Re: which high level Java client
Not following this thread too much, but there is also https://github.com/Netflix/astyanax/ Astyanax is currently in use at Netflix http://movies.netflix.com/. Issues generally are fixed as quickly as possbile and releases done frequently. -sd On Thu, Jun 28, 2012 at 2:39 PM, Poziombka, Wade L wade.l.poziom...@intel.com wrote: I use Pelops and have been very happy. In my opinion the interface is cleaner than that with Hector. I personally do like the serializer business. -Original Message- From: Radim Kolar [mailto:h...@filez.com] Sent: Thursday, June 28, 2012 5:06 AM To: user@cassandra.apache.org Subject: Re: which high level Java client i do not have experience with other clients, only hector. But timeout management in hector is really broken. If you expect your nodes to timeout often (for example, if you are using WAN) better to try something else first. -- Sasha Dolgy sasha.do...@gmail.com
Re: portability between enterprise and community version
I consistently move keyspaces from linux machines onto windows machines for development purposes. I've had no issues ... but would probably be hesitant in rolling this out into a productive instance. Depends on the level of risk you want to take. : ) Run some tests ... mix things up and share your experiences ... Personally, I could see some value in not really caring what OS my cassandra instances are running on ... just that the JVM's are consistent and the available hardware resources are sufficient I don't speak for the vendors mentioned in this thread, but traditionally, the first step towards supportability is finding the problems / identifying the risks and see if they can be resolved ... -sd On Wed, Jun 13, 2012 at 10:26 AM, Viktor Jevdokimov viktor.jevdoki...@adform.com wrote: Repair (streaming) will not work. ** ** Probably schema update will not work also, it was long time ago, don’t remember. ** ** Migration of the cluster between Windows and Linux also not an easy task, a lot of manual work. ** ** Finally, mixed Cassandra environments are not supported as by DataStax as by anyone else. ** ** ** ** Best regards / Pagarbiai *Viktor Jevdokimov* Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsiderhttp://twitter.com/#%21/adforminsider What is Adform: watch this short video http://vimeo.com/adform/display [image: Adform News] http://www.adform.com Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. *From:* Abhijit Chanda [mailto:abhijit.chan...@gmail.com] *Sent:* Wednesday, June 13, 2012 10:54 *To:* user@cassandra.apache.org *Subject:* Re: portability between enterprise and community version ** ** Hi Viktor Jevadokimov, May i know what are the issues i may face if i mix windows cluster along with linux cluster. signature-logo29.png
Re: RESTful API for GET
https://github.com/hmsonline/virgil Brian O'Neill posted this a while ago ... sits on top of Cassandra to give you the RESTful API you want Another option ... http://code.google.com/p/restish/ Or, you could simply build your own ... On Tue, Jun 12, 2012 at 8:46 AM, Tom fivemile...@gmail.com wrote: Hi James, No, Cassandra doesn't supports a RESTful api. As Tamar points out, you have to supply this functionality yourself specifically for your data model. When designing your RESTful server application: - consider using a RESTful framework (for example: Jersey) - use a cassandra client to access your Cassandra data (for example: astyanax) Good luck, Tom On 06/11/2012 11:15 PM, James Pirz wrote: Hi, Thanks for the reply, But can you tell me how do you form your request URLs, I mean does Cassandra support a native RESTful api for talking to the system, and if yes, on which specific port it is listening for the coming requests ? and what does it expect for the format for the URLs ? Thanks in advance, James On Mon, Jun 11, 2012 at 11:09 PM, Tamar Fraenkel ta...@tok-media.com mailto:ta...@tok-media.com wrote: Hi! I am using java and jersey. Works fine, *Tamar Fraenkel * Senior Software Engineer, TOK Media Inline image 1 ta...@tok-media.com mailto:ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Tue, Jun 12, 2012 at 9:06 AM, James Pirz james.p...@gmail.com mailto:james.p...@gmail.com wrote: Dear all, I am trying to query the system, specifically performing a GET for a specific key, through Jmeter (or CURL) and I am wondering what is the best pure RESTful API for the system (with the lowest overhead) that I can use. Thanks, James -- Sasha Dolgy sasha.do...@gmail.com
Re: how to configure cassandra as multi tenant
Google, man. http://wiki.apache.org/cassandra/MultiTenant http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/about-multitenant-datamodel-td7575966.html http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/For-multi-tenant-is-it-good-to-have-a-key-space-for-each-tenant-td6723290.html On Mon, Jun 11, 2012 at 11:37 AM, MOHD ARSHAD SALEEM marshadsal...@tataelxsi.co.in wrote: Hi Aaron, Can you send me some particular link related to multi tenant research Regards Arshad -- *From:* aaron morton [aa...@thelastpickle.com] *Sent:* Thursday, June 07, 2012 3:34 PM *To:* user@cassandra.apache.org *Subject:* Re: how to configure cassandra as multi tenant Cassandra is not designed to run as a multi tenant database. There have been some recent discussions on this, search the user group for more detailed answers. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 7/06/2012, at 7:03 PM, MOHD ARSHAD SALEEM wrote: Hi All, I wanted to know how to use cassandra as a multi tenant . Regards Arshad -- Sasha Dolgy sasha.do...@gmail.com
Re: how to configure cassandra as multi tenant
Arshad, I used google with the following query: apache cassandra multitenant Suggest you do the same? As was mentioned earlier, there has been a lot of discussion about this topic for the past year -- especially on this mailing list. If you want to use Thrift or, to make your life easier, using Hector or a similar API, you can create keyspaces however you want ... aligned to your design / architecture to support Multitenancy. If it's code specific help you want ... check out the maililng lists / resources for the various API's that make working with Thrift easier: Hector Pycassa PHPCassa etc. -sd On Mon, Jun 11, 2012 at 12:05 PM, MOHD ARSHAD SALEEM marshadsal...@tataelxsi.co.in wrote: Hi Sasha, Thanks for your reply. but what you send this is just to create keyspace manually using command prompt.how to create keyspace(Multi tenant) automatically using cassandra API's. Regards Arshad -- *From:* Sasha Dolgy [sdo...@gmail.com] *Sent:* Monday, June 11, 2012 3:09 PM *To:* user@cassandra.apache.org *Subject:* Re: how to configure cassandra as multi tenant Google, man. http://wiki.apache.org/cassandra/MultiTenant http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/about-multitenant-datamodel-td7575966.html http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/For-multi-tenant-is-it-good-to-have-a-key-space-for-each-tenant-td6723290.html On Mon, Jun 11, 2012 at 11:37 AM, MOHD ARSHAD SALEEM marshadsal...@tataelxsi.co.in wrote: Hi Aaron, Can you send me some particular link related to multi tenant research Regards Arshad -- *From:* aaron morton [aa...@thelastpickle.com] *Sent:* Thursday, June 07, 2012 3:34 PM *To:* user@cassandra.apache.org *Subject:* Re: how to configure cassandra as multi tenant Cassandra is not designed to run as a multi tenant database. There have been some recent discussions on this, search the user group for more detailed answers. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 7/06/2012, at 7:03 PM, MOHD ARSHAD SALEEM wrote: Hi All, I wanted to know how to use cassandra as a multi tenant . Regards Arshad -- Sasha Dolgy sasha.do...@gmail.com -- Sasha Dolgy sasha.do...@gmail.com
Zurich / Swiss / Alps meetup
All, A year ago I made a simple query to see if there were any users based in and around Zurich, Switzerland or the Alps region, interested in participating in some form of Cassandra User Group / Meetup. At the time, 1-2 replies happened. I didn't do much with that. Let's try this again. Who all is interested? I often am jealous about all the fun I miss out on with the regular meetups that happen stateside ... Regards, -sd -- Sasha Dolgy sasha.do...@gmail.com
Re: Matthew Dennis's Cassandra On EC2
Although, probably inappropriate, I would be willing to contribute some funds for someone to recreate it with animated stick-figures. thanks. ;) On Thu, May 17, 2012 at 6:02 PM, Jeremy Hanna jeremy.hanna1...@gmail.comwrote: Sorry - it was at the austin cassandra meetup and we didn't record the presentation. I wonder if this would be a popular topic to have at the upcoming Cassandra SF event which would be recorded...
Re: unsubscribe
List-Help: mailto:user-h...@cassandra.apache.org List-Unsubscribe: mailto:user-unsubscr...@cassandra.apache.orguser-unsubscr...@cassandra.apache.org http://wiki.apache.org/cassandra/FAQ#unsubscribe On Mon, Apr 16, 2012 at 8:53 AM, Dirk Dittmar d.ditt...@wortzwei.de wrote:
RE: Using Thrift
Best to read about maven. Save you some grief. On Apr 2, 2012 3:05 PM, Rishabh Agrawal rishabh.agra...@impetus.co.in wrote: I didn’t fine slf4j files in distribution. So I downloaded them can you help me how to configure it. *From:* Dave Brosius [mailto:dbros...@mebigfatguy.com] *Sent:* Monday, April 02, 2012 6:28 PM *To:* user@cassandra.apache.org *Subject:* Re: Using Thrift For a thrift client, you need the following jars at a minimum apache-cassandra-clientutil-*.jar apache-cassandra-thrift-*.jar libthrift-*.jar slf4j-api-*.jar slf4j-log4j12-*.jar all of these jars can be found in the cassandra distribution. On 04/02/2012 07:40 AM, Rishabh Agrawal wrote: Any suggestions…. *From:* Rishabh Agrawal *Sent:* Monday, April 02, 2012 4:42 PM *To:* user@cassandra.apache.org *Subject:* Using Thrift Hello, I have just started exploring Cassandra from java side and using wish to use thrift as my api. The problem is whenever is I try to compile my java code I get following error : “package org.slf4j does not exist” Can anyone help me with this. Thanks and Regards Rishabh Agrawal -- Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. Know more about our Big Data quick-start program at the event. New Impetus webcast ‘Cloud-enabled Performance Testing vis-à-vis On-premise’ available at http://bit.ly/z6zT4L. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference. -- Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. Know more about our Big Data quick-start program at the event. New Impetus webcast ‘Cloud-enabled Performance Testing vis-à-vis On-premise’ available at http://bit.ly/z6zT4L. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference. -- Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. Know more about our Big Data quick-start program at the event. New Impetus webcast ‘Cloud-enabled Performance Testing vis-à-vis On-premise’ available at http://bit.ly/z6zT4L. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: How to store a list of values?
Save the skills in a single column in json format. Job done. On Mar 26, 2012 7:04 PM, Ben McCann b...@benmccann.com wrote: True. But I don't need the skills to be searchable, so I'd rather embed them in the user than add another top-level CF. I was thinking of doing something along the lines of adding a skills super column to the User table: skills: { 'java': null, 'c++': null, 'cobol': null } However, I'm still not sure yet how to accomplish this with Astyanax. I've only figured out how to make composite columns with predefined column names with it and not dynamic column names like this. On Mon, Mar 26, 2012 at 9:08 AM, R. Verlangen ro...@us2.nl wrote: In this case you only neem the columns for values. You don't need the column-values to hold multiple columns (the super-column principle). So a normal CF would work. 2012/3/26 Ben McCann b...@benmccann.com Thanks for the reply Samal. I did not realize that you could store a column with null value. Do you know if this solution would work with composite columns? It seems super columns are being phased out in favor of composites, but I do not understand composites very well yet. I'm trying to figure out if there's any way to accomplish what you've suggested using Astyanax https://github.com/Netflix/astyanax. Thanks for the help, Ben On Mon, Mar 26, 2012 at 8:46 AM, samal samalgo...@gmail.com wrote: plus it is fully compatible with CQL. SELECT * FROM UserSkill WHERE KEY='ben'; On Mon, Mar 26, 2012 at 9:13 PM, samal samalgo...@gmail.com wrote: I would take simple approach. create one other CF UserSkill with row key same as profile_cf key, In user_skill cf will add skill as column name and value null. Columns can be added or removed. UserProfile={ '*ben*'={ blah :blah blah :blah blah :blah } } UserSkill={ '*ben*'={ 'java':'' 'cassandra':'' . . . 'linux':'' 'skill':'infinity' } } On Mon, Mar 26, 2012 at 12:34 PM, Ben McCann b...@benmccann.comwrote: I have a profile column family and want to store a list of skills in each profile. In BigTable I could store a Protocol Bufferhttp://code.google.com/apis/protocolbuffers/docs/overview.htmlwith a repeated field, but I'm not sure how this is typically accomplished in Cassandra. One option would be to store a serialized Thrifthttp://thrift.apache.org/or protobuf, but I'd prefer not to do this as I believe Cassandra doesn't have knowledge of these formats, and so the data in the datastore would not not human readable in CQL queries from the command line. The other solution I thought of would be to use a super column and put a random UUID as the key for each skill: skills: { '4b27c2b3ac48e8df': 'java', '84bf94ea7bc92018': 'c++', '9103b9a93ce9d18': 'cobol' } Is this a good way of handling lists in Cassandra? I imagine there's some idiom I'm not aware of. I'm using the Astyanaxhttps://github.com/Netflix/astyanax/wikiclient library, which only supports composite columns instead of super columns, and so the solution I proposed above would seem quite awkward in that case. Though I'm still having some trouble understanding composite columns as they seem not to be completely documented yet. Would this solution work with composite columns? Thanks, Ben -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: cassandra 1.08 on java7 and win7
interesting. that behaviour _does_ happen in 1.0.8, but doesn't in 1.0.6 on windows 7 with Java 7. looks to be a problem with the CLI and not the actual Cassandra service. just tried it now. -sd On Mon, Mar 26, 2012 at 11:29 PM, R. Verlangen ro...@us2.nl wrote: Ben Coverston wrote earlier today: Use a version of the Java 6 runtime, Cassandra hasn't been tested at all with the Java 7 runtime So I think that might be a good way to start.
Re: cassandra 1.08 on java7 and win7
best to open an issue: https://issues.apache.org/jira/browse/CASSANDRA On Mon, Mar 26, 2012 at 11:35 PM, Frank Hsueh frank.hs...@gmail.com wrote: err ... same thing happens with Java 1.6 On Mon, Mar 26, 2012 at 2:35 PM, Frank Hsueh frank.hs...@gmail.comwrote: I'm using the latest of Java 1.6 from Oracle. On Mon, Mar 26, 2012 at 2:29 PM, R. Verlangen ro...@us2.nl wrote: Ben Coverston wrote earlier today: Use a version of the Java 6 runtime, Cassandra hasn't been tested at all with the Java 7 runtime So I think that might be a good way to start. 2012/3/26 Frank Hsueh frank.hs...@gmail.com I think I have cassandra the server started In another window: cassandra-cli.bat -h localhost -p 9160 Starting Cassandra Client Connected to: Test Cluster on localhost/9160 Welcome to Cassandra CLI version 1.0.8 Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit. [default@unknown] create keyspace DEMO; log4j:WARN No appenders could be found for logger (org.apache.cassandra.config.DatabaseDescriptor). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfigfor more info. Cannot locate cassandra.yaml Fatal configuration error; unable to start server. See log for stacktrace. C:\Workspace\cassandra\apache-cassandra-1.0.8\bin anybody seen this before ? -- Frank Hsueh | frank.hs...@gmail.com -- With kind regards, Robin Verlangen www.robinverlangen.nl -- Frank Hsueh | frank.hs...@gmail.com -- Frank Hsueh | frank.hs...@gmail.com -- Sasha Dolgy sasha.do...@gmail.com
design that mimics twitter tweet search
Hi All, With twitter, when I search for words like: cassandra is the bestest, 4 tweets will appear, including one i just did. My understand that the internals of twitter work in that each word in a tweet is allocated, irrespective of the presence of a # hash tag, and the tweet id is assigned to a row for that word. What is puzzling to me, and hopeful that some smart people on here can shed some light on -- is how would this work with Cassandra? row [ cassandra ]: key - tweetid / timestamp row [ bestest ]: key - tweetid / timestamp I had thought that I could simply pull a list of all column names from each row (representing each word) and flag all occurrences (tweet id's) that exist in each row ... however, these rows would get quite long over time. Am I missing an easier way to get a list of all tweetid's that exist in multiple rows? -- Sasha Dolgy sasha.do...@gmail.com
Re: design that mimics twitter tweet search
yes -- but given i have two keywords, and want to find all tweets that have cassandra and bestest ... means, retrieving all columns + values in each row, iterating through both to see if tweet id's in one, exist in the other and finishing up with a consolidated list of tweet id's that only exist in both. just seems clunky to me ... ? On Sun, Mar 18, 2012 at 4:12 PM, Benoit Perroud ben...@noisette.ch wrote: The simpliest modeling you could have is using the keyword as key, a timestamp/time UUID as column name and the tweetid as value - cf['keyword']['timestamp'] = tweetid then you do a range query to get all tweetid sorted by time (you may want them in reverse order) and you can limit to the number of tweets displayed on the page. As some rows can become large, you could use key patitionning by concatening for instance keyword and the month and year. 2012/3/18 Sasha Dolgy sdo...@gmail.com: Hi All, With twitter, when I search for words like: cassandra is the bestest, 4 tweets will appear, including one i just did. My understand that the internals of twitter work in that each word in a tweet is allocated, irrespective of the presence of a # hash tag, and the tweet id is assigned to a row for that word. What is puzzling to me, and hopeful that some smart people on here can shed some light on -- is how would this work with Cassandra? row [ cassandra ]: key - tweetid / timestamp row [ bestest ]: key - tweetid / timestamp I had thought that I could simply pull a list of all column names from each row (representing each word) and flag all occurrences (tweet id's) that exist in each row ... however, these rows would get quite long over time. Am I missing an easier way to get a list of all tweetid's that exist in multiple rows? -- Sasha Dolgy sasha.do...@gmail.com -- sent from my Nokia 3210 -- Sasha Dolgy sasha.do...@gmail.com
Re: data model question
Alternate would be to add another row to your user CF specific for Facebook ids. Column ID would be the Facebook identifier and value would be your internal uuid. Consider when you want to add another service like twitter. Will you then add another CF per service or just another row specific now to twitter ID's. Queries will be easy still as its against a single row in the same CF. On Mar 12, 2012 10:14 AM, aaron morton aa...@thelastpickle.com wrote: In this case, where you know the query upfront, I add a custom secondary index using another CF to support the query. It's a little easier here because the data wont change. UserLookupCF (using composite types for the key value) row_key: system_name:id e.g. facebook:12345 or twitter:12345 col_name : internal_user_id e.g. 5678 col_value: empty Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 11/03/2012, at 11:15 PM, Tamar Fraenkel wrote: Hi! Thanks for the response. From what I read, secondary indices are good only for columns with few possible values. Is this a good fit for my case? I have unique facebook id for every user. Thanks *Tamar Fraenkel * Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Sun, Mar 11, 2012 at 11:48 AM, Marcel Steinbach mstei...@gmail.comwrote: Either you do that or you could think about using a secondary index on the fb user name in your primary cf. See http://www.datastax.com/docs/1.0/ddl/indexes Cheers Am 11.03.2012 um 09:51 schrieb Tamar Fraenkel ta...@tok-media.com: Hi! I need some advise: I have user CF, which has a UUID key which is my internal user id. One of the column is facebook_id of the user (if exist). I need to have the reverse mapping from facebook_id to my UUID. My intention is to add a CF for the mapping from Facebook Id to my id: user_by_fbid = { // key is fb Id, column name is our User Id, value is empty 13101876963: { f94f6b20-161a-4f7e-995f-0466c62a1b6b : } } Does this makes sense. This CF will be used whenever a user log in through Facebook to retrieve the internal id. Thanks *Tamar Fraenkel * Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956
Re: What is the best way to secure remote Cassandra (dev) server ?
Put it on a non-routable internal network. 192.168.x.x 172.16.x.x Etc... On Mar 2, 2012 1:56 PM, investtr investt...@gmail.com wrote: We have our development Cassandra 1.0.8 server running on EC2 and wanted to secure it. I read securing the entire server with firewall is one of the options. What are the other cheaper options to secure a development server ? regards, Ramesh
Re: Best way to know the cluster status
Tamil, what is the underlying purpose you are trying to achieve? To have your webpages know and detect when a node is down? To have a monitoring tool detect when a node is down? PHPCassa allows you to define multiple nodes. If one node is down, it should log information to the webserver logs and continue to work as expected if an alternate node is available. Parsing the output of nodetool ring is OK if you want the status at that very moment. Something more reliable should be considered, perhaps using JMX and a proper monitoring tool, like Nagios or Zenoss...etc. On Mon, Feb 6, 2012 at 8:59 AM, R. Verlangen ro...@us2.nl wrote: You might consider writing some kind of php script that runs nodetool ring and parse the output? 2012/2/6 Tamil selvan R.S tamil.3...@gmail.com Hi, What is the best way to know the cluster status via php? Currently we are trying to connect to individual cassandra instance with a specified timeout and if it fails we report the node to be down. But this test remains faulty. What are the other ways to test availability of nodes in cassandra cluster? How does datastax opscenter manage to do that? Regards, Tamil Selvan -- Sasha Dolgy sasha.do...@gmail.com
Re: Cannot start cassandra node anymore
why would you ever want to stop all nodes together? On Thu, Jan 26, 2012 at 1:24 PM, Carlo Pires carlopi...@gmail.com wrote: I found out this is related to schema change. Happens *every time* I create drop and new CF with composite types. As workaround I: * never stop all nodes together To stop a node: * repair and compact a node before stopping it * stop and start it again * if it started fine good if not, remove all data and restart the node (and wait...)
Re: CLI exception :: A long is exactly 8 bytes: 1
Hi -- Sorry for the delay, and thanks for the response. Debug didn't print any stack traces and none are in the usual suspected places...but thanks for that hint. Didn't know that option existed. The age column is an Integer ... Updating to IntegerType worked. Thanks. On Mon, Jan 2, 2012 at 11:23 AM, aaron morton aa...@thelastpickle.com wrote: If you use the --debug flag when you start the CLI it always will print full stack traces. What is the CF definition ? I'm guessing the column_metadata specifies that the age column is a Long Was there existing data in the age column and if so how was it encoded ? Was the existing data was encoded as a variable length integer value? The standard IntegerType is not compatible with the LongType as the the long is fixed width. If this is the case try re-creating the index using an IntegerType. This worked for me… [default@dev] create column family User ... with comparator = AsciiType ... and column_metadata = ... [{ ... column_name : age, ... validation_class : LongType, ... index_type : 0, ... index_name : IdxAge}, ... ]; 2fd1a5c0-352b-11e1--242d50cf1fb6 Waiting for schema agreement... ... schemas agree across the cluster [default@dev] [default@dev] get User where age = 1; 0 Row Returned. Elapsed time: 33 msec(s). [default@dev] Hope that helps.
Re: cassandra site wsod's /mysql site functions
Have you looked at PHPCassa [ https://github.com/thobbs/phpcassa ] instead of using Thrift direct? I've had no issues with getting it to work with versions 0.7.x, 0.8.x and now 1.x ... it adds better error handling and overall, is fairly easy to get going. Some information to get you running: http://thobbs.github.com/phpcassa/ Instead of fighting with the heavy lifting .. it's often recommended around here to use the purpose built libraries that abstract thrift for you ... -sd On Tue, Jan 3, 2012 at 2:01 PM, Tim Dunphy bluethu...@gmail.com wrote: unfortunately not .. :( thanks for checking. still looking for advice on this! tx
CLI exception :: A long is exactly 8 bytes: 1
Hi Everyone, Been a while .. without any problems. Thanks for grinding out a good product! On 1.0.6, I applied an update to a column family to add a secondary index, and now via the CLI, when I perform a get user where something=1 I receive the following result: org.apache.cassandra.db.marshal.MarshalException: A long is exactly 8 bytes: 1 This behaviour doesn't seem to be affecting phpcassa or hector retrieving the results of that query ... is this a silly something i've done, or something a bit more buggy with the CLI? Thanks in advance, -sd -- Sasha Dolgy sasha.do...@gmail.com
Re: CLI exception :: A long is exactly 8 bytes: 1
as per the wiki link you sent, i change my query to: get user where something = '1'; Still throws the error ... This was fine *before* I ran the update CF command .. To Query Data get User where age = '12'; On Fri, Dec 30, 2011 at 6:05 PM, Moshiur Rahman moshi.b...@gmail.com wrote: I think you need to mention data type in your command. You have to run the following command first: assume CFName keys as TypeName, i.e., utf8 Otherwise, you need to mention type with each command, e.g., utf8('keyname'). http://wiki.apache.org/cassandra/CassandraCli Moshiur On Fri, Dec 30, 2011 at 10:50 AM, Sasha Dolgy sdo...@gmail.com wrote: Hi Everyone, Been a while .. without any problems. Thanks for grinding out a good product! On 1.0.6, I applied an update to a column family to add a secondary index, and now via the CLI, when I perform a get user where something=1 I receive the following result: org.apache.cassandra.db.marshal.MarshalException: A long is exactly 8 bytes: 1 This behaviour doesn't seem to be affecting phpcassa or hector retrieving the results of that query ... is this a silly something i've done, or something a bit more buggy with the CLI? Thanks in advance, -sd
Re: cassandra as an email store ...
Hi Rustam, Thanks for posting that. Interesting to see that you opted to use Super Column's: https://github.com/elasticinbox/elasticinbox/wiki/Data-Model .. wondering, for the sake of argument/discussion .. if anyone can come up with an alternative data model that doesn't use SC's. -sd On Fri, Dec 16, 2011 at 11:10 AM, Rustam Aliyev rus...@code.az wrote: Hi Sasha, Replying to the old thread just for reference. We've released a code which we use to store emails in Cassandra as an open source project: http://elasticinbox.com/ Hope you find it helpful. Regards, Rustam. On Fri Apr 29 15:20:07 2011, Sasha Dolgy wrote: Great read. thanks. On Apr 29, 2011 4:07 PM, sridhar basam s...@basam.org mailto:s...@basam.org wrote: Have you already looked at some research out of IBM about this usecase? Paper is at http://ewh.ieee.org/r6/scv/computer/nfic/2009/IBM-Jun-Rao.pdf Sridhar -- Sasha Dolgy sasha.do...@gmail.com
Re: security
Firewall with appropriate rules. On Tue, Nov 8, 2011 at 6:30 PM, Guy Incognito dnd1...@gmail.com wrote: hi, is there a standard approach to securing cassandra eg within a corporate network? at the moment in our dev environment, anybody with network connectivity to the cluster can connect to it and mess with it. this would not be acceptable in prod. do people generally write custom authenticators etc, or just put the cluster behind a firewall with the appropriate rules to limit access?
Re: Value-Added Services Layer
I don't have grand visions of having fat clients connect directly to Cassandra to read/write data. Too much risk in my opinion. On Tue, Oct 25, 2011 at 4:50 PM, Edward Capriolo edlinuxg...@gmail.com wrote: If you do not think restful API's are useful, try to make a fat client that speaks a non http or https protocol and put if on the desktops of thousands of corporate computers. Then wait for months/years for approval and firewall changes across said corporate network.
Re: Cassandra and Thrift on the Server Side
Hi Brian, It's an interesting one. Hope you don't mind some feedback .I see you have been making rounds publicizing the concept and patch (like on my blog ; ) http://blog.sasha.dolgy.com/2011/05/apache-cassandra-restful-api.html) For me, and the goals I have, I'm not sure this is fit for purpose. I built an API that implements the business processes, business rules, security and policies of what I require. I made it RESTful to allow consumers quick and easy access The API implements Hector or PHPCassa depending on my mood. Both libraries provide an element of Connection Pooling, meaning I don't have to worry about that in my code. It just works The cost to me writing code that leverages Hector or PHPCassa isn't that high when I compare it to writing code that would leverage a RESTful interface. I'd have to think about Connection Pooling / selecting the best available node, etc. I think the cost would be higher unless it's a one or two node infrastructure or there is a load balancer in front of all of the Cassandra interfaces so that I don't have to think about it .. Would I leverage the RESTful interface if it existed with Cassandra? Probably not. I am happy with the libraries as they are today .. and they let me bundle in a bunch of fun (batch mutates, connection pooling, etc). They aren't overly complicated and make overall development and integration quite simple. I definitely think that people who are looking into Apache Cassanda for the first time may look for this feature and/or CQL ... and in that respects, it's something good to have. Probably the best question I read in the JIRA ticket ( https://issues.apache.org/jira/browse/CASSANDRA-3380 ) is: ...what problem the REST API solves... , which is still not clear to me... -sd On Tue, Oct 25, 2011 at 5:48 AM, Brian ONeill b...@alumni.brown.edu wrote: Peter Minearo Peter.Minearo at Reardencommerce.com writes: Thrift uses RPC, I was wondering if Cassandra uses Thrift on the server side to handle the requests from the clients? I know Thrift is used on the client side, but what about the server side? If this is true; is there a reason for it? Was a REST API with a JSON payload tried? Are there any plans to create a REST API for Cassandra? We started work on an extension to Cassandra that would deliver a REST layer. Check out: http://tinyurl.com/3ktnc9f http://code.google.com/a/apache-extras.org/p/virgil/ -brian
Re: Volunteers needed - Wiki
maybe that should be the first wiki update the TODO On Tue, Oct 11, 2011 at 7:21 AM, Maki Watanabe watanabe.m...@gmail.comwrote: Hello aaron, I raise my hand too. If you have to-do list about the wiki, please let us know. maki
Operator on secondary indexes in 0.8.x (GTE/LTE)
I was trying to get a range of rows based on a secondary_index that was defined. Any rows where age was greater than or equal to ... it didn't work. Is this a continued limitation? Did a quick look in JIRA, couldn't find anything. The output from help get; on the cli contains the following, which led me to believe it was a limitation on Cassandra 0.7.x and not on 0.8.x ... get cf where col operator value [ and col operator value and ...] [limit limit]; get cf where col operator function(value) [ and col operator function and ...] [limit limit]; - operator: Operator to test the column value with. Supported operators are =, , =, , = . In Cassandra 0.7 at least one = operator must be present. [default@sdo] get user where age = 18; No indexed columns present in index clause with operator EQ [default@sdo] get user where gender = 1 and age = 18 (returns results) Tested this behavior on 0.8.2, 0.8.6 and now 0.8.7 ... create column family user with column_type = 'Standard' and comparator = 'UTF8Type' and default_validation_class = 'BytesType' and key_validation_class = 'BytesType' and memtable_operations = 0.248437498 and memtable_throughput = 53 and memtable_flush_after = 1440 and rows_cached = 0.0 and row_cache_save_period = 0 and keys_cached = 20.0 and key_cache_save_period = 14400 and read_repair_chance = 1.0 and gc_grace = 864000 and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = true and row_cache_provider = 'ConcurrentLinkedHashCacheProvider' and column_metadata = [ {column_name : 'gender', validation_class : LongType, index_name : 'user_gender_idx', index_type : 0}, {column_name : 'year', validation_class : LongType, index_name : 'user_year_idx', index_type : 0}]; -- Sasha Dolgy sasha.do...@gmail.com
Re: Operator on secondary indexes in 0.8.x (GTE/LTE)
ah, hadn't even thought of that. simple. elegant. cheers. On Tue, Oct 11, 2011 at 11:01 PM, Jake Luciani jak...@gmail.com wrote: This hasn't changed in AFAIK, In Brisk we had the same problem in CFS so we created a sentinel value that all rows shared then it works. CASSANDRA-2915 should fix it. On Tue, Oct 11, 2011 at 4:48 PM, Sasha Dolgy sdo...@gmail.com wrote: I was trying to get a range of rows based on a secondary_index that was defined. Any rows where age was greater than or equal to ... it didn't work. Is this a continued limitation? Did a quick look in JIRA, couldn't find anything. The output from help get; on the cli contains the following, which led me to believe it was a limitation on Cassandra 0.7.x and not on 0.8.x ... get cf where col operator value [ and col operator value and ...] [limit limit]; get cf where col operator function(value) [ and col operator function and ...] [limit limit]; - operator: Operator to test the column value with. Supported operators are =, , =, , = . In Cassandra 0.7 at least one = operator must be present. [default@sdo] get user where age = 18; No indexed columns present in index clause with operator EQ [default@sdo] get user where gender = 1 and age = 18 (returns results) Tested this behavior on 0.8.2, 0.8.6 and now 0.8.7 ... create column family user with column_type = 'Standard' and comparator = 'UTF8Type' and default_validation_class = 'BytesType' and key_validation_class = 'BytesType' and memtable_operations = 0.248437498 and memtable_throughput = 53 and memtable_flush_after = 1440 and rows_cached = 0.0 and row_cache_save_period = 0 and keys_cached = 20.0 and key_cache_save_period = 14400 and read_repair_chance = 1.0 and gc_grace = 864000 and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = true and row_cache_provider = 'ConcurrentLinkedHashCacheProvider' and column_metadata = [ {column_name : 'gender', validation_class : LongType, index_name : 'user_gender_idx', index_type : 0}, {column_name : 'year', validation_class : LongType, index_name : 'user_year_idx', index_type : 0}]; -- Sasha Dolgy sasha.do...@gmail.com -- http://twitter.com/tjake -- Sasha Dolgy sasha.do...@gmail.com
Re: Volunteers needed - Wiki
while on the topic of the wiki ... it's not entirely pleasing to the senses or at all user friendly ... hacking around on it earlier today, there aren't that many options on how to give it some flare ... shame really that for such a cool piece of software, the wiki doesn't scream the same level of cool. FWIW, Cassandra doesn't show up on http://wiki.apache.org/general/ On Wed, Oct 12, 2011 at 12:05 AM, Daria Hutchinson da...@datastax.comwrote: Sounds like a good place to start! Thanks for taking the lead and please let me know how I can help! Daria On Tue, Oct 11, 2011 at 2:20 PM, aaron morton aa...@thelastpickle.comwrote: Thanks Daria, I have a look at whats there and get in touch. Right now I'm not thinking beyond getting the wiki complete (e.g. it lists all the command line tools) and correct for version 1.0. My main concern was people coming away from the site with incorrect information and having a bad out of the box experience. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 12/10/2011, at 7:42 AM, Daria Hutchinson wrote: DataStax would like to help with the wiki update effort. For example, we have a start on updates for 1.0, such as the storage configuration. http://www.datastax.com/docs/1.0/configuration/storage_configuration Let me know how we can help. Cheers, Daria (DataStax Tech Writer) Question - Are you planning on maintaining wiki docs by version going forward (starting with 1.0)? On Tue, Oct 11, 2011 at 1:55 AM, aaron morton aa...@thelastpickle.comwrote: @maki thanks, Could you take a look at the cli page http://wiki.apache.org/cassandra/CassandraCli ?. There is a lot of online docs in the tool, so we dont need to replicate that. Just a simple getting started guide, some examples and a few tips about about what to do if things don't work. e.g. often people have problems when using bytes comparator. If you could use the sample schema that ships in conf/ that would be handy. You may want to snapshot the 0.7 CLI page in the same way the 0.6 one was and link back http://wiki.apache.org/cassandra/CassandraCli06 Just update the draft home page to say you are working on it http://wiki.apache.org/cassandra/FrontPage_draft_aaron @sasha I was going to use the draft home page as a todo list, (do every page listed on there, and sensibly follow links) and as a checkout system http://wiki.apache.org/cassandra/FrontPage_draft_aaron @Jérémy Thanks I'll keep that in mind. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 11/10/2011, at 8:12 PM, Jérémy SEVELLEC wrote: Hi Aaron, I think the CommitLog section is outdated ( http://wiki.apache.org/cassandra/ArchitectureCommitLog) : The CommitLogHeader is no longer exist since this ticket : https://issues.apache.org/jira/browse/CASSANDRA-2419 Regards, Jérémy 2011/10/11 Sasha Dolgy sdo...@gmail.com maybe that should be the first wiki update the TODO On Tue, Oct 11, 2011 at 7:21 AM, Maki Watanabe watanabe.m...@gmail.com wrote: Hello aaron, I raise my hand too. If you have to-do list about the wiki, please let us know. maki -- Jérémy -- Sasha Dolgy sasha.do...@gmail.com
Re: ebs or ephemeral
just catching the tail end of this discussion. aaron, in your previous email, you said And an explanation of why we normally avoid ephemeral. shouldn't this be, avoiding EBS? EBS was a nightmare for us in terms of performance. On Mon, Oct 10, 2011 at 9:23 AM, aaron morton aa...@thelastpickle.comwrote: 6 nodes and RF3 will mean you can handle between 1 and 2 failed nodes. see http://thelastpickle.com/2011/06/13/Down-For-Me/ Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 7/10/2011, at 9:37 PM, Madalina Matei wrote: Hi Aaron, For a 6 nodes cluster, what RF can we use in order to support 2 failed nodes? From the article that you sent i understood avoid EMS and use ephemeral. am i missing anything? Thank you so much for your help, Madaina On Fri, Oct 7, 2011 at 9:15 AM, aaron morton aa...@thelastpickle.comwrote: Data Stax have pre build AMI's here http://www.datastax.com/dev/blog/setting-up-a-cassandra-cluster-with-the-datastax-ami And an explanation of why we normally avoid ephemeral. Also, I would go with 6 nodes. You will then be able to handle up to 2 failed nodes. Hope that helps.
Re: what's the difference between repair CF separately and repair the entire node?
It was mentioned in another thread that Twitter uses 0.8 in productionfor me that was a fairly strong testimonial... On Sep 14, 2011 9:28 AM, Yan Chunlu springri...@gmail.com wrote: is 0.8 ready for production use? as I know currently many companies including reddit.com are using 0.7, how does they get rid of the repair problem? On Wed, Sep 14, 2011 at 2:47 PM, Sylvain Lebresne sylv...@datastax.com wrote: On Wed, Sep 14, 2011 at 2:38 AM, Yan Chunlu springri...@gmail.com wrote: me neither don't want to repair one CF at the time. the node repair took a week and still running, compactionstats and netstream shows nothing is running on every node, and also no error message, no exception, really no idea what was it doing, To add to the list of things repair does wrong in 0.7, we'll have to add that if one of the node participating in the repair (so any node that share a range with the node on which repair was started) goes down (even for a short time), then the repair will simply hang forever doing nothing. And no specific error message will be logged. That could be what happened. Again, recent releases of 0.8 fix that too. -- Sylvain I stopped yesterday. maybe I should run repair again while disable compaction on all nodes? thanks! On Wed, Sep 14, 2011 at 6:57 AM, Peter Schuller peter.schul...@infidyne.com wrote: I think it is a serious problem since I can not repair. I am using cassandra on production servers. is there some way to fix it without upgrade? I heard of that 0.8.x is still not quite ready in production environment. It is a serious issue if you really need to repair one CF at the time. However, looking at your original post it seems this is not necessarily your issue. Do you need to, or was your concern rather the overall time repair took? There are other things that are improved in 0.8 that affect 0.7. In particular, (1) in 0.7 compaction, including validating compactions that are part of repair, is non-concurrent so if your repair starts while there is a long-running compaction going it will have to wait, and (2) semi-related is that the merkle tree calculation that is part of repair/anti-entropy may happen out of synch if one of the nodes participating happen to be busy with compaction. This in turns causes additional data to be sent as part of repair. That might be why your immediately following repair took a long time, but it's difficult to tell. If you're having issues with repair and large data sets, I would generally say that upgrading to 0.8 is recommended. However, if you're on 0.7.4, beware of https://issues.apache.org/jira/browse/CASSANDRA-3166 -- / Peter Schuller (@scode on twitter)
AntiEntropyService.getNeighbors pulls information from where?
This relates to the issue i opened the other day: https://issues.apache.org/jira/browse/CASSANDRA-3175 .. basically, 'nodetool ring' throws an exception on two of the four nodes. In my fancy little world, the problems appear to be related to one of the nodes thinking that someone is their neighbor ... and that someone moved away a long time ago /mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:5] 2011-09-10 21:20:02,182 AntiEntropyService.java (line 658) Could not proceed on repair because a neighbor (/10.130.185.136) is dead: manual-repair-d8cdb59a-04a4-4596-b73f-cba3bd2b9eab failed. /mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:7] 2011-09-11 21:20:02,258 AntiEntropyService.java (line 658) Could not proceed on repair because a neighbor (/10.130.185.136) is dead: manual-repair-ad17e938-f474-469c-9180-d88a9007b6b9 failed. /mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:9] 2011-09-12 21:20:02,256 AntiEntropyService.java (line 658) Could not proceed on repair because a neighbor (/10.130.185.136) is dead: manual-repair-636150a5-4f0e-45b7-b400-24d8471a1c88 failed. Appears only in the logs for one node that is generating the issue. 172.16.12.10 Where do I find where the AntiEntropyService.getNeighbors(tablename, range) is pulling it's information from? On the two nodes that work: [default@system] describe cluster; Cluster Information: Snitch: org.apache.cassandra.locator.Ec2Snitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 1b871300-dbdc-11e0--564008fe649f: [172.16.12.10, 172.16.12.11, 172.16.14.12, 172.16.14.10] [default@system] From the two nodes that don't work: [default@unknown] describe cluster; Cluster Information: Snitch: org.apache.cassandra.locator.Ec2Snitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 1b871300-dbdc-11e0--564008fe649f: [172.16.12.10, 172.16.12.11, 172.16.14.12, 172.16.14.10] UNREACHABLE: [10.130.185.136] -- which is really 172.16.14.10 [default@unknown] Really now. Where does 10.130.185.136 exist? It's in none of the configurations I have AND the full ring has been shut down and started up ... not trying to give Vijay a hard time by posting here btw! Just thinking it could be something super silly ... that a wider audience has come across. -- Sasha Dolgy sasha.do...@gmail.com
Re: AntiEntropyService.getNeighbors pulls information from where?
use system; del LocationInfo[52696e67]; i ran this on the nodes that had the problems. stopped, started the nodes, it re-did it's job job done. all fixed with a new bug! https://issues.apache.org/jira/browse/CASSANDRA-3186 On Tue, Sep 13, 2011 at 2:09 AM, aaron morton aa...@thelastpickle.com wrote: I'm pretty sure I'm behind on how to deal with this problem. Best I know is to start the node with -Dcassandra.load_ring_state=false as a JVM option. But if the ghost IP address is in gossip it will not work, and it should be in gossip. Does the ghost IP show up in nodetool ring ? Anyone know a way to remove a ghost IP from gossip that does not have a token associated with it ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 13/09/2011, at 6:39 AM, Sasha Dolgy wrote: This relates to the issue i opened the other day: https://issues.apache.org/jira/browse/CASSANDRA-3175 .. basically, 'nodetool ring' throws an exception on two of the four nodes. In my fancy little world, the problems appear to be related to one of the nodes thinking that someone is their neighbor ... and that someone moved away a long time ago /mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:5] 2011-09-10 21:20:02,182 AntiEntropyService.java (line 658) Could not proceed on repair because a neighbor (/10.130.185.136) is dead: manual-repair-d8cdb59a-04a4-4596-b73f-cba3bd2b9eab failed. /mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:7] 2011-09-11 21:20:02,258 AntiEntropyService.java (line 658) Could not proceed on repair because a neighbor (/10.130.185.136) is dead: manual-repair-ad17e938-f474-469c-9180-d88a9007b6b9 failed. /mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:9] 2011-09-12 21:20:02,256 AntiEntropyService.java (line 658) Could not proceed on repair because a neighbor (/10.130.185.136) is dead: manual-repair-636150a5-4f0e-45b7-b400-24d8471a1c88 failed. Appears only in the logs for one node that is generating the issue. 172.16.12.10 Where do I find where the AntiEntropyService.getNeighbors(tablename, range) is pulling it's information from? On the two nodes that work: [default@system] describe cluster; Cluster Information: Snitch: org.apache.cassandra.locator.Ec2Snitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 1b871300-dbdc-11e0--564008fe649f: [172.16.12.10, 172.16.12.11, 172.16.14.12, 172.16.14.10] [default@system] From the two nodes that don't work: [default@unknown] describe cluster; Cluster Information: Snitch: org.apache.cassandra.locator.Ec2Snitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 1b871300-dbdc-11e0--564008fe649f: [172.16.12.10, 172.16.12.11, 172.16.14.12, 172.16.14.10] UNREACHABLE: [10.130.185.136] -- which is really 172.16.14.10 [default@unknown] Really now. Where does 10.130.185.136 exist? It's in none of the configurations I have AND the full ring has been shut down and started up ... not trying to give Vijay a hard time by posting here btw! Just thinking it could be something super silly ... that a wider audience has come across. -- Sasha Dolgy sasha.do...@gmail.com -- Sasha Dolgy sasha.do...@gmail.com
Ec2Snitch nodetool issue after upgrade to 0.8.5
Upgraded one ring that has four nodes from 0.8.0 to 0.8.5 with only one minor problem. It relates to Ec2Snitch when running a 'nodetool ring' from two of the four nodes. the rest are all working fine: Address DC Rack Status State Load Owns Token 148362247927262972740864614603570725035 172.16.12.11 ap-southeast1a Up Normal 1.58 MB 24.02% 1909554714494251628118265338228798 172.16.12.10 ap-southeast1a Up Normal 1.63 MB 22.11% 56713727820156410577229101238628035242 172.16.14.10 ap-southeast1b Up Normal 1.85 MB 33.33% 113427455640312821154458202477256070484 172.16.14.12 ap-southeast1b Up Normal 1.36 MB 20.53% 14836224792726297274086461460357072503 works ... on 2 nodes which happen to be on the 172.16.14.0/24 network. the nodes where the error appears are on the 172.16.12.0/24 network and this is what is shown when nodetool ring is run: Address DC Rack Status State Load Owns Token 148362247927262972740864614603570725035 172.16.12.11 ap-southeast1a Up Normal 1.58 MB 24.02% 1909554714494251628118265338228798 172.16.12.10 ap-southeast1a Up Normal 1.62 MB 22.11% 56713727820156410577229101238628035242 Exception in thread main java.lang.NullPointerException at org.apache.cassandra.locator.Ec2Snitch.getDatacenter(Ec2Snitch.java:93) at org.apache.cassandra.locator.DynamicEndpointSnitch.getDatacenter(DynamicEndpointSnitch.java:122) at org.apache.cassandra.locator.EndpointSnitchInfo.getDatacenter(EndpointSnitchInfo.java:49) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) I've stopped the node and started it ... still doesn't make a difference. I've also shut down all nodes in the ring so that it was fully offline, and then brought them all back up ... issue still persists on two of the nodes. There are no firewall rules restricting traffic between these nodes. For example, on a node where the ring throws the exception, the two hosts that don't show up i can still get nestats for: nodetool -h 172.16.12.11 -p 9090 netstats 172.16.14.10 Mode: Normal Nothing streaming to /172.16.14.10 Nothing streaming from /172.16.14.10 Pool NameActive Pending Completed Commandsn/a 0 3 Responses n/a 1 1483 nodetool -h 172.16.12.11 -p 9090 netstats
Re: Ec2Snitch nodetool issue after upgrade to 0.8.5
maybe it's related to this ... https://issues.apache.org/jira/browse/CASSANDRA-3114 odd thing is, we haven't moved to Ec2Snitch ... been using it for quite a long time now ... On Sat, Sep 10, 2011 at 1:42 PM, Sasha Dolgy sdo...@gmail.com wrote: Upgraded one ring that has four nodes from 0.8.0 to 0.8.5 with only one minor problem. It relates to Ec2Snitch when running a 'nodetool ring' from two of the four nodes. the rest are all working fine: Address DC Rack Status State Load Owns Token 148362247927262972740864614603570725035 172.16.12.11 ap-southeast1a Up Normal 1.58 MB 24.02% 1909554714494251628118265338228798 172.16.12.10 ap-southeast1a Up Normal 1.63 MB 22.11% 56713727820156410577229101238628035242 172.16.14.10 ap-southeast1b Up Normal 1.85 MB 33.33% 113427455640312821154458202477256070484 172.16.14.12 ap-southeast1b Up Normal 1.36 MB 20.53% 14836224792726297274086461460357072503 works ... on 2 nodes which happen to be on the 172.16.14.0/24 network. the nodes where the error appears are on the 172.16.12.0/24 network and this is what is shown when nodetool ring is run: Address DC Rack Status State Load Owns Token 148362247927262972740864614603570725035 172.16.12.11 ap-southeast1a Up Normal 1.58 MB 24.02% 1909554714494251628118265338228798 172.16.12.10 ap-southeast1a Up Normal 1.62 MB 22.11% 56713727820156410577229101238628035242 Exception in thread main java.lang.NullPointerException at org.apache.cassandra.locator.Ec2Snitch.getDatacenter(Ec2Snitch.java:93) at org.apache.cassandra.locator.DynamicEndpointSnitch.getDatacenter(DynamicEndpointSnitch.java:122) at org.apache.cassandra.locator.EndpointSnitchInfo.getDatacenter(EndpointSnitchInfo.java:49) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) I've stopped the node and started it ... still doesn't make a difference. I've also shut down all nodes in the ring so that it was fully offline, and then brought them all back up ... issue still persists on two of the nodes. There are no firewall rules restricting traffic between these nodes. For example, on a node where the ring throws the exception, the two hosts that don't show up i can still get nestats for: nodetool -h 172.16.12.11 -p 9090
Re: Ec2Snitch nodetool issue after upgrade to 0.8.5
Of course. Hoping one day I create an issue related to Ec2 that CAN be reproduced... https://issues.apache.org/jira/browse/CASSANDRA-3175 On Sat, Sep 10, 2011 at 10:10 PM, Jonathan Ellis jbel...@gmail.com wrote: Can you create a Jira ticket?
Re: Is Cassandra suitable for this use case?
You can chunk the files into pieces and store the pieces in Cassandra... Munge all the pieces back together when delivering back to the client... On Aug 25, 2011 6:33 PM, Ruby Stevenson ruby...@gmail.com wrote: hi Evgeny I appreciate the input. The concern with HDFS is that it has own share of problems - its name node, which essentially a metadata server, load all files information into memory (roughly 300 MB per million files) and its failure handling is far less attractive ... on top of configuring and maintaining two separate components and two API for handling data. I am still holding out hopes that there might be some better way of go about it? Best Regards, Ruby On Thu, Aug 25, 2011 at 11:10 AM, Evgeniy Ryabitskiy evgeniy.ryabits...@wikimart.ru wrote: Hi, If you want to store files with partition/replication, you could use Distributed File System(DFS). Like http://hadoop.apache.org/hdfs/ or any other: http://en.wikipedia.org/wiki/Distributed_file_system Still you could use Cassandra to store any metadata and filepath in DFS. So: Cassandra + HDFS would be my solution. Evgeny.
Re: Changing the CLI, not a great idea!
Unfortunately, the perception that I have as a business consumer and night-time hack, is that more importance and effort is placed on ensuring information is up to date and correct on the http://www.datastax.com/docs/0.8/index website and less on keeping the wiki up to date or relevant... which forces people to be introduced to a for-profit company to get relevant information ... which just so happens to employ a substantial amount of Apache Cassandra contributors ... not that there's anything wrong with that, right? On Thu, Jul 28, 2011 at 10:46 AM, David Boxenhorn da...@citypath.com wrote: This is part of a much bigger problem, one which has many parts, among them: 1. Cassandra is complex. Getting a gestalt understanding of it makes me think I understand how Alzheimer's patients must feel. 2. There is no official documentation. Perhaps everything is out there somewhere, who knows? 3. Cassandra is a moving target. Books are out of date before they hit the press. 4. Most of the important knowledge about Cassandra exists in a kind of oral history, that is hard to keep up with, and even harder to understand once it's long past. I think it is clear that we need a better one-stop-shop for good documentation. What hasn't been talked about much - but I think it's just as important - is a good one-stop-shop for Cassandra's oral history. (You might think this list is the place, but it's too noisy to be useful, except at the very tip of the cowcatcher. Cassandra needs a canonized version of its oral history.)
Re: Equalizing nodes storage load
are you trying to balance load or owns ? owns looks fine ... 33.33% each ... which to me says balanced. how did you calculate your tokens? On Fri, Jul 22, 2011 at 4:37 PM, Mina Naguib mina.nag...@bloomdigital.com wrote: Address Status State Load Owns Token xx.xx.x.105 Up Normal 41.98 GB 33.33% 37809151880104273718152734159085356828 xx.xx.x.107 Up Normal 59.4 GB 33.33% 94522879700260684295381835397713392071 xx.xx.x.18 Up Normal 74.65 GB 33.33% 151236607520417094872610936636341427313
Re: Cassandra training in Bangalore, India
I am quite certain if you find enough people and pony up the fees a few people on this list would be willing to make the journey... On Jul 21, 2011 8:02 AM, samal sa...@wakya.in wrote: As per my knowledge, there is not such expert training available in India as of now. As Sameer said there is enough online material available from where you can learn.I have been playing with Cassandra since beginning. We can plan for Meetup/learning session near Mumbai/Pune region.
Re: Need help json2sstable
You are missing after On Wed, Jul 20, 2011 at 8:03 AM, Nilabja Banerjee nilabja.baner...@gmail.com wrote: Hi All, Here Is my Json structure. {Fetch_CC :{ cc:{ :1000, :ICICI, :, city:{ name:banglore }; }; } If the structure is incorrect, please give me one small structre to use below utility. I am using 0.7.5 version. Now how can I can use Json2SStable utilities? Please provide me the steps. What are the things I have configure? Thank You -- Sasha Dolgy sasha.do...@gmail.com
Re: best example of indexing
Examples exist in the conf directory of the distribution... On Jul 20, 2011 11:48 AM, CASSANDRA learner cassandralear...@gmail.com wrote: Hi Guys, Can you please give me the best example of creating index on a column family. As I am completely new to this, Can you please give me a simple and good example.
Re: One node down but it thinks its fine...
any firewall changes? ping is fine ... but if you can't get from node(a) to nodes(n) on the specific ports... On Wed, Jul 13, 2011 at 6:47 PM, samal sa...@wakya.in wrote: Check seed ip is same in all node and should not be loopback ip on cluster. On Wed, Jul 13, 2011 at 8:40 PM, Ray Slakinski ray.slakin...@gmail.com wrote: One of our nodes, which happens to be the seed thinks its Up and all the other nodes are down. However all the other nodes thinks the seed is down instead. The logs for the seed node show everything is running as it should be. I've tried restarting the node, turning on/off gossip and thrift and nothing seems to get the node to see the rest of its ring as up and running. I have also tried restarting one of the other nodes, which had no affect on the situation. Below is the ring outputs for the seed and one other node in the ring, plus a ping to show that the seed can ping the other node. # bin/nodetool -h 0.0.0.0 ring Address Status State Load Owns Token 141784319550391026443072753096570088105 127.0.0.1 Up Normal 4.61 GB 16.67% 0 xx.xxx.30.210 Down Normal ? 16.67% 28356863910078205288614550619314017621 xx.xx.90.87 Down Normal ? 16.67% 56713727820156410577229101238628035242 xx.xx.22.236 Down Normal ? 16.67% 85070591730234615865843651857942052863 xx.xx.97.96 Down Normal ? 16.67% 113427455640312821154458202477256070484 xx.xxx.17.122 Down Normal ? 16.67% 141784319550391026443072753096570088105 # ping xx.xxx.30.210 PING xx.xxx.30.210 (xx.xxx.30.210) 56(84) bytes of data. 64 bytes from xx.xxx.30.210: icmp_req=1 ttl=61 time=0.299 ms 64 bytes from xx.xxx.30.210: icmp_req=2 ttl=61 time=0.287 ms ^C --- xx.xxx.30.210 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.287/0.293/0.299/0.006 ms # bin/nodetool -h xx.xxx.30.210 ring Address Status State Load Owns Token 141784319550391026443072753096570088105 xx.xxx.23.40 Down Normal ? 16.67% 0 xx.xxx.30.210 Up Normal 10.58 GB 16.67% 28356863910078205288614550619314017621 xx.xx.90.87 Up Normal 10.47 GB 16.67% 56713727820156410577229101238628035242 xx.xx.22.236 Up Normal 9.63 GB 16.67% 85070591730234615865843651857942052863 xx.xx.97.96 Up Normal 10.68 GB 16.67% 113427455640312821154458202477256070484 xx.xxx.17.122 Up Normal 10.18 GB 16.67% 141784319550391026443072753096570088105 -- Ray Slakinski -- Sasha Dolgy sasha.do...@gmail.com
Re: Survey: Cassandra/JVM Resident Set Size increase
I'll post more tomorrow ... However, we set up one node in a single node cluster and have left it with no datareviewing memory consumption graphs...it increased daily until it gobbled (highly technical term) all memory...the system is now running just below 100% memory usagewhich i find peculiar seeings that it is doing nothingwith no data and no peers. On Jul 12, 2011 3:29 PM, Chris Burroughs chris.burrou...@gmail.com wrote: ### Preamble There have been several reports on the mailing list of the JVM running Cassandra using too much memory. That is, the resident set size is (max java heap size + mmaped segments) and continues to grow until the process swaps, kernel oom killer comes along, or performance just degrades too far due to the lack of space for the page cache. It has been unclear from these reports if there is a pattern. My hope here is that by comparing JVM versions, OS versions, JVM configuration etc., we will find something. Thank you everyone for your time. Some example reports: - http://www.mail-archive.com/user@cassandra.apache.org/msg09279.html - http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Very-high-memory-utilization-not-caused-by-mmap-on-sstables-td5840777.html - https://issues.apache.org/jira/browse/CASSANDRA-2868 - http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/OOM-or-what-settings-to-use-on-AWS-large-td6504060.html - http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-memory-problem-td6545642.html For reference theories include (in no particular order): - memory fragmentation - JVM bug - OS/glibc bug - direct memory - swap induced fragmentation - some other bad interaction of cassandra/jdk/jvm/os/nio-insanity. ### Survey 1. Do you think you are experiencing this problem? 2. Why? (This is a good time to share a graph like http://www.twitpic.com/5fdabn or http://img24.imageshack.us/img24/1754/cassandrarss.png) 2. Are you using mmap? (If yes be sure to have read http://wiki.apache.org/cassandra/FAQ#mmap , and explain how you have used pmap [or another tool] to rule you mmap and top decieving you.) 3. Are you using JNA? Was mlockall succesful (it's in the logs on startup)? 4. Is swap enabled? Are you swapping? 5. What version of Apache Cassandra are you using? 6. What is the earliest version of Apache Cassandra you recall seeing this problem with? 7. Have you tried the patch from CASSANDRA-2654 ? 8. What jvm and version are you using? 9. What OS and version are you using? 10. What are your jvm flags? 11. Have you tried limiting direct memory (-XX:MaxDirectMemorySize) 12. Can you characterise how much GC your cluster is doing? 13. Approximately how many read/writes per unit time is your cluster doing (per node or the whole cluster)? 14. How are you column families configured (key cache size, row cache size, etc.)?
Re: Storing counters in the standard column families along with non-counter columns ?
No, it's not possible. To achieve it, there are two options ... contribute to the issue or wait for it to be resolved ... https://issues.apache.org/jira/browse/CASSANDRA-2614 -sd On Sun, Jul 10, 2011 at 5:04 PM, Aditya Narayan ady...@gmail.com wrote: Is it now possible to store counters in the standard column families along with non counter type columns ? How to achieve this ?
Re: Repair doesn't work after upgrading to 0.8.1
This is the same behavior I reported in 2768 as Aaron referenced ... What was suggested for us was to do the following: - Shut down the entire ring - When you bring up each node, do a nodetool repair That didn't immediately resolve the problems. In the end, I backed up all the data, removed the keyspace and created a new one. That seemed to have solved our problems. That was from 0.7.6-2 to 0.8.0 However, in the issue reported, it was unable to be reproduced ... I'd be curious to know how Hector's keyspace is defined. Ours at the time was RF=3 and using Ec2 snitch... -sd On Fri, Jul 1, 2011 at 9:22 AM, Sylvain Lebresne sylv...@datastax.com wrote: Héctor, when you say I have upgraded all my cluster to 0.8.1, from which version was that: 0.7.something or 0.8.0 ? If this was 0.8.0, did you run successful repair on 0.8.0 previous to the upgrade ?
DataStax Brisk
How far behind is Brisk from the Cassandra release cycle? If 0.8.1 of Cassandra was released yesterday, when ( if it isn't already ) will the Brisk distribution implement 0.8.1? -sd -- Sasha Dolgy sasha.do...@gmail.com
Re: advice for EC2 deployment
are you able to open a connection from one of the nodes to a node on the other side? us-east to us-west? could your problem be as simple as connectivity and/or security group configuration? On Thu, Jun 23, 2011 at 1:51 PM, pankaj soni pankajsoni0...@gmail.com wrote: hey, I have got my ec2 multi-dc across AZ's but in same region us-east. Now I am trying to deploy cassandra over multiple regions that is ec2 us west, singapore and us-east. I have edited the config file as sasha's reply below. though when I run nodetool in each DC, I only see the nodes from that region. That is EC2 US west is showing only 2 nodes which are up in that region but not the other 2 which are there in US-east. Kindly suggest a solution. -thanks On Wed, Apr 27, 2011 at 5:45 PM, Sasha Dolgy sdo...@gmail.com wrote: Hi, If I understand you correctly, you are trying to get a private ip in us-east speaking to the private ip in us-west. to make your life easier, configure your nodes to use hostname of the server. if it's in a different region, it will use the public ip (ec2 dns will handle this for you) and if it's in the same region, it will use the private ip. this way you can stop worrying about if you are using the public or private ip to communicate with another node. let the aws dns do the work for you. just make sure you are using v0.8 with SSL turned on and have the appropriate security group definitions ... -sasha On Wed, Apr 27, 2011 at 1:55 PM, pankajsoni0126 pankajsoni0...@gmail.com wrote: I have been trying to deploy Cassandra cluster across regions and for that I posted this IP address resolution in MultiDC setup. But when it is to get nodes talking to each other on different regions say, us-east and us-west over private IP's of EC2 nodes I am facing problems. I am assuming if Cassandra is built for multi-DC setup it should be easily deployed with node1's DC1's public IP listed as seed in all nodes in DC2 and to gain idea about network topology? I have hit a dud for deployment in such scenario. Or is it there any way possible to use Private IP's for such a scenario in EC2, as Public Ip are less secure and costly? -- Sasha Dolgy sasha.do...@gmail.com
Re: How to create data model from RDBMS ERD
you can create the inverted index in the same CF ... just means you would have potentially lots more rows ... do you have a use-case or hypothetical you can share? if not ... here's one. http://code.google.com/p/oauth-php it has an RDBMs suggested model http://oauth-php.googlecode.com/svn/trunk/library/store/mysql/mysql.sql how would you model that? self serving as it's my plan today / tomorrow On Thu, Jun 23, 2011 at 6:43 PM, mcasandra mohitanch...@gmail.com wrote: How should one go about creating a data model from RDBMS ER into Big Table Data model? For eg: RDBMS has many indexes required for queries and I think this is the most important aspect when desiging the data model in Big Table. I was initially planning to denormalize into one CF and use secondary indexes. However I also read that creating secondary indexes have performance impact. So other option is to create inverted index. But it also seems to be bad to have too many CFs. We have requirements to support high volume min of 500 writes + 500 reads per sec. What would you advise?
Re: advice for EC2 deployment
we use a combination of Vyatta OpenVPN on the nodes that are EC2 and nodes that aren't Ec2works a treat. On Thu, Jun 23, 2011 at 10:23 PM, Sameer Farooqui cassandral...@gmail.com wrote: EC2Snitch doesn't currently support multi-Regions in Amazon. Tickets to track: https://issues.apache.org/jira/browse/CASSANDRA-2452 https://issues.apache.org/jira/browse/CASSANDRA-2491 Let us know if/how you get the OpenVPN connection to work across Regions. On Thu, Jun 23, 2011 at 6:29 AM, pankajsoni0126 pankajsoni0...@gmail.com wrote: No, the nodes in the separate DC's are able to discover each other. But across the Dc's its not happening. I have double checked the config parameters, both require in amazon settings and cassandra.yaml before posting query here. has anybody got there nodes talking to each other across regions by just using public-dns? I am also looking into open vpn and how to deploy it.
Re: Storing files in blob into Cassandra
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Storing-photos-images-docs-etc-td6078278.html Of significance from that link (which was great until feeling lucky was removed...): Google of terms cassandra large files + feeling lucky http://www.google.com/search?q=cassandra+large+filesie=utf-8oe=utf-8aq=trls=org.mozilla:en-US:officialclient=firefox-a Yields: http://wiki.apache.org/cassandra/FAQ#large_file_and_blob_storage --- store your images / documents / etc. somewhere and reference them in Cassandra. That's the consensus that's been bandied about on this list quite frequently. we employ a solution that uses Amazon S3 for storage and Cassandra as the reference to the meta data and location of the files. works a treat On Wed, Jun 22, 2011 at 9:07 AM, Damien Picard picard.dam...@gmail.com wrote: Hi, I have to store some files (Images, documents, etc.) for my users in a webapp. I use Cassandra for all of my data and I would like to know if this is a good idea to store these files into blob on a Cassandra CF ? Is there some contraindications, or special things to know to achieve this ? Thank you
Re: solandra or pig or....?
First, thanks everyone for the input. Appreciate it. The number crunching would already have been completed, and all statistics per game defined, and inserted into the appropriate CF/row/cols ... So, that being said, Solandra appears to be the right way to go ... except, this would require that my current application(s) be rewritten to consume Solandra and no longer Cassandra ... Your application isn't aware of Cassandra only Solr. or can I have the best of both worlds? Search is only one aspect of the consumer experience. If a consumer wanted to view a 'card' for a baseball player, all the information would be retrieved directly from Cassandra to build that card and search wouldn't be required... -sd On Tue, Jun 21, 2011 at 9:50 PM, Jake Luciani jak...@gmail.com wrote: Right, Solr will not do anything other than basic aggregations (facets) and range queries. On Tue, Jun 21, 2011 at 3:16 PM, Dan Kuebrich dan.kuebr...@gmail.com wrote: Solandra is indeed distributed search, not distributed number-crunching. As a previous poster said, you could imagine structuring the data in a series of documents with fields containing playername, teamname, position, location, day, time, inning, at bat, outcome, etc. Then you could query to get a slice of the data that matches your predicate and run statistics on that subset. The statistics would have to come from other code (eg. R), but solr will filter it for you. So, this approach only works if the slices are reasonably small, but gives you great granularity on search as long as you put all the info in. The users of this datastore (or you) must be willing to write their own simple aggregation functions (show me only the unique player names returned by this solr query, show me the average of field X returned by this solr query, ...) If the numbers of results are too great, MR may be the way to go.
Re: OOM (or, what settings to use on AWS large?)
We had a similar problem a last month and found that the OS eventually in the end killed the Cassandra process on each of our nodes ... I've upgraded to 0.8.0 from 0.7.6-2 and have not had the problem since, but i do see consumption levels rising consistently from one day to the next on each node .. On Wed, Jun 1, 2011 at 2:30 PM, Sasha Dolgy sdo...@gmail.com wrote: is there a specific string I should be looking for in the logs that isn't super obvious to me at the moment... On Tue, May 31, 2011 at 8:21 PM, Jonathan Ellis jbel...@gmail.com wrote: The place to start is with the statistics Cassandra logs after each GC. look for GCInspector I found this in the logs on all my servers but never did much after that On Wed, Jun 22, 2011 at 2:33 PM, William Oberman ober...@civicscience.com wrote: I woke up this morning to all 4 of 4 of my cassandra instances reporting they were down in my cluster. I quickly started them all, and everything seems fine. I'm doing a postmortem now, but it appears they all OOM'd at roughly the same time, which was not reported in any cassandra log, but I discovered something in /var/log/kern that showed java died of oom(*). In amazon, I'm using large instances for cassandra, and they have no swap (as recommended), so I have ~8GB of ram. Should I use a different max mem setting? I'm using a stock rpm from riptano/datastax. If I run ps -aux I get: /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms3843M -Xmx3843M -Xmn200M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true -Djava.rmi.server.hostname=X.X.X.X -Dcom.sun.management.jmxremote.port=8080 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dmx4jaddress=0.0.0.0 -Dmx4jport=8081 -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -Dcassandra-pidfile=/var/run/cassandra/cassandra.pid -cp :/etc/cassandra/conf:/usr/share/cassandra/lib/antlr-3.1.3.jar:/usr/share/cassandra/lib/apache-cassandra-0.7.4.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-collections-3.2.1.jar:/usr/share/cassandra/lib/commons-lang-2.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.1.jar:/usr/share/cassandra/lib/guava-r05.jar:/usr/share/cassandra/lib/high-scale-lib.jar:/usr/share/cassandra/lib/jackson-core-asl-1.4.0.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.4.0.jar:/usr/share/cassandra/lib/jetty-6.1.21.jar:/usr/share/cassandra/lib/jetty-util-6.1.21.jar:/usr/share/cassandra/lib/jline-0.9.94.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/jug-2.0.0.jar:/usr/share/cassandra/lib/libthrift-0.5.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/mx4j-tools.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.6.1.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.6.1.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar org.apache.cassandra.thrift.CassandraDaemon (*) Also, why would they all OOM so close to each other? Bad luck? Or once the first node went down, is there an increased chance of the rest? I'm still on 0.7.4, when I released cassandra to production that was the latest release. In addition to (or instead of?) fixing memory settings, I'm guessing I should upgrade. will
Re: OOM (or, what settings to use on AWS large?)
Yes ... this is because it was the OS that killed the process, and wasn't related to Cassandra crashing. Reviewing our monitoring, we saw that memory utilization was pegged at 100% for days and days before it was finally killed because 'apt' was fighting for resource. At least, that's as far as I got in my investigation before giving up, moving to 0.8.0 and implementing 24hr nodetool repair on each node via cronjobso far ... no problems. On Wed, Jun 22, 2011 at 2:49 PM, William Oberman ober...@civicscience.com wrote: Well, I managed to run 50 days before an OOM, so any changes I make will take a while to test ;-) I've seen the GCInspector log lines appear periodically in my logs, but I didn't see a correlation with the crash. I'll read the instructions on how to properly do a rolling upgrade today, practice on test, and try that on production first. will
Re: No Transactions: An Example
I'd implement the concept of a bank account using counters in a counter column family. one row per account ... each column for transaction data and one column for the actual balance. just so long as you use whole numbers ... no one needs pennies anymore. -sd On Wed, Jun 22, 2011 at 4:18 PM, Trevor Smith tre...@knewton.com wrote: Hello, I was wondering if anyone had architecture thoughts of creating a simple bank account program that does not use transactions. I think creating an example project like this would be a good thing to have for a lot of the discussions that pop up about transactions and Cassandra (and non-transactional datastores in general). Consider the simple system that has accounts, and users can transfer money between the accounts. There are these interesting papers as background (links below). Thank you. Trevor Smith http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-components-postattachments/00-09-20-52-14/BuildingOnQuicksand_2D00_V3_2D00_081212h_2D00_pdf.pdf http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
Re: Storing Accounting Data
but you can store the -details- of a transaction as json data and do some sanity checks to validate that the data you currently have stored aligns with the recorded transactions. maybe a batch job run every 24 hours ... On Wed, Jun 22, 2011 at 4:19 PM, Oleg Anastastasyev olega...@gmail.com wrote: Is C* suitable for storing customer account (financial) data, as well as billing, payroll, etc? This is a new company so migration is not an issue... starting from scratch. If you need only store them - then yes, but if you require transactions spanning multiple rows or column families, which i believe will be main functionality here - then definitely no, because cassandra has no ACID, no transactions spanning multiple rows and no ability to rollback.
Re: No Transactions: An Example
I would still maintain a record of the transaction ... so that I can do analysis post to determine if/when problems occurred ... On Wed, Jun 22, 2011 at 4:31 PM, Trevor Smith tre...@knewton.com wrote: Sasha, How would you deal with a transfer between accounts in which only one half of the operation was successfully completed? Thank you. Trevor
Re: 99.999% uptime - Operations Best Practices?
Implement monitoring and be proactive...that will stop you waking up to a big surprise. i'm sure there were symltoms leading up to all 4 nodes going down. willing to wager that each node went down at different times and not all went down at once... On Jun 22, 2011 11:50 PM, Les Hazlewood l...@katasoft.com wrote: I understand that every environment is different and it always 'depends' :) But recommending settings and techniques based on an existing real production environment (like the user's suggestion to run nodetool repair as a regular cron job) is always a better starting point for a new Cassandra evaluator than having to start from scratch. Ryan, do you have any 'seed' settings that you guys use for nodes at Twitter? Are there any resources/write-ups beyond the two I've listed already that address some of these 'gotchas'? If those two links are in fact the ideal starting point, that's fine - but it appears that this may not be the case however based on the aforementioned user as well as the other who helped him who saw similar warning signs. I'm hoping for someone to dispel these reports based on what people actually do in production today. Any info/settings/recommendations based on real production environments would be appreciated! Thanks again, Les
Re: OOM (or, what settings to use on AWS large?)
http://www.twitpic.com/5fdabn http://www.twitpic.com/5fdbdg i do love a good graph. two of the weekly memory utilization graphs for 2 of the 4 servers from this ring... week 21 was a nice week ... the week before 0.8.0 went out proper. since then, bumped up to 0.8 and have seen a steady increase in the memory consumption (used) but have not seen the swap do what it did ...and the buffered/cached seems much better -sd On Thu, Jun 23, 2011 at 12:09 AM, Chris Burroughs chris.burrou...@gmail.com wrote: In `free` terms, by pegged do you mean that free Mem was 0, or -/+ buffers/cache as 0?
Re: OOM (or, what settings to use on AWS large?)
yes. each one corresponds with taking a node down for various reasons. i think more people should show their graphs. it's great. hoping Oberman has some.so we can see what his look like ,, On Thu, Jun 23, 2011 at 12:40 AM, Chris Burroughs chris.burrou...@gmail.com wrote: Do all of the reductions in Used on that graph correspond to node restarts? My Zabbix for reference: http://img194.imageshack.us/img194/383/2weekmem.png On 06/22/2011 06:35 PM, Sasha Dolgy wrote: http://www.twitpic.com/5fdabn http://www.twitpic.com/5fdbdg i do love a good graph. two of the weekly memory utilization graphs for 2 of the 4 servers from this ring... week 21 was a nice week ... the week before 0.8.0 went out proper. since then, bumped up to 0.8 and have seen a steady increase in the memory consumption (used) but have not seen the swap do what it did ...and the buffered/cached seems much better -sd On Thu, Jun 23, 2011 at 12:09 AM, Chris Burroughs chris.burrou...@gmail.com wrote: In `free` terms, by pegged do you mean that free Mem was 0, or -/+ buffers/cache as 0? -- Sasha Dolgy sasha.do...@gmail.com
Re: Storing files in blob into Cassandra
maybe you want to spend a few minutes reading about Haystack over at facebook to give you some ideas... https://www.facebook.com/note.php?note_id=76191543919 Not saying what they've done is the right way... just sayin' On Thu, Jun 23, 2011 at 6:29 AM, AJ a...@dude.podzone.net wrote: I was thinking of doing the same thing. But, to compensate for the bandwidth usage during the read, I was hoping to find a way for the httpd or app server to cache the file either in RAM or on disk so subsequent reads could just reference the in-mem cache or local hdd. I have big data requirements, so duplicating the storage of file blobs by adding them to the hdd would almost double my storage requirements. So, the hdd cache would have to be limited with the LRU removed periodically. I was thinking about making the key for each file be a relative file path as if it were on disk. This same path could also be used as it's actual location on disk in the local disk cache. Using a path as the key makes it flexible in many ways if I ever change my mind and want to store all files on disk, or when backing-up or archiving, etc.. But, I'm rusty on my apache http knowledge but I also thought there was an apache cache mod that would use both ram and disk depending on the frequency of use. But, I don't know if you can tell it to cache this blob like it's a file. Just some thoughts.
Re: port 8080
it's defined in $CASSANDRA_HOME/conf/cassandra-env.sh JMX_PORT= Have it different for each instance ... On Tue, Jun 21, 2011 at 1:24 PM, osishkin osishkin osish...@gmail.com wrote: I want to have several deamons running on a machine, each belinging to a multi-node cluster. Is that a problem in concern to port 8080, for jmx monitoring? Is it somewhere hardcoded, so that changing it is the configuration files is not enough? Thank you osi
Re: port 8080
Personally speaking, I do not run JMX on 8080, and never have. The tools, like cassandra-cli and nodetool expect it to be on the default port, but you can override with -p or -jmxport -sd On Tue, Jun 21, 2011 at 1:33 PM, osishkin osishkin osish...@gmail.com wrote: I did, and everything seemed to work fine. But I saw a reference here http://www.onemanclapping.org/2010/03/running-multiple-cassandra-nodes-on.html That said make sure you have at least one node listening on 8080 since all the Cassandra tools assume JMX is listening there, and then remembered that I saw a warning regarding that port when we uploaded one of the machines. Unfortunately I don't have access to them currently, so I can't replicate it immediately. but I thought perhaps someone can repute my fear that there is something special about that port On Tue, Jun 21, 2011 at 2:28 PM, Sasha Dolgy sdo...@gmail.com wrote: it's defined in $CASSANDRA_HOME/conf/cassandra-env.sh JMX_PORT= Have it different for each instance ... On Tue, Jun 21, 2011 at 1:24 PM, osishkin osishkin osish...@gmail.com wrote: I want to have several deamons running on a machine, each belinging to a multi-node cluster. Is that a problem in concern to port 8080, for jmx monitoring? Is it somewhere hardcoded, so that changing it is the configuration files is not enough? Thank you osi -- Sasha Dolgy sasha.do...@gmail.com
Re: pig integration NoClassDefFoundError TypeParser
bang on ... no idea why ... a new day a fresh login ... environment variables gone. working now with cassandra 0.8.0 and pig 0.8.1 went through all my steps and all is working ... except line 45 in the bin/pig_cassandra is not proper when there are multiple pig*.jar files. On Mon, Jun 20, 2011 at 10:03 PM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: I think you might be having environment/classpath issues with an RC of cassandra 0.8 or something.
solandra or pig or....?
Folks, Simple question ... Assuming my current use case is the ability to log lots of trivial and seemingly useless sports statistics ... I want a user to be able to query / compare For example: -- Show me all baseball players in cheektowaga and ontario, california who have hit a grandslam on tuesdays where it was just a leap year. Each baseball player is represented by a single row in a CF: player_uuid, fullname, hometown, game1, game2, game3, game4 Game's are UUID's that are a reference to another row in the same CF that provides information about that game... location, final score, date (unix timestamp or ISO format) , and statitics which are represented as a new column timestamp:player_uuid I can use PIG, as I understand, to run a query to generate specific information about specific things and populate that data back into Cassandra in another CF ... similar to the hypothetical search aboveas the information is structured already, i assume PIG is the right tool for the job, but may not be ideal for a web application and enabling ad-hoc queries ... it could take anywhere from 2-? seconds for that query to generate, populate, and return to the user...? On the other hand, I have started to read about Solr / Solandra / Lucandra can this provide similar functionality or better ? or is it more geared towards full text search and indexing ... I don't want to get into the habit of guessing what my potential users want to search for ... trying to think of ways to offload this to them. -- Sasha Dolgy sasha.do...@gmail.com
Re: solandra or pig or....?
Without getting overly complicated and long winded ... are there practical references / examples I can review that demonstrate the cassandra/solandra benefitsi had a quick look at https://github.com/tjake/Solandra/wiki/Solandra-Wiki and it wasn't dead obvious to me On Tue, Jun 21, 2011 at 8:19 PM, Jake Luciani jak...@gmail.com wrote: Solandra can answer the question you used as an example and it's more of a fit for low-latency ad-hoc reporting then PIG. Pig queries will take minutes not seconds. On Tue, Jun 21, 2011 at 12:12 PM, Sasha Dolgy sdo...@gmail.com wrote: Folks, Simple question ... Assuming my current use case is the ability to log lots of trivial and seemingly useless sports statistics ... I want a user to be able to query / compare For example: -- Show me all baseball players in cheektowaga and ontario, california who have hit a grandslam on tuesdays where it was just a leap year. Each baseball player is represented by a single row in a CF: player_uuid, fullname, hometown, game1, game2, game3, game4 Game's are UUID's that are a reference to another row in the same CF that provides information about that game... location, final score, date (unix timestamp or ISO format) , and statitics which are represented as a new column timestamp:player_uuid I can use PIG, as I understand, to run a query to generate specific information about specific things and populate that data back into Cassandra in another CF ... similar to the hypothetical search aboveas the information is structured already, i assume PIG is the right tool for the job, but may not be ideal for a web application and enabling ad-hoc queries ... it could take anywhere from 2-? seconds for that query to generate, populate, and return to the user...? On the other hand, I have started to read about Solr / Solandra / Lucandra can this provide similar functionality or better ? or is it more geared towards full text search and indexing ... I don't want to get into the habit of guessing what my potential users want to search for ... trying to think of ways to offload this to them. -- Sasha Dolgy sasha.do...@gmail.com -- http://twitter.com/tjake -- Sasha Dolgy sasha.do...@gmail.com
pig integration NoClassDefFoundError TypeParser
Been trying for the past little bit to try and get the PIG integration working with Cassandra 0.8.0 1. Downloaded the src for 0.8.0 and ran ant build 2. went into contrib/pig and ran ant ... gives me: /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/build/cassandra_storage.jar and is copied into the lib/ directory 3. Downloaded pig-0.8.1, modified the ivy/libraries.properties so that it uses Jackson 1.8.2 .. and ran ant. it compiles and gives me two jars: pig-0.8.1-SNAPSHOT-core.jar and pig-0.8.1-SNAPSHOT.jar - I did try to run it with Jackson 1.4 as the contrib/pig/README.txt suggested, but that failed... The referenced JIRA ticket (PIG-1863) suggests 1.6.0 (still produces the same results) Environment variables are set: java version 1.6.0_24 PIG_INITIAL_ADDRESS=localhost PIG_HOME=/usr/local/src/pig-0.8.1 PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner PIG_RPC_PORT=9160 CASSANDRA_HOME=/usr/local/src/apache-cassandra-0.8.0-src I then start up cassandra ... no issues. I connect and create a new keyspace called foo with a column family called bar and a CF called foo...Inside the CF bar, I create a few rows, with random columns 4 Rows. From contrib/pig I run: bin/pig_cassandra -x local ... immediately get the error: [: 45: /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar: unexpected operator -- this is a reference to this line: if [ ! -e $PIG_JAR ]; then *** Problem here is that $PIG_JAR is a reference to two files ... pig-0.8.1-core.jar pig.jar ... Changing line 44 to PIG_JAR=$PIG_HOME/pig*core*.jar fixes this ... (or even referencing $PIG_HOME/build/pig*core*.jar or just pig.jar Try again to run: bin/pig_cassandra -x local and everything loads up nicely: 2011-06-21 02:07:23,671 [main] INFO org.apache.pig.Main - Logging error messages to: /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/pig_1308593243668.log 2011-06-21 02:07:23,778 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:/// grunt register /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar; register /usr/local/src/pig-0.8.1/pig.jar; register /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-fixes.jar; register /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-sources-fixes.jar; register /usr/local/src/apache-cassandra-0.8.0-src/lib/libthrift-0.6.jar; grunt grunt rows = LOAD 'cassandra://foo/bar' USING CassandraStorage(); grunt STORE rows into 'cassandra://foo/foo' USING CassandraStorage(); 2011-06-21 02:04:53,271 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 2011-06-21 02:04:53,271 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used. 2011-06-21 02:04:53,324 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId= 2011-06-21 02:04:53,447 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: rows: Store(cassandra://foo/foo:CassandraStorage) - scope-1 Operator Key: scope-1) 2011-06-21 02:04:53,458 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2011-06-21 02:04:53,477 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2011-06-21 02:04:53,477 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2011-06-21 02:04:53,480 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-06-21 02:04:53,494 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-06-21 02:04:53,494 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job 2011-06-21 02:04:53,556 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2011-06-21 02:04:59,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2011-06-21 02:04:59,718 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-06-21 02:04:59,719 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2011-06-21 02:04:59,948 [Thread-5] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-06-21 02:04:59,960 [Thread-5] INFO
Re: pig integration NoClassDefFoundError TypeParser
Hi ... I still have the same problem with pig-0.8.0-cdh3u0... Maybe I'm doing something wrong. Where does org/apache/cassandra/db/marshal/TypeParser exist, or should exist? It's not in the $CASSANDRA_HOME/libs or /usr/local/src/pig-0.8.0-cdh3u0/lib or /usr/local/src/apache-cassandra-0.8.0-src/build/lib/jars for jar in `ls *.jar` do jar -tf $jar | grep TypeParser if [ $? -eq 0 ]; then echo $jar fi done Shows me nothing in all the lib dirs On Mon, Jun 20, 2011 at 8:44 PM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: Try running with cdh3u0 version of pig and see if it has the same problem. They backported the patch (to pig 0.9 which should be out in time for the hadoop summit next week) that adds the updated jackson dependency for avro. The download URL for that is - http://archive.cloudera.com/cdh/3/pig-0.8.0-cdh3u0.tar.gz Alternatively, I believe today brisk beta 2 will be out which has pig integrated. Not sure if that would work for your current environment though. See if that works. On Jun 20, 2011, at 1:09 PM, Sasha Dolgy wrote: Been trying for the past little bit to try and get the PIG integration working with Cassandra 0.8.0 1. Downloaded the src for 0.8.0 and ran ant build 2. went into contrib/pig and ran ant ... gives me: /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/build/cassandra_storage.jar and is copied into the lib/ directory 3. Downloaded pig-0.8.1, modified the ivy/libraries.properties so that it uses Jackson 1.8.2 .. and ran ant. it compiles and gives me two jars: pig-0.8.1-SNAPSHOT-core.jar and pig-0.8.1-SNAPSHOT.jar - I did try to run it with Jackson 1.4 as the contrib/pig/README.txt suggested, but that failed... The referenced JIRA ticket (PIG-1863) suggests 1.6.0 (still produces the same results) Environment variables are set: java version 1.6.0_24 PIG_INITIAL_ADDRESS=localhost PIG_HOME=/usr/local/src/pig-0.8.1 PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner PIG_RPC_PORT=9160 CASSANDRA_HOME=/usr/local/src/apache-cassandra-0.8.0-src I then start up cassandra ... no issues. I connect and create a new keyspace called foo with a column family called bar and a CF called foo...Inside the CF bar, I create a few rows, with random columns 4 Rows. From contrib/pig I run: bin/pig_cassandra -x local ... immediately get the error: [: 45: /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar: unexpected operator -- this is a reference to this line: if [ ! -e $PIG_JAR ]; then *** Problem here is that $PIG_JAR is a reference to two files ... pig-0.8.1-core.jar pig.jar ... Changing line 44 to PIG_JAR=$PIG_HOME/pig*core*.jar fixes this ... (or even referencing $PIG_HOME/build/pig*core*.jar or just pig.jar Try again to run: bin/pig_cassandra -x local and everything loads up nicely: 2011-06-21 02:07:23,671 [main] INFO org.apache.pig.Main - Logging error messages to: /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/pig_1308593243668.log 2011-06-21 02:07:23,778 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:/// grunt register /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar; register /usr/local/src/pig-0.8.1/pig.jar; register /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-fixes.jar; register /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-sources-fixes.jar; register /usr/local/src/apache-cassandra-0.8.0-src/lib/libthrift-0.6.jar; grunt grunt rows = LOAD 'cassandra://foo/bar' USING CassandraStorage(); grunt STORE rows into 'cassandra://foo/foo' USING CassandraStorage(); 2011-06-21 02:04:53,271 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 2011-06-21 02:04:53,271 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used. 2011-06-21 02:04:53,324 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId= 2011-06-21 02:04:53,447 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: rows: Store(cassandra://foo/foo:CassandraStorage) - scope-1 Operator Key: scope-1) 2011-06-21 02:04:53,458 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2011-06-21 02:04:53,477 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2011-06-21 02:04:53,477 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2011-06-21 02:04:53,480 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-06-21 02:04:53,494 [main] INFO
Re: pig integration NoClassDefFoundError TypeParser
Yes ... I ran an ant in the root directory on a fresh download of 0.8.0 src: /usr/local/src/apache-cassandra-0.8.0-src# ls /usr/local/src/apache-cassandra-0.8.0-src/build/classes/main/org/apache/cassandra/db/marshal/ AbstractCommutativeType.class AbstractType.class LexicalUUIDType.class UTF8Type.class AbstractType$1.classAbstractUUIDType.class LocalByPartionerType.class UTF8Type$UTF8Validator.class AbstractType$2.classAsciiType.class LongType.class UTF8Type$UTF8Validator$State.class AbstractType$3.classBytesType.class MarshalException.class UUIDType.class AbstractType$4.classCounterColumnType.class TimeUUIDType.class AbstractType$5.classIntegerType.class UTF8Type$1.class /usr/local/src/apache-cassandra-0.8.0-src# find . | grep TypeParser /usr/local/src/apache-cassandra-0.8.0-src# echo $? 1 /usr/local/src/apache-cassandra-0.8.0-src# /usr/local/src/apache-cassandra-0.8.0-src# grep -Ri TypeError . /usr/local/src/apache-cassandra-0.8.0-src# echo $? 1 /usr/local/src/apache-cassandra-0.8.0-src# TypeParser does not exist...? On Mon, Jun 20, 2011 at 9:11 PM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: hmmm, did you build the cassandra src in the root of your cassandra directory with ant? sounds like it can't find that cassandra class. That's required.
Re: pig integration NoClassDefFoundError TypeParser
cassandra-0.8.0/src/java/org/apache/cassandra/db/marshal/TypeParser.java : doesn't exist cassandra-0.8.1/src/java/org/apache/cassandra/db/marshal/TypeParser.java : exists... PIG integration with 0.8.0 is no longer working / doesn't work with 0.8.0 release, but will with 0.8.1 .. fair assumption? On Mon, Jun 20, 2011 at 9:18 PM, Sasha Dolgy sdo...@gmail.com wrote: Yes ... I ran an ant in the root directory on a fresh download of 0.8.0 src: /usr/local/src/apache-cassandra-0.8.0-src# ls /usr/local/src/apache-cassandra-0.8.0-src/build/classes/main/org/apache/cassandra/db/marshal/ AbstractCommutativeType.class AbstractType.class LexicalUUIDType.class UTF8Type.class AbstractType$1.class AbstractUUIDType.class LocalByPartionerType.class UTF8Type$UTF8Validator.class AbstractType$2.class AsciiType.class LongType.class UTF8Type$UTF8Validator$State.class AbstractType$3.class BytesType.class MarshalException.class UUIDType.class AbstractType$4.class CounterColumnType.class TimeUUIDType.class AbstractType$5.class IntegerType.class UTF8Type$1.class /usr/local/src/apache-cassandra-0.8.0-src# find . | grep TypeParser /usr/local/src/apache-cassandra-0.8.0-src# echo $? 1 /usr/local/src/apache-cassandra-0.8.0-src# /usr/local/src/apache-cassandra-0.8.0-src# grep -Ri TypeError . /usr/local/src/apache-cassandra-0.8.0-src# echo $? 1 /usr/local/src/apache-cassandra-0.8.0-src# TypeParser does not exist...? On Mon, Jun 20, 2011 at 9:11 PM, Jeremy Hanna jeremy.hanna1...@gmail.com wrote: hmmm, did you build the cassandra src in the root of your cassandra directory with ant? sounds like it can't find that cassandra class. That's required. -- Sasha Dolgy sasha.do...@gmail.com
Re: cassandra crash
What type of environment? We had issues with our cluster on 0.7.6-2 ... The messages you see and highlighted, from what I recall aren't bad ... they are good. Investigating our crash, it turns out that the OS killed our Cassandra process and this was found in /var/log/messages Since then, I have implemented a routine nodetool repair and upgraded to 0.8.0 which seems to have fixed the problem. Can you post specifics about your environment? version, # of nodes, size, etc...? That generally helps people to guess better where your problems are (with respects to the crash you had...) -sd 2011/6/17 Donna Li donna...@utstar.com All: Can you find some exception from the last sentence? Would cassandra crash when memory is not enough? There are some other application run with cassandra, the other application may use large memory. -- *发件人:* Donna Li *发送时间:* 2011年6月17日 9:58 *收件人:* user@cassandra.apache.org *主题:* cassandra crash All: Why cassandra crash after print the following log? INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,020 SSTableDeletingReference.java (line 104) Deleted /usr/local/rss/DDB/data/data/PSCluster/CsiStatusTab-206-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,020 SSTableDeletingReference.java (line 104) Deleted /usr/local/rss/DDB/data/data/PSCluster/CsiStatusTab-207-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,020 SSTableDeletingReference.java (line 104) Deleted /usr/local/rss/DDB/data/data/PSCluster/VCCCurScheduleTable-137-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,021 SSTableDeletingReference.java (line 104) Deleted /usr/local/rss/DDB/data/data/PSCluster/CsiStatusTab-205-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,021 SSTableDeletingReference.java (line 104) Deleted /usr/local/rss/DDB/data/data/PSCluster/VCCCurScheduleTable-139-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,021 SSTableDeletingReference.java (line 104) Deleted /usr/local/rss/DDB/data/data/PSCluster/VCCCurScheduleTable-138-Data.db INFO [SSTABLE-CLEANUP-TIMER] 2011-06-16 14:19:01,021 SSTableDeletingReference.java (line 104) Deleted /usr/local/rss/DDB/data/data/PSCluster/CsiStatusTab-208-Data.db INFO [GC inspection] 2011-06-16 14:22:59,562 GCInspector.java (line 110) GC for ParNew: 385 ms, 26859800 reclaimed leaving 117789112 used; max is 118784 Best Regards Donna li
Re: Cassandra.yaml
Hi Vivek, When I write client code in Java, using Hector, I don't specify a cassandra.yaml ... I specify the host(s) and keyspace I want to connect to. Alternately, I specify the host(s) and create the keyspace if the one I would like to use doesn't exist (new cluster for example). At no point do I use yaml file with my client code The conf/cassandra.yaml is there to tell the cassandra server how to behave / operate when it starts ... -sd On Fri, Jun 17, 2011 at 9:55 AM, Vivek Mishra vivek.mis...@impetus.co.in wrote: I have a query: I have my Cassandra server running on my local machine and it has loaded Cassandra specific settings from apache-cassandra-0.8.0-src/apache-cassandra-0.8.0-src/conf/cassandra.yaml Now If I am writing a java program to connect to this server why do I need to provide a new Cassandra.yaml file again? Even if server is already up and running Even if I can create keyspaces, columnfamilies programmatically? Isn’t it some type of redundancy? Might be my query is a bit irrelevant. -Vivek
Re: Querying superColumn
Write two records ... 1. [department1] = { Vivek : India } 2. [India] = { Vivek : department1 } 1. [department1] = { Vivs : USA } 2. [USA] = { Vivs : department1 } Now you can query a single row to display all employees in USA or all employees in department1 ... employee moves to a new department in a new country, simply remove the column from that department row and country row and re-insert into the new rows... My understanding with Cassandra and similar technologies is that you are designing to be smart and avoid data duplication. You are designing to address the searches and queries based on your business requirements ... when you know what those are, you cheat and pre-populate the data you will be searching on ... On Fri, Jun 17, 2011 at 1:16 PM, Vivek Mishra vivek.mis...@impetus.co.in wrote: Correct. But that will not solve issue of data colocation(data locality) ? From: Sasha Dolgy [mailto:sdo...@gmail.com] Sent: Thursday, June 16, 2011 8:47 PM To: user@cassandra.apache.org Subject: Re: Querying superColumn Have 1 row with employee info for country/office/division, each column an employee id and json info about the employee or a reference.to.another row id for that employee data No more supercolumn. On Jun 16, 2011 1:56 PM, Vivek Mishra vivek.mis...@impetus.co.in wrote: I have a question about querying super column For example: I have a supercolumnFamily DEPARTMENT with dynamic superColumn 'EMPLOYEE'( name, country). Now for rowKey 'DEPT1' I have inserted multiple super column like: Employee1{ Name: Vivek country: India } Employee2{ Name: Vivs country: USA } Now if I want to retrieve a super column whose rowkey is 'DEPT1' and employee name is 'Vivek'. Can I get only 'EMPLOYEE1' ? -Vivek Write to us for a Free Gold Pass to the Cloud Computing Expo, NYC to attend a live session by Head of Impetus Labs on 'Secrets of Building a Cloud Vendor Agnostic PetaByte Scale Real-time Secure Web Application on the Cloud '. Looking to leverage the Cloud for your Big Data Strategy ? Attend Impetus webinar on May 27 by registering at http://www.impetus.com/webinar?eventid=42 . NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference. Write to us for a Free Gold Pass to the Cloud Computing Expo, NYC to attend a live session by Head of Impetus Labs on ‘Secrets of Building a Cloud Vendor Agnostic PetaByte Scale Real-time Secure Web Application on the Cloud ‘. Looking to leverage the Cloud for your Big Data Strategy ? Attend Impetus webinar on May 27 by registering at http://www.impetus.com/webinar?eventid=42 . NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference. -- Sasha Dolgy sasha.do...@gmail.com
Re: getFieldValue()
A good example for what I understand in using Hector / pycassa / etc. is, if you wanted to implement connection pooling, you would have to craft your own solution, versus implementing the solution that is tested and ready to go, provided by Hector. Thrift doesn't provide native connection pooling ...? There are a few scenarios / examples where using a library that abstracts the Thrift bindings will make your life easier ... and they are maintained and up to date generally in alignment with new releases of Cassandra. That's a +1 for me ... Nothing stops you from using Thrift .. depends on how much work you want to implement yourself. -sd On Fri, Jun 17, 2011 at 5:30 PM, Markus Wiesenbacher | Codefreun.de m...@codefreun.de wrote: One question regarding point 2: Why should we always use Hector, Thrift is not that bad?
Re: Docs: Token Selection
+1 for this if it is possible... On Fri, Jun 17, 2011 at 6:31 PM, Eric tamme eta...@gmail.com wrote: What I don't like about NTS is I would have to have more replicas than I need. {DC1=2, DC2=2}, RF=4 would be the minimum. If I felt that 2 local replicas was insufficient, I'd have to move up to RF=6 which seems like a waste... I'm predicting data in the TB range so I'm trying to keep replicas to a minimum. My goal is to have 2-3 replicas in a local data center and 1 replica in another dc. I think that would be enough barring a major catastrophe. But, I'm not sure this is possible. I define local as in the same data center as the client doing the insert/update. Yes, not being able to configure the replication factor differently for each data center is a bit annoying. Im assuming you basically want DC1 to have a replication factor of {DC1:2, DC2:1} and DC2 to have {DC1:1,DC2:2}. I would very much like that feature as well, but I dont know the feasibility of it. -Eric
Re: urgent how to specify multiple hosts in cassandra
have them all within a and not multiple , for example: seeds: 192.168.1.115, 192.168.1.110 versus what you have... On Fri, Jun 17, 2011 at 7:00 PM, Anurag Gujral anurag.guj...@gmail.com wrote: Hi All I specified multiple hosts in seeds field when using cassandra-0.8 like this seeds: 192.168.1.115,192.168.1.110,192.168.1.113 But I am getting error that hile parsing a block mapping in reader, line 106, column 13: - seeds: 192.168.1.115,192.168. ... ^ expected block end, but found FlowEntry in reader, line 106, column 35: - seeds: 192.168.1.115,192.168.1.110,192.168.1.113 ... ^ Please suggest I am doing upgrade right now/ Thanks Anurag -- Sasha Dolgy sasha.do...@gmail.com
Re: Docs: Token Selection
Replication factor is defined per keyspace if i'm not mistaken. Can't remember if NTS is per keyspace or per cluster ... if it's per keyspace, that would be a way around it ... without having to maintain multiple clusters just have multiple keyspaces ... On Fri, Jun 17, 2011 at 9:23 PM, AJ a...@dude.podzone.net wrote: On 6/17/2011 12:32 PM, Jeremiah Jordan wrote: Run two clusters, one which has {DC1:2, DC2:1} and one which is {DC1:1,DC2:2}. You can't have both in the same cluster, otherwise it isn't possible to tell where the data got written when you want to read it. For a given key XYZ you must be able to compute which nodes it is stored on just using XYZ, so a strategy where it is on nodes DC1_1,DC1_2, and DC2_1 when a node in DC1 is the coordinator, and to DC1_1, DC2_1 and DC2_2 when a node in DC2 is the coordinator won't work. Given just XYZ I don't know where to look for the data. But, from the way you describe what you want to happen, clients from DC1 aren't using data inserted by clients from DC2, so you should just make two different Cassandra clusters. Once for the DC1 guys which is {DC1:2, DC2:1} and one for the DC2 guys which is {DC1:1,DC2:2}. Interesting. Thx. -- Sasha Dolgy sasha.do...@gmail.com
Re: sstable2json2sstable bug with json data stored
The JSON you are showing below is an export from cassandra? { 74657374: [[data, {foo:bar}, 1308209845388000]] } Does this work? { 74657374: [[data, {foo:bar}, 1308209845388000]] } -sd On Thu, Jun 16, 2011 at 9:49 AM, Timo Nentwig timo.nent...@toptarif.de wrote: On 6/15/11 17:41, Timo Nentwig wrote: (json can likely be boiled down even more...) Any JSON (well, probably anything with quotes...) breaks it: { 74657374: [[data, {foo:bar}, 1308209845388000]] } [default@foo] set transactions[test][data]='{foo:bar}'; I feared that storing data in a readable fashion would be a fateful idea. https://issues.apache.org/jira/browse/CASSANDRA-2780
Re: Querying superColumn
Have 1 row with employee info for country/office/division, each column an employee id and json info about the employee or a reference.to.another row id for that employee data No more supercolumn. On Jun 16, 2011 1:56 PM, Vivek Mishra vivek.mis...@impetus.co.in wrote: I have a question about querying super column For example: I have a supercolumnFamily DEPARTMENT with dynamic superColumn 'EMPLOYEE'( name, country). Now for rowKey 'DEPT1' I have inserted multiple super column like: Employee1{ Name: Vivek country: India } Employee2{ Name: Vivs country: USA } Now if I want to retrieve a super column whose rowkey is 'DEPT1' and employee name is 'Vivek'. Can I get only 'EMPLOYEE1' ? -Vivek Write to us for a Free Gold Pass to the Cloud Computing Expo, NYC to attend a live session by Head of Impetus Labs on 'Secrets of Building a Cloud Vendor Agnostic PetaByte Scale Real-time Secure Web Application on the Cloud '. Looking to leverage the Cloud for your Big Data Strategy ? Attend Impetus webinar on May 27 by registering at http://www.impetus.com/webinar?eventid=42 . NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: Docs: Token Selection
So, with ec2 ... 3 regions (DC's), each one is +1 from another? On Jun 16, 2011 3:40 PM, AJ a...@dude.podzone.net wrote: Thanks Eric! I've finally got it! I feel like I've just been initiated or something by discovering this secret. I kid! But, I'm thinking about using OldNetworkTopStrat. Do you, or anyone else, know if the same rules for token assignment applies to ONTS? On 6/16/2011 7:21 AM, Eric tamme wrote: AJ, sorry I seemed to miss the original email on this thread. As Aaron said, when computing tokens for multiple data centers, you should compute them independently for each data center - as if it were its own Cassandra cluster. You can have overlapping token ranges between multiple data centers, but no two nodes can have the same token, so for subsequent data centers I just increment the tokens. For two data centers with two nodes each using RandomPartitioner calculate the tokens for the first DC normally, but int he second data center, increment the tokens by one. In DC 1 node 1 = 0 node 2 = 85070591730234615865843651857942052864 In DC 2 node 1 = 1 node 2 = 85070591730234615865843651857942052865 For RowMutations this will give each data center a local set of nodes that it can write to for complete coverage of the entire token space. If you are using NetworkTopologyStrategy for replication, it will give an offset mirror replication between the two data centers so that your replicas will not get pinned to a node in the remote DC. There are other ways to select the tokens, but the increment method is the simplest to manage and continue to grow with. Hope that helps. -Eric
Re: cascading failures due to memory
No. Upgraded to 0.8 and monitor the systems more. we schedule a repair every 24hrs via cron and so far no problems.. On Jun 15, 2011 5:44 PM, AJ a...@dude.podzone.net wrote: Sasha, Did you ever nail down the cause of this problem? On 5/31/2011 4:01 AM, Sasha Dolgy wrote: hi everyone, the current nodes i have deployed (4) have all been working fine, with not a lot of data ... more reads than writes at the moment. as i had monitoring disabled, when one node's OS killed the cassandra process due to out of memory problems ... that was fine. 24 hours later, another node, 24 hours later, another node ...until finally, all 4 nodes no longer had cassandra running. When all nodes are started fresh, CPU utilization is at about 21% on each box. after 24 hours, this goes up to 32% and then 51% 24 hours later. originally I had thought that this may be a result of 'nodetool repair' not being run consistently ... after adding a cronjob to run every 24 hours (staggered between nodes) the problem of the increasing memory utilization does not resolve. i've read the operations page and also the http://wiki.apache.org/cassandra/MemtableThresholds page. i am running defaults and 0.7.6-02 ... what are the best places to start in terms of finding why this is happening? CF design / usage? 'nodetool cfstats' gives me some good info ... and i've already implemented some changes to one CF based on how it had ballooned (too many rows versus not enough columns) suggestions appreciated
Re: What's the best approach to search in Cassandra
Datastax has pretty sufficient documentation on their site for secondary indexes. On Jun 16, 2011 6:57 AM, Mark Kerzner markkerz...@gmail.com wrote: Jake, *You need to maintain a huge number of distinct indexes.* * * *Are we talking about secondary indexes? If yes, this sounds like exactly my problem. There is so little documentation! - but I think that if I read all there is on GitHub, I can probably start using it. * Thank you, Mark On Fri, Jun 3, 2011 at 8:07 PM, Jake Luciani jak...@gmail.com wrote: Mark, Check out Solandra. http://github.com/tjake/Solandra On Fri, Jun 3, 2011 at 7:56 PM, Mark Kerzner markkerz...@gmail.com wrote: Hi, I need to store, say, 10M-100M documents, with each document having say 100 fields, like author, creation date, access date, etc., and then I want to ask questions like give me all documents whose author is like abc**, and creation date any time in 2010 and access date in 2010-2011, and so on, perhaps 10-20 conditions, matching a list of some keywords. What's best, Lucene, Katta, Cassandra CF with secondary indices, or plan scan and compare of every record? Thanks a bunch! Mark -- http://twitter.com/tjake
Re: odd logs after repair
Hi ... Does anyone else see these type of INFO messages in their log files, or is i just me..? INFO [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AntiEntropyService.java (line 177) Excluding /10.128.34.18 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again. ERROR [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec,5,RMI Runtime] java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793) at java.util.HashMap$KeyIterator.next(HashMap.java:828) at org.apache.cassandra.service.AntiEntropyService.getNeighbors(AntiEntropyService.java:173) at org.apache.cassandra.service.AntiEntropyService$RepairSession.run(AntiEntropyService.java:776) I'm at a loss as to why this is showing up in the logs. -sd On Mon, Jun 13, 2011 at 3:58 PM, Sasha Dolgy sdo...@gmail.com wrote: hm. that's not it. we've been using a non-standard jmx port for some time i've dropped the keyspace and recreated ... wonder if that'll help On Mon, Jun 13, 2011 at 3:57 PM, Tyler Hobbs ty...@datastax.com wrote: On Mon, Jun 13, 2011 at 8:41 AM, Sasha Dolgy sdo...@gmail.com wrote: I recall there being a discussion about a default port changing from 0.7.x to 0.8.x ...this was JMX, correct? Or were there others. Yes, the default JMX port changed from 8080 to 7199. I don't think there were any others.
Re: odd logs after repair
Hi Sylvain, I verified on all nodes with nodetool version that they are 0.8 and have even restarted nodes. Still persists. The four nodes all report similar errors about the other nodes. When i upgraded to 0.8 maybe there were relics about the keyspace that say it's from an earlier version? I need to create a new keyspace to see if that fixes the error On Jun 14, 2011 10:08 AM, Sylvain Lebresne sylv...@datastax.com wrote: The exception itself is a bug (I've created https://issues.apache.org/jira/browse/CASSANDRA-2767 to fix it). However, the important message is the previous one (Even if the exception was not thrown, repair wouldn't be able to work correctly, so the fact that the exception is thrown is not such a big deal). Apparently, from the standpoint of whomever node this logs is from, the node 10.128.34.18 is still running 0.7. You should check if it is the case (restarting 10.128.34.18 and look for something like 'Cassandra version: 0.8.0' is one solution). If the does does run 0.8.0 and you still get this error, then it would point to a problem with our detection of the nodes. -- Sylvain On Tue, Jun 14, 2011 at 9:55 AM, Sasha Dolgy sdo...@gmail.com wrote: Hi ... Does anyone else see these type of INFO messages in their log files, or is i just me..? INFO [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AntiEntropyService.java (line 177) Excluding /10.128.34.18 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again. ERROR [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec,5,RMI Runtime] java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793) at java.util.HashMap$KeyIterator.next(HashMap.java:828) at org.apache.cassandra.service.AntiEntropyService.getNeighbors(AntiEntropyService.java:173) at org.apache.cassandra.service.AntiEntropyService$RepairSession.run(AntiEntropyService.java:776) I'm at a loss as to why this is showing up in the logs. -sd On Mon, Jun 13, 2011 at 3:58 PM, Sasha Dolgy sdo...@gmail.com wrote: hm. that's not it. we've been using a non-standard jmx port for some time i've dropped the keyspace and recreated ... wonder if that'll help On Mon, Jun 13, 2011 at 3:57 PM, Tyler Hobbs ty...@datastax.com wrote: On Mon, Jun 13, 2011 at 8:41 AM, Sasha Dolgy sdo...@gmail.com wrote: I recall there being a discussion about a default port changing from 0.7.x to 0.8.x ...this was JMX, correct? Or were there others. Yes, the default JMX port changed from 8080 to 7199. I don't think there were any others.
Re: odd logs after repair
https://issues.apache.org/jira/browse/CASSANDRA-2768 On Tue, Jun 14, 2011 at 10:55 AM, Sylvain Lebresne sylv...@datastax.com wrote: Could you open a ticket then please ? -- Sylvain On Tue, Jun 14, 2011 at 10:25 AM, Sasha Dolgy sdo...@gmail.com wrote: Hi Sylvain, I verified on all nodes with nodetool version that they are 0.8 and have even restarted nodes. Still persists. The four nodes all report similar errors about the other nodes. When i upgraded to 0.8 maybe there were relics about the keyspace that say it's from an earlier version? I need to create a new keyspace to see if that fixes the error On Jun 14, 2011 10:08 AM, Sylvain Lebresne sylv...@datastax.com wrote: The exception itself is a bug (I've created https://issues.apache.org/jira/browse/CASSANDRA-2767 to fix it). However, the important message is the previous one (Even if the exception was not thrown, repair wouldn't be able to work correctly, so the fact that the exception is thrown is not such a big deal). Apparently, from the standpoint of whomever node this logs is from, the node 10.128.34.18 is still running 0.7. You should check if it is the case (restarting 10.128.34.18 and look for something like 'Cassandra version: 0.8.0' is one solution). If the does does run 0.8.0 and you still get this error, then it would point to a problem with our detection of the nodes. -- Sylvain On Tue, Jun 14, 2011 at 9:55 AM, Sasha Dolgy sdo...@gmail.com wrote: Hi ... Does anyone else see these type of INFO messages in their log files, or is i just me..? INFO [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AntiEntropyService.java (line 177) Excluding /10.128.34.18 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again. ERROR [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec,5,RMI Runtime] java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793) at java.util.HashMap$KeyIterator.next(HashMap.java:828) at org.apache.cassandra.service.AntiEntropyService.getNeighbors(AntiEntropyService.java:173) at org.apache.cassandra.service.AntiEntropyService$RepairSession.run(AntiEntropyService.java:776) I'm at a loss as to why this is showing up in the logs. -sd On Mon, Jun 13, 2011 at 3:58 PM, Sasha Dolgy sdo...@gmail.com wrote: hm. that's not it. we've been using a non-standard jmx port for some time i've dropped the keyspace and recreated ... wonder if that'll help On Mon, Jun 13, 2011 at 3:57 PM, Tyler Hobbs ty...@datastax.com wrote: On Mon, Jun 13, 2011 at 8:41 AM, Sasha Dolgy sdo...@gmail.com wrote: I recall there being a discussion about a default port changing from 0.7.x to 0.8.x ...this was JMX, correct? Or were there others. Yes, the default JMX port changed from 8080 to 7199. I don't think there were any others. -- Sasha Dolgy sasha.do...@gmail.com
Re: New web client future API
Your application is built with the thrift bindings and not with a higher level client like Hector? On Tue, Jun 14, 2011 at 3:42 PM, Markus Wiesenbacher | Codefreun.de m...@codefreun.de wrote: Hi, what is the future API for Cassandra? Thrift, Avro, CQL? I just released an early version of my web client (http://www.codefreun.de/apollo) which is Thrift-based, and therefore I would like to know what the future is ... Many thanks MW -- Sasha Dolgy sasha.do...@gmail.com
Re: odd logs after repair
I recall there being a discussion about a default port changing from 0.7.x to 0.8.x ...this was JMX, correct? Or were there others. On Mon, Jun 13, 2011 at 3:34 PM, Sasha Dolgy sdo...@gmail.com wrote: Hi Aaron, The error is being reported on all 4 nodes. I have confirmed (for my own sanity) that each node is running: ReleaseVersion: 0.8.0 I can reproduce the error on any node by trailing cassandra/logs/system.log and running nodetool repair INFO [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AntiEntropyService.java (line 177) Excluding /10.128.34.18 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again. ERROR [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 21:28:39,877 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec,5,RMI Runtime] java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793) at java.util.HashMap$KeyIterator.next(HashMap.java:828) at org.apache.cassandra.service.AntiEntropyService.getNeighbors(AntiEntropyService.java:173) at org.apache.cassandra.service.AntiEntropyService$RepairSession.run(AntiEntropyService.java:776) When I run nodetool ring, the ring looks balanced and nothing out of sorts. I also have this set up with RF=3 on 4 nodes ... but repair was working fine prior to the 0.8.0 upgrade. Are there any special commands I need to run? I've tried scrub, cleanup, flush too ... still, repair gives the same issues. -- I have stopped one of the nodes and started it. Issue still persists. I stop another node that is reported in the logs (like .18 above) and start it ... run repair again ... issue is persisted to the log file still. -sd On Mon, Jun 13, 2011 at 3:02 PM, aaron morton aa...@thelastpickle.com wrote: You can double check with node tool e.g. $ ./bin/nodetool -h localhost version ReleaseVersion: 0.8.0-SNAPSHOT This error is about the internode wire protocol one node thinks another is using. Not sure how it could get confused, does it go away if you restart the node that logged the error ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 13 Jun 2011, at 06:19, Sasha Dolgy wrote: Hi Everyone, Last week, upgraded all 4 nodes to apache-cassandra-0.8.0 .. no issues. Trolling the logs today, I find messages like this on all four nodes: INFO [manual-repair-0b61c9e2-3593-4633-a80f-b6ca52cfe948] 2011-06-13 02:16:45,978 AntiEntropyService.java (line 177) Excluding /10.128.34.18 from repair because it is on version 0.7 or sooner. You should consider updating this node before running repair again. Maybe it would be nice to have the version of all nodes print in nodetool ring ? I don't think I'm crazy though ... have manually checked all are on 0.8.0 -- Sasha Dolgy sasha.do...@gmail.com -- Sasha Dolgy sasha.do...@gmail.com -- Sasha Dolgy sasha.do...@gmail.com
Re: count column in Cassandra
probably helpful if you change the subject when posting about a different topic. Is your question about counters or the count function? Counters are cool. Count allows you to determine how many columns exist in a row. -sd On Mon, Jun 13, 2011 at 5:27 PM, Sijie YANG iyan...@gmail.com wrote: Hi, All I am newbie to cassandra. I have a simple question but don't find any clear answer by searching google: What's the meaning of count column in Cassandra? Thanks.
Re: SSL Streaming
AJ was responding to an email I sent in Marchalthough i do appreciate the quick reaponse from the community ;) i moved on to our implementation of vpn... On Jun 14, 2011 1:35 AM, aaron morton aa...@thelastpickle.com wrote: Sasha does https://github.com/apache/cassandra/blob/cassandra-0.8.0/conf/cassandra.yaml#L362help ? A - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 13 Jun 2011, at 23:26, AJ wrote: Performance-wise, I think it would be better to just let the client encrypt sensitive data before storing it, versus encrypting all traffic all the time. If individual values are encrypted, then they don't have to be encrypted/decrypted during transit between nodes during the initial updates as well as during the commissioning of a new node or other times. A drawback, however, is now you have to manage one or more keys for the lifetime of the data. It will also complicate your data view interfaces. However, if Cassandra had data encryption built-in somehow, that would solve this problem... just thinking out loud. Can anyone think of other pro/cons of both strategies? On 3/22/2011 2:21 AM, Sasha Dolgy wrote: Hi, Is there documentation available anywhere that describes how one can use org.apache.cassandra.security.streaming.* ? After the EC2 posts yesterday, one question I was asked was about the security of data being shifted between nodes. Is it done in clear text, or encrypted..? I haven't seen anything to suggest that it's encrypted, but see in the source that security.streaming does leverage SSL ... Thanks in advance for some pointers to documentation. Also, for anyone who is using SSL .. how much of a performance impact have you noticed? Is it minimal or significant?
Re: Cassandra not starting right
Dont post to the list in html...that should work. -f puts it to foreground. Without -f puts it to the bafkground On Jun 11, 2011 7:29 AM, Jean-Nicolas Boulay Desjardins jnbdzjn...@gmail.com wrote: Thanks for your help! It seems when I use this command: ./bin/cassandra -f It makes it work. I still need to do contr-C Sorry I am emailing you directly, but for some reason every email I send to the news letter sends me back an error. Thanks again.
Re: Cannot connect to Cassandra
netstat -an | grep 9160 see anything? maybe cassandra service isn't running.? look for hints in the log files. these are defined in the $CASSANDRA_HOME/conf/log4j-server.properties ... On Fri, Jun 10, 2011 at 9:23 PM, Jean-Nicolas Boulay Desjardins jnbdzjn...@gmail.com wrote: My Cassandra used to work with no problems. I was able to connect with no problems but now for some reason it doesn't work anymore. [default@unknown] connect localhost/9160; Exception connecting to localhost/9160. Reason: Connection refused. and root# ./bin/cassandra-cli -host localhost -port 9160 Exception connecting to localhost/9160. Reason: Connection refused. Thanks in advance... -- Sasha Dolgy sasha.do...@gmail.com
Re: Python Client
pycassa.. http://pycassa.github.com/pycassa/ On Sat, Jun 11, 2011 at 4:58 AM, Carlos Sanchez papach...@gmail.com wrote: All, I was wondering if there are Cassandra python clients and which one would be the best to use Thanks a lot, Carlos
Re: after a while nothing happening with repair
I recall having this issue when one of the nodes wasn't available ... or there was a problem during the repair process. Cancelling the repair job and rerunning it would complete successfully. I believe there is a bug open for this https://issues.apache.org/jira/browse/CASSANDRA-2290 On Thu, Jun 9, 2011 at 10:28 AM, Jonathan Colby jonathan.co...@gmail.comwrote: When I run repair on a node in my 0.7.6-2 cluster, the repair starts to stream data and activity is seen in the logs. However, after a while (a day or so) it seems like everything freezes up. The repair command is still running (the command prompt has not returned) and netstats shows output similar to below. All streams at 0% and nothing happening. The logs indicate that things were started but there is no indication if anything is in fact still active. For example, this is the last log entry related to repair, just this morning: INFO [StreamStage:1] 2011-06-09 07:13:21,423 StreamOut.java (line 173) Stream context metadata [/var/lib/cassandra/data/DFS/main-f-144-Data.db sections=2 progress=0/31947748 - 0%, /var/lib/cassandra/data/DFS/main-f-145-Data.db section s=2 progress=0/25786564 - 0%, /var/lib/cassandra/data/DFS/main-f-143-Data.db sections=2 progress=0/5830103399 - 0%], 9 sstables. INFO [StreamStage:1] 2011-06-09 07:13:21,423 StreamOutSession.java (line 174) Streaming to /10.46.108.104 However, netstats on all related notes looks something like this. The nodes continue to handle read/write requests just fine. They are not overloaded at all. Any advice would be greatly appreciated. Because repairs seem like they never finish, I have a feeling we have a lot of garbage data in our cluster. /opt/cassandra/bin/nodetool -h $HOSTNAME -p 35014 netstats Mode: Normal Not sending any streams. Streaming from: /10.46.108.104 DFS: /var/lib/cassandra/data/DFS/main-f-209-Data.db sections=2 progress=0/276461810 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-153-Data.db sections=2 progress=0/100340568 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-40-Data.db sections=2 progress=0/62726190502 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-180-Data.db sections=1 progress=0/158898493 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-109-Data.db sections=2 progress=0/87250515569 - 0% Streaming from: /10.47.108.102 DFS: /var/lib/cassandra/data/DFS/main-f-304-Data.db sections=2 progress=0/13563864214 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-350-Data.db sections=1 progress=0/2877129955 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-379-Data.db sections=2 progress=0/143804948 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-370-Data.db sections=2 progress=0/683716174 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-371-Data.db sections=2 progress=0/56650 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-368-Data.db sections=2 progress=0/4005533616 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-369-Data.db sections=2 progress=0/155515922 - 0% Streaming from: /10.46.108.103 DFS: /var/lib/cassandra/data/DFS/main-f-888-Data.db sections=2 progress=0/158096259 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-828-Data.db sections=1 progress=0/29508276 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-886-Data.db sections=2 progress=0/133704150 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-759-Data.db sections=2 progress=0/83629797522 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-889-Data.db sections=2 progress=0/96903803 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-751-Data.db sections=2 progress=0/17944852950 - 0% Streaming from: /10.46.108.101 DFS: /var/lib/cassandra/data/DFS/main-f-1318-Data.db sections=2 progress=0/60617216778 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-1179-Data.db sections=2 progress=0/11870790009 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-1324-Data.db sections=2 progress=0/710603722 - 0% DFS: /var/lib/cassandra/data/DFS/main-f-1322-Data.db sections=2 progress=0/5844992187 - 0% -- Sasha Dolgy sasha.do...@gmail.com
Re: how to retrieve data from supercolumns by phpcassa ?
you'll find a response to this question on the phpcassa mailing list ... where you asked the same question. -sd On Wed, Jun 8, 2011 at 10:22 AM, amrita amritajayakuma...@gmail.com wrote: Hi, Can u please tell me how to create a supercolumn and retrieve data from it using phpcassa??? student_details{id{sid,lesson_id,answers{time_expired,answer_opted}}}
upgrading to cassandra 0.8
Hi, Good news on the 0.8 release. So ... if I upgrade one node out of four, and let it run for a bit ... I should have no issues, correct? If I make schema changes, specifically, adding a new column family for counters, how will this behave with the other three nodes that aren't upgraded? Or ... should schema changes not be done until all nodes are upgraded? -- Sasha Dolgy sasha.do...@gmail.com
Re: cascading failures due to memory
is there a specific string I should be looking for in the logs that isn't super obvious to me at the moment... On Tue, May 31, 2011 at 8:21 PM, Jonathan Ellis jbel...@gmail.com wrote: The place to start is with the statistics Cassandra logs after each GC. On Tue, May 31, 2011 at 5:01 AM, Sasha Dolgy sdo...@gmail.com wrote: hi everyone, the current nodes i have deployed (4) have all been working fine, with not a lot of data ... more reads than writes at the moment. as i had monitoring disabled, when one node's OS killed the cassandra process due to out of memory problems ... that was fine. 24 hours later, another node, 24 hours later, another node ...until finally, all 4 nodes no longer had cassandra running. When all nodes are started fresh, CPU utilization is at about 21% on each box. after 24 hours, this goes up to 32% and then 51% 24 hours later. originally I had thought that this may be a result of 'nodetool repair' not being run consistently ... after adding a cronjob to run every 24 hours (staggered between nodes) the problem of the increasing memory utilization does not resolve. i've read the operations page and also the http://wiki.apache.org/cassandra/MemtableThresholds page. i am running defaults and 0.7.6-02 ... what are the best places to start in terms of finding why this is happening? CF design / usage? 'nodetool cfstats' gives me some good info ... and i've already implemented some changes to one CF based on how it had ballooned (too many rows versus not enough columns) suggestions appreciated -- Sasha Dolgy sasha.do...@gmail.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Sasha Dolgy sasha.do...@gmail.com
Re: cascading failures due to memory
and is there anything specific that could be causing the issue between Java SE 1.6.0_24 and 1.6.0_25 ? All nodes are _24 up to 64% memory usage today -sd On Wed, Jun 1, 2011 at 9:30 PM, Sasha Dolgy sdo...@gmail.com wrote: is there a specific string I should be looking for in the logs that isn't super obvious to me at the moment... On Tue, May 31, 2011 at 8:21 PM, Jonathan Ellis jbel...@gmail.com wrote: The place to start is with the statistics Cassandra logs after each GC. On Tue, May 31, 2011 at 5:01 AM, Sasha Dolgy sdo...@gmail.com wrote: hi everyone, the current nodes i have deployed (4) have all been working fine, with not a lot of data ... more reads than writes at the moment. as i had monitoring disabled, when one node's OS killed the cassandra process due to out of memory problems ... that was fine. 24 hours later, another node, 24 hours later, another node ...until finally, all 4 nodes no longer had cassandra running. When all nodes are started fresh, CPU utilization is at about 21% on each box. after 24 hours, this goes up to 32% and then 51% 24 hours later. originally I had thought that this may be a result of 'nodetool repair' not being run consistently ... after adding a cronjob to run every 24 hours (staggered between nodes) the problem of the increasing memory utilization does not resolve. i've read the operations page and also the http://wiki.apache.org/cassandra/MemtableThresholds page. i am running defaults and 0.7.6-02 ... what are the best places to start in terms of finding why this is happening? CF design / usage? 'nodetool cfstats' gives me some good info ... and i've already implemented some changes to one CF based on how it had ballooned (too many rows versus not enough columns) suggestions appreciated -- Sasha Dolgy sasha.do...@gmail.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com -- Sasha Dolgy sasha.do...@gmail.com -- Sasha Dolgy sasha.do...@gmail.com
cascading failures due to memory
hi everyone, the current nodes i have deployed (4) have all been working fine, with not a lot of data ... more reads than writes at the moment. as i had monitoring disabled, when one node's OS killed the cassandra process due to out of memory problems ... that was fine. 24 hours later, another node, 24 hours later, another node ...until finally, all 4 nodes no longer had cassandra running. When all nodes are started fresh, CPU utilization is at about 21% on each box. after 24 hours, this goes up to 32% and then 51% 24 hours later. originally I had thought that this may be a result of 'nodetool repair' not being run consistently ... after adding a cronjob to run every 24 hours (staggered between nodes) the problem of the increasing memory utilization does not resolve. i've read the operations page and also the http://wiki.apache.org/cassandra/MemtableThresholds page. i am running defaults and 0.7.6-02 ... what are the best places to start in terms of finding why this is happening? CF design / usage? 'nodetool cfstats' gives me some good info ... and i've already implemented some changes to one CF based on how it had ballooned (too many rows versus not enough columns) suggestions appreciated -- Sasha Dolgy sasha.do...@gmail.com
Re: starting with PHPcassa
http://thobbs.github.com/phpcassa/installation.html If you already have the log files, pycassa (python) may be better suited and quicker http://pycassa.github.com/pycassa/ On Tue, May 31, 2011 at 4:03 PM, Amrita Jayakumar amritajayakuma...@gmail.com wrote: I have log files of the format id key value. I want to load these files into cassandra using PHPcassa. I have installed Cassandra 7. Can anyone please guide me with the exact procedures as in how to install PHPcassa and take things forward?