Re: portability between enterprise and community version
@Viktor: I've read/heard this many times before, however I've never seen a real explanation. Java is cross platform. If Cassandra runs properly on both Linux as Windows clusters: why would it be impossible to communicate? Of course I understand the disadvantages of having a combined cluster. 2012/6/13 Viktor Jevdokimov viktor.jevdoki...@adform.com Do not mix Linux and Windows nodes. ** ** ** ** Best regards / Pagarbiai *Viktor Jevdokimov* Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider What is Adform: watch this short video http://vimeo.com/adform/display [image: Adform News] http://www.adform.com Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. *From:* Abhijit Chanda [mailto:abhijit.chan...@gmail.com] *Sent:* Wednesday, June 13, 2012 09:21 *To:* user@cassandra.apache.org *Subject:* portability between enterprise and community version ** ** Hi All, ** ** Is it possible to communicate from a datastax enterprise edition to datastax community edition. Actually i want to set one of my node in linux box and other in windows. Please suggest. ** ** With Regards, -- Abhijit Chanda VeHere Interactive Pvt. Ltd. +91-974395 -- With kind regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. signature-logo368b.png
Re: Why Hector is taking more time than Thrift
Hector is a higher-level client that provides some abstraction and an easy to use interface. The Thrift API is pretty raw. So for most cases the Hector client would be the best choice; except for use-cases where the ultimate performance is a requirement (resulting in lots of more maintenance between Thrift API changes). 2012/6/6 Prakrati Agrawal prakrati.agra...@mu-sigma.com Dear all ** ** I am trying to evaluate the performance of Cassandra and wrote a code to retrieve a complete row ( having 43707 columns) using Thrift and Hector. * *** The thrift client code took 0.767 seconds while Hector code took 0.883 seconds . Is it expected that Hector will be slower than Thrift? If yes, then why are we using Hector and not Thrift? ** ** Thanks and Regards ** ** Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com ** ** -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- With kind regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
Re: Problem in getting data from a 2 node cluster
Did you run repair on the new node? 2012/6/6 Prakrati Agrawal prakrati.agra...@mu-sigma.com Dear all, ** ** I had a 1 node cluster. Then I added 1 more node to it. ** ** When I ran my query on 1 node cluster I got all my data but when I ran my query on the 2 node cluster (Hector code) I am not getting the same data. How do I ensure that my Hector code retrieves data from all the nodes. *** * ** ** Also when I decommission my node and then add it again I get the following message. This node will not auto bootstrap because it is configured to be a seed node Please tell me the meaning of it also ** ** Thanks and Regards Prakrati ** ** Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com ** ** -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- With kind regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
Re: Problem in getting data from a 2 node cluster
Repair ensures that all data is consistent and available on the node. 2012/6/6 Prakrati Agrawal prakrati.agra...@mu-sigma.com When I run the nodetool command I get the following information ./nodetool -h localhost ring Address DC RackStatus State Load Effective-Owership Token 85070591730234615865843651857942052864 162.192.100.16 datacenter1 rack1 Up Normal 238.22 MB 50.00% 0 162.192.100.48 datacenter1 rack1 Up Normal 115.6 MB 50.00% 85070591730234615865843651857942052864 ** ** Please help me ** ** Thanks and Regards ** ** Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com ** ** *From:* Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com] *Sent:* Wednesday, June 06, 2012 3:55 PM *To:* user@cassandra.apache.org *Subject:* RE: Problem in getting data from a 2 node cluster ** ** What does repair do? ** ** Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com ** ** *From:* R. Verlangen [mailto:ro...@us2.nl] *Sent:* Wednesday, June 06, 2012 3:56 PM *To:* user@cassandra.apache.org *Subject:* Re: Problem in getting data from a 2 node cluster ** ** Did you run repair on the new node? 2012/6/6 Prakrati Agrawal prakrati.agra...@mu-sigma.com Dear all, I had a 1 node cluster. Then I added 1 more node to it. When I ran my query on 1 node cluster I got all my data but when I ran my query on the 2 node cluster (Hector code) I am not getting the same data. How do I ensure that my Hector code retrieves data from all the nodes. *** * Also when I decommission my node and then add it again I get the following message. This node will not auto bootstrap because it is configured to be a seed node Please tell me the meaning of it also Thanks and Regards Prakrati Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com ** ** -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. ** ** -- With kind regards, ** ** Robin Verlangen *Software engineer* ** ** W http://www.robinverlangen.nl E ro...@us2.nl ** ** Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. ** ** ** ** -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended
Re: nodetool repair -pr enough in this scenario?
In your case -pr would be just fine (see Viktor's explanation). 2012/6/5 Viktor Jevdokimov viktor.jevdoki...@adform.com Understand simple mechanics first, decide how to act later. ** ** Without –PR there’s no difference from which host to run repair, it runs for the whole 100% range, from start to end, the whole cluster, all nodes, at once. ** ** With –PR it runs only for a primary range of a node you are running a repair. Let say you have simple ring of 3 nodes with RF=2 and ranges (per node) N1=C-A, N2=A-B, N3=B-C (node tokens are N1=A, N2=B, N3=C). No rack, no DC aware. So running repair with –PR on node N2 will only repair a range A-B, for which node N2 is a primary and N3 is a backup. N2 and N3 will synchronize A-B range one with other. For other ranges you need to run on other nodes. ** ** Without –PR running on any node will repair all ranges, A-B, B-C, C-A. A node you run a repair without –PR is just a repair coordinator, so no difference, which one will be next time. ** ** ** ** Best regards / Pagarbiai *Viktor Jevdokimov* Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider What is Adform: watch this short video http://vimeo.com/adform/display [image: Adform News] http://www.adform.com Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. *From:* David Daeschler [mailto:david.daesch...@gmail.com] *Sent:* Tuesday, June 05, 2012 08:59 *To:* user@cassandra.apache.org *Subject:* nodetool repair -pr enough in this scenario? ** ** Hello, ** ** Currently I have a 4 node cassandra cluster on CentOS64. I have been running nodetool repair (no -pr option) on a weekly schedule like: ** ** Host1: Tue, Host2: Wed, Host3: Thu, Host4: Fri ** ** In this scenario, if I were to add the -pr option, would this still be sufficient to prevent forgotten deletes and properly maintain consistency? ** ** Thank you, - David -- With kind regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. signature-logo29.png
Re: about multitenant datamodel
Every CF has a certain amount of overhead in memory. It's just not how Cassandra is designed to be used. Maybe you could think of a way to smash data down to indices and entities. With an abstraction layer you can store practically anything in Cassandra. 2012/6/5 Toru Inoko in...@ms.scsk.jp IMHO a model that allows external users to create CF's is a bad one. why do you think so? I'll let users create ristricted CFs, and limit a number of CFs which users create. is it still a bad one? On Thu, 31 May 2012 06:44:05 +0900, aaron morton aa...@thelastpickle.com wrote: - Do a lot of keyspaces cause some problems? (If I have 1,000 users, cassandra creates 1,000 keyspaces…) It's not keyspaces, but the number of column families. Without storing any data each CF uses about 1MB of ram. When they start storing and reading data they use more. IMHO a model that allows external users to create CF's is a bad one. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 25/05/2012, at 12:52 PM, Toru Inoko wrote: Hi, all. I'm designing data api service(like cassandra.io but not using dedicated server for each user) on cassandra 1.1 on which users can do DML/DDL method like cql. Followings are api which users can use( almost same to cassandra api). - create/read/delete ColumnFamilies/Rows/Columns Now I'm thinking about multitenant datamodel on that. My data model like the following. I'm going to prepare a keyspace for each user as a user's tenant space. | keyspace1 | --- | column family | |(for user1)| | ... | keyspace2 | --- | column family | |(for user2)| | ... Followings are my question! - Is this data model a good for multitenant? - Do a lot of keyspaces cause some problems? (If I have 1,000 users, cassandra creates 1,000 keyspaces...) please, help. thank you in advance. Toru Inoko. -- --**- SCSK株式会社 技術・品質・情報グループ 技術開発部 先端技術課 猪子 徹(Toru Inoko) tel : 03-6438-3544 mail : in...@ms.scsk.jp --**- -- With kind regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
Re: Finding whether a new node is successfully added or not
Hi there, You can check the ring info with nodetool. Furthermore you can take a look at the streaming statistics: lots of pending indicates a node that is still receiving data from it's seed(s). As far as I'm aware of the seed value will be read upon start: so a restart is required. Good luck. 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com Dear all, ** ** I added a new node to my 1 node Cassandra cluster. Now I want to find out whether it is added successfully or not. Also do I need to restart the already running node after entering the seed value. Please help me. ** ** Thanks and Regards ** ** Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com ** ** -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- With kind regards, Robin Verlangen *Software engineer* * * W www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
Re: Adding a new node to Cassandra cluster
Hi there, When you speak to one node it will internally redirect the request to the proper node (local / external): but you won't be able to failover on a crash of the localhost. For adding another node to the connection pool you should take a look at the documentation of your java client. Good luck! 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com Dear all ** ** I successfully added a new node to my cluster so now it’s a 2 node cluster. But how do I mention it in my Java code as when I am retrieving data its retrieving only for one node that I am specifying in the localhost. How do I specify more than one node in the localhost. ** ** Please help me ** ** Thanks and Regards ** ** Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com ** ** -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- With kind regards, Robin Verlangen *Software engineer* * * W www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
Re: Adding a new node to Cassandra cluster
You might consider using a higher level client (like Hector indeed). If you don't want this you will have to write your own connection pool. For start take a look at Hector. But keep in mind that you might be reinventing the wheel. 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com Hi, ** ** I am using Thrift API and I am not able to find anything on the internet about how to configure it for multiple nodes. I am not using any proper client like Hector. ** ** Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com ** ** *From:* R. Verlangen [mailto:ro...@us2.nl] *Sent:* Monday, June 04, 2012 2:44 PM *To:* user@cassandra.apache.org *Subject:* Re: Adding a new node to Cassandra cluster ** ** Hi there, ** ** When you speak to one node it will internally redirect the request to the proper node (local / external): but you won't be able to failover on a crash of the localhost. For adding another node to the connection pool you should take a look at the documentation of your java client. ** ** Good luck! ** ** 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com Dear all I successfully added a new node to my cluster so now it’s a 2 node cluster. But how do I mention it in my Java code as when I am retrieving data its retrieving only for one node that I am specifying in the localhost. How do I specify more than one node in the localhost. Please help me Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com ** ** -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. ** ** -- With kind regards, ** ** Robin Verlangen *Software engineer* ** ** W www.robinverlangen.nl E ro...@us2.nl ** ** Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. ** ** -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- With kind regards, Robin Verlangen *Software engineer* * * W www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
Re: Adding a new node to Cassandra cluster
Connection pooling involves things like: - (transparent) failover / retry - disposal of connections after X messages - keep track of connections Again: take a look at the hector connection pool. Source: https://github.com/rantav/hector/tree/master/core/src/main/java/me/prettyprint/cassandra/connection 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com Ye I know I am trying to reinvent the wheel but I have to. The requirement is such that I have to use Java Thrift API without any client like Hector. Can you please tell me how do I do it. ** ** Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com ** ** *From:* samal [mailto:samalgo...@gmail.com] *Sent:* Monday, June 04, 2012 3:12 PM *To:* user@cassandra.apache.org *Subject:* Re: Adding a new node to Cassandra cluster ** ** If you use thrift API, you have to maintain lot of low level code by yourself which is already being polished by HLC hector, pycassa also with HLC your can easily switch between thrift and growing CQL. On Mon, Jun 4, 2012 at 3:00 PM, R. Verlangen ro...@us2.nl wrote: You might consider using a higher level client (like Hector indeed). If you don't want this you will have to write your own connection pool. For start take a look at Hector. But keep in mind that you might be reinventing the wheel. ** ** 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com Hi, I am using Thrift API and I am not able to find anything on the internet about how to configure it for multiple nodes. I am not using any proper client like Hector. Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com *From:* R. Verlangen [mailto:ro...@us2.nl] *Sent:* Monday, June 04, 2012 2:44 PM *To:* user@cassandra.apache.org *Subject:* Re: Adding a new node to Cassandra cluster Hi there, When you speak to one node it will internally redirect the request to the proper node (local / external): but you won't be able to failover on a crash of the localhost. For adding another node to the connection pool you should take a look at the documentation of your java client. Good luck! 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com Dear all I successfully added a new node to my cluster so now it’s a 2 node cluster. But how do I mention it in my Java code as when I am retrieving data its retrieving only for one node that I am specifying in the localhost. How do I specify more than one node in the localhost. Please help me Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- With kind regards, Robin Verlangen *Software engineer* W www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. ** ** -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet
Re: repair
The repair -pr only repairs the nodes primary range: so is only usefull in day to day use. When you're recovering from a crash use it without -pr. 2012/6/4 Romain HARDOUIN romain.hardo...@urssaf.fr Run repair -pr in your cron. Tamar Fraenkel ta...@tok-media.com a écrit sur 04/06/2012 13:44:32 : Thanks. I actually did just that with cron jobs running on different hours. I asked the question because I saw that when one of the logs was running the repair, all nodes logged some repair related entries in /var/log/cassandra/system.log Thanks again, Tamar Fraenkel Senior Software Engineer, TOK Media -- With kind regards, Robin Verlangen *Software engineer* * * W www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
Re: Data Versioning Support
Hi Felipe, There recently was a thread about ( http://www.mail-archive.com/user@cassandra.apache.org/msg22298.html ). The answer in short: no. However you can build your own data model to support it. Cheers! 2012/5/24 Felipe Schmidt felipef...@gmail.com Doe's Cassandra support data versioning? I'm trying to find it in many places but I'm not quite sure about it. Regards, Felipe Mathias Schmidt (Computer Science UFRGS, RS, Brazil) -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: Number of keyspaces
Yes, it does. However there's no real answer what's the limit: it depends on your hardware and cluster configuration. You might even want to search the archives of this mailinglist, I remember this has been asked before. Cheers! 2012/5/21 Luís Ferreira zamith...@gmail.com Hi, Does the number of keyspaces affect the overall cassandra performance? Cumprimentos, Luís Ferreira -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: Number of keyspaces
Hmm, you got me on that. I assumed (~ wrong) that more keyspaces would mean more CF's. 2012/5/22 aaron morton aa...@thelastpickle.com It's more the number of CF's than keyspaces. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 22/05/2012, at 6:58 PM, R. Verlangen wrote: Yes, it does. However there's no real answer what's the limit: it depends on your hardware and cluster configuration. You might even want to search the archives of this mailinglist, I remember this has been asked before. Cheers! 2012/5/21 Luís Ferreira zamith...@gmail.com Hi, Does the number of keyspaces affect the overall cassandra performance? Cumprimentos, Luís Ferreira -- With kind regards, Robin Verlangen www.robinverlangen.nl -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: is it possible to run cassandra process in client mode as smart proxy
Hi there, I'm using HAProxy for PHP projects to take care of this. It improved connection pooling enormous on the client side: with preserving failover capabilities. Maybe that is something for you to use in combination with PHP. Good luck! 2012/5/16 Piavlo lolitus...@gmail.com Hi, I'm interested in using some smart proxy cassandra process that could act as coordinator node and be aware of cluster state. And run this smart proxy cassandra process on each client side host where the application(php) with short lived cassandra connections runs. Besides being aware of cluster state if it could act as coordinator node it would save unneeded network trips. And maybe even have an option to take care of hinted handoffs. IMHO the best candidate for this is the cassandra itself (like it's done in elasticsearch http://www.elasticsearch.org/** guide/reference/modules/node.**htmlhttp://www.elasticsearch.org/guide/reference/modules/node.html ) I also see there was a work done in this direction at https://issues.apache.org/**jira/browse/CASSANDRA-535https://issues.apache.org/jira/browse/CASSANDRA-535 So maybe this is something that is already usable? Or maybe there is some third party project that could be used as smart cassandra proxy? Thanks Alex -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: is it possible to run cassandra process in client mode as smart proxy
Yes, I'm aware of those issues however in our use case they don't cause any problems. But ... If there's something better out there I'm really curious: so I'll keep up with this thread. 2012/5/16 Piavlo lolitus...@gmail.com On 05/16/2012 01:24 PM, R. Verlangen wrote: Hi there, I'm using HAProxy for PHP projects to take care of this. It improved connection pooling enormous on the client side: with preserving failover capabilities. Maybe that is something for you to use in combination with PHP. I already use it exactly like this :) But i don't think it's a good solution. And it's totally unaware of thrift/cassandra protocol, it's was pretty well discussed here http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-amp-HAProxy-td5473144.html I even see the plain tcp healchecks failing from time to time for no reason. I'm planning to make it a bit smarter with localhost http level healthchecks - where it would make a cassandra write to CF in keyspace wich has replication of 1, and will write to a key that maps to the specific cassandra node being checked by healthcheck (of course the keys need to be recaclulated each time the cluster is rebalanced). But IMHO it's very very ugly hack, and not as reliable as real smart proxy which is way more superior and efficient (especially if it could do the reads/writes coordination itself). Haproxy also has an issue that then once of the backend ip changes (which happens often in clouds) it's has to be restarted to resolve the the correct hostname, though it looks like Willy is finally seriously considering to implement more dynamic for hostnames lookups (which was not the case about a year ago then I asked for such feature) the problem was is discussed here recently - http://marc.info/?l=haproxym=133559164408814w=1 haproxy has some more issues - i don't remember off top of my head. Smart proxy would simply not have all those issues, as it's aware of the ring state and the protocol and if smaprt proxy was the cassandra itself then it would have all the needed features tested and reliable at no effort. Thanks Alex Good luck! 2012/5/16 Piavlo lolitus...@gmail.com Hi, I'm interested in using some smart proxy cassandra process that could act as coordinator node and be aware of cluster state. And run this smart proxy cassandra process on each client side host where the application(php) with short lived cassandra connections runs. Besides being aware of cluster state if it could act as coordinator node it would save unneeded network trips. And maybe even have an option to take care of hinted handoffs. IMHO the best candidate for this is the cassandra itself (like it's done in elasticsearch http://www.elasticsearch.org/guide/reference/modules/node.html) I also see there was a work done in this direction at https://issues.apache.org/jira/browse/CASSANDRA-535 So maybe this is something that is already usable? Or maybe there is some third party project that could be used as smart cassandra proxy? Thanks Alex -- With kind regards, Robin Verlangen www.robinverlangen.nl -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: get dinamicsnith info from php
I struggled with this before and decided to use HAProxy which suits my needs, you can read a little more about it at my personal blog: http://www.robinverlangen.nl/index/view/4fa902c1596cb-44a627/how-to-solve-the-pain-of-stateless-php-with-cassandra.html Good luck with it! 2012/5/14 Viktor Jevdokimov viktor.jevdoki...@adform.com Let say you have 8 nodes cluster with replication factor 3. If one node is down, for its token range you have only 2 nodes left, not 7, which can process you requests – other nodes will forward requests to the nearest (depends on snitch) or with lower latency (depends on dynamic snitch) of 2 remaining. ** ** I have no idea about PHP and its multithreading capabilities, if it’s impossible to run background thread to return dead endpoint to the list, instead of checking it on HTTP request thread, you’re stacked. For the lower latencies dynamic snitch already do a job for you, selecting a node with lower latencies. ** ** If you’d like Cassandra to avoid forwarding requests to appropriate node, but making a direct request to a node where data is, you need smarter client, capable to select node by key and other things to do to achieve this. ** ** ** ** Best regards / Pagarbiai *Viktor Jevdokimov* Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider What is Adform: watch this short video http://vimeo.com/adform/display [image: Adform News] http://www.adform.com Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. *From:* ruslan usifov [mailto:ruslan.usi...@gmail.com] *Sent:* Monday, May 14, 2012 17:41 *To:* user@cassandra.apache.org *Subject:* Re: get dinamicsnith info from php ** ** Sorry for my bad english. I want to solve follow problem. For example we down one node for maintenance reason, for a long time (30 min). Now we use TSocketPool for polling connection to cassandra, but this poll implementation is as i think not so good, it have a custom parameter setRetryInterval, with allow off broken node (now we set i to 10sec), but this mean that every 10sec pool will try to connet down node (i repeat we shutdown node for maintance reason), because it doesn't know node dead or node, but cassandra cluster know this, and this connection attempt is senselessly, also when node make compact it can be heavy loaded, and can't serve client reqest very good (at this moment we can got little increase of avg backend responce time) 2012/5/14 Viktor Jevdokimov viktor.jevdoki...@adform.com I’m not sure, that selecting node upon DS is a good idea. First of all every node has values about every node, including self. Self DS values are always better than others. For example, 3 nodes RF=2: N1 N2 N3 N1 0.5ms 2ms 2ms N2 2ms 0.5ms 2ms N3 2ms 2ms 0.5ms We have monitored many Cassandra counters, including DS values for every node, and graphs shows that latencies is not about load. So the strategy should be based on use case, node count, RF, replica placement strategy, read repair chance, and more, and more… What do you want to achieve? ** ** Best regards / Pagarbiai *Viktor Jevdokimov* Senior Developer ** ** Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsiderhttp://twitter.com/#%21/adforminsider What is Adform: watch this short video http://vimeo.com/adform/display** ** [image: Adform News] http://www.adform.com Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. ** ** *From:* ruslan usifov [mailto:ruslan.usi...@gmail.com] *Sent:* Monday, May 14, 2012 16:58 *To:* user@cassandra.apache.org *Subject:* get dinamicsnith info from php
Re: Use-case: multi-instance webshop
@Aaron: Solr will probably be the solution to our problem. Thank you! @Radim: We already have a Cassandra cluster, we do not want to add an extra MongoDB cluster. At this moment the data would fit easily in SQL, but we don't know how our platform grows and we want to be prepared for the future. Would it be stupid to go for the manual indexing on top of Cassandra? 2012/5/10 Radim Kolar h...@filez.com Is Cassandra a fit for this use-case or should we just stick with the oldskool MySQL and put things like votes, reviews etc in our C* store? If all your data fits into one computer and you expect only tens of millions records in table then go for SQL. It has far more features and people are comfortable to work with it. If you want noSQL then go for mongoDB -- With kind regards, Robin Verlangen www.robinverlangen.nl
Use-case: multi-instance webshop
Hi there, I'm working on a datamodel for a multi-website, multi-customer system. Things we would like to do: - search products (lucene / solr / solandra) - multi-filter (e.g. categories) - reviews - voting I can't really see how to do the filtering of the products by categories and even things like price (ranges would be possible with C*). Is Cassandra a fit for this use-case or should we just stick with the oldskool MySQL and put things like votes, reviews etc in our C* store? -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: Bad Request: No indexed columns present in by-columns clause with equals operator
I read a while ago that a compaction would rebuild the index. You can trigger this by running repair with the nodetool. 2012/4/24 mdione@orange.com De : mdione@orange.com [mailto:mdione@orange.com] [default@avatars] describe HBX_FILE; ColumnFamily: HBX_FILE Key Validation Class: org.apache.cassandra.db.marshal.BytesType Default column value validator: org.apache.cassandra.db.marshal.BytesType Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type Row cache size / save period in seconds / keys to save : 0.0/0/all Row Cache Provider: org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider Key cache size / save period in seconds: 20.0/14400 GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 1.0 Replicate on write: true Bloom Filter FP chance: default Built indexes: [] Column Metadata: Column Name: HBX_FIL_DATE Validation Class: org.apache.cassandra.db.marshal.UTF8Type Column Name: HBX_FIL_LARGE Validation Class: org.apache.cassandra.db.marshal.AsciiType Column Name: HBX_FIL_MEDIUM Validation Class: org.apache.cassandra.db.marshal.AsciiType Column Name: HBX_FIL_SMALL Validation Class: org.apache.cassandra.db.marshal.AsciiType Column Name: HBX_FIL_STATUS Validation Class: org.apache.cassandra.db.marshal.UTF8Type Index Name: HBX_FILE_HBX_FIL_STATUS_idx Index Type: KEYS Column Name: HBX_FIL_TINY Validation Class: org.apache.cassandra.db.marshal.AsciiType Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy Someone in #cassandra pointed out that the index might be created, but it's shown as not built («Built indexes: []»). Is that right? Any idea how to build it? -- Marcos Dione SysAdmin Astek Sud-Est pour FT/TGPF/OPF/PORTAIL/DOP/HEBEX @ Marco Polo 04 97 12 62 45 - mdione@orange.com _ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, France Telecom - Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, France Telecom - Orange is not liable for messages that have been modified, changed or falsified. Thank you. -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: blob fields, bynary or hexa?
PHPCassa does support binaries, so that should not be the problem. 2012/4/19 phuduc nguyen duc.ngu...@pearson.com Well, I'm not sure exactly how you're passing a blob to the CLI. It would be helpful if you pasted your commands/code and maybe there is a simple oversight. With that said, Cassandra can most definitely save blob/binary values. I think most people use a high level client; we use Hector. If you're in PHP land, see if you problems exist in phpcassa. Duc On 4/19/12 2:25 AM, mdione@orange.com mdione@orange.com wrote: De : phuduc nguyen [mailto:duc.ngu...@pearson.com] How are you passing a blob or binary stream to the CLI? It sounds like you're passing in a representation of a binary stream as ascii/UTF8 which will create the problems you describe. So this is only a limitation of Cassandra-cli? -- Marcos Dione SysAdmin Astek Sud-Est pour FT/TGPF/OPF/PORTAIL/DOP/HEBEX @ Marco Polo 04 97 12 62 45 - mdione@orange.com __ ___ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, France Telecom - Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, France Telecom - Orange is not liable for messages that have been modified, changed or falsified. Thank you. -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: swap grows
Its recommended to disable swap entirely when you run Cassandra on a server. 2012/4/14 ruslan usifov ruslan.usi...@gmail.com I forgot to say that system have 24GB of phis memory 2012/4/14 ruslan usifov ruslan.usi...@gmail.com Hello We have 6 node cluster (cassandra 0.8.10). On one node i increase java heap size to 6GB, and now at this node begin grows swap, but system have about 3GB of free memory: root@6wd003:~# free total used free sharedbuffers cached Mem: 24733664 217028123030852 0 6792 13794724 -/+ buffers/cache:7901296 16832368 Swap: 1998840 23521996488 And swap space slowly grows, but i misunderstand why? PS: We have JNA mlock, and set vm.swappiness = 0 PS: OS ubuntu 10.0.4(2.6.32-40-generic) -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: swap grows
Maybe it has got something to do with swapiness, it's something you can configure, more info here: https://www.linux.com/news/software/applications/8208-all-about-linux-swap-space 2012/4/14 ruslan usifov ruslan.usi...@gmail.com I know:-) but this is not answer:-(. I found that on other nodes there still about 3GB (on node with JAVA_HEAP=6GB free memory also 3GB) of free memory but there JAVA_HEAP=5G, so this looks like some sysctl (/proc/sys/vm???) ratio (about 10%(3 / 24 * 100)), i don't known which, anybody can explain this situation 2012/4/14 R. Verlangen ro...@us2.nl Its recommended to disable swap entirely when you run Cassandra on a server. 2012/4/14 ruslan usifov ruslan.usi...@gmail.com I forgot to say that system have 24GB of phis memory 2012/4/14 ruslan usifov ruslan.usi...@gmail.com Hello We have 6 node cluster (cassandra 0.8.10). On one node i increase java heap size to 6GB, and now at this node begin grows swap, but system have about 3GB of free memory: root@6wd003:~# free total used free sharedbuffers cached Mem: 24733664 217028123030852 0 6792 13794724 -/+ buffers/cache:7901296 16832368 Swap: 1998840 23521996488 And swap space slowly grows, but i misunderstand why? PS: We have JNA mlock, and set vm.swappiness = 0 PS: OS ubuntu 10.0.4(2.6.32-40-generic) -- With kind regards, Robin Verlangen www.robinverlangen.nl -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: Trouble with wrong data
It sounds like the commitlog has been replayed however I have really no idea whether this could have happened. Anyone? 2012/4/13 Alain RODRIGUEZ arodr...@gmail.com The commitlog_total_space_in_mb was not set, I set it to avoid having the same problem in the future. I am aware of the over-counting problem introduced by the counters. The point is that I use them to make statistics per hours. I can understand having some wrong counts in the column corresponding to the crash time, but how to explain that all my counts since the start (months ago) have become wrong after the crash ? After the crash I tried to repair my entire keyspace from one of the 2 nodes and this made my server crash again, no idea why. Can this failed repair be at the origin of the corrupted data ? I'm still replaying all my counts of the past months and I'm afraid this kind of bug could happen again... I was using cassandra for months without any issue. Alain 2012/4/11 aaron morton aa...@thelastpickle.com However after recovering from this issue (freeing some space and fixing the value of commitlog_total_space_in_mb in cassandra.yaml) Did the commit log grow larger than commitlog_total_space_in_mb ? I realized that all statistics were all destroyed. I have bad values on every single counter since I start using them (september) ! Counter operations are not idempotent. If you client retries a counter operation it may result in the increment been applied twice. Could this have been your issue ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 11/04/2012, at 2:35 AM, Alain RODRIGUEZ wrote: By the way, I am using Cassandra 1.0.7, CL = ONE (R/W), RF = 2, 2 EC2 c1.medium nodes cluster Alain 2012/4/10 Alain RODRIGUEZ arodr...@gmail.com Hi, I'm experimenting a strange and very annoying phenomena. I had a problem with the commit log size which grew too much and full one of the hard disks in all my nodes almost at the same time (2 nodes only, RF=2, so the 2 nodes are behaving exactly in the same way) My data are mounted in an other partition that was not full. However after recovering from this issue (freeing some space and fixing the value of commitlog_total_space_in_mb in cassandra.yaml) I realized that all statistics were all destroyed. I have bad values on every single counter since I start using them (september) ! Does anyone experimented something similar or have any clue on this ? Do you need more information ? Alain -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: need of regular nodetool repair
Yes, I personally have configured it to perform a repair once a week, as the GCGraceSeconds is at 10 days. This is also what's in the manual http://wiki.apache.org/cassandra/Operations#Repairing_missing_or_inconsistent_data (point 2) 2012/4/11 ruslan usifov ruslan.usi...@gmail.com Hello I have follow question, if we Read and write to cassandra claster with QUORUM consistency level, does this allow to us do not call nodetool repair regular? (i.e. every GCGraceSeconds) -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: need of regular nodetool repair
Well, if everything works 100% at any time there should be nothing to repair, however with a distributed cluster it would be pretty rare for that to occur. At least that is how I interpret this. 2012/4/11 Igor i...@4friends.od.ua BTW, I heard that we don't need to run repair if all your data have TTL, all HH works, and you never delete your data. On 04/11/2012 11:34 AM, ruslan usifov wrote: Sorry fo my bad english, so QUORUM allow doesn't make repair regularity? But form your anser it does not follow 2012/4/11 R. Verlangen ro...@us2.nl Yes, I personally have configured it to perform a repair once a week, as the GCGraceSeconds is at 10 days. This is also what's in the manual http://wiki.apache.org/cassandra/Operations#Repairing_missing_or_inconsistent_data (point 2) 2012/4/11 ruslan usifov ruslan.usi...@gmail.com Hello I have follow question, if we Read and write to cassandra claster with QUORUM consistency level, does this allow to us do not call nodetool repair regular? (i.e. every GCGraceSeconds) -- With kind regards, Robin Verlangen www.robinverlangen.nl -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: cassandra 0.8.7 + hector 0.8.3: All Quorum reads result in writes?
Are you sure this isn't read-repair? http://wiki.apache.org/cassandra/ReadRepair 2012/4/11 Thibaut Britz thibaut.br...@trendiction.com Also executing the same multiget rangeslice query over the same range again will trigger the same writes again and again. On Wed, Apr 11, 2012 at 5:41 PM, Thibaut Britz thibaut.br...@trendiction.com wrote: Hi, I just diagnosted this strange behavior: When I fetch a rangeslice through hector and set the consistency level to quorum, according to cfstats (and also to the output files on the hd), cassandra seems to execute a write request for each read I execute. The write count in cfstats is increased when I execute the rangeslice function over the same range again and again (without saving anything at all). If I set the consitency level to ONE, no writes are executed. How can I disable this? Why are the records rewritten each time, even though I don't want them to be rewritten? Thanks, Thibaut. Code: Keyspace ks = getConnection(cluster, consistencylevel); RangeSlicesQueryString, String, V rangeSlicesQuery = HFactory.createRangeSlicesQuery(ks, StringSerializer.get(), StringSerializer.get(), s); rangeSlicesQuery.setColumnFamily(columnFamily); rangeSlicesQuery.setColumnNames(column); rangeSlicesQuery.setKeys(start, end); rangeSlicesQuery.setRowCount(maxrows); QueryResultOrderedRowsString, String, V result = rangeSlicesQuery.execute(); return result.get(); -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: Nodetool snapshot, consistency and replication
Ok, thank you. 2012/4/2 Rob Coli rc...@palominodb.com On Mon, Apr 2, 2012 at 9:19 AM, R. Verlangen ro...@us2.nl wrote: - 3 node cluster - RF = 3 - fully consistent (not measured, but let's say it is) Is it true that when I take a snaphot at only one of the 3 nodes this contains all the data in the cluster (at least 1 replica)? Yes. =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb -- With kind regards, Robin Verlangen www.robinverlangen.nl
Nodetool snapshot, consistency and replication
Hi there, I have a question about the nodetool snapshot. Situation: - 3 node cluster - RF = 3 - fully consistent (not measured, but let's say it is) Is it true that when I take a snaphot at only one of the 3 nodes this contains all the data in the cluster (at least 1 replica)? With kind regards, Robin Verlangen www.robinverlangen.nl
Re: another DataStax OpsCenter question
Nick, would that also result in useless duplicates of the statistics? 2012/3/30 Nick Bailey n...@datastax.com Unfortunately at the moment OpsCenter only really supports having one instance per cluster. It may be possible to set up an instance in each datacenter, however it has not been tested and each opscenter instance would lose some functionality. On Fri, Mar 30, 2012 at 3:13 AM, Alexandru Sicoe adsi...@gmail.com wrote: Hi Nick, I forgot to say I was using 1.2.3 which I think uses different ports. So I will upgrade to 1.4.1 and open those ports across the firewall although that's kind of a pain. I already have about 320 config lines for the Cassandra cluster itself. So, just to make things clear, is it mandatory to have one OpsCenter instance per Cassandra cluster? Even if that cluster is split in multiple Cassandra DCs across separate regions? Is there a way to have one OpsCenter per Cassandra DC (monitor Cassandra DCs individually)? That would get rid of many configuration issues! Cheers, Alex On Thu, Mar 29, 2012 at 9:35 PM, Nick Bailey n...@datastax.com wrote: This setup may be possible although there are a few potential issues. Firstly, see: http://www.datastax.com/docs/opscenter/configure_opscenter#configuring-firewall-port-access Basically the agents and OpsCenter communicate on ports 61620 and 61621 by default (those can be configured though). The agents will contact the the OpsCenter machine on port 61620. You can specify the interface the agents will use to connect to this port when installing/setting up the agents. The OpsCenter machine will contact the agents on port 61621. Right now the OpsCenter machine will only talk to the nodes using the listen_address configured in your cassandra conf. We have a task to fix this in the future so that you can configure the interface that opscenter will contact each agent on. In the meantime though OpsCenter will need to be able to hit the listen_address for each node. On Thu, Mar 29, 2012 at 12:47 PM, Alexandru Sicoe adsi...@gmail.com wrote: Hello, I am planning on testing OpsCenter to see how it can monitor a multi DC cluster. There are 2 DCs each on a different side of a firewall. I've configured NAT on the firewall to allow the communication between all Cassandra nodes on ports 7000, 7199 and 9160. The cluster works fine. However when I start OpsCenter (obviously on one side of the firewall) the OpsCenter CF gives me two schema versions in the cluster and basically messes up everything. Plus, I can only see the nodes on one the same side. What are the requirements to let the OpsCenter on one side see the Cassandra nodes and the OpsCenter agents on the other, and viceversa? Is it possible to use OpsCenter across a firewall? Cheers, Alex -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: opscenter
As far as I'm aware of that is not possible using the opscenter. I recommend you use the cassandra-cli and perform an update column family query. 2012/3/29 puneet loya puneetl...@gmail.com I m currently using the the datastax opscenter. How do we add column to the column families in opscenter?? -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: Any improvements in Cassandra JDBC driver ?
The best would to not use update / insert at all but set / put / save. Cheers! 2012/3/29 Dinusha Dilrukshi sdddilruk...@gmail.com What I want to tell was this driver does not use INSERT key word. Since CQL support for using INSERT keyword and it is more generic key word used to add new records, it's more user friendly to use INSERT key word to add new record set rather using UPDATE keyword. Regards, ~Dinusha~ On Thu, Mar 29, 2012 at 8:34 PM, Jeremiah Jordan jeremiah.jor...@morningstar.com wrote: There is no such thing as pure insert which will give an error if the thing already exists. Everything is really UPDATE OR INSERT. Whether you say UPDATE, or INSERT, it will all act like UPDATE OR INSERT, if the thing is there it get over written, if it isn't there it gets inserted. -Jeremiah -- *From:* Dinusha Dilrukshi [sdddilruk...@gmail.com] *Sent:* Wednesday, March 28, 2012 11:41 PM *To:* user@cassandra.apache.org *Subject:* Any improvements in Cassandra JDBC driver ? Hi, We are using Cassandra JDBC driver (found in [1]) to call to Cassandra sever using CQL and JDBC calls. One of the main disadvantage is, this driver is not available in maven repository where people can publicly access. Currently we have to checkout the source and build ourselves. Is there any possibility to host this driver in a maven repository ? And one of the other limitation in driver is, it does not support for the insert query. If we need to do a insert , then it can be done using the update statement. So basically it will be same query used for both UPDATE and INSERT. As an example, if you execute following query: update USER set 'username'=?, 'password'=? where key = ? and if the provided 'KEY' already exist in the Column family then it will do a update to existing columns. If the provided KEY does not already exist, then it will do a insert.. Is that the INSERT query option is now available in latest driver? Are there any other improvements/supports added to this driver recently ? Is this driver compatible with Cassandra-1.1.0 and is that the changes done for driver will be backward compatible with older Cassandra versions (1.0.0) ? [1]. http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/ Regards, ~Dinusha~ -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: Graveyard compactions, when do they occur?
Cassandra graveyard sounds like a lot of thombstones that will be compacted during normal compact. You can trigger that manually using the nodetool. 2012/3/28 Erik Forsberg forsb...@opera.com Hi! I was trying out the truncate command in cassandra-cli. http://wiki.apache.org/**cassandra/CassandraCli08http://wiki.apache.org/cassandra/CassandraCli08says A snapshot of the data is created, which is deleted asyncronously during a 'graveyard' compaction. When do graveyard compactions happen? Do I have to trigger them somehow? Thanks, \EF -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: How to store a list of values?
Yes, that is one of the possible solutions to your problem. When you want to retrieve only the skills of a particular row just get the columns with as start value skill:. A suggestion to your example might be to use a ~ in stead of : as separator. A tilde is used less often in standard sentences, so you could replace any of them in skills with some other character (e.g. a dash or whitespace). 2012/3/27 Ben McCann b...@benmccann.com I was given one other suggestion (which may have been suggested earlier in this thread, but is clearer to me with an example). The suggestion was to use composite columns and have the first part of the key name be skill and the second part be the specific skill and then store a null value. I hope I understood this suggestion correctly. user: { 'name': 'ben', 'title': 'software engineer', 'company': 'google', 'location': 'orange county', 'skill:java': '', 'skill:html': '', 'skill:javascript': '' } On Tue, Mar 27, 2012 at 12:04 AM, samal samalgo...@gmail.com wrote: YEAH! agree, it only matter for time bucket data. On Tue, Mar 27, 2012 at 12:31 PM, R. Verlangen ro...@us2.nl wrote: That's true, but it does not sound like a real problem to me.. Maybe someone else can shed some light upon this. 2012/3/27 samal samalgo...@gmail.com On Tue, Mar 27, 2012 at 1:47 AM, R. Verlangen ro...@us2.nl wrote: but any schema change will break it How do you mean? You don't have to specify the columns in Cassandra so it should work perfect. Except for the skill~ is preserverd for your list. In case skill~ is decided to change to skill:: , it need to be handle at app level. Or otherwise had t update in all row, read it first, modify it, insert new version and delete old version. -- With kind regards, Robin Verlangen www.robinverlangen.nl -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: How to store a list of values?
If you use the CompositeColumn it does, but it looked to me in your example you just used the simple utf8-based solution. My apologies for the confusion. 2012/3/28 Ben McCann b...@benmccann.com Hmm. I thought that Cassandra would encode the composite column without the colon and that it was only there for illustration purposes, so the suggestion to use ~ is confusing. Are there some docs you can point me to? Also, after some reading, it seems to me that it is not even possible to have a composite column together with a regular column in a column family in this manner. On Wed, Mar 28, 2012 at 12:34 AM, R. Verlangen ro...@us2.nl wrote: Yes, that is one of the possible solutions to your problem. When you want to retrieve only the skills of a particular row just get the columns with as start value skill:. A suggestion to your example might be to use a ~ in stead of : as separator. A tilde is used less often in standard sentences, so you could replace any of them in skills with some other character (e.g. a dash or whitespace). 2012/3/27 Ben McCann b...@benmccann.com I was given one other suggestion (which may have been suggested earlier in this thread, but is clearer to me with an example). The suggestion was to use composite columns and have the first part of the key name be skill and the second part be the specific skill and then store a null value. I hope I understood this suggestion correctly. user: { 'name': 'ben', 'title': 'software engineer', 'company': 'google', 'location': 'orange county', 'skill:java': '', 'skill:html': '', 'skill:javascript': '' } On Tue, Mar 27, 2012 at 12:04 AM, samal samalgo...@gmail.com wrote: YEAH! agree, it only matter for time bucket data. On Tue, Mar 27, 2012 at 12:31 PM, R. Verlangen ro...@us2.nl wrote: That's true, but it does not sound like a real problem to me.. Maybe someone else can shed some light upon this. 2012/3/27 samal samalgo...@gmail.com On Tue, Mar 27, 2012 at 1:47 AM, R. Verlangen ro...@us2.nl wrote: but any schema change will break it How do you mean? You don't have to specify the columns in Cassandra so it should work perfect. Except for the skill~ is preserverd for your list. In case skill~ is decided to change to skill:: , it need to be handle at app level. Or otherwise had t update in all row, read it first, modify it, insert new version and delete old version. -- With kind regards, Robin Verlangen www.robinverlangen.nl -- With kind regards, Robin Verlangen www.robinverlangen.nl -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: problem in create column family
Not sure about that, what version of Cassandra are you using? Maybe someone else here knows how to solve this.. 2012/3/27 puneet loya puneetl...@gmail.com ya had created with UTF8Type before.. It gave the same error. On executing help assume command it is giving 'utf8' as a type. so can i use comparator='utf8' or not?? Please reply On Mon, Mar 26, 2012 at 9:17 PM, R. Verlangen ro...@us2.nl wrote: You should use the full type names, e.g. create column family MyColumnFamily with comparator=UTF8Type; 2012/3/26 puneet loya puneetl...@gmail.com It is giving errors like Unable to find abstract-type class 'org.apache.cassandra.db.marshal.utf8' and java.lang.RuntimeException: org.apache.cassandra.db.marshal.MarshalException: cannot parse 'catalogueId' as hex bytes where catalogueId is a column that has utf8 as its data type. they may be just synactical errors.. Please suggest if u can help me out on dis?? -- With kind regards, Robin Verlangen www.robinverlangen.nl -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: How to store a list of values?
That's true, but it does not sound like a real problem to me.. Maybe someone else can shed some light upon this. 2012/3/27 samal samalgo...@gmail.com On Tue, Mar 27, 2012 at 1:47 AM, R. Verlangen ro...@us2.nl wrote: but any schema change will break it How do you mean? You don't have to specify the columns in Cassandra so it should work perfect. Except for the skill~ is preserverd for your list. In case skill~ is decided to change to skill:: , it need to be handle at app level. Or otherwise had t update in all row, read it first, modify it, insert new version and delete old version. -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: Fwd: information on cassandra
Thank you Maki, wasn't aware of that. 2012/3/27 Maki Watanabe watanabe.m...@gmail.com auto_bootstrap has been removed from cassandra.yaml and always enabled since 1.0. fyi. maki 2012/3/26 R. Verlangen ro...@us2.nl: Yes, you can add nodes to a running cluster. It's very simple: configure the cluster name and seed node(s) in cassandra.yaml, set auto_bootstrap to true and start the node. 2012/3/26 puneet loya puneetl...@gmail.com 5n.. consider i m starting on a single node. can I add nodes later?? plz reply :) On Sun, Mar 25, 2012 at 7:41 PM, Ertio Lew ertio...@gmail.com wrote: I guess 2 node cluster with RF=2 might also be a starting point. Isn't it ? Are there any issues with this ? On Sun, Mar 25, 2012 at 12:20 AM, samal samalgo...@gmail.com wrote: Cassandra has distributed architecture. So 1 node does not fit into it. although it can used but you loose its benefits , ok if you are just playing around, use vm to learn how cluster communicate, handle request. To get full tolerance, redundancy and consistency minimum 3 node is required. Imp read here: http://wiki.apache.org/cassandra/ http://www.datastax.com/docs/1.0/index http://thelastpickle.com/ http://www.acunu.com/blogs/all/ On Sat, Mar 24, 2012 at 11:37 PM, Garvita Mehta garvita.me...@tcs.com wrote: its not advisable to use cassandra on single node, as its basic definition says if a node fails, data still remains in the system, atleast 3 nodes must be there while setting up a cassandra cluster. Garvita Mehta CEG - Open Source Technology Group Tata Consultancy Services Ph:- +91 22 67324756 Mailto: garvita.me...@tcs.com Website: http://www.tcs.com Experience certainty. IT Services Business Solutions Outsourcing -puneet loya wrote: - To: user@cassandra.apache.org From: puneet loya puneetl...@gmail.com Date: 03/24/2012 06:36PM Subject: Fwd: information on cassandra hi, I m puneet, an engineering student. I would like to know that, is cassandra useful considering we just have a single node(rather a single system) having all the information. I m looking for decent response time for the database. can you please respond? Thank you , Regards, Puneet Loya =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you -- With kind regards, Robin Verlangen www.robinverlangen.nl -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: Schema advice/help
You can just get a slice range with as start userId: and no end. 2012/3/27 Maciej Miklas mac.mik...@googlemail.com multiget would require Order Preserving Partitioner, and this can lead to unbalanced ring and hot spots. Maybe you can use secondary index on itemtype - is must have small cardinality: http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/ On Tue, Mar 27, 2012 at 10:10 AM, Guy Incognito dnd1...@gmail.com wrote: without the ability to do disjoint column slices, i would probably use 5 different rows. userId:itemType - activityId then it's a multiget slice of 10 items from each of your 5 rows. On 26/03/2012 22:16, Ertio Lew wrote: I need to store activities by each user, on 5 items types. I always want to read last 10 activities on each item type, by a user (ie, total activities to read at a time =50). I am wanting to store these activities in a single row for each user so that they can be retrieved in single row query, since I want to read all the last 10 activities on each item.. I am thinking of creating composite names appending itemtype : activityId(activityId is just timestamp value) but then, I don't see about how to read the last 10 activities from all itemtypes. Any ideas about schema to do this better way ? -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: counter column family
*create column family MyCounterColumnFamily with default_validation_class=CounterColumnType and key_validation_class=UTF8Type and comparator=UTF8Type;* There you go! Keys must be utf8, as well as the column names. Of course you can change those validators. Cheers! 2012/3/27 puneet loya puneetl...@gmail.com Can u give an example of create column family with counter column in it. Please reply Regards, Puneet Loya -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: counter column family
You should use a connection pool without retries to prevent a single increment of +1 have a result of e.g. +3. 2012/3/27 Rishabh Agrawal rishabh.agra...@impetus.co.in You can even define how much increment you want. But let me just warn you, as far my knowledge, it has consistency issues. *From:* puneet loya [mailto:puneetl...@gmail.com] *Sent:* Tuesday, March 27, 2012 5:59 PM *To:* user@cassandra.apache.org *Subject:* Re: counter column family thanxx a ton :) :) the counter column family works synonymous as 'auto increment' in other databases rite? I mean we have a column of type integer which increments with every insert. Am i goin the rite way?? please reply :) On Tue, Mar 27, 2012 at 5:50 PM, R. Verlangen ro...@us2.nl wrote: *create column family MyCounterColumnFamily with default_validation_class=CounterColumnType and key_validation_class=UTF8Type and comparator=UTF8Type;* There you go! Keys must be utf8, as well as the column names. Of course you can change those validators. Cheers! 2012/3/27 puneet loya puneetl...@gmail.com Can u give an example of create column family with counter column in it. Please reply Regards, Puneet Loya -- With kind regards, Robin Verlangen www.robinverlangen.nl -- Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. Know more about our Big Data quick-start program at the event. New Impetus webcast ‘Cloud-enabled Performance Testing vis-à-vis On-premise’ available at http://bit.ly/z6zT4L. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference. -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: import
You can write your own script to parse the excel file (export as csv) and import it with batch inserts. Should be pretty easy if you have experience with those techniques. 2012/3/27 puneet loya puneetl...@gmail.com I want to import files from excel to cassandra? Is it possible?? Any tool that can help?? Whats the best way?? Plz reply :) -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: Error in FAQ?
If you want to modify a column family, just open the command line interface (cassandra-cli), connect to a node (probably: connect localhost/9160;). When you have to create your first keyspace type: create keyspace MyKeyspace; For modifying an existing keyspace type: use MyKeyspace; If you need more information you can just type help; Good luck! 2012/3/26 Ben McCann b...@benmccann.com Hmmm, I don't see anything regarding column families in cassandra.yaml. It seems like the answer for that question in the FAQ is very outdated. On Sun, Mar 25, 2012 at 4:04 PM, Serge Fonville serge.fonvi...@gmail.comwrote: Hi, 2012/3/26 Ben McCann b...@benmccann.com: There's a line that says Make necessary changes to your storage-conf.xml. I can't find this file. Does it still exist? If so, where should I look? I installed the packaged version of Cassandra available in the Datastax community edition. From http://wiki.apache.org/cassandra/StorageConfiguration Prior to the 0.7 release, Cassandra storage configuration is described by the conf/storage-conf.xml file. As of 0.7, it is described by the conf/cassandra.yaml file. After googling cassandra storage-conf.xml Kind regards/met vriendelijke groet, Serge Fonville http://www.sergefonville.nl Convince Google!! They need to add GAL support on Android (star to agree) http://code.google.com/p/android/issues/detail?id=4602 2012/3/26 Ben McCann b...@benmccann.com: There's a line that says Make necessary changes to your storage-conf.xml. I can't find this file. Does it still exist? If so, where should I look? I installed the packaged version of Cassandra available in the Datastax community edition. Thanks, Ben
Re: problem in create column family
You should use the full type names, e.g. create column family MyColumnFamily with comparator=UTF8Type; 2012/3/26 puneet loya puneetl...@gmail.com It is giving errors like Unable to find abstract-type class 'org.apache.cassandra.db.marshal.utf8' and java.lang.RuntimeException: org.apache.cassandra.db.marshal.MarshalException: cannot parse 'catalogueId' as hex bytes where catalogueId is a column that has utf8 as its data type. they may be just synactical errors.. Please suggest if u can help me out on dis?? -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: How to store a list of values?
but any schema change will break it How do you mean? You don't have to specify the columns in Cassandra so it should work perfect. Except for the skill~ is preserverd for your list. 2012/3/26 samal samalgo...@gmail.com Save the skills in a single column in json format. Job done. Good if it have fixed set of skills, then any add or delete changes need handle in app. -read column first-reformat JOSN-update column (2 thrift calls). skill~Java: null, skill~Cassandra: null This is also good option, but any schema change will break it. On Mar 26, 2012 7:04 PM, Ben McCann b...@benmccann.com wrote: True. But I don't need the skills to be searchable, so I'd rather embed them in the user than add another top-level CF. I was thinking of doing something along the lines of adding a skills super column to the User table: skills: { 'java': null, 'c++': null, 'cobol': null } However, I'm still not sure yet how to accomplish this with Astyanax. I've only figured out how to make composite columns with predefined column names with it and not dynamic column names like this. On Mon, Mar 26, 2012 at 9:08 AM, R. Verlangen ro...@us2.nl wrote: In this case you only neem the columns for values. You don't need the column-values to hold multiple columns (the super-column principle). So a normal CF would work. 2012/3/26 Ben McCann b...@benmccann.com Thanks for the reply Samal. I did not realize that you could store a column with null value. Do you know if this solution would work with composite columns? It seems super columns are being phased out in favor of composites, but I do not understand composites very well yet. I'm trying to figure out if there's any way to accomplish what you've suggested using Astyanax https://github.com/Netflix/astyanax. Thanks for the help, Ben On Mon, Mar 26, 2012 at 8:46 AM, samal samalgo...@gmail.com wrote: plus it is fully compatible with CQL. SELECT * FROM UserSkill WHERE KEY='ben'; On Mon, Mar 26, 2012 at 9:13 PM, samal samalgo...@gmail.com wrote: I would take simple approach. create one other CF UserSkill with row key same as profile_cf key, In user_skill cf will add skill as column name and value null. Columns can be added or removed. UserProfile={ '*ben*'={ blah :blah blah :blah blah :blah } } UserSkill={ '*ben*'={ 'java':'' 'cassandra':'' . . . 'linux':'' 'skill':'infinity' } } On Mon, Mar 26, 2012 at 12:34 PM, Ben McCann b...@benmccann.comwrote: I have a profile column family and want to store a list of skills in each profile. In BigTable I could store a Protocol Bufferhttp://code.google.com/apis/protocolbuffers/docs/overview.htmlwith a repeated field, but I'm not sure how this is typically accomplished in Cassandra. One option would be to store a serialized Thrifthttp://thrift.apache.org/or protobuf, but I'd prefer not to do this as I believe Cassandra doesn't have knowledge of these formats, and so the data in the datastore would not not human readable in CQL queries from the command line. The other solution I thought of would be to use a super column and put a random UUID as the key for each skill: skills: { '4b27c2b3ac48e8df': 'java', '84bf94ea7bc92018': 'c++', '9103b9a93ce9d18': 'cobol' } Is this a good way of handling lists in Cassandra? I imagine there's some idiom I'm not aware of. I'm using the Astyanaxhttps://github.com/Netflix/astyanax/wikiclient library, which only supports composite columns instead of super columns, and so the solution I proposed above would seem quite awkward in that case. Though I'm still having some trouble understanding composite columns as they seem not to be completely documented yet. Would this solution work with composite columns? Thanks, Ben -- With kind regards, Robin Verlangen www.robinverlangen.nl -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: Performance overhead when using start and end columns
@Aaron: Very interesting article! Mentioned it on my Dutch blog. 2012/3/26 Mohit Anchlia mohitanch...@gmail.com Thanks! On Mon, Mar 26, 2012 at 10:53 AM, aaron morton aa...@thelastpickle.comwrote: See the test's in the article. The code I used for profiling is also available. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 27/03/2012, at 6:21 AM, Mohit Anchlia wrote: Thanks but if I do have to specify start and end columns then how much overhead roughly would that translate to since reading metadata should be constant overall? On Mon, Mar 26, 2012 at 10:18 AM, aaron morton aa...@thelastpickle.comwrote: Some information on query plans http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ Tl;Dr; Select columns with no start, in the natural Comparator order. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 25/03/2012, at 2:25 PM, Mohit Anchlia wrote: I have rows with around 2K-50K columns but when I do a query I only need to fetch few columns between start and end columns. I was wondering what performance overhead does it cause by using slice query with start and end columns? Looking at the code it looks like when you give start and end column it goes in IndexSliceReader logic, but it's hard to tell how much overhead on an average one would see? Or is it even worth worrying about? -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: cassandra 1.08 on java7 and win7
Ben Coverston wrote earlier today: Use a version of the Java 6 runtime, Cassandra hasn't been tested at all with the Java 7 runtime So I think that might be a good way to start. 2012/3/26 Frank Hsueh frank.hs...@gmail.com I think I have cassandra the server started In another window: cassandra-cli.bat -h localhost -p 9160 Starting Cassandra Client Connected to: Test Cluster on localhost/9160 Welcome to Cassandra CLI version 1.0.8 Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit. [default@unknown] create keyspace DEMO; log4j:WARN No appenders could be found for logger (org.apache.cassandra.config.DatabaseDescriptor). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Cannot locate cassandra.yaml Fatal configuration error; unable to start server. See log for stacktrace. C:\Workspace\cassandra\apache-cassandra-1.0.8\bin anybody seen this before ? -- Frank Hsueh | frank.hs...@gmail.com -- With kind regards, Robin Verlangen www.robinverlangen.nl
Re: cassandra-cli and uncreachable status confusion
That's correct. If you run describe cluster normally you'll see something like: Cluster Information: Snitch: org.apache.cassandra.locator.SimpleSnitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 3a0f6a80-7140-11e1--511aec3785ff: [IP_OF_NODE, IP_OF_NODE , IP_OF_NODE ] If there are troubles with the schema multiple will be shown of them, like: Cluster Information: Snitch: org.apache.cassandra.locator.SimpleSnitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 3a0f6a80-7140-11e1--511aec3785ff: [IP_OF_NODE, IP_OF_NODE ] 4e252abe-7140-11e1--511aec3785ff: [IP_OF_NODE ] 2012/3/19 Shoaib Mir shoaib...@gmail.com On Tue, Mar 20, 2012 at 4:18 AM, aaron morton aa...@thelastpickle.comwrote: There is a server side check to ensure that all available nodes share the same schema version. Is that checked using describe cluster ?? cheers, Shoaib
Re: Single Node Cassandra Installation
By default Cassandra tries to write to both nodes, always. Writes will only fail (on a node) if it is down, and even then hinted handoff will attempt to keep both nodes in sync when the troubled node comes back up. The point of having two nodes is to have read and write availability in the face of transient failure. Even more: if you enable read repair the chances of having bad writes decreases for any further reads. This will make your cluster become faster consistent again after some failure. Also consider to use different CL's for different operations. E.g. the Twitter timeline can miss some records, however if you would want to display my bank account I would prefer to see the right thing: or a nice error message. 2012/3/16 Ben Coverston ben.covers...@datastax.com Doing reads and writes at CL=1 with RF=2 N=2 does not imply that the reads will be inconsistent. It's more complicated than the simple counting of blocked replicas. It is easy to support the notion that it will be largely consistent, in fact very consistent for most use cases. By default Cassandra tries to write to both nodes, always. Writes will only fail (on a node) if it is down, and even then hinted handoff will attempt to keep both nodes in sync when the troubled node comes back up. The point of having two nodes is to have read and write availability in the face of transient failure. If you are interested there is a good exposition of what 'consistency' means in a system like Cassandra from the link below[1]. [1] http://www.eecs.berkeley.edu/~pbailis/projects/pbs/ On Fri, Mar 16, 2012 at 6:50 AM, Thomas van Neerijnen t...@bossastudios.com wrote: You'll need to either read or write at at least quorum to get consistent data from the cluster so you may as well do both. Now that you mention it, I was wrong about downtime, with a two node cluster reads or writes at quorum will mean both nodes need to be online. Perhaps you could have an emergency switch in your application which flips to consistency of 1 if one of your Cassandra servers goes down? Just make sure it's set back to quorum when the second one returns or again you could end up with inconsistent data. On Fri, Mar 16, 2012 at 2:04 AM, Drew Kutcharian d...@venarc.com wrote: Thanks for the comments, I guess I will end up doing a 2 node cluster with replica count 2 and read consistency 1. -- Drew On Mar 15, 2012, at 4:20 PM, Thomas van Neerijnen wrote: So long as data loss and downtime are acceptable risks a one node cluster is fine. Personally this is usually only acceptable on my workstation, even my dev environment is redundant, because servers fail, usually when you least want them to, like for example when you've decided to save costs by waiting before implementing redundancy. Could a failure end up costing you more than you've saved? I'd rather get cheaper servers (maybe even used off ebay??) so I could have at least two of them. If you do go with a one node solution, altho I haven't tried it myself Priam looks like a good place to start for backups, otherwise roll your own with incremental snapshotting turned on and a watch on the snapshot directory. Storage on something like S3 or Cloud Files is very cheap so there's no good excuse for no backups. On Thu, Mar 15, 2012 at 7:12 PM, R. Verlangen ro...@us2.nl wrote: Hi Drew, One other disadvantage is the lack of consistency level and replication. Both ware part of the high availability / redundancy. So you would really need to backup your single-node-cluster to some other external location. Good luck! 2012/3/15 Drew Kutcharian d...@venarc.com Hi, We are working on a project that initially is going to have very little data, but we would like to use Cassandra to ease the future scalability. Due to budget constraints, we were thinking to run a single node Cassandra for now and then add more nodes as required. I was wondering if it is recommended to run a single node cassandra in production? Are there any other issues besides lack of high availability? Thanks, Drew -- Ben Coverston DataStax -- The Apache Cassandra Company
Re: 0.8.1 Vs 1.0.7
Check your log for messages about rebuilding indices: that might grow your dataset some. One thing is for sure: the data import removed all the crap that lasted in the 0.8.1 cluster (duplicates, thombstones etc). The decrease is fairly dramatic but not unlogical at all. 2012/3/16 Jeremiah Jordan jeremiah.jor...@morningstar.com I would guess more aggressive compaction settings, did you update rows or insert some twice? If you run major compaction a couple times on the 0.8.1 cluster does the data size get smaller? You can use the describe command to check if compression got turned on. -Jeremiah -- *From:* Ravikumar Govindarajan [ravikumar.govindara...@gmail.com] *Sent:* Thursday, March 15, 2012 4:41 AM *To:* user@cassandra.apache.org *Subject:* 0.8.1 Vs 1.0.7 Hi, I ran some data import tests for cassandra 0.8.1 and 1.0.7. The results were a little bit surprising 0.8.1, SimpleStrategy, Rep_Factor=3,QUORUM Writes, RP, SimpleSnitch XXX.XXX.XXX.A datacenter1 rack1 Up Normal 140.61 GB 12.50% XXX.XXX.XXX.B datacenter1 rack1 Up Normal 139.92 GB 12.50% XXX.XXX.XXX.C datacenter1 rack1 Up Normal 138.81 GB 12.50% XXX.XXX.XXX.D datacenter1 rack1 Up Normal 139.78 GB 12.50% XXX.XXX.XXX.E datacenter1 rack1 Up Normal 137.44 GB 12.50% XXX.XXX.XXX.F datacenter1 rack1 Up Normal 138.48 GB 12.50% XXX.XXX.XXX.G datacenter1 rack1 Up Normal 140.52 GB 12.50% XXX.XXX.XXX.H datacenter1 rack1 Up Normal 145.24 GB 12.50% 1.0.7, NTS, Rep_Factor{DC1:3, DC2:2}, LOCAL_QUORUM writes, RP [DC2 m/c yet to join ring], PropertyFileSnitch XXX.XXX.XXX.A DC1 RAC1 Up Normal 48.72 GB 12.50% XXX.XXX.XXX.B DC1 RAC1 Up Normal 51.23 GB 12.50% XXX.XXX.XXX.C DC1 RAC1 Up Normal 52.4GB 12.50% XXX.XXX.XXX.D DC1 RAC1 Up Normal 49.64 GB 12.50% XXX.XXX.XXX.E DC1 RAC1 Up Normal 48.5GB 12.50% XXX.XXX.XXX.F DC1 RAC1 Up Normal53.38 GB 12.50% XXX.XXX.XXX.G DC1 RAC1 Up Normal 51.11 GB 12.50% XXX.XXX.XXX.H DC1 RAC1 Up Normal 53.36 GB 12.50% There seems to be 3X savings in size for the same dataset running 1.0.7. I have not enabled compression for any of the CFs. Will it be enabled by default when creating a new CF in 1.0.7? cassandra.yaml is also mostly identical. Thanks and Regards, Ravi
Re: Single Node Cassandra Installation
Hi Drew, One other disadvantage is the lack of consistency level and replication. Both ware part of the high availability / redundancy. So you would really need to backup your single-node-cluster to some other external location. Good luck! 2012/3/15 Drew Kutcharian d...@venarc.com Hi, We are working on a project that initially is going to have very little data, but we would like to use Cassandra to ease the future scalability. Due to budget constraints, we were thinking to run a single node Cassandra for now and then add more nodes as required. I was wondering if it is recommended to run a single node cassandra in production? Are there any other issues besides lack of high availability? Thanks, Drew
Re: Node joining / unknown
It seemed that one of the other nodes had trouble with a compaction task. The C node was waiting for that. It's now streaming all it's data into place. Thank you all for your time! 2012/3/7 i...@4friends.od.ua just run nodetool compactionstat on other nodes. -Original Message- From: R. Verlangen ro...@us2.nl To: user@cassandra.apache.org Sent: Wed, 07 Mar 2012 23:09 Subject: Re: Node joining / unknown @Brandon: Thank you for the information. I'll do that next time. @Igor: Any ways to find out whether that is the current state? And if so, how to solve it? 2012/3/7 i...@4friends.od.ua Maybe it wait for verification compaction on other node? -Original Message- From: R. Verlangen ro...@us2.nl To: user@cassandra.apache.org Sent: Wed, 07 Mar 2012 22:15 Subject: Re: Node joining / unknown At this moment the node has joined the ring (after a restart: tried that before, but now it had finally result). When I try to run repair on the new node, the log says (the new node is NODE C): INFO [AntiEntropyStage:1] 2012-03-07 21:12:06,453 AntiEntropyService.java (line 190) [repair #cfcc12b0-6891-11e1--70a329caccff] Received merkle tree for StorageMeta from NODE A INFO [AntiEntropyStage:1] 2012-03-07 21:12:06,643 AntiEntropyService.java (line 190) [repair #cfcc12b0-6891-11e1--70a329caccff] Received merkle tree for StorageMeta from NODE B And then doesn't do anything anymore. Tried it a couple of times again. It's just not starting. Results from netstats on NODE C: Mode: NORMAL Not sending any streams. Not receiving any streams. Pool NameActive Pending Completed Commandsn/a 0 5 Responses n/a93 4296 Any suggestions? Thank you! 2012/3/7 aaron morton aa...@thelastpickle.com - When I try to remove the token, it says: Exception in thread main java.lang.UnsupportedOperationException: Token not found. Am assuming you ran nodetool removetoken on a node other than the joining node? What did nodetool ring look like on that machine ? Take a look at nodetool netstats on the joining node to see if streaming has failed. If it's dead then… 1) Try restarting the joining node and run nodetool repair on it immediately. Note: am assuming QUOURM CL otherwise things may get inconsistent. or 2) Stop the node. Try to get remove the token again from another node. Node that removing a token will stream data around the place as well. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 7/03/2012, at 9:11 PM, R. Verlangen wrote: Hi there, I'm currently in a really weird situation. - Nodetool ring says node X is joining (this already takes 12 hours, with no activity) - When I try to remove the token, it says: Exception in thread main java.lang.UnsupportedOperationException: Token not found. - Removetoken status = No token removals in process. How to get that node out of my cluster? With kind regards, Robin Verlangen
Node joining / unknown
Hi there, I'm currently in a really weird situation. - Nodetool ring says node X is joining (this already takes 12 hours, with no activity) - When I try to remove the token, it says: Exception in thread main java.lang.UnsupportedOperationException: Token not found. - Removetoken status = No token removals in process. How to get that node out of my cluster? With kind regards, Robin Verlangen
Re: Node joining / unknown
At this moment the node has joined the ring (after a restart: tried that before, but now it had finally result). When I try to run repair on the new node, the log says (the new node is NODE C): INFO [AntiEntropyStage:1] 2012-03-07 21:12:06,453 AntiEntropyService.java (line 190) [repair #cfcc12b0-6891-11e1--70a329caccff] Received merkle tree for StorageMeta from NODE A INFO [AntiEntropyStage:1] 2012-03-07 21:12:06,643 AntiEntropyService.java (line 190) [repair #cfcc12b0-6891-11e1--70a329caccff] Received merkle tree for StorageMeta from NODE B And then doesn't do anything anymore. Tried it a couple of times again. It's just not starting. Results from netstats on NODE C: Mode: NORMAL Not sending any streams. Not receiving any streams. Pool NameActive Pending Completed Commandsn/a 0 5 Responses n/a93 4296 Any suggestions? Thank you! 2012/3/7 aaron morton aa...@thelastpickle.com - When I try to remove the token, it says: Exception in thread main java.lang.UnsupportedOperationException: Token not found. Am assuming you ran nodetool removetoken on a node other than the joining node? What did nodetool ring look like on that machine ? Take a look at nodetool netstats on the joining node to see if streaming has failed. If it's dead then… 1) Try restarting the joining node and run nodetool repair on it immediately. Note: am assuming QUOURM CL otherwise things may get inconsistent. or 2) Stop the node. Try to get remove the token again from another node. Node that removing a token will stream data around the place as well. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 7/03/2012, at 9:11 PM, R. Verlangen wrote: Hi there, I'm currently in a really weird situation. - Nodetool ring says node X is joining (this already takes 12 hours, with no activity) - When I try to remove the token, it says: Exception in thread main java.lang.UnsupportedOperationException: Token not found. - Removetoken status = No token removals in process. How to get that node out of my cluster? With kind regards, Robin Verlangen
Re: Node joining / unknown
@Brandon: Thank you for the information. I'll do that next time. @Igor: Any ways to find out whether that is the current state? And if so, how to solve it? 2012/3/7 i...@4friends.od.ua Maybe it wait for verification compaction on other node? -Original Message- From: R. Verlangen ro...@us2.nl To: user@cassandra.apache.org Sent: Wed, 07 Mar 2012 22:15 Subject: Re: Node joining / unknown At this moment the node has joined the ring (after a restart: tried that before, but now it had finally result). When I try to run repair on the new node, the log says (the new node is NODE C): INFO [AntiEntropyStage:1] 2012-03-07 21:12:06,453 AntiEntropyService.java (line 190) [repair #cfcc12b0-6891-11e1--70a329caccff] Received merkle tree for StorageMeta from NODE A INFO [AntiEntropyStage:1] 2012-03-07 21:12:06,643 AntiEntropyService.java (line 190) [repair #cfcc12b0-6891-11e1--70a329caccff] Received merkle tree for StorageMeta from NODE B And then doesn't do anything anymore. Tried it a couple of times again. It's just not starting. Results from netstats on NODE C: Mode: NORMAL Not sending any streams. Not receiving any streams. Pool NameActive Pending Completed Commandsn/a 0 5 Responses n/a93 4296 Any suggestions? Thank you! 2012/3/7 aaron morton aa...@thelastpickle.com - When I try to remove the token, it says: Exception in thread main java.lang.UnsupportedOperationException: Token not found. Am assuming you ran nodetool removetoken on a node other than the joining node? What did nodetool ring look like on that machine ? Take a look at nodetool netstats on the joining node to see if streaming has failed. If it's dead then… 1) Try restarting the joining node and run nodetool repair on it immediately. Note: am assuming QUOURM CL otherwise things may get inconsistent. or 2) Stop the node. Try to get remove the token again from another node. Node that removing a token will stream data around the place as well. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 7/03/2012, at 9:11 PM, R. Verlangen wrote: Hi there, I'm currently in a really weird situation. - Nodetool ring says node X is joining (this already takes 12 hours, with no activity) - When I try to remove the token, it says: Exception in thread main java.lang.UnsupportedOperationException: Token not found. - Removetoken status = No token removals in process. How to get that node out of my cluster? With kind regards, Robin Verlangen
Re: TimeUUID
For querying purposes it would be better to use readable strings because you can really get information out of that. TimeUUID is just a unique value based on time; but not only the time. 2012/2/28 Tamar Fraenkel ta...@tok-media.com Hi! I have a column family where I use rows as time buckets. What I do is take epoc time in seconds, and round it to 1 hour (taking the result of time_since_epoc_second divided by 3600). My key validation type is LongType. I wonder whether it is better to use TimeUUID or even readable string representation for time? Thanks, -- *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 tokLogo.png
Combining Cassandra with some SQL language
Hi there, I'm currently busy with the technical design of a new project. Of course it will depend on your needs, but is it weird to combine Cassandra with a SQL language like MySQL? In my usecase it would be nice because we have some tables/CF's with lots and lots of data that does not really have to be consistent 100%, but also have some data that should be always consistent. What do you think of this? With kind regards, Robin Verlangen
Re: Combining Cassandra with some SQL language
Ok, thank you all for your opinions. Seems that I can continue without any extra db-model headaches ;-) 2012/2/27 Sanjay Sharma sanjay.sha...@impetus.co.in Kundera (https://github.com/impetus-opensource/Kundera)- an open source APL Java ORM allows polyglot persistence between RDBMS and NoSQL databases such as Cassandra, MongoDB, HBase etc. transparently to the business logic developer. A note of caution- this does not mean that Cassandra data modeling can be bypassed- NoSQL entities still need to be modeled in such a way so as to best use Cassandra capabilities. Kundera can also take care of relationship between the entities in RDBMS. Transactions management is still pending however. Regards, Sanjay *From:* Adam Haney [mailto:adam.ha...@retickr.com] *Sent:* Sunday, February 26, 2012 7:51 PM *To:* user@cassandra.apache.org *Subject:* Re: Combining Cassandra with some SQL language I've been using a combination of MySQL and Cassandra for about a year now on a project that now serves about 20k users. We use Cassandra for storing large entities and MySQL to store meta data that allows us to do better ad hoc querying. It's worked quite well for us. During this time we have also been able to migrate some of our tables in MySQL to Cassandra if MySQL performance / capacity became a problem. This may seem obvious but if you're planning on creating a data model that spans multiple databases make sure you encapsulate the logic to read/write/delete information in a good data model library and only use that library to access your data. This is good practice anyway but when you add the extra complication of multiple databases that may reference one another it's an absolute must. On Sun, Feb 26, 2012 at 8:06 AM, R. Verlangen ro...@us2.nl wrote: Hi there, I'm currently busy with the technical design of a new project. Of course it will depend on your needs, but is it weird to combine Cassandra with a SQL language like MySQL? In my usecase it would be nice because we have some tables/CF's with lots and lots of data that does not really have to be consistent 100%, but also have some data that should be always consistent. What do you think of this? With kind regards, Robin Verlangen -- Impetus’ Head of Innovation labs, Vineet Tyagi will be presenting on ‘Big Data Big Costs?’ at the Strata Conference, CA (Feb 28 - Mar 1) http://bit.ly/bSMWd7. Listen to our webcast ‘Hybrid Approach to Extend Web Apps to Tablets Smartphones’ available at http://bit.ly/yQC1oD. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: List all keys with RandomPartitioner
You can leave the end key empty. 1) Start with startkey = 2) Next iteration start with startkey = last key of the previous batch 3) Keep on going until you ran out of results 2012/2/22 Rafael Almeida almeida...@yahoo.com From: Franc Carter franc.car...@sirca.org.au To: user@cassandra.apache.org Sent: Wednesday, February 22, 2012 9:24 AM Subject: Re: List all keys with RandomPartitioner On Wed, Feb 22, 2012 at 8:47 PM, Flavio Baronti f.baro...@list-group.com wrote: I need to iterate over all the rows in a column family stored with RandomPartitioner. When I reach the end of a key slice, I need to find the token of the last key in order to ask for the next slice. I saw in an old email that the token for a specific key can be recoveder through FBUtilities.hash(). That class however is inside the full Cassandra jar, not inside the client-specific part. Is there a way to iterate over all the keys which does not require the server-side Cassandra jar? Does this help ? http://wiki.apache.org/cassandra/FAQ#iter_world I don't get it. It says to use the last key read as start key, but what should be used as end key?
Re: Please advise -- 750MB object possible?
I would suggest you chunk them down into small pieces (~ 10-50MB) and just fetch all the parts you need. A problem might be that if fetching one fails, the whole blob is useless. 2012/2/22 Rafael Almeida almeida...@yahoo.com Keep them where? -- *From:* Mohit Anchlia mohitanch...@gmail.com *To:* user@cassandra.apache.org *Cc:* potek...@bnl.gov *Sent:* Wednesday, February 22, 2012 3:44 PM *Subject:* Re: Please advise -- 750MB object possible? In my opinion if you are busy site or application keep blobs out of the database. On Wed, Feb 22, 2012 at 9:37 AM, Dan Retzlaff dretzl...@gmail.com wrote: Chunking is a good idea, but you'll have to do it yourself. A few of the columns in our application got quite large (maybe ~150MB) and the failure mode was RPC timeout exceptions. Nodes couldn't always move that much data across our data center interconnect in the default 10 seconds. With enough heap and a faster network you could probably get by without chunking, but it's not ideal. On Wed, Feb 22, 2012 at 9:04 AM, Maxim Potekhin potek...@bnl.gov wrote: Hello everybody, I'm being asked whether we can serve an object, which I assume is a blob, of 750MB size? I guess the real question is of how to chunk it and/or even it's possible to chunk it. Thanks! Maxim
Re: Newbie Question: Cassandra consuming 100% CPU on ubuntu server
You might want to check your Cassandra logs, they contain important information that might lead you to the actual cause of the problems. 2012/2/18 Aditya Gupta ady...@gmail.com Thanks! But what about the 100% cpu consumption that is causing the server to hang? On Sat, Feb 18, 2012 at 6:19 PM, Watanabe Maki watanabe.m...@gmail.comwrote: I haven't use the packaged kit, but Cassandra uses half of physical memory on your system by default. You need to edit cassandra-env.sh to decrease heap size. Update MAX_HEAP_SIZE and NEW_HEAP_SIZE and restart. From iPhone On 2012/02/18, at 20:40, Aditya Gupta ady...@gmail.com wrote: I just installed Cassandra on my ubuntu server by adding the following to the sources list: deb http://www.apache.org/dist/cassandra/debian 10x main deb-src http://www.apache.org/dist/cassandra/debian 10x main Soon after install I started getting OOM errors then the server became unresponsive. I added more RAM to the server but found that cassandra was consuming 100% CPU 1GB RAM as soon the server was being started. Why is this happening how can get it to normal conditions ?
Re: Replication factor per column family
Ok, that's clear, thank you for your time! 2012/2/16 aaron morton aa...@thelastpickle.com yes. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 16/02/2012, at 10:15 PM, R. Verlangen wrote: Hmm ok. This means if I want to have a CF with RF = 3 and another CF with RF = 1 (e.g. some debug logging) I will have to create 2 keyspaces? 2012/2/16 aaron morton aa...@thelastpickle.com Multiple CF mutations for a row are treated atomically in the commit log, and they are sent together to the replicas. Replication occurs at the row level, not the row+cf level. If each CF had it's own RF, odd things may happen. Like sending a batch mutation for one row and two CF's that fails because there is not enough nodes for one of the CF's. Would be other reasons as well. In short it's baked in. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 16/02/2012, at 9:54 PM, R. Verlangen wrote: Hi there, As the subject states: Is it possible to set a replication factor per column family? Could not find anything of recent releases. I'm running Cassandra 1.0.7 and I think it should be possible on a per CF basis instead of the whole keyspace. With kind regards, Robin
Re: CQL query issue when fetching data from Cassandra
I'm not sure about your first 2 questions. The third might be an exception: check your Cassandra logs. About the like-thing: there's no such query possibiliy in Cassandra / CQL. You can take a look at Hadoop / Hive to tackle those problems. 2012/2/16 Roshan codeva...@gmail.com Hi I am using Cassandra 1.0.6 version and having one column family in my keyspace. create column family TestCF with comparator = UTF8Type and column_metadata = [ {column_name : userid, validation_class : BytesType, index_name : userid_idx, index_type : KEYS}, {column_name : workspace, validation_class : BytesType, index_name : wp_idx, index_type : KEYS}, {column_name : module, validation_class : BytesType, index_name : module_idx, index_type : KEYS}, {column_name : action, validation_class : BytesType, index_name : action_idx, index_type : KEYS}, {column_name : description, validation_class : BytesType}, {column_name : status, validation_class : BytesType, index_name : status_idx, index_type : KEYS}, {column_name : createdtime, validation_class : BytesType}, {column_name : created, validation_class : BytesType, index_name : created_idx, index_type : KEYS}, {column_name : logdetail, validation_class : BytesType}] and keys_cached = 1 and rows_cached = 1000 and row_cache_save_period = 0 and key_cache_save_period = 3600 and memtable_throughput = 255 and memtable_operations = 0.29; 1). The IN operator is not working SELECT * FROM TestCF WHERE status IN ('Failed', 'Success') 2) The OR operator is not fetching data. SELECT * FROM TestCF WHERE status='Failed' OR status='Success' 3) If I use AND operator, it also not sending data. Query doesn't have issues, but result set is null. SELECT * FROM TestCF WHERE status='Failed' AND status='Success' 4) Is there any thing similar to LIKE in CQL? I want to search data based on some part of string. Could someone please help me to solve the above issues? Thanks. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/CQL-query-issue-when-fetching-data-from-Cassandra-tp7290072p7290072.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Wide row column slicing - row size shard limit
Things you should know: - Thrift has a limit on the amount of data it will accept / send, you can configure this in Cassandra: 64MB's should still work find (1) - Rows should not become huge: this will make perfect load balancing impossible in your cluster - A single row should fit on a disk - The limit of columns per row is 2 billion You should pick a range for your time range (e.g. second, minute, ..) that suits your needs. As far as I'm aware of, there's no such limit as 10MB in Cassandra for a single row to decrease performance. Might be a memory / IO problem. 2012/2/15 Data Craftsman database.crafts...@gmail.com Hello experts, Based on this blog of Basic Time Series with Cassandra data modeling, http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/ This (wide row column slicing) works well enough for a while, but over time, this row will get very large. If you are storing sensor data that updates hundreds of times per second, that row will quickly become gigantic and unusable. The answer to that is to shard the data up in some way There is a limit on how big the row size can be before slowing down the update and query performance, that is 10MB or less. Is this still true in Cassandra latest version? or in what release Cassandra will remove this limit? Manually sharding the wide row will increase the application complexity, it would be better if Cassandra can handle it transparently. Thanks, Charlie | DBA Developer p.s. Quora link, http://www.quora.com/Cassandra-database/What-are-good-ways-to-design-data-model-in-Cassandra-for-historical-data
Replication factor per column family
Hi there, As the subject states: Is it possible to set a replication factor per column family? Could not find anything of recent releases. I'm running Cassandra 1.0.7 and I think it should be possible on a per CF basis instead of the whole keyspace. With kind regards, Robin
Re: Replication factor per column family
Hmm ok. This means if I want to have a CF with RF = 3 and another CF with RF = 1 (e.g. some debug logging) I will have to create 2 keyspaces? 2012/2/16 aaron morton aa...@thelastpickle.com Multiple CF mutations for a row are treated atomically in the commit log, and they are sent together to the replicas. Replication occurs at the row level, not the row+cf level. If each CF had it's own RF, odd things may happen. Like sending a batch mutation for one row and two CF's that fails because there is not enough nodes for one of the CF's. Would be other reasons as well. In short it's baked in. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 16/02/2012, at 9:54 PM, R. Verlangen wrote: Hi there, As the subject states: Is it possible to set a replication factor per column family? Could not find anything of recent releases. I'm running Cassandra 1.0.7 and I think it should be possible on a per CF basis instead of the whole keyspace. With kind regards, Robin
Re: Deleting a column vs setting it's value to empty
Setting to may cause you less headaches as you won't have to deal with tombstones You won't have to deal with tombstones manually, the Thrift API will take care of this. Deleting an empty column value will always be better; with one exception, when empty does actually mean something else then non-existing. 2012/2/10 Narendra Sharma narendra.sha...@gmail.com IMO deleting is always better. It is better to not store the column if there is no value associated. -Naren On Fri, Feb 10, 2012 at 12:15 PM, Drew Kutcharian d...@venarc.com wrote: Hi Everyone, Let's say I have the following object which I would like to save in Cassandra: class User { UUID id; //row key String name; //columnKey: name, columnValue: the name of the user String description; //columnKey: description, columnValue: the description of the user } Description can be nullable. What's the best approach when a user updates her description and sets it to null? Should I delete the description column or set it to an empty string? In addition, if I go with the delete column strategy, since I don't know what was the previous value of description (the column could not even exist), what would happen when I delete a non existent column? Thanks, Drew -- Narendra Sharma Software Engineer *http://www.aeris.com http://www.persistentsys.com* *http://narendrasharma.blogspot.com/*
Re: Querying for rows without a particular column
One option might be to maintain an index containing the keys of the rows. The index would then have the same TTL as the row itself so when you iterate over the index columns you'll find exactly the same results. Although I'm not really sure whether this is the best option. Another might be to use Hadoop to find your results with a map/reduce task. 2012/2/13 Asankha C. Perera asan...@apache.org Hi All I am using expiring columns in my column family, and need to search for the rows where a particular column expired (and no longer exists).. I am using Hector client. How can I make a query to find the rows of my interest? thanks asankha -- Asankha C. Perera AdroitLogic, http://adroitlogic.org http://esbmagic.blogspot.com
Re: deleting rows and tombstones
Are you planning to insert rows with keys that existed before? If that's true, there will be no tombstones (as far as I understand Cassandra). It that's not, then you will get tombstones that might slow down the reads because they have to be skipped until the next compaction. 2012/2/14 Todd Burruss bburr...@expedia.com my design calls for deleting a row (by key, not individual columns) and re-inserting it a lot and I'm concerned about tombstone build up slowing down reads. I know if I delete a lot of individual columns the tombstones will build up and slow down reads until they are cleaned up, but not sure if the same holds for deleting the whole role. thoughts?
Re: timed-out retrieving a giant row.
I'm familiar to this in PHPCassa, but with Hector it would be something like this: Query you CF with a range.setStart(lastColName) and range.setFinish(StringUtils.byte() where the lastColName is the name of the column from the previous read. You can continue this until you run out of results. 2012/2/14 Yuhan Zhang yzh...@onescreen.com Hi all, I'm using the Hector client 0.8, trying to retrieve a list of IDs from a gaint row. each ID is a columnName in the row It works ok when there's not many IDs, but SliceQuery starts to time-out after the row becomes big. Is this approach the correct way to store a list of IDs? are there some settings that I'm missing? by looking at the code, it sets the range of the columnNames to be setRange(null, null, false, Integer.MAX_VALUE); is there a way in cassandra to retrieve the first 100 columns, then the next 100 columns, and so forth? Thank you. Yuhan
Re: timed-out retrieving a giant row.
Of course you should set your limit to 100 or something like that, not Integer.MAX_VALUE ;-) 2012/2/14 R. Verlangen ro...@us2.nl I'm familiar to this in PHPCassa, but with Hector it would be something like this: Query you CF with a range.setStart(lastColName) and range.setFinish(StringUtils.byte() where the lastColName is the name of the column from the previous read. You can continue this until you run out of results. 2012/2/14 Yuhan Zhang yzh...@onescreen.com Hi all, I'm using the Hector client 0.8, trying to retrieve a list of IDs from a gaint row. each ID is a columnName in the row It works ok when there's not many IDs, but SliceQuery starts to time-out after the row becomes big. Is this approach the correct way to store a list of IDs? are there some settings that I'm missing? by looking at the code, it sets the range of the columnNames to be setRange(null, null, false, Integer.MAX_VALUE); is there a way in cassandra to retrieve the first 100 columns, then the next 100 columns, and so forth? Thank you. Yuhan
Re: keycache persisted to disk ?
This is because of the warm up of Cassandra as it starts. On a start it will start fetching the rows that were cached: this will have to be loaded from the disk, as there is nothing in the cache yet. You can read more about this at http://wiki.apache.org/cassandra/LargeDataSetConsiderations 2012/2/13 Franc Carter franc.car...@sirca.org.au On Mon, Feb 13, 2012 at 5:03 PM, zhangcheng zhangch...@jike.com wrote: ** I think the keycaches and rowcahches are bothe persisted to disk when shutdown, and restored from disk when restart, then improve the performance. Thanks - that would explain at least some of what I am seeing cheers 2012-02-13 -- zhangcheng -- *发件人:* Franc Carter *发送时间:* 2012-02-13 13:53:56 *收件人:* user *抄送:* *主题:* keycache persisted to disk ? Hi, I am testing Cassandra on Amazon and finding performance can vary fairly wildly. I'm leaning towards it being an artifact of the AWS I/O system but have one other possibility. Are keycaches persisted to disk and restored on a clean shutdown and restart ? cheers -- *Franc Carter* | Systems architect | Sirca Ltd marc.zianideferra...@sirca.org.au franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 9236 9118 Level 9, 80 Clarence St, Sydney NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215 -- *Franc Carter* | Systems architect | Sirca Ltd marc.zianideferra...@sirca.org.au franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 9236 9118 Level 9, 80 Clarence St, Sydney NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215
Re: keycache persisted to disk ?
I also noticed that, Cassandra appears to perform better under a continues load. Are you sure the rows you're quering are actually in the cache? 2012/2/13 Franc Carter franc.car...@sirca.org.au 2012/2/13 R. Verlangen ro...@us2.nl This is because of the warm up of Cassandra as it starts. On a start it will start fetching the rows that were cached: this will have to be loaded from the disk, as there is nothing in the cache yet. You can read more about this at http://wiki.apache.org/cassandra/LargeDataSetConsiderations I actually has the opposite 'problem'. I have a pair of servers that have been static since mid last week, but have seen performance vary significantly (x10) for exactly the same query. I hypothesised it was various caches so I shut down Cassandra, flushed the O/S buffer cache and then bought it back up. The performance wasn't significantly different to the pre-flush performance cheers 2012/2/13 Franc Carter franc.car...@sirca.org.au On Mon, Feb 13, 2012 at 5:03 PM, zhangcheng zhangch...@jike.com wrote: ** I think the keycaches and rowcahches are bothe persisted to disk when shutdown, and restored from disk when restart, then improve the performance. Thanks - that would explain at least some of what I am seeing cheers 2012-02-13 -- zhangcheng -- *发件人:* Franc Carter *发送时间:* 2012-02-13 13:53:56 *收件人:* user *抄送:* *主题:* keycache persisted to disk ? Hi, I am testing Cassandra on Amazon and finding performance can vary fairly wildly. I'm leaning towards it being an artifact of the AWS I/O system but have one other possibility. Are keycaches persisted to disk and restored on a clean shutdown and restart ? cheers -- *Franc Carter* | Systems architect | Sirca Ltd marc.zianideferra...@sirca.org.au franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 9236 9118 Level 9, 80 Clarence St, Sydney NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215 -- *Franc Carter* | Systems architect | Sirca Ltd marc.zianideferra...@sirca.org.au franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 9236 9118 Level 9, 80 Clarence St, Sydney NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215 -- *Franc Carter* | Systems architect | Sirca Ltd marc.zianideferra...@sirca.org.au franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 9236 9118 Level 9, 80 Clarence St, Sydney NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215
Re: Best way to know the cluster status
You might consider writing some kind of php script that runs nodetool ring and parse the output? 2012/2/6 Tamil selvan R.S tamil.3...@gmail.com Hi, What is the best way to know the cluster status via php? Currently we are trying to connect to individual cassandra instance with a specified timeout and if it fails we report the node to be down. But this test remains faulty. What are the other ways to test availability of nodes in cassandra cluster? How does datastax opscenter manage to do that? Regards, Tamil Selvan
Re: nodetool hangs and didn't print anything with firewall
Do you allow both outbound as inbound traffic? You might also try allowing both TCP as UDP. 2012/2/6 Roshan codeva...@gmail.com Yes, If the firewall is disable it works. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/nodetool-hangs-and-didn-t-print-anything-with-firewall-tp7257286p7257310.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: yet a couple more questions on composite columns
Yiming, I am using 2 CF's. Performance wise this should not be an issue. I use it for small files data store. My 2 CF's are: FilesMeta FilesData 2012/2/5 Yiming Sun yiming@gmail.com Interesting idea, Jim. Is there a reason you don't you use metadata:{accountId} instead? For performance reasons? On Sat, Feb 4, 2012 at 6:24 PM, Jim Ancona j...@anconafamily.com wrote: I've used special values which still comply with the Composite schema for the metadata columns, e.g. a column of 1970-01-01:{accountId} for a metadata column where the Composite is DateType:UTF8Type. Jim On Sat, Feb 4, 2012 at 2:13 PM, Yiming Sun yiming@gmail.com wrote: Thanks Andrey and Chris. It sounds like we don't necessarily have to use composite columns. From what I understand about dynamic CF, each row may have completely different data from other rows; but in our case, the data in each row is similar to other rows; my concern was more about the homogeneity of the data between columns. In our original supercolumn-based schema, one special supercolumn is called metadata which contains a number of subcolumns to hold metadata describing each collection (e.g. number of documents, etc.), then the rest of the supercolumns in the same row are all IDs of documents belong to the collection, and for each document supercolumn, the subcolumns contain the document content as well as metadata on individual document (e.g. checksum of each document). To move away from the supercolumn schema, I could either create two CFs, one to hold metadata, the other document content; or I could create just one CF mixing metadata and doc content in the same row, and using composite column names to identify if the particular column is metadata or a document. I am just wondering if you have any inputs on the pros and cons of each schema. -- Y. On Fri, Feb 3, 2012 at 10:27 PM, Chris Gerken chrisger...@mindspring.com wrote: On 4 February 2012 06:21, Yiming Sun yiming@gmail.com wrote: I cannot have one composite column name with 3 components while another with 4 components? Just put 4 components and left last empty (if it is same type)?! Another question I have is how flexible composite columns actually are. If my data model has a CF containing US zip codes with the following composite columns: {OH:Spring Field} : 45503 {OH:Columbus} : 43085 {FL:Spring Field} : 32401 {FL:Key West} : 33040 I know I can ask cassandra to give me the zip codes of all cities in OH. But can I ask it to give me the zip codes of all cities named Spring Field using this model? Thanks. No. You set first composite component at first. I'd use a dynamic CF: row key = state abbreviation column name = city name column value = zip code (or a complex object, one of whose properties is zip code) you can iterate over the columns in a single row to get a state's city names and their zip code and you can do a get_range_slices on all keys for the columns starting and ending on the city name to find out the zip codes for a cities with the given name. I think - Chris
Re: yet a couple more questions on composite columns
I also made something like this a while ago. I decided to go for the 2-rows-solution: by doing that you don't have the need for super columns. Cassandra is really good at reading, so this should not be an issue. Cheers! 2012/2/4 Yiming Sun yiming@gmail.com Thanks Andrey and Chris. It sounds like we don't necessarily have to use composite columns. From what I understand about dynamic CF, each row may have completely different data from other rows; but in our case, the data in each row is similar to other rows; my concern was more about the homogeneity of the data between columns. In our original supercolumn-based schema, one special supercolumn is called metadata which contains a number of subcolumns to hold metadata describing each collection (e.g. number of documents, etc.), then the rest of the supercolumns in the same row are all IDs of documents belong to the collection, and for each document supercolumn, the subcolumns contain the document content as well as metadata on individual document (e.g. checksum of each document). To move away from the supercolumn schema, I could either create two CFs, one to hold metadata, the other document content; or I could create just one CF mixing metadata and doc content in the same row, and using composite column names to identify if the particular column is metadata or a document. I am just wondering if you have any inputs on the pros and cons of each schema. -- Y. On Fri, Feb 3, 2012 at 10:27 PM, Chris Gerken chrisger...@mindspring.comwrote: On 4 February 2012 06:21, Yiming Sun yiming@gmail.com wrote: I cannot have one composite column name with 3 components while another with 4 components? Just put 4 components and left last empty (if it is same type)?! Another question I have is how flexible composite columns actually are. If my data model has a CF containing US zip codes with the following composite columns: {OH:Spring Field} : 45503 {OH:Columbus} : 43085 {FL:Spring Field} : 32401 {FL:Key West} : 33040 I know I can ask cassandra to give me the zip codes of all cities in OH. But can I ask it to give me the zip codes of all cities named Spring Field using this model? Thanks. No. You set first composite component at first. I'd use a dynamic CF: row key = state abbreviation column name = city name column value = zip code (or a complex object, one of whose properties is zip code) you can iterate over the columns in a single row to get a state's city names and their zip code and you can do a get_range_slices on all keys for the columns starting and ending on the city name to find out the zip codes for a cities with the given name. I think - Chris
Re: yet a couple more questions on composite columns
I just kept both row keys the same. This was very trivial for fetching them both. When you have A, you can fetch B, and vice versa. 2012/2/4 Yiming Sun yiming@gmail.com Interesting idea, R.V. But what did you do with the row keys? On Sat, Feb 4, 2012 at 2:29 PM, R. Verlangen ro...@us2.nl wrote: I also made something like this a while ago. I decided to go for the 2-rows-solution: by doing that you don't have the need for super columns. Cassandra is really good at reading, so this should not be an issue. Cheers! 2012/2/4 Yiming Sun yiming@gmail.com Thanks Andrey and Chris. It sounds like we don't necessarily have to use composite columns. From what I understand about dynamic CF, each row may have completely different data from other rows; but in our case, the data in each row is similar to other rows; my concern was more about the homogeneity of the data between columns. In our original supercolumn-based schema, one special supercolumn is called metadata which contains a number of subcolumns to hold metadata describing each collection (e.g. number of documents, etc.), then the rest of the supercolumns in the same row are all IDs of documents belong to the collection, and for each document supercolumn, the subcolumns contain the document content as well as metadata on individual document (e.g. checksum of each document). To move away from the supercolumn schema, I could either create two CFs, one to hold metadata, the other document content; or I could create just one CF mixing metadata and doc content in the same row, and using composite column names to identify if the particular column is metadata or a document. I am just wondering if you have any inputs on the pros and cons of each schema. -- Y. On Fri, Feb 3, 2012 at 10:27 PM, Chris Gerken chrisger...@mindspring.com wrote: On 4 February 2012 06:21, Yiming Sun yiming@gmail.com wrote: I cannot have one composite column name with 3 components while another with 4 components? Just put 4 components and left last empty (if it is same type)?! Another question I have is how flexible composite columns actually are. If my data model has a CF containing US zip codes with the following composite columns: {OH:Spring Field} : 45503 {OH:Columbus} : 43085 {FL:Spring Field} : 32401 {FL:Key West} : 33040 I know I can ask cassandra to give me the zip codes of all cities in OH. But can I ask it to give me the zip codes of all cities named Spring Field using this model? Thanks. No. You set first composite component at first. I'd use a dynamic CF: row key = state abbreviation column name = city name column value = zip code (or a complex object, one of whose properties is zip code) you can iterate over the columns in a single row to get a state's city names and their zip code and you can do a get_range_slices on all keys for the columns starting and ending on the city name to find out the zip codes for a cities with the given name. I think - Chris
Re: Restart cassandra every X days?
Yes, I already did a repair and cleanup. Currently my ring looks like this: Address DC RackStatus State LoadOwns Token ***.89datacenter1 rack1 Up Normal 2.44 GB 50.00% 0 ***.135datacenter1 rack1 Up Normal 6.99 GB 50.00% 85070591730234615865843651857942052864 It's not really a problem, but I'm still wondering why this happens. 2012/2/1 aaron morton aa...@thelastpickle.com Do you mean the load in nodetool ring is not even, despite the tokens been evenly distributed ? I would assume this is not the case given the difference, but it may be hints given you have just done an upgrade. Check the system using nodetool cfstats to see. They will eventually be delivered and deleted. More likely you will want to: 1) nodetool repair to make sure all data is distributed then 2) nodetool cleanup if you have changed the tokens at any point finally Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 31/01/2012, at 11:56 PM, R. Verlangen wrote: After running 3 days on Cassandra 1.0.7 it seems the problem has been solved. One weird thing remains, on our 2 nodes (both 50% of the ring), the first's usage is just over 25% of the second. Anyone got an explanation for that? 2012/1/29 aaron morton aa...@thelastpickle.com Yes but… For every upgrade read the NEWS.TXT it will go through the upgrade procedure in detail. If you want to feel extra smart scan through the CHANGES.txt to get an idea of whats going on. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 29/01/2012, at 4:14 AM, Maxim Potekhin wrote: Sorry if this has been covered, I was concentrating solely on 0.8x -- can I just d/l 1.0.x and continue using same data on same cluster? Maxim On 1/28/2012 7:53 AM, R. Verlangen wrote: Ok, seems that it's clear what I should do next ;-) 2012/1/28 aaron morton aa...@thelastpickle.com There are no blockers to upgrading to 1.0.X. A - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 28/01/2012, at 7:48 AM, R. Verlangen wrote: Ok. Seems that an upgrade might fix these problems. Is Cassandra 1.x.x stable enough to upgrade for, or should we wait for a couple of weeks? 2012/1/27 Edward Capriolo edlinuxg...@gmail.com I would not say that issuing restart after x days is a good idea. You are mostly developing a superstition. You should find the source of the problem. It could be jmx or thrift clients not closing connections. We don't restart nodes on a regiment they work fine. On Thursday, January 26, 2012, Mike Panchenko m...@mihasya.com wrote: There are two relevant bugs (that I know of), both resolved in somewhat recent versions, which make somewhat regular restarts beneficial https://issues.apache.org/jira/browse/CASSANDRA-2868 (memory leak in GCInspector, fixed in 0.7.9/0.8.5) https://issues.apache.org/jira/browse/CASSANDRA-2252 (heap fragmentation due to the way memtables used to be allocated, refactored in 1.0.0) Restarting daily is probably too frequent for either one of those problems. We usually notice degraded performance in our ancient cluster after ~2 weeks w/o a restart. As Aaron mentioned, if you have plenty of disk space, there's no reason to worry about cruft sstables. The size of your active set is what matters, and you can determine if that's getting too big by watching for iowait (due to reads from the data partition) and/or paging activity of the java process. When you hit that problem, the solution is to 1. try to tune your caches and 2. add more nodes to spread the load. I'll reiterate - looking at raw disk space usage should not be your guide for that. Forcing a gc generally works, but should not be relied upon (note suggest in http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#gc()). It's great news that 1.0 uses a better mechanism for releasing unused sstables. nodetool compact triggers a major compaction and is no longer a recommended by datastax (details here http://www.datastax.com/docs/1.0/operations/tuning#tuning-compactionbottom of the page). Hope this helps. Mike. On Wed, Jan 25, 2012 at 5:14 PM, aaron morton aa...@thelastpickle.com wrote: That disk usage pattern is to be expected in pre 1.0 versions. Disk usage is far less interesting than disk free space, if it's using 60 GB and there is 200GB thats ok. If it's using 60Gb and there is 6MB free thats a problem. In pre 1.0 the compacted files are deleted on disk by waiting for the JVM do decide to GC all remaining references. If there is not enough space (to store the total size of the files it is about to write or compact) on disk GC is forced and the files are deleted. Otherwise they will get deleted at some point in the future. In 1.0 files are reference counted
Re: Restart cassandra every X days?
Well, it seems it's balancing itself, 24 hours later the ring looks like this: ***.89datacenter1 rack1 Up Normal 7.36 GB 50.00% 0 ***.135datacenter1 rack1 Up Normal 8.84 GB 50.00% 85070591730234615865843651857942052864 Looks pretty normal, right? 2012/2/2 aaron morton aa...@thelastpickle.com Speaking technically, that ain't right. I would: * Check if node .135 is holding a lot of hints. * Take a look on disk and see what is there. * Go through a repair and compact on each node. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 2/02/2012, at 9:55 PM, R. Verlangen wrote: Yes, I already did a repair and cleanup. Currently my ring looks like this: Address DC RackStatus State Load OwnsToken ***.89datacenter1 rack1 Up Normal 2.44 GB 50.00% 0 ***.135datacenter1 rack1 Up Normal 6.99 GB 50.00% 85070591730234615865843651857942052864 It's not really a problem, but I'm still wondering why this happens. 2012/2/1 aaron morton aa...@thelastpickle.com Do you mean the load in nodetool ring is not even, despite the tokens been evenly distributed ? I would assume this is not the case given the difference, but it may be hints given you have just done an upgrade. Check the system using nodetool cfstats to see. They will eventually be delivered and deleted. More likely you will want to: 1) nodetool repair to make sure all data is distributed then 2) nodetool cleanup if you have changed the tokens at any point finally Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 31/01/2012, at 11:56 PM, R. Verlangen wrote: After running 3 days on Cassandra 1.0.7 it seems the problem has been solved. One weird thing remains, on our 2 nodes (both 50% of the ring), the first's usage is just over 25% of the second. Anyone got an explanation for that? 2012/1/29 aaron morton aa...@thelastpickle.com Yes but… For every upgrade read the NEWS.TXT it will go through the upgrade procedure in detail. If you want to feel extra smart scan through the CHANGES.txt to get an idea of whats going on. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 29/01/2012, at 4:14 AM, Maxim Potekhin wrote: Sorry if this has been covered, I was concentrating solely on 0.8x -- can I just d/l 1.0.x and continue using same data on same cluster? Maxim On 1/28/2012 7:53 AM, R. Verlangen wrote: Ok, seems that it's clear what I should do next ;-) 2012/1/28 aaron morton aa...@thelastpickle.com There are no blockers to upgrading to 1.0.X. A - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 28/01/2012, at 7:48 AM, R. Verlangen wrote: Ok. Seems that an upgrade might fix these problems. Is Cassandra 1.x.x stable enough to upgrade for, or should we wait for a couple of weeks? 2012/1/27 Edward Capriolo edlinuxg...@gmail.com I would not say that issuing restart after x days is a good idea. You are mostly developing a superstition. You should find the source of the problem. It could be jmx or thrift clients not closing connections. We don't restart nodes on a regiment they work fine. On Thursday, January 26, 2012, Mike Panchenko m...@mihasya.com wrote: There are two relevant bugs (that I know of), both resolved in somewhat recent versions, which make somewhat regular restarts beneficial https://issues.apache.org/jira/browse/CASSANDRA-2868 (memory leak in GCInspector, fixed in 0.7.9/0.8.5) https://issues.apache.org/jira/browse/CASSANDRA-2252 (heap fragmentation due to the way memtables used to be allocated, refactored in 1.0.0) Restarting daily is probably too frequent for either one of those problems. We usually notice degraded performance in our ancient cluster after ~2 weeks w/o a restart. As Aaron mentioned, if you have plenty of disk space, there's no reason to worry about cruft sstables. The size of your active set is what matters, and you can determine if that's getting too big by watching for iowait (due to reads from the data partition) and/or paging activity of the java process. When you hit that problem, the solution is to 1. try to tune your caches and 2. add more nodes to spread the load. I'll reiterate - looking at raw disk space usage should not be your guide for that. Forcing a gc generally works, but should not be relied upon (note suggest in http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#gc()). It's great news that 1.0 uses a better mechanism for releasing unused sstables. nodetool compact triggers a major compaction and is no longer a recommended by datastax (details here http://www.datastax.com/docs/1.0/operations/tuning#tuning-compactionbottom of the page). Hope
Re: Can you query Cassandra while it's doing major compaction
It will have a performance penalty, so it would be better to spread the compactions over a period of time. But Cassandra will still take care of any reads/writes (within the given timeout). 2012/2/3 myreasoner myreaso...@gmail.com If every node in the cluster is running major compaction, would it be able to answer any read request? And is it wise to write anything to a cluster while it's doing major compaction? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-you-query-Cassandra-while-it-s-doing-major-compaction-tp7249985p7249985.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Restart cassandra every X days?
After running 3 days on Cassandra 1.0.7 it seems the problem has been solved. One weird thing remains, on our 2 nodes (both 50% of the ring), the first's usage is just over 25% of the second. Anyone got an explanation for that? 2012/1/29 aaron morton aa...@thelastpickle.com Yes but… For every upgrade read the NEWS.TXT it will go through the upgrade procedure in detail. If you want to feel extra smart scan through the CHANGES.txt to get an idea of whats going on. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 29/01/2012, at 4:14 AM, Maxim Potekhin wrote: Sorry if this has been covered, I was concentrating solely on 0.8x -- can I just d/l 1.0.x and continue using same data on same cluster? Maxim On 1/28/2012 7:53 AM, R. Verlangen wrote: Ok, seems that it's clear what I should do next ;-) 2012/1/28 aaron morton aa...@thelastpickle.com There are no blockers to upgrading to 1.0.X. A - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 28/01/2012, at 7:48 AM, R. Verlangen wrote: Ok. Seems that an upgrade might fix these problems. Is Cassandra 1.x.x stable enough to upgrade for, or should we wait for a couple of weeks? 2012/1/27 Edward Capriolo edlinuxg...@gmail.com I would not say that issuing restart after x days is a good idea. You are mostly developing a superstition. You should find the source of the problem. It could be jmx or thrift clients not closing connections. We don't restart nodes on a regiment they work fine. On Thursday, January 26, 2012, Mike Panchenko m...@mihasya.com wrote: There are two relevant bugs (that I know of), both resolved in somewhat recent versions, which make somewhat regular restarts beneficial https://issues.apache.org/jira/browse/CASSANDRA-2868 (memory leak in GCInspector, fixed in 0.7.9/0.8.5) https://issues.apache.org/jira/browse/CASSANDRA-2252 (heap fragmentation due to the way memtables used to be allocated, refactored in 1.0.0) Restarting daily is probably too frequent for either one of those problems. We usually notice degraded performance in our ancient cluster after ~2 weeks w/o a restart. As Aaron mentioned, if you have plenty of disk space, there's no reason to worry about cruft sstables. The size of your active set is what matters, and you can determine if that's getting too big by watching for iowait (due to reads from the data partition) and/or paging activity of the java process. When you hit that problem, the solution is to 1. try to tune your caches and 2. add more nodes to spread the load. I'll reiterate - looking at raw disk space usage should not be your guide for that. Forcing a gc generally works, but should not be relied upon (note suggest in http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#gc()). It's great news that 1.0 uses a better mechanism for releasing unused sstables. nodetool compact triggers a major compaction and is no longer a recommended by datastax (details here http://www.datastax.com/docs/1.0/operations/tuning#tuning-compactionbottom of the page). Hope this helps. Mike. On Wed, Jan 25, 2012 at 5:14 PM, aaron morton aa...@thelastpickle.com wrote: That disk usage pattern is to be expected in pre 1.0 versions. Disk usage is far less interesting than disk free space, if it's using 60 GB and there is 200GB thats ok. If it's using 60Gb and there is 6MB free thats a problem. In pre 1.0 the compacted files are deleted on disk by waiting for the JVM do decide to GC all remaining references. If there is not enough space (to store the total size of the files it is about to write or compact) on disk GC is forced and the files are deleted. Otherwise they will get deleted at some point in the future. In 1.0 files are reference counted and space is freed much sooner. With regard to regular maintenance, node tool cleanup remvos data from a node that it is no longer a replica for. This is only of use when you have done a token move. I would not recommend a daily restart of the cassandra process. You will lose all the run time optimizations the JVM has made (i think the mapped files pages will stay resident). As well as adding additional entropy to the system which must be repaired via HH, RR or nodetool repair. If you want to see compacted files purged faster the best approach would be to upgrade to 1.0. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 26/01/2012, at 9:51 AM, R. Verlangen wrote: In his message he explains that it's for Forcing a GC . GC stands for garbage collection. For some more background see: http://en.wikipedia.org/wiki/Garbage_collection_(computer_science) Cheers! 2012/1/25 mike...@thomsonreuters.com Karl, Can you give a little more details on these 2 lines, what do they do? java -jar cmdline
Re: Any tools like phpMyAdmin to see data stored in Cassandra ?
You might run it from a VM? 2012/1/30 Ertio Lew ertio...@gmail.com On Mon, Jan 30, 2012 at 7:16 AM, Frisch, Michael michael.fri...@nuance.com wrote: OpsCenter? http://www.datastax.com/products/opscenter - Mike I have tried Sebastien's phpmyAdmin For Cassandrahttps://github.com/sebgiroux/Cassandra-Cluster-Admin to see the data stored in Cassandra in the same manner as phpMyAdmin allows. But since it makes assumptions about the datatypes of the column name/column value doesn't allow to configure the datatype data should be read as on per cf basis, I couldn't make the best use of it. Are there any similar other tools out there that can do the job better ? Thanks, that's a great product but unfortunately doesn't work with windows. Any tools for windows ?
Re: Restart cassandra every X days?
Ok, seems that it's clear what I should do next ;-) 2012/1/28 aaron morton aa...@thelastpickle.com There are no blockers to upgrading to 1.0.X. A - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 28/01/2012, at 7:48 AM, R. Verlangen wrote: Ok. Seems that an upgrade might fix these problems. Is Cassandra 1.x.x stable enough to upgrade for, or should we wait for a couple of weeks? 2012/1/27 Edward Capriolo edlinuxg...@gmail.com I would not say that issuing restart after x days is a good idea. You are mostly developing a superstition. You should find the source of the problem. It could be jmx or thrift clients not closing connections. We don't restart nodes on a regiment they work fine. On Thursday, January 26, 2012, Mike Panchenko m...@mihasya.com wrote: There are two relevant bugs (that I know of), both resolved in somewhat recent versions, which make somewhat regular restarts beneficial https://issues.apache.org/jira/browse/CASSANDRA-2868 (memory leak in GCInspector, fixed in 0.7.9/0.8.5) https://issues.apache.org/jira/browse/CASSANDRA-2252 (heap fragmentation due to the way memtables used to be allocated, refactored in 1.0.0) Restarting daily is probably too frequent for either one of those problems. We usually notice degraded performance in our ancient cluster after ~2 weeks w/o a restart. As Aaron mentioned, if you have plenty of disk space, there's no reason to worry about cruft sstables. The size of your active set is what matters, and you can determine if that's getting too big by watching for iowait (due to reads from the data partition) and/or paging activity of the java process. When you hit that problem, the solution is to 1. try to tune your caches and 2. add more nodes to spread the load. I'll reiterate - looking at raw disk space usage should not be your guide for that. Forcing a gc generally works, but should not be relied upon (note suggest in http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#gc()). It's great news that 1.0 uses a better mechanism for releasing unused sstables. nodetool compact triggers a major compaction and is no longer a recommended by datastax (details here http://www.datastax.com/docs/1.0/operations/tuning#tuning-compactionbottom of the page). Hope this helps. Mike. On Wed, Jan 25, 2012 at 5:14 PM, aaron morton aa...@thelastpickle.com wrote: That disk usage pattern is to be expected in pre 1.0 versions. Disk usage is far less interesting than disk free space, if it's using 60 GB and there is 200GB thats ok. If it's using 60Gb and there is 6MB free thats a problem. In pre 1.0 the compacted files are deleted on disk by waiting for the JVM do decide to GC all remaining references. If there is not enough space (to store the total size of the files it is about to write or compact) on disk GC is forced and the files are deleted. Otherwise they will get deleted at some point in the future. In 1.0 files are reference counted and space is freed much sooner. With regard to regular maintenance, node tool cleanup remvos data from a node that it is no longer a replica for. This is only of use when you have done a token move. I would not recommend a daily restart of the cassandra process. You will lose all the run time optimizations the JVM has made (i think the mapped files pages will stay resident). As well as adding additional entropy to the system which must be repaired via HH, RR or nodetool repair. If you want to see compacted files purged faster the best approach would be to upgrade to 1.0. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 26/01/2012, at 9:51 AM, R. Verlangen wrote: In his message he explains that it's for Forcing a GC . GC stands for garbage collection. For some more background see: http://en.wikipedia.org/wiki/Garbage_collection_(computer_science) Cheers! 2012/1/25 mike...@thomsonreuters.com Karl, Can you give a little more details on these 2 lines, what do they do? java -jar cmdline-jmxclient-0.10.3.jar - localhost:8080 java.lang:type=Memory gc Thank you, Mike -Original Message- From: Karl Hiramoto [mailto:k...@hiramoto.org] Sent: Wednesday, January 25, 2012 12:26 PM To: user@cassandra.apache.org Subject: Re: Restart cassandra every X days? On 01/25/12 19:18, R. Verlangen wrote: Ok thank you for your feedback. I'll add these tasks to our daily cassandra maintenance cronjob. Hopefully this will keep things under controll. I forgot to mention that we found that Forcing a GC also cleans up some space. in a cronjob you can do this with http://crawler.archive.org/cmdline-jmxclient/ my cron
Re: Restart cassandra every X days?
Ok. Seems that an upgrade might fix these problems. Is Cassandra 1.x.x stable enough to upgrade for, or should we wait for a couple of weeks? 2012/1/27 Edward Capriolo edlinuxg...@gmail.com I would not say that issuing restart after x days is a good idea. You are mostly developing a superstition. You should find the source of the problem. It could be jmx or thrift clients not closing connections. We don't restart nodes on a regiment they work fine. On Thursday, January 26, 2012, Mike Panchenko m...@mihasya.com wrote: There are two relevant bugs (that I know of), both resolved in somewhat recent versions, which make somewhat regular restarts beneficial https://issues.apache.org/jira/browse/CASSANDRA-2868 (memory leak in GCInspector, fixed in 0.7.9/0.8.5) https://issues.apache.org/jira/browse/CASSANDRA-2252 (heap fragmentation due to the way memtables used to be allocated, refactored in 1.0.0) Restarting daily is probably too frequent for either one of those problems. We usually notice degraded performance in our ancient cluster after ~2 weeks w/o a restart. As Aaron mentioned, if you have plenty of disk space, there's no reason to worry about cruft sstables. The size of your active set is what matters, and you can determine if that's getting too big by watching for iowait (due to reads from the data partition) and/or paging activity of the java process. When you hit that problem, the solution is to 1. try to tune your caches and 2. add more nodes to spread the load. I'll reiterate - looking at raw disk space usage should not be your guide for that. Forcing a gc generally works, but should not be relied upon (note suggest in http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#gc()). It's great news that 1.0 uses a better mechanism for releasing unused sstables. nodetool compact triggers a major compaction and is no longer a recommended by datastax (details here http://www.datastax.com/docs/1.0/operations/tuning#tuning-compactionbottom of the page). Hope this helps. Mike. On Wed, Jan 25, 2012 at 5:14 PM, aaron morton aa...@thelastpickle.com wrote: That disk usage pattern is to be expected in pre 1.0 versions. Disk usage is far less interesting than disk free space, if it's using 60 GB and there is 200GB thats ok. If it's using 60Gb and there is 6MB free thats a problem. In pre 1.0 the compacted files are deleted on disk by waiting for the JVM do decide to GC all remaining references. If there is not enough space (to store the total size of the files it is about to write or compact) on disk GC is forced and the files are deleted. Otherwise they will get deleted at some point in the future. In 1.0 files are reference counted and space is freed much sooner. With regard to regular maintenance, node tool cleanup remvos data from a node that it is no longer a replica for. This is only of use when you have done a token move. I would not recommend a daily restart of the cassandra process. You will lose all the run time optimizations the JVM has made (i think the mapped files pages will stay resident). As well as adding additional entropy to the system which must be repaired via HH, RR or nodetool repair. If you want to see compacted files purged faster the best approach would be to upgrade to 1.0. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 26/01/2012, at 9:51 AM, R. Verlangen wrote: In his message he explains that it's for Forcing a GC . GC stands for garbage collection. For some more background see: http://en.wikipedia.org/wiki/Garbage_collection_(computer_science) Cheers! 2012/1/25 mike...@thomsonreuters.com Karl, Can you give a little more details on these 2 lines, what do they do? java -jar cmdline-jmxclient-0.10.3.jar - localhost:8080 java.lang:type=Memory gc Thank you, Mike -Original Message- From: Karl Hiramoto [mailto:k...@hiramoto.org] Sent: Wednesday, January 25, 2012 12:26 PM To: user@cassandra.apache.org Subject: Re: Restart cassandra every X days? On 01/25/12 19:18, R. Verlangen wrote: Ok thank you for your feedback. I'll add these tasks to our daily cassandra maintenance cronjob. Hopefully this will keep things under controll. I forgot to mention that we found that Forcing a GC also cleans up some space. in a cronjob you can do this with http://crawler.archive.org/cmdline-jmxclient/ my cron
Re: How to create a table in Cassandra
A table is called a column family in Cassandra. From the CLI you can just create one by typing: create column family MyApplication; -- Forwarded message -- From: anandbab...@polarisft.com Date: Fri, Jan 27, 2012 at 2:36 PM Subject: How to create a table in Cassandra To: d...@cassandra.apache.org Can anyone tell me how to create a table in the Cassandra. I have installed it... and I am new to this... Thanks, Barnabas This e-Mail may contain proprietary and confidential information and is sent for the intended recipient(s) only. If by an addressing or transmission error this mail has been misdirected to you, you are requested to delete this mail immediately. You are also hereby notified that any use, any form of reproduction, dissemination, copying, disclosure, modification, distribution and/or publication of this e-mail message, contents or its attachment other than by its intended recipient/s is strictly prohibited. Visit us at http://www.polarisFT.com
Restart cassandra every X days?
Hi there, I'm currently running a 2-node cluster for some small projects that might need to scale-up in the future: that's why we chose Cassandra. The actual problem is that one of the node's harddrive usage keeps growing. For example: - after a fresh restart ~ 10GB - after a couple of days running ~ 60GB I know that Cassandra uses lots of diskspace but is this still normal? I'm running cassandra 0.8.7 Gr. Robin
Re: Restart cassandra every X days?
Ok thank you for your feedback. I'll add these tasks to our daily cassandra maintenance cronjob. Hopefully this will keep things under controll. 2012/1/25 Karl Hiramoto k...@hiramoto.org On 01/25/12 16:09, R. Verlangen wrote: Hi there, I'm currently running a 2-node cluster for some small projects that might need to scale-up in the future: that's why we chose Cassandra. The actual problem is that one of the node's harddrive usage keeps growing. For example: - after a fresh restart ~ 10GB - after a couple of days running ~ 60GB I know that Cassandra uses lots of diskspace but is this still normal? I'm running cassandra 0.8.7 I run 9 nodes with cassandra 0.7.8 and we see this same behaviour, but we keep it under control by doing the sequence: nodetool repair nodetool compact nodetool cleanup According to the 1.0.x changelog IIRC this disk usage is supposed to be improved. -- Karl
Re: Restart cassandra every X days?
Thanks for reminding. I'm going to start with adding the cleanup compact to the chain of maintenance tasks. In my opinion java should determine itselfs when to start a GC: doesn't feel natural to do this manually. 2012/1/25 Karl Hiramoto k...@hiramoto.org On 01/25/12 19:18, R. Verlangen wrote: Ok thank you for your feedback. I'll add these tasks to our daily cassandra maintenance cronjob. Hopefully this will keep things under controll. I forgot to mention that we found that Forcing a GC also cleans up some space. in a cronjob you can do this with http://crawler.archive.org/**cmdline-jmxclient/http://crawler.archive.org/cmdline-jmxclient/ my cronjob looks more like nodetool repair nodetool cleanup nodetool compact java -jar cmdline-jmxclient-0.10.3.jar - localhost:8080 java.lang:type=Memory gc -- Karl
Re: Restart cassandra every X days?
In his message he explains that it's for Forcing a GC . GC stands for garbage collection. For some more background see: http://en.wikipedia.org/wiki/Garbage_collection_(computer_science) Cheers! 2012/1/25 mike...@thomsonreuters.com Karl, Can you give a little more details on these 2 lines, what do they do? java -jar cmdline-jmxclient-0.10.3.jar - localhost:8080 java.lang:type=Memory gc Thank you, Mike -Original Message- From: Karl Hiramoto [mailto:k...@hiramoto.org] Sent: Wednesday, January 25, 2012 12:26 PM To: user@cassandra.apache.org Subject: Re: Restart cassandra every X days? On 01/25/12 19:18, R. Verlangen wrote: Ok thank you for your feedback. I'll add these tasks to our daily cassandra maintenance cronjob. Hopefully this will keep things under controll. I forgot to mention that we found that Forcing a GC also cleans up some space. in a cronjob you can do this with http://crawler.archive.org/cmdline-jmxclient/ my cronjob looks more like nodetool repair nodetool cleanup nodetool compact java -jar cmdline-jmxclient-0.10.3.jar - localhost:8080 java.lang:type=Memory gc -- Karl This email was sent to you by Thomson Reuters, the global news and information company. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Thomson Reuters.
Re: Tips for using OrderedPartitioner
If you would like to index your rows in an index-row, you could also choose for indexing the index-rows. This will scale up for any needs and create a tree structure. 2012/1/24 aaron morton aa...@thelastpickle.com Nothing I can thin of other than making the keys uniform. Having a single index row with the RP can be a pain. Is there a way to partition it ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 23/01/2012, at 11:42 PM, Tharindu Mathew wrote: Hi, We use Cassandra in a way we always want to range slice queries. Because, of the tendency to create hotspots with OrderedPartioner we decided to use RandomPartitioner. Then we would use, a row as an index row, holding values of the other row keys of the CF. I feel this has become a burden and would like to move to an OrderedPartioner to avoid this work around. The index row workaround which has become cumbersome when we query the data store. Is there any tips we can follow to allow for lesser amount of hot spots? -- Regards, Tharindu blog: http://mackiemathew.com/
Re: Enable thrift logging
Pick a custom loglevel and redirect them with the /etc/syslog.conf ? 2012/1/24 ruslan usifov ruslan.usi...@gmail.com Hello I try to log thrift log message (this need to us for solve communicate problem between Cassandra daemon and php client ), so in log4j-server.properties i write follow lines: log4j.logger.org.apache.thrift.transport=DEBUG,THRIFT log4j.appender.THRIFT=org.apache.log4j.RollingFileAppender log4j.appender.THRIFT.maxFileSize=20MB log4j.appender.THRIFT.maxBackupIndex=50 log4j.appender.THRIFT.layout=org.apache.log4j.PatternLayout log4j.appender.THRIFT.layout.ConversionPattern=%5p [%t] %d{ISO8601} %F (line %L) %m%n log4j.appender.THRIFT.File=/var/log/cassandra/8.0/thrift.log But no any messages in log in this case(but thay must be, i.e. Exception trace), if we enable DEBUG in rootLogger ie: log4j.rootLogger=DEBUG,stdout,R Thrift log messages appear in sytem.log as expected, but how can we separate them to separate log? PS: cassandra 0.8.9
Re: Data Model Question
A couple of days ago I came across Countandra ( http://countandra.org/ ). It seems that it might be a solution for you. Gr. Robin 2012/1/20 Tamar Fraenkel ta...@tok-media.com ** Hi! I am a newbie to Cassandra and seeking some advice regarding the data model I should use to best address my needs. For simplicity, what I want to accomplish is: I have a system that has users (potentially ~10,000 per day) and they perform actions in the system (total of ~50,000 a day). Each User’s action is taking place in a certain point in time, and is also classified into categories (1 to 5) and tagged by 1-30 tags. Each action’s Categories and Tags has a score associated with it, the score is between 0 to 1 (let’s assume precision of 0.0001). I want to be able to identify similar actions in the system (performed usually by more than one user). Similarity of actions is calculated based on their common Categories and Tags taking scores into account. I need the system to store: - The list of my users with attributes like name, age etc - For each action – the categories and tags associated with it and their score, the time of the action, and the user who performed it. - Groups of similar actions (ActionGroups) – the id’s of actions in the group, the categories and tags describing the group, with their scores. Those are calculated using an algorithm that takes into account the categories and tags of the actions in the group. When a user performs a new action in the system, I want to add it to a fitting ActionGroups (with similar categories and tags). For this I need to be able to perform the following: Find all the recent ActionGroups (those who were updated with actions performed during the last T minutes), who has at list one of the new action’s categories AND at list one of the new action’s tags. I thought of two ways to address the issue and I would appreciate your insights. First one using secondary indexes Column Family: *Users* Key: userId Compare with Bytes Type Columns: name: , age: etc… Column Family: *Actions* Key: actionId Compare with Bytes Type Columns: Category1 : Score …. CategoriN: Score, Tag1 : Score, …. TagK:Score Time: timestamp user: userId Column Family: *ActionGroups* Key: actionGroupId Compare with Bytes Type Columns: Category1 : Score …. CategoriN: Score, Tag1 : Score …. TagK:Score lastUpdateTime: timestamp actionId1: null, … , actionIdM: null I will then define secondary index on each tag columns, category columns, and the update time column. Let’s assume the new action I want to add to ActionGroup has NewActionCategory1 - NewActionCategoryK, and has NewActionTag1 – NewActionTagN. I will perform the following query: Select * From ActionGroups where (NewActionCategory1 0 … or NewActionCategoryK 0) and (NewActionTag1 0 … or NewActionTagN 0) and lastUpdateTime T; Second solution Have the same CF as in the first solution *without the secondary* *index*, and have two additional CF-ies: Column Family: *CategoriesToActionGroupId* Key: categoryId Compare with ByteType Columns: {Timestamp, ActionGroupsId1 } : null {Timestamp, ActionGroupsId2} : null ... *timestamp is the update time for the ActionGroup A similar CF will be defined for tags. I will then be able to run several queries on CategoriesToActionGroupId (one for each of the new story Categories), with column slice for the right update time of the ActionGroup. I will do the same for the TagsToActionGroupId. I will then use my client code to remove duplicates (ActionGroups who are associated with more than one Tag or Category). My questions are: 1. Are the two solutions viable? If yes, which is better 2. Is there any better way of doing this? 3. Can I use jdbc and CQL with both method, or do I have to use Hector (I am using Java). Thanks Tamar
Re: nodetool ring question
I will have a look very soon and if I find something I'll let you know. Thank you in advance! 2012/1/19 aaron morton aa...@thelastpickle.com Michael, Robin Let us know if the reported live load is increasing and diverging from the on disk size. If it is can you check nodetool cfstats and find an example of a particular CF where Space Used Live has diverged from the on disk size. The provide the schema for the CF and any other info that may be handy. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 18/01/2012, at 10:58 PM, Michael Vaknine wrote: I did restart the cluster and now it is normal 5GB. ** ** *From:* R. Verlangen [mailto:ro...@us2.nl] *Sent:* Wednesday, January 18, 2012 11:32 AM *To:* user@cassandra.apache.org *Subject:* Re: nodetool ring question ** ** I also have this problem. My data on nodes grows to roughly 30GB. After a restart only 5GB remains. Is a factor 6 common for Cassandra? 2012/1/18 aaron morton aa...@thelastpickle.com Good idea Jeremiah, are you using compression Michael ? ** ** Scanning through the CF stats this jumps out… ** ** Column Family: Attractions SSTable count: 3 Space used (live): 27542876685 Space used (total): 1213220387 Thats 25Gb of live data but only 1.3GB total. ** ** Otherwise want to see if a restart fixes it :) Would be interesting to know if it's wrong from the start or drifts during streaming or compaction. ** ** Cheers ** ** - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com ** ** On 18/01/2012, at 12:04 PM, Jeremiah Jordan wrote: There were some nodetool ring load reporting issues with early version of 1.0.X don't remember when they were fixed, but that could be your issue. Are you using compressed column families, a lot of the issues were with those. Might update to 1.0.7. -Jeremiah On 01/16/2012 04:04 AM, Michael Vaknine wrote: Hi, I have a 4 nodes cluster 1.0.3 version This is what I get when I run nodetool ring Address DC RackStatus State Load OwnsToken 127605887595351923798765477786913079296 10.8.193.87 datacenter1 rack1 Up Normal 46.47 GB 25.00% 0 10.5.7.76 datacenter1 rack1 Up Normal 48.01 GB 25.00% 42535295865117307932921825928971026432 10.8.189.197datacenter1 rack1 Up Normal 53.7 GB 25.00% 85070591730234615865843651857942052864 10.5.3.17 datacenter1 rack1 Up Normal 43.49 GB 25.00% 127605887595351923798765477786913079296 I have finished running repair on all 4 nodes. I have less then 10 GB on the /var/lib/cassandra/data/ folders My question is Why nodetool reports almost 50 GB on each node? Thanks Michael ** **
Re: nodetool ring question
I also have this problem. My data on nodes grows to roughly 30GB. After a restart only 5GB remains. Is a factor 6 common for Cassandra? 2012/1/18 aaron morton aa...@thelastpickle.com Good idea Jeremiah, are you using compression Michael ? Scanning through the CF stats this jumps out… Column Family: Attractions SSTable count: 3 Space used (live): 27542876685 Space used (total): 1213220387 Thats 25Gb of live data but only 1.3GB total. Otherwise want to see if a restart fixes it :) Would be interesting to know if it's wrong from the start or drifts during streaming or compaction. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 18/01/2012, at 12:04 PM, Jeremiah Jordan wrote: There were some nodetool ring load reporting issues with early version of 1.0.X don't remember when they were fixed, but that could be your issue. Are you using compressed column families, a lot of the issues were with those. Might update to 1.0.7. -Jeremiah On 01/16/2012 04:04 AM, Michael Vaknine wrote: Hi, ** ** I have a 4 nodes cluster 1.0.3 version ** ** This is what I get when I run nodetool ring ** ** Address DC RackStatus State Load OwnsToken 127605887595351923798765477786913079296 10.8.193.87 datacenter1 rack1 Up Normal 46.47 GB 25.00% 0 10.5.7.76 datacenter1 rack1 Up Normal 48.01 GB 25.00% 42535295865117307932921825928971026432 10.8.189.197datacenter1 rack1 Up Normal 53.7 GB 25.00% 85070591730234615865843651857942052864 10.5.3.17 datacenter1 rack1 Up Normal 43.49 GB 25.00% 127605887595351923798765477786913079296 ** ** I have finished running repair on all 4 nodes. ** ** I have less then 10 GB on the /var/lib/cassandra/data/ folders ** ** My question is Why nodetool reports almost 50 GB on each node? ** ** Thanks Michael
Re: Re: Schema clone ...
A null response is most of the times an exception, try to take a look at the Cassandra logs to find out what causes the problem. 2012/1/9 cbert...@libero.it cbert...@libero.it I was just trying it but ... in 0.7 CLI there is no show schema command. When I connect with 1.0 CLI to my 0.7 cluster ... [default@social] show schema; null I always get a null as answer! :-| Any tip for this? ty, Cheers Carlo Messaggio originale Da: aa...@thelastpickle.com Data: 09/01/2012 11.33 A: user@cassandra.apache.org, cbert...@libero.itcbert...@libero.it Ogg: Re: Schema clone ... Try show schema in the CLI. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 9/01/2012, at 11:12 PM, cbert...@libero.it wrote: Hi, I have create a new dev-cluster with cassandra 1.0 -- I would like to have the same CFs that I have in the 0.7 one but I don't need data to be there, just the schema. Which is the fastest way to do it without making 30 create column family ... Best regards, Carlo
Re: What is the future of supercolumns ?
My suggestion is simple: don't use any deprecated stuff out there. In practically any case there is a good reason why it's deprecated. I've seen a couple of composite-column vs supercolumn discussions in the past weeks here: I think a little bit of searching will get you around. Cheers 2012/1/7 Aklin_81 asdk...@gmail.com I read entire columns inside the supercolumns at any time but as for writing them, I write the columns at different times. I don't have the need to update them except that die after their TTL period of 60 days. But since they are going to be deprecated, I don't know if it would be really advisable to use them right now. I believe if it was possible to do wildchard querying for a list of column names then the supercolumns use cases may be easily replaced by normal columns. Could it practically possible, in future ? On Sat, Jan 7, 2012 at 8:05 AM, Terje Marthinussen tmarthinus...@gmail.com wrote: Please realize that I do not make any decisions here and I am not part of the core Cassandra developer team. What has been said before is that they will most likely go away and at least under the hood be replaced by composite columns. Jonathan have however stated that he would like the supercolumn API/abstraction to remain at least for backwards compatibility. Please understand that under the hood, supercolumns are merely groups of columns serialized as a single block of data. The fact that there is a specialized and hardcoded way to serialize these column groups into supercolumns is a problem however and they should probably go away to make space for a more generic implementation allowing more flexible data structures and less code specific for one special data structure. Today there are tons of extra code to deal with the slight difference in serialization and features of supercolumns vs columns and hopefully most of that could go away if things got structured a bit different. I also hope that we keep APIs to allow simple access to groups of key/value pairs to simplify application logic as working with just columns can add a lot of application code which should not be needed. If you almost always need all or mostly all of the columns in a supercolumn, and you normally update all of them at the same time, they will most likely be faster than normal columns. Processing wise, you will actually do a bit more work on serialization/deserialization of SC's but the I/O part will usually be better grouped/require less operations. I think we did some benchmarks on some heavy use cases with ~30 small columns per SC some time back and I think we ended up with SCs being 10-20% faster. Terje On Jan 5, 2012, at 2:37 PM, Aklin_81 wrote: I have seen supercolumns usage been discouraged most of the times. However sometimes the supercolumns seem to fit the scenario most appropriately not only in terms of how the data is stored but also in terms of how is it retrieved. Some of the queries supported by SCs are uniquely capable of doing the task which no other alternative schema could do.(Like recently I asked about getting the equivalent of retrieving a list of (full)supercolumns by name, through use of composite columns, unfortunately there was no way to do this without reading lots of extra columns). So I am really confused whether: 1. Should I really not use the supercolumns for any case at all, however appropriate, or I just need to be just careful while realizing that supercolumns fit my use case appropriately or what!? 2. Are there any performance concerns with supercolumns even in the cases where they are used most appropriately. Like when you need to retrieve the entire supercolumns everytime max. no of subcolumns vary between 0-10. (I don't write all the subcolumns inside supercolumn, at once though! Does this also matter?) 3. What is their future? Are they going to be deprecated or may be enhanced later?
Re: How to find out when a nodetool operation has ended?
The repair will continue even if you ctrl+c nodetool, it runs on the server not the client. Hmm, didn't know that. Maybe a tweak for the nodetool that just displays a message after starting: Started with ... and some kind of notication (with wall) when it's done? 2012/1/7 aaron morton aa...@thelastpickle.com The repair will continue even if you ctrl+c nodetool, it runs on the server not the client. Aside from using ops centre you can also look at TP Stats to see when there is nothing left in the AntiEntropyStage or look for a log messages from the StorageService that says… Repair command #{} completed successfully Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 7/01/2012, at 12:32 PM, Maxim Potekhin wrote: Thanks, so I take it there is no solution outside of Opcenter. I mean of course I can redirect the output, with additional timestamps if needed, to a log file -- which I can access remotely. I just thought there would be some status command by chance, to tell me what maintenance the node is doing. Too bad there is not! Maxim On 1/6/2012 5:40 PM, R. Verlangen wrote: You might consider: - installing DataStax OpsCenter ( http://www.datastax.com/products/opscenter ) - starting the repair in a linux screen (so you can attach to the screen from another location)
Re: java.lang.IllegalArgumentException occurred when creating a keyspcace with replication factor
Try this: create keyspace testkeyspace; update keyspace testkeyspace with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = {replication_factor:3}; Good luck! 2012/1/6 Sajith Kariyawasam saj...@gmail.com Hi all, I tried creating a keyspace with the replication factor 3, using cli interface ... in Cassandra 1.0.6 (earlier tried in 0.8.2 and failed too) But I'm getting an exception java.lang.IllegalArgumentException: No enum const class org.apache.cassandra.cli.CliClient$AddKeyspaceArgument.REPLICATION_FACTOR The command I used was [default@unknown] create keyspace testkeyspace with replication_factor=3; What has gone wrong ? Many thanks in advance -- Best Regards Sajith
Re: How to find out when a nodetool operation has ended?
You might consider: - installing DataStax OpsCenter ( http://www.datastax.com/products/opscenter ) - starting the repair in a linux screen (so you can attach to the screen from another location) I prefer the OpsCener. 2012/1/6 Maxim Potekhin potek...@bnl.gov Suppose I start a repair on one or a few nodes in my cluster, from an interactive machine in the office, and leave for the day (which is a very realistic scenario imho). Is there a way to know, from a remote machine, when a particular action, such as compaction or repair, has been finished? I figured that compaction stats can be mum at times, thus it's not a reliable indicator. Many thanks, Maxim