Can not find auto bootstrap property in cassandra.yaml for Cassandra 1.1.0
Dear all I am trying to add a new node to the Cassandra cluster. In all the documentations available on net it says to set the auto bootstrap property in cassandra.yaml to true but I am not finding the property in the file. Please help me Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.
Re: Can not find auto bootstrap property in cassandra.yaml for Cassandra 1.1.0
Hi, Me too met with that problem. You can safely skip that step for latest versions. I read somewhere that by default it is set to be true in latest versions. On Mon, Jun 4, 2012 at 11:28 AM, Prakrati Agrawal prakrati.agra...@mu-sigma.com wrote: Dear all ** ** I am trying to add a new node to the Cassandra cluster. In all the documentations available on net it says to set the auto bootstrap property in cassandra.yaml to true but I am not finding the property in the file. Please help me ** ** Thanks and Regards ** ** Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com ** ** -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- Pushpalanka Jayawardhana | Undergraduate | Computer Science and Engineering University of Moratuwa +94779716248 | http://pushpalankajaya.blogspot.com Twitter: http://twitter.com/Pushpalanka | Slideshare: http://www.slideshare.net/Pushpalanka
Re: Can not find auto bootstrap property in cassandra.yaml for Cassandra 1.1.0
Hi Prakrati, In 1.1.0 you don't need to set this, its by default. Im also on 1.1.0 and I didn't need to set this. Regards, Roshni From: Prakrati Agrawal prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Sun, 3 Jun 2012 22:58:24 -0700 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Can not find auto bootstrap property in cassandra.yaml for Cassandra 1.1.0 Dear all I am trying to add a new node to the Cassandra cluster. In all the documentations available on net it says to set the auto bootstrap property in cassandra.yaml to true but I am not finding the property in the file. Please help me Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***
RE: Can not find auto bootstrap property in cassandra.yaml for Cassandra 1.1.0
Hello, Auto bootstrap, as an attribute, is not present in latest versions of Cassandra (1.0 and later). You can add 'auto_bootstrap: true' or keep initial_token to be blank to make it happen. As I have noticed by default it remains true Regards Rishabh Agrawal From: Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com] Sent: Sunday, June 03, 2012 10:58 PM To: user@cassandra.apache.org Subject: Can not find auto bootstrap property in cassandra.yaml for Cassandra 1.1.0 Dear all I am trying to add a new node to the Cassandra cluster. In all the documentations available on net it says to set the auto bootstrap property in cassandra.yaml to true but I am not finding the property in the file. Please help me Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.comhttp://www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. Register for Impetus webinar 'User Experience Design for iPad Applications' June 8(10:00am PT). http://lf1.me/f9/ Impetus' Head of Labs to present on 'Integrating Big Data technologies in your IT portfolio' at Cloud Expo, NY (June 11-14). Contact us for a complimentary pass.Impetus also sponsoring the Yahoo Summit 2012. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Getting error on adding seed to add a new node
Dear all, I am trying to add a new node to my existing one node Cassandra. So I edited the seeds value in the cassandra.yaml and added the ip addresses of both the nodes. But its giving me the following error: ERROR 13:16:48,342 Fatal configuration error error while parsing a block mapping in reader, line 164, column 13: - seeds: 162.192.100.16,162.192 ... ^ expected block end, but found FlowEntry in reader, line 164, column 36: - seeds: 162.192.100.16,162.192.100.48 ^ Please help me. Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.
Re: Getting error on adding seed to add a new node
Hi, As it is said in cassandra.yaml, you need to define it as # Ex: ** ip1,ip2,ip3** whole list as one String. On Mon, Jun 4, 2012 at 1:17 PM, Prakrati Agrawal prakrati.agra...@mu-sigma.com wrote: Dear all, ** ** I am trying to add a new node to my existing one node Cassandra. So I edited the seeds value in the cassandra.yaml and added the ip addresses of both the nodes. But its giving me the following error: ** ** ERROR 13:16:48,342 Fatal configuration error error while parsing a block mapping in reader, line 164, column 13: - seeds: 162.192.100.16*,*162.192 ... ^ expected block end, but found FlowEntry in reader, line 164, column 36: - seeds: 162.192.100.16*,*162.192.100.48 ^ ** ** Please help me. Thanks and Regards ** ** Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com ** ** -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- Pushpalanka Jayawardhana | Undergraduate | Computer Science and Engineering University of Moratuwa +94779716248 | http://pushpalankajaya.blogspot.com Twitter: http://twitter.com/Pushpalanka | Slideshare: http://www.slideshare.net/Pushpalanka
Finding whether a new node is successfully added or not
Dear all, I added a new node to my 1 node Cassandra cluster. Now I want to find out whether it is added successfully or not. Also do I need to restart the already running node after entering the seed value. Please help me. Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.
Re: Finding whether a new node is successfully added or not
Hi there, You can check the ring info with nodetool. Furthermore you can take a look at the streaming statistics: lots of pending indicates a node that is still receiving data from it's seed(s). As far as I'm aware of the seed value will be read upon start: so a restart is required. Good luck. 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com Dear all, ** ** I added a new node to my 1 node Cassandra cluster. Now I want to find out whether it is added successfully or not. Also do I need to restart the already running node after entering the seed value. Please help me. ** ** Thanks and Regards ** ** Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com ** ** -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- With kind regards, Robin Verlangen *Software engineer* * * W www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
Re: Finding whether a new node is successfully added or not
Hi Prakrati, bin/nodetool -host ip ring Refer here at Cassandra wiki http://wiki.apache.org/cassandra/NodeTool for more details. A restart is needed as I know as node need to communicate with the seeds and make sense of the cluster it is in. On Mon, Jun 4, 2012 at 1:41 PM, Prakrati Agrawal prakrati.agra...@mu-sigma.com wrote: Dear all, ** ** I added a new node to my 1 node Cassandra cluster. Now I want to find out whether it is added successfully or not. Also do I need to restart the already running node after entering the seed value. Please help me. ** ** Thanks and Regards ** ** Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com ** ** -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- Pushpalanka Jayawardhana | Undergraduate | Computer Science and Engineering University of Moratuwa +94779716248 | http://pushpalankajaya.blogspot.com Twitter: http://twitter.com/Pushpalanka | Slideshare: http://www.slideshare.net/Pushpalanka
Re: row_cache_provider = 'SerializingCacheProvider'
Yes SerializingCacheProvider is the off heap caching provider. Can you do some more digging into what is using the heap ? Cheers A - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 1/06/2012, at 9:52 PM, ruslan usifov wrote: Hello I begin use SerializingCacheProvider for rows cashing, and got extremely JAVA heap grows. But i think that this cache provider doesn't use JAVA heap
Re: batch isolation
On Sun, Jun 3, 2012 at 6:05 PM, Todd Burruss bburr...@expedia.com wrote: I just meant there is a row delete in the same batch as inserts - all to the same column family and key Then it's the timestamp that will decide what happens. Whatever has a timestamp lower or equal to the tombstone timestamp will be deleted (that stands for insert in the batch itself). -- Sylvain -Original Message- From: Sylvain Lebresne [sylv...@datastax.com] Received: Sunday, 03 Jun 2012, 3:44am To: user@cassandra.apache.org [user@cassandra.apache.org] Subject: Re: batch isolation On Sun, Jun 3, 2012 at 2:53 AM, Todd Burruss bburr...@expedia.com wrote: 1 – does this mean that a batch_mutate that first sends a row delete mutation on key X, then subsequent insert mutations for key X is isolated? I'm not sure what you mean by having a batch_mutate that first sends ... then ..., since a batch_mutate is a single API call. 2 – does isolation span column families for the same key within the same batch_mutate? No, it doesn't span column families (contrarily to atomicity). There is more details in http://www.datastax.com/dev/blog/row-level-isolation. -- Sylvain
Adding a new node to Cassandra cluster
Dear all I successfully added a new node to my cluster so now it's a 2 node cluster. But how do I mention it in my Java code as when I am retrieving data its retrieving only for one node that I am specifying in the localhost. How do I specify more than one node in the localhost. Please help me Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.
Re: Adding a new node to Cassandra cluster
Hi there, When you speak to one node it will internally redirect the request to the proper node (local / external): but you won't be able to failover on a crash of the localhost. For adding another node to the connection pool you should take a look at the documentation of your java client. Good luck! 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com Dear all ** ** I successfully added a new node to my cluster so now it’s a 2 node cluster. But how do I mention it in my Java code as when I am retrieving data its retrieving only for one node that I am specifying in the localhost. How do I specify more than one node in the localhost. ** ** Please help me ** ** Thanks and Regards ** ** Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com ** ** -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- With kind regards, Robin Verlangen *Software engineer* * * W www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
RE: Adding a new node to Cassandra cluster
Hi, I am using Thrift API and I am not able to find anything on the internet about how to configure it for multiple nodes. I am not using any proper client like Hector. Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com From: R. Verlangen [mailto:ro...@us2.nl] Sent: Monday, June 04, 2012 2:44 PM To: user@cassandra.apache.org Subject: Re: Adding a new node to Cassandra cluster Hi there, When you speak to one node it will internally redirect the request to the proper node (local / external): but you won't be able to failover on a crash of the localhost. For adding another node to the connection pool you should take a look at the documentation of your java client. Good luck! 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com Dear all I successfully added a new node to my cluster so now it's a 2 node cluster. But how do I mention it in my Java code as when I am retrieving data its retrieving only for one node that I am specifying in the localhost. How do I specify more than one node in the localhost. Please help me Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.comhttp://www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- With kind regards, Robin Verlangen Software engineer W www.robinverlangen.nlhttp://www.robinverlangen.nl E ro...@us2.nlmailto:ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.
Re: Adding a new node to Cassandra cluster
You might consider using a higher level client (like Hector indeed). If you don't want this you will have to write your own connection pool. For start take a look at Hector. But keep in mind that you might be reinventing the wheel. 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com Hi, ** ** I am using Thrift API and I am not able to find anything on the internet about how to configure it for multiple nodes. I am not using any proper client like Hector. ** ** Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com ** ** *From:* R. Verlangen [mailto:ro...@us2.nl] *Sent:* Monday, June 04, 2012 2:44 PM *To:* user@cassandra.apache.org *Subject:* Re: Adding a new node to Cassandra cluster ** ** Hi there, ** ** When you speak to one node it will internally redirect the request to the proper node (local / external): but you won't be able to failover on a crash of the localhost. For adding another node to the connection pool you should take a look at the documentation of your java client. ** ** Good luck! ** ** 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com Dear all I successfully added a new node to my cluster so now it’s a 2 node cluster. But how do I mention it in my Java code as when I am retrieving data its retrieving only for one node that I am specifying in the localhost. How do I specify more than one node in the localhost. Please help me Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com ** ** -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. ** ** -- With kind regards, ** ** Robin Verlangen *Software engineer* ** ** W www.robinverlangen.nl E ro...@us2.nl ** ** Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. ** ** -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- With kind regards, Robin Verlangen *Software engineer* * * W www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
Query
Hi all, I wanted to know how to read and write data using cassandra API's . is there any link related to sample program . Regards Arshad
Re: Adding a new node to Cassandra cluster
Prakrati, I believe even though you would specify one node in your code, internally the request would be going to any – perhaps more than 1 node based on your replication factors consistency level settings. You can try this by connecting to one node and writing to it and then reading the same data from another node. You can see this replication happening via CLI as well. Regards, Roshni From: R. Verlangen ro...@us2.nlmailto:ro...@us2.nl Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Mon, 4 Jun 2012 02:30:40 -0700 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Adding a new node to Cassandra cluster You might consider using a higher level client (like Hector indeed). If you don't want this you will have to write your own connection pool. For start take a look at Hector. But keep in mind that you might be reinventing the wheel. 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com Hi, I am using Thrift API and I am not able to find anything on the internet about how to configure it for multiple nodes. I am not using any proper client like Hector. Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.comhttp://www.mu-sigma.com From: R. Verlangen [mailto:ro...@us2.nlmailto:ro...@us2.nl] Sent: Monday, June 04, 2012 2:44 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Adding a new node to Cassandra cluster Hi there, When you speak to one node it will internally redirect the request to the proper node (local / external): but you won't be able to failover on a crash of the localhost. For adding another node to the connection pool you should take a look at the documentation of your java client. Good luck! 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com Dear all I successfully added a new node to my cluster so now it’s a 2 node cluster. But how do I mention it in my Java code as when I am retrieving data its retrieving only for one node that I am specifying in the localhost. How do I specify more than one node in the localhost. Please help me Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.comhttp://www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- With kind regards, Robin Verlangen Software engineer W www.robinverlangen.nlhttp://www.robinverlangen.nl E ro...@us2.nlmailto:ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- With kind regards, Robin Verlangen Software engineer W www.robinverlangen.nlhttp://www.robinverlangen.nl E ro...@us2.nlmailto:ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use
RE: Query
If you are using Java try out Kundera or Hector, both are good and have good documentation available. From: MOHD ARSHAD SALEEM [mailto:marshadsal...@tataelxsi.co.in] Sent: Monday, June 04, 2012 2:37 AM To: user@cassandra.apache.org Subject: Query Hi all, I wanted to know how to read and write data using cassandra API's . is there any link related to sample program . Regards Arshad Register for Impetus webinar 'User Experience Design for iPad Applications' June 8(10:00am PT). http://lf1.me/f9/ Impetus' Head of Labs to present on 'Integrating Big Data technologies in your IT portfolio' at Cloud Expo, NY (June 11-14). Contact us for a complimentary pass.Impetus also sponsoring the Yahoo Summit 2012. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: Adding a new node to Cassandra cluster
If you use thrift API, you have to maintain lot of low level code by yourself which is already being polished by HLC hector, pycassa also with HLC your can easily switch between thrift and growing CQL. On Mon, Jun 4, 2012 at 3:00 PM, R. Verlangen ro...@us2.nl wrote: You might consider using a higher level client (like Hector indeed). If you don't want this you will have to write your own connection pool. For start take a look at Hector. But keep in mind that you might be reinventing the wheel. 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com Hi, ** ** I am using Thrift API and I am not able to find anything on the internet about how to configure it for multiple nodes. I am not using any proper client like Hector. ** ** Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com ** ** *From:* R. Verlangen [mailto:ro...@us2.nl] *Sent:* Monday, June 04, 2012 2:44 PM *To:* user@cassandra.apache.org *Subject:* Re: Adding a new node to Cassandra cluster ** ** Hi there, ** ** When you speak to one node it will internally redirect the request to the proper node (local / external): but you won't be able to failover on a crash of the localhost. For adding another node to the connection pool you should take a look at the documentation of your java client. ** ** Good luck! ** ** 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com Dear all I successfully added a new node to my cluster so now it’s a 2 node cluster. But how do I mention it in my Java code as when I am retrieving data its retrieving only for one node that I am specifying in the localhost. How do I specify more than one node in the localhost. Please help me Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com ** ** -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. ** ** -- With kind regards, ** ** Robin Verlangen *Software engineer* ** ** W www.robinverlangen.nl E ro...@us2.nl ** ** Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. ** ** -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- With kind regards, Robin Verlangen *Software engineer* * * W www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
Re: Query
Here is a link that will help you out if you use Kundera as high level client for Cassandra: https://github.com/impetus-opensource/Kundera/wiki/Getting-Started-in-5-minuteshttps://mail3.impetus.co.in/owa/redir.aspx?C=EkEr9x7W6ku6EW9m23CsVhgx2Di4Fc8Ixe8fyRCMSrCL8TOfNSadRVR_uY98wDCUO3S71gXRO0g.URL=https%3a%2f%2fgithub.com%2fimpetus-opensource%2fKundera%2fwiki%2fGetting-Started-in-5-minutes Regards, Amresh On Mon, Jun 4, 2012 at 3:09 PM, Rishabh Agrawal rishabh.agra...@impetus.co.in wrote: If you are using Java try out Kundera or Hector, both are good and have good documentation available. *From:* MOHD ARSHAD SALEEM [mailto:marshadsal...@tataelxsi.co.in] *Sent:* Monday, June 04, 2012 2:37 AM *To:* user@cassandra.apache.org *Subject:* Query Hi all, I wanted to know how to read and write data using cassandra API's . is there any link related to sample program . Regards Arshad -- Register for Impetus webinar ‘User Experience Design for iPad Applications’ June 8(10:00am PT). http://lf1.me/f9/ Impetus’ Head of Labs to present on ‘Integrating Big Data technologies in your IT portfolio’ at Cloud Expo, NY (June 11-14). Contact us for a complimentary pass.Impetus also sponsoring the Yahoo Summit 2012. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: Query
On Mon, Jun 4, 2012 at 7:36 PM, MOHD ARSHAD SALEEM marshadsal...@tataelxsi.co.in wrote: Hi all, I wanted to know how to read and write data using cassandra API's . is there any link related to sample program . I did a Proof of Concept using a python client -PyCassa ( https://github.com/pycassa/pycassa) which works well cheers Regards Arshad -- *Franc Carter* | Systems architect | Sirca Ltd marc.zianideferra...@sirca.org.au franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 9236 9118 Level 9, 80 Clarence St, Sydney NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215
RE: Adding a new node to Cassandra cluster
Ye I know I am trying to reinvent the wheel but I have to. The requirement is such that I have to use Java Thrift API without any client like Hector. Can you please tell me how do I do it. Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com From: samal [mailto:samalgo...@gmail.com] Sent: Monday, June 04, 2012 3:12 PM To: user@cassandra.apache.org Subject: Re: Adding a new node to Cassandra cluster If you use thrift API, you have to maintain lot of low level code by yourself which is already being polished by HLC hector, pycassa also with HLC your can easily switch between thrift and growing CQL. On Mon, Jun 4, 2012 at 3:00 PM, R. Verlangen ro...@us2.nlmailto:ro...@us2.nl wrote: You might consider using a higher level client (like Hector indeed). If you don't want this you will have to write your own connection pool. For start take a look at Hector. But keep in mind that you might be reinventing the wheel. 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com Hi, I am using Thrift API and I am not able to find anything on the internet about how to configure it for multiple nodes. I am not using any proper client like Hector. Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.comhttp://www.mu-sigma.com From: R. Verlangen [mailto:ro...@us2.nlmailto:ro...@us2.nl] Sent: Monday, June 04, 2012 2:44 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Adding a new node to Cassandra cluster Hi there, When you speak to one node it will internally redirect the request to the proper node (local / external): but you won't be able to failover on a crash of the localhost. For adding another node to the connection pool you should take a look at the documentation of your java client. Good luck! 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com Dear all I successfully added a new node to my cluster so now it’s a 2 node cluster. But how do I mention it in my Java code as when I am retrieving data its retrieving only for one node that I am specifying in the localhost. How do I specify more than one node in the localhost. Please help me Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.comhttp://www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- With kind regards, Robin Verlangen Software engineer W www.robinverlangen.nlhttp://www.robinverlangen.nl E ro...@us2.nlmailto:ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- With kind regards, Robin Verlangen Software engineer W www.robinverlangen.nlhttp://www.robinverlangen.nl E ro...@us2.nlmailto:ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the
Re: Adding a new node to Cassandra cluster
Connection pooling involves things like: - (transparent) failover / retry - disposal of connections after X messages - keep track of connections Again: take a look at the hector connection pool. Source: https://github.com/rantav/hector/tree/master/core/src/main/java/me/prettyprint/cassandra/connection 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com Ye I know I am trying to reinvent the wheel but I have to. The requirement is such that I have to use Java Thrift API without any client like Hector. Can you please tell me how do I do it. ** ** Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com ** ** *From:* samal [mailto:samalgo...@gmail.com] *Sent:* Monday, June 04, 2012 3:12 PM *To:* user@cassandra.apache.org *Subject:* Re: Adding a new node to Cassandra cluster ** ** If you use thrift API, you have to maintain lot of low level code by yourself which is already being polished by HLC hector, pycassa also with HLC your can easily switch between thrift and growing CQL. On Mon, Jun 4, 2012 at 3:00 PM, R. Verlangen ro...@us2.nl wrote: You might consider using a higher level client (like Hector indeed). If you don't want this you will have to write your own connection pool. For start take a look at Hector. But keep in mind that you might be reinventing the wheel. ** ** 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com Hi, I am using Thrift API and I am not able to find anything on the internet about how to configure it for multiple nodes. I am not using any proper client like Hector. Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com *From:* R. Verlangen [mailto:ro...@us2.nl] *Sent:* Monday, June 04, 2012 2:44 PM *To:* user@cassandra.apache.org *Subject:* Re: Adding a new node to Cassandra cluster Hi there, When you speak to one node it will internally redirect the request to the proper node (local / external): but you won't be able to failover on a crash of the localhost. For adding another node to the connection pool you should take a look at the documentation of your java client. Good luck! 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com Dear all I successfully added a new node to my cluster so now it’s a 2 node cluster. But how do I mention it in my Java code as when I am retrieving data its retrieving only for one node that I am specifying in the localhost. How do I specify more than one node in the localhost. Please help me Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- With kind regards, Robin Verlangen *Software engineer* W www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. ** ** -- This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet
Re: Retrieving old data version for a given row
*I was taking a look at tombstones stored at SSTable and I noticed that if I perform a key deletion, the tombstone doesn’t have any timestamp, he has this appearance: “key”:[ ] In all the other deletions granularities the tombstone have a timestamp.Without this information seems to be not possible to solve conflicts when a insertion for the same key is done after this deletion. If it happens, I think Cassandra will always delete this new information because of this tombstone. I’m using a single node configuration and maybe it change how does tombstones looks like. Thanks in advance.* * * Regards, Felipe Mathias Schmidt *(Computer Science UFRGS, RS, Brazil)* 2012/5/31 aaron morton aa...@thelastpickle.com -Is there any other way to stract the contect of SSTable, writing a java program for example instead of using sstable2json? Look at the code in sstale2json and copy it :) -I tried to get tombstons using the thrift API, but seems to be not possible, is it right? When I try, the program throws an exception. No. Tombstones are not returned from API (See ColumnFamilyStore.getColumnFamily() ). You can see them if you use sstable2json. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 30/05/2012, at 9:53 PM, Felipe Schmidt wrote: I have further questions: -Is there any other way to stract the contect of SSTable, writing a java program for example instead of using sstable2json? -I tried to get tombstons using the thrift API, but seems to be not possible, is it right? When I try, the program throws an exception. thanks in advance Regards, Felipe Mathias Schmidt (Computer Science UFRGS, RS, Brazil) 2012/5/24 aaron morton aa...@thelastpickle.com: Ok... it's really strange to me that Cassandra doesn't support data versioning cause all of other key-value databases support it (at least those who I know). You can design it into your data model if you need it. I have one remaining question: -in the case that I have more than 1 SSTable in the disk for the same column but with different data versions, is it possible to make a query to get the old version instead of the newest one? No. There is only ever 1 value for a column. The older copies of the column in the SSTables are artefacts of immutable on disk structures. If you want to see what's inside an SSTable use bin/sstable2json Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/05/2012, at 9:42 PM, Felipe Schmidt wrote: Ok... it's really strange to me that Cassandra doesn't support data versioning cause all of other key-value databases support it (at least those who I know). I have one remaining question: -in the case that I have more than 1 SSTable in the disk for the same column but with different data versions, is it possible to make a query to get the old version instead of the newest one? Regards, Felipe Mathias Schmidt (Computer Science UFRGS, RS, Brazil) 2012/5/16 Dave Brosius dbros...@mebigfatguy.com: You're in for a world of hurt going down that rabbit hole. If you truely want version data then you should think about changing your keying to perhaps be a composite key where key is of form NaturalKey/VersionId Or if you want the versioning at the column level, use composite columns with ColumnName/VersionId format On 05/16/2012 10:16 AM, Felipe Schmidt wrote: That was very helpfull, thank you very much! I still have some questions: -it is possible to make Cassandra keep old value data after flushing? The same question for the memTable, before flushing. Seems to me that when I update some tuple, the old data will be overwrited in memTable, even before flushing. -it is possible to scan values from the memtable, maybe using the so-called Thrift API? Using the client-api I can just see the newest data version, I can't see what's really happening with the memTable. I ask that cause what I'll try to do is a Change Data Capture to Cassandra and the answers will define what kind of aproaches I'm able to use. Thanks in advance. Regards, Felipe Mathias Schmidt (Computer Science UFRGS, RS, Brazil) 2012/5/14 aaron mortonaa...@thelastpickle.com: Cassandra does not provide access to multiple versions of the same column. It is essentially implementation detail. All mutations are written to the commit log in a binary format, see the o.a.c.db.RowMutation.getSerializedBuffer() (If you want to tail it for analysis you may want to change commitlog_sync in cassandra.yaml) Here is post about looking at multiple versions columns in an sstable http://thelastpickle.com/2011/05/15/Deletes-and-Tombstones/ Remember that not all versions of a column are written to disk (see http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/). Also
Re: row_cache_provider = 'SerializingCacheProvider'
I have setup 5GB of JavaHeap wit follow tuning: MAX_HEAP_SIZE=5G HEAP_NEWSIZE=800M JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=5 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=65 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly JVM_OPTS=$JVM_OPTS -XX:CMSFullGCsBeforeCompaction=1 Also I set up 2GB to memtables (memtable_total_space_in_mb: 2048) My avg heap usage (nodetool -h localhost info): 3G Based on nodetool -h localhost cfhistograms i calc avg row size 70KB I setup row cache only for one CF with follow settings: update column family building with rows_cached=1 and row_cache_provider='SerializingCacheProvider'; When i setup row cache i got promotion failure in GC (with stop the world pause about 30secs) with almost HEAP filled. I very confused with this behavior. PS: i use cassandra 1.0.10, with JNA 3.4.0 on ubuntu lucid (kernel 2.6.32-41) 2012/6/4 aaron morton aa...@thelastpickle.com: Yes SerializingCacheProvider is the off heap caching provider. Can you do some more digging into what is using the heap ? Cheers A - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 1/06/2012, at 9:52 PM, ruslan usifov wrote: Hello I begin use SerializingCacheProvider for rows cashing, and got extremely JAVA heap grows. But i think that this cache provider doesn't use JAVA heap
Which client to use for Cassandra real time insertion and retrieval
Dear all, I am trying to explore Cassandra for real time applications. Can you please suggest me which client is the best to use ? Is the client choice based on the user 's comfort level or on use cases. Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.
RPM of Cassandra 1.1.0
Hi, I need to install Apache Cassandra 1.1.0 from RPM. Please provide me link to download rpm for CentOS. Thanks Regards Adeel Akbar
repair
Hi! I apologize if for this naive question. When I run nodetool repair, is it enough to run on one of the nodes, or do I need to run on each one of them? Thanks *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 tokLogo.png
RE: repair
Hello, As far as my knowledge goes, it works per node basis. So you have to run on different nodes. I would suggest you to not to execute it simultaneously on all nodes in a production environment. Regards Rishabh Agrawal From: Tamar Fraenkel [mailto:ta...@tok-media.com] Sent: Monday, June 04, 2012 4:25 AM To: user@cassandra.apache.org Subject: repair Hi! I apologize if for this naive question. When I run nodetool repair, is it enough to run on one of the nodes, or do I need to run on each one of them? Thanks Tamar Fraenkel Senior Software Engineer, TOK Media [Inline image 1] ta...@tok-media.commailto:ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 Register for Impetus webinar 'User Experience Design for iPad Applications' June 8(10:00am PT). http://lf1.me/f9/ Impetus' Head of Labs to present on 'Integrating Big Data technologies in your IT portfolio' at Cloud Expo, NY (June 11-14). Contact us for a complimentary pass.Impetus also sponsoring the Yahoo Summit 2012. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference. inline: image001.png
RE repair
Hi, It is not enough to run the repair in one node, except if the node contain all the data (ex : 3 node cluster with RF=3). In the general case, the best is to launch the repair in every node, with the -rp option (use -rp to repair only the first range returned by the partitioner) Tamar Fraenkel ta...@tok-media.com 04/06/2012 13:24 Veuillez répondre à user@cassandra.apache.org A user@cassandra.apache.org cc Objet repair Hi! I apologize if for this naive question. When I run nodetool repair, is it enough to run on one of the nodes, or do I need to run on each one of them? Thanks Tamar Fraenkel Senior Software Engineer, TOK Media ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956
RE RPM of Cassandra 1.1.0
Hi, The RPM from datastax : http://rpm.datastax.com/community/noarch/ apache-cassandra11-1.1.0-2.noarch.rpm Regards, Samuel Adeel Akbar adeel.ak...@panasiangroup.com 04/06/2012 13:20 Veuillez répondre à user@cassandra.apache.org A user@cassandra.apache.org cc Objet RPM of Cassandra 1.1.0 Hi, I need to install Apache Cassandra 1.1.0 from RPM. Please provide me link to download rpm for CentOS. Thanks Regards Adeel Akbar
Re: repair
Thanks. I actually did just that with cron jobs running on different hours. I asked the question because I saw that when one of the logs was running the repair, all nodes logged some repair related entries in /var/log/ cassandra/system.log Thanks again, *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Mon, Jun 4, 2012 at 2:35 PM, Rishabh Agrawal rishabh.agra...@impetus.co.in wrote: Hello, As far as my knowledge goes, it works per node basis. So you have to run on different nodes. I would suggest you to not to execute it simultaneously on all nodes in a production environment. Regards Rishabh Agrawal *From:* Tamar Fraenkel [mailto:ta...@tok-media.com] *Sent:* Monday, June 04, 2012 4:25 AM *To:* user@cassandra.apache.org *Subject:* repair Hi! I apologize if for this naive question. When I run nodetool repair, is it enough to run on one of the nodes, or do I need to run on each one of them? Thanks *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 -- Register for Impetus webinar ‘User Experience Design for iPad Applications’ June 8(10:00am PT). http://lf1.me/f9/ Impetus’ Head of Labs to present on ‘Integrating Big Data technologies in your IT portfolio’ at Cloud Expo, NY (June 11-14). Contact us for a complimentary pass.Impetus also sponsoring the Yahoo Summit 2012. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference. tokLogo.pngimage001.png
Re: repair
Run repair -pr in your cron. Tamar Fraenkel ta...@tok-media.com a écrit sur 04/06/2012 13:44:32 : Thanks. I actually did just that with cron jobs running on different hours. I asked the question because I saw that when one of the logs was running the repair, all nodes logged some repair related entries in /var/log/cassandra/system.log Thanks again, Tamar Fraenkel Senior Software Engineer, TOK Media
Which client to use for Cassandra real time insertion and retrieval
I'm assuming you are looking for a java client. From my own experience, Hector is a good client, that can be used in real time applications (it supports connexion pooling and automatic retries). But I would suggest to have a look at astyanax from netflix (https://github.com/Netflix/astyanax). I didn't have the opportunity to use it, but it seems VERY good. Regards, Samuel
Re: repair
The repair -pr only repairs the nodes primary range: so is only usefull in day to day use. When you're recovering from a crash use it without -pr. 2012/6/4 Romain HARDOUIN romain.hardo...@urssaf.fr Run repair -pr in your cron. Tamar Fraenkel ta...@tok-media.com a écrit sur 04/06/2012 13:44:32 : Thanks. I actually did just that with cron jobs running on different hours. I asked the question because I saw that when one of the logs was running the repair, all nodes logged some repair related entries in /var/log/cassandra/system.log Thanks again, Tamar Fraenkel Senior Software Engineer, TOK Media -- With kind regards, Robin Verlangen *Software engineer* * * W www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
Re: repair
Thank you all! *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Mon, Jun 4, 2012 at 3:16 PM, R. Verlangen ro...@us2.nl wrote: The repair -pr only repairs the nodes primary range: so is only usefull in day to day use. When you're recovering from a crash use it without -pr. 2012/6/4 Romain HARDOUIN romain.hardo...@urssaf.fr Run repair -pr in your cron. Tamar Fraenkel ta...@tok-media.com a écrit sur 04/06/2012 13:44:32 : Thanks. I actually did just that with cron jobs running on different hours. I asked the question because I saw that when one of the logs was running the repair, all nodes logged some repair related entries in /var/log/cassandra/system.log Thanks again, Tamar Fraenkel Senior Software Engineer, TOK Media -- With kind regards, Robin Verlangen *Software engineer* * * W www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. tokLogo.png
Re: Integration Testing for Cassandra
That article is a good starting point. To make your life a bit easier, consider checking out CassandraUnit that provides facilities to load example data in a variety of ways. https://github.com/jsevellec/cassandra-unit Then you just need to be able to pass in which cassandra instance to connect to before you execute your code (embedded versus external environment). On Mon, Jun 4, 2012 at 12:10 AM, Eran Chinthaka Withana eran.chinth...@gmail.com wrote: Hi, I want to write integration tests related to my cassandra code where instead of accessing production clusters I should be able to start an embedded cassandra instance, within my unit test code, populate some data and run test cases. I found this[1] as the closest to what I'm looking for (I prefer to use thrift API so didn't even think about using storage proxy API). I'm using Hector 1.0.x as my client to connect my cassandra 1.0.x clusters. Before I go ahead and use it, is this the recommended way to test Cassandra related client code? Are there any test utils already in Cassandra code base? I really appreciate if someone can shed some light here. [1] http://prettyprint.me/2010/02/14/running-cassandra-as-an-embedded-service/ Thanks, Eran Chinthaka Withana
RE: repair
Why without -PR when recovering from crash? Repair without -PR runs full repair of the cluster, the node which receives a command is a repair controller, ALL nodes synchronizes replicas at the same time, streaming data between each other. The problems may arise: · When streaming hangs (it tends to hang even on a stable network), repair session hangs (any version does re-stream?) · Network will be highly saturated · In case of high inconsistency some nodes may receive a lot of data, disk usage much more than 2x (depends on RF) · A lot of compactions will be pending IMO, best way to run repair is from script with -PR for single CF from single node at a time and monitoring progress, like: repair -pr node1 ks1 cf1 repair -pr node2 ks1 cf1 repair -pr node3 ks1 cf1 repair -pr node1 ks1 cf2 repair -pr node2 ks1 cf2 repair -pr node3 ks1 cf2 With some progress or other control in between, your choice. Use repair with care, do not let your cluster go down. Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider What is Adform: watch this short videohttp://vimeo.com/adform/display [Adform News] http://www.adform.com Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. From: R. Verlangen [mailto:ro...@us2.nl] Sent: Monday, June 04, 2012 15:17 To: user@cassandra.apache.org Subject: Re: repair The repair -pr only repairs the nodes primary range: so is only usefull in day to day use. When you're recovering from a crash use it without -pr. 2012/6/4 Romain HARDOUIN romain.hardo...@urssaf.frmailto:romain.hardo...@urssaf.fr Run repair -pr in your cron. Tamar Fraenkel ta...@tok-media.commailto:ta...@tok-media.com a écrit sur 04/06/2012 13:44:32 : Thanks. I actually did just that with cron jobs running on different hours. I asked the question because I saw that when one of the logs was running the repair, all nodes logged some repair related entries in /var/log/cassandra/system.log Thanks again, Tamar Fraenkel Senior Software Engineer, TOK Media -- With kind regards, Robin Verlangen Software engineer W www.robinverlangen.nlhttp://www.robinverlangen.nl E ro...@us2.nlmailto:ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. inline: signature-logo29.png
Re: repair
Thanks, one more question. On regular basis, should I run repair for the system keyspace? *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Mon, Jun 4, 2012 at 5:02 PM, Viktor Jevdokimov viktor.jevdoki...@adform.com wrote: Why without –PR when recovering from crash? ** ** Repair without –PR runs full repair of the cluster, the node which receives a command is a repair controller, ALL nodes synchronizesreplicas at the same time, streaming data between each other. The problems may arise: **· **When streaming hangs (it tends to hang even on a stable network), repair session hangs (any version does re-stream?) **· **Network will be highly saturated **· **In case of high inconsistency some nodes may receive a lot of data, disk usage much more than 2x (depends on RF) **· **A lot of compactions will be pending ** ** IMO, best way to run repair is from script with –PR for single CF from single node at a time and monitoring progress, like: repair -pr node1 ks1 cf1 repair -pr node2 ks1 cf1 repair -pr node3 ks1 cf1 repair -pr node1 ks1 cf2 repair -pr node2 ks1 cf2 repair -pr node3 ks1 cf2 With some progress or other control in between, your choice. ** ** Use repair with care, do not let your cluster go down. ** ** ** ** ** ** Best regards / Pagarbiai *Viktor Jevdokimov* Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider What is Adform: watch this short video http://vimeo.com/adform/display [image: Adform News] http://www.adform.com Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. *From:* R. Verlangen [mailto:ro...@us2.nl] *Sent:* Monday, June 04, 2012 15:17 *To:* user@cassandra.apache.org *Subject:* Re: repair ** ** The repair -pr only repairs the nodes primary range: so is only usefullin day to day use. When you're recovering from a crash use it without - pr. 2012/6/4 Romain HARDOUIN romain.hardo...@urssaf.fr Run repair -pr in your cron. Tamar Fraenkel ta...@tok-media.com a écrit sur 04/06/2012 13:44:32 : Thanks. I actually did just that with cron jobs running on different hours. I asked the question because I saw that when one of the logs was running the repair, all nodes logged some repair related entries in /var/log/cassandra/system.log Thanks again, Tamar Fraenkel Senior Software Engineer, TOK Media ** ** -- With kind regards, ** ** Robin Verlangen *Software engineer* ** ** W www.robinverlangen.nl E ro...@us2.nl ** ** Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. ** ** signature-logo29.pngtokLogo.png
Replication factor via hector
Hi , I'm trying to see the effect of different replication factors and consistency levels for a keyspace on a 4 node cassandra cluster. I'm doing this using hector client. I could not find an api to set replication factor for a keyspace though I could find ways to modify consistency level. Is it possible to change replication factor using hector or does it have to be done using CLI? Regards, Roshni This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential ***
Re: batch isolation
I don't think I'm being clear. I just was wondering if a row delete is isolated with all the other inserts or deletes to a specific column family and key in the same batch. On 6/4/12 1:58 AM, Sylvain Lebresne sylv...@datastax.com wrote: On Sun, Jun 3, 2012 at 6:05 PM, Todd Burruss bburr...@expedia.com wrote: I just meant there is a row delete in the same batch as inserts - all to the same column family and key Then it's the timestamp that will decide what happens. Whatever has a timestamp lower or equal to the tombstone timestamp will be deleted (that stands for insert in the batch itself). -- Sylvain -Original Message- From: Sylvain Lebresne [sylv...@datastax.com] Received: Sunday, 03 Jun 2012, 3:44am To: user@cassandra.apache.org [user@cassandra.apache.org] Subject: Re: batch isolation On Sun, Jun 3, 2012 at 2:53 AM, Todd Burruss bburr...@expedia.com wrote: 1 does this mean that a batch_mutate that first sends a row delete mutation on key X, then subsequent insert mutations for key X is isolated? I'm not sure what you mean by having a batch_mutate that first sends ... then ..., since a batch_mutate is a single API call. 2 does isolation span column families for the same key within the same batch_mutate? No, it doesn't span column families (contrarily to atomicity). There is more details in http://www.datastax.com/dev/blog/row-level-isolation. -- Sylvain
Re: 1.1 not removing commit log files?
Apply the local hint mutation follows the same code path and regular mutations. When the commit log is being truncated you should see flush activity, logged from the ColumnFamilyStore with Enqueuing flush of messages. If you set DEBUG logging for the org.apache.cassandra.db.ColumnFamilyStore it will log if it things the CF is clean and no flush takes place. If you set DEBUG logging on org.apache.cassandra.db.commitlog.CommitLog we will see if the commit log file could not be deleted because a dirty CF was not flushed. Cheers A - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 2/06/2012, at 4:43 AM, Rob Coli wrote: On Thu, May 31, 2012 at 7:01 PM, aaron morton aa...@thelastpickle.com wrote: But that talks about segments not being cleared at startup. Does not explain why they were allowed to get past the limit in the first place. Perhaps the commit log size tracking for this limit does not, for some reason, track hints? This seems like the obvious answer given the state which appears to trigger it? This doesn't explain why the files aren't getting deleted after the hints are delivered, of course... =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Re: Secondary Indexes, Quorum and Cluster Availability
IIRC index slices work a little differently with consistency, they need to have CL level nodes available for all token ranges. If you drop it to CL ONE the read is local only for a particular token range. The problem when doing index reads is the nodes that contain the results can no longer be selected by the partitioner. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 2/06/2012, at 5:15 AM, Jim Ancona wrote: Hi, We have an application with two code paths, one of which uses a secondary index query and the other, which doesn't. While testing node down scenarios in our cluster we got a result which surprised (and concerned) me, and I wanted to find out if the behavior we observed is expected. Background: 6 nodes in the cluster (in order: A, B, C, E, F and G) RF = 3 All operations at QUORUM Operation 1: Read by row key followed by write Operation 2: Read by secondary index, followed by write While running a mixed workload of operations 1 and 2, we got the following results: Scenario Result All nodes up All operations succeed One node down All operations succeed Nodes A and E down All operations succeed Nodes A and B down Operation 1: ~33% fail Operation 2: All fail Nodes A and C down Operation 1: ~17% fail Operation 2: All fail We had expected (perhaps incorrectly) that the secondary index reads would fail in proportion to the portion of the ring that was unable to reach quorum, just as the row key reads did. For both operation types the underlying failure was an UnavailableException. The same pattern repeated for the other scenarios we tried. The row key operations failed at the expected ratios, given the portion of the ring that was unable to meet quorum because of nodes down, while all the secondary index reads failed as soon as 2 out of any 3 adjacent nodes were down. Is this an expected behavior? Is it documented anywhere? I didn't find it with a quick search. The operation doing secondary index query is an important one for our app, and we'd really prefer that it degrade gracefully in the face of cluster failures. My plan at this point is to do that query at ConsistencyLevel.ONE (and accept the increased risk of inconsistency). Will that work? Thanks in advance, Jim
Re: Can't delete from SCF wide row
Delete is a no look write operation, like normal writes. So it should not be directly causing a lot of memory allocation. It may be causing a lot of compaction activity, which due to the wide row may be throwing up lots of GC. Try the following to get through the deletions: * disable compaction by setting min_compaction_level and max_compaction_level to 0 (via nodetool on current versions) Once you have finished compaction * lower the in_memory_compaction_limit in the yaml. * set concurrent_compactions to 2 in the yaml * enable compaction again Once everything has settled down restore the in_memory_compaction_limit and concurrent_compactions Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 2/06/2012, at 7:53 AM, Rustam Aliyev wrote: Hi all, I have SCF with ~250K rows. One of these rows is relatively large - it's a wide row (according to compaction logs) containing ~100.000 super columns and overall size of 1GB. Each super column has average size of 10K and ~10 sub columns. When I'm trying to delete ~90% of the columns in this particular row, Cassandra nodes which own this wide row (3 of 5, RF=3) quickly run out of the heap space. See logs from one of the hosts here: http://pastebin.com/raw.php?i=kwn7b3rP After that, all 3 nodes start flapping up/down and GC messages (like the one in the bottom of the pastebin above) appearing in the logs. Cassandra never repairs from this mode and the only way out if to kill -9 and start again. On IRC it was suggested that it enters GC death spiral. I tried to throttle delete requests on the client side - sending batch of 100 delete requests each 500ms. So no more than 200 deletes/sec. But it didn't help. I can reduce it further to 100/sec, but I don't think it will help much. I delete millions of columns from other row in this SCF at the same rate and never have hit this problem. It only happens when I try to delete from this particular wide row. So right now I don't know how can I delete these columns. Any ideas? Many thanks, Rustam.
Re: TimedOutException()
Is the node we are connecting to try to proxy requests ? Wouldn't our configuration ensure all nodes have replicas ? It can still time out even when reading locally. (The thread running the query is waiting on the read thread). Look in the server side logs to see if there are any errors. If you are getting a timeout in this situation I would guess either the node is heavily overloaded or you are asking for a lot of data from a wide row. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 2/06/2012, at 11:00 AM, Oleg Dulin wrote: Tyler Hobbs ty...@datastax.com wrote: On Fri, Jun 1, 2012 at 9:39 AM, Oleg Dulin oleg.du...@gmail.com wrote: Is my understanding correct that this is where cassandra is telling us it can't accomplish something within that timeout value -- as opposed to network timeout ? Where is it set ? That's correct. Basically, the coordinator sees that a replica has not responded (or can not respond) before hitting a timeout. This is controlled by rpc_timeout_in_ms in cassandra.yaml. -- Tyler Hobbs DataStax a href=http://datastax.com/;http://datastax.com//a So if we are using random partitioner, and read consistency of one, what does that mean ? We have a 3 node cluster, use write / read consistency of one, replication factor of 3. Is the node we are connecting to try to proxy requests ? Wouldn't our configuration ensure all nodes have replicas ?
Re: Errors with Cassandra 1.0.10, 1.1.0, 1.1.1-SNAPSHOT and 1.2.0-SNAPSHOT
I remember someone have the file exists issue a few weeks ago, IIRC it magically went away. Do yo have steps to reproduce this fault ? If you can reproduce it on a release version please create a ticket on https://issues.apache.org/jira/browse/CASSANDRA and update the email thread. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 3/06/2012, at 2:56 PM, Horacio G. de Oro wrote: Permissions are ok. The writes works ok, and the data can be read. Thanks! Horacio On Sat, Jun 2, 2012 at 11:50 PM, Kirk True k...@mustardgrain.com wrote: Permissions problems on /var for the user running Cassandra? Sent from my iPhone On Jun 2, 2012, at 6:56 PM, Horacio G. de Oro hgde...@gmail.com wrote: Hi! While using Cassandra, I've seen this log messages when running some test cases (which insert lots of columns in 4 rows). I've tryied Cassandra 1.0.10, 1.1.0, 1.1.1-SNAPSHOT and 1.2.0-SNAPSHOT (built from git). I'm using the default configuration, Oracle jdk 1.6.0_32, Ubuntu 12.04 and pycassa. Since I'm very new to Cassandra (I'm just starting to learn it) I don't know if I'm doing something wrong, or maybe there are some bugs in the several versions of Cassandra I've tested. cassandra-1.0.10 - IOException: unable to mkdirs cassandra-1.1.0 - IOException: Unable to create directory cassandra-1.1.1-SNAPSHOT - IOException: Unable to create directory cassandra-1.2.0-SNAPSHOT - IOException: Unable to create directory - CLibrary.java (line 191) Unable to create hard link (...) command output: ln: failed to create hard link `(...)/lolog_tests-Logs_by_app-ia-3-Summary.db': File exists Thanks in advance! Horacio system-cassandra-1.0.10.log ERROR [MutationStage:1] 2012-06-02 20:37:41,115 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MutationStage:1,5,main] java.io.IOError: java.io.IOException: unable to mkdirs /var/lib/cassandra/data/lolog_tests/snapshots/1338680261112-Logs_by_app at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1433) at org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1462) at org.apache.cassandra.db.ColumnFamilyStore.truncate(ColumnFamilyStore.java:1657) at org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandler.java:50) ERROR [MutationStage:11] 2012-06-02 20:37:55,730 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MutationStage:11,5,main] java.io.IOError: java.io.IOException: unable to mkdirs /var/lib/cassandra/data/lolog_tests/snapshots/1338680275729-Logs_by_app at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1433) at org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1462) at org.apache.cassandra.db.ColumnFamilyStore.truncate(ColumnFamilyStore.java:1657) at org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandler.java:50) ERROR [MutationStage:19] 2012-06-02 20:37:57,395 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MutationStage:19,5,main] java.io.IOError: java.io.IOException: unable to mkdirs /var/lib/cassandra/data/lolog_tests/snapshots/1338680277394-Logs_by_app at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1433) at org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1462) at org.apache.cassandra.db.ColumnFamilyStore.truncate(ColumnFamilyStore.java:1657) at org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandler.java:50) ERROR [MutationStage:20] 2012-06-02 20:41:26,666 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MutationStage:20,5,main] java.io.IOError: java.io.IOException: unable to mkdirs /var/lib/cassandra/data/lolog_tests/snapshots/133868048-Logs_by_app at org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1433) at org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1462) at org.apache.cassandra.db.ColumnFamilyStore.truncate(ColumnFamilyStore.java:1657) at org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandler.java:50) system-cassandra-1.1.0.log ERROR [MutationStage:1] 2012-06-02 20:45:15,609 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[MutationStage:1,5,main] java.io.IOError: java.io.IOException: Unable to
[RELEASE] Apache Cassandra 1.1.1 released
The Cassandra team is pleased to announce the release of Apache Cassandra version 1.1.1. Cassandra is a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model. You can read more here: http://cassandra.apache.org/ Downloads of source and binary distributions are listed in our download section: http://cassandra.apache.org/download/ This version is the first maintenance/bug fix release[1] on the 1.1 series. As always, please pay attention to the release notes[2] and Let us know[3] if you were to encounter any problem. Enjoy! [1]: http://goo.gl/4Dxae (CHANGES.txt) [2]: http://goo.gl/ZE8ZK (NEWS.txt) [3]: https://issues.apache.org/jira/browse/CASSANDRA
RE: nodes moving spontaneously
Thanks for the tip. Checked nodetool ring on all nodes and they all have a consistent view of the ring. We have had other problems like nodes crashing etc so anything could have happened, but we're sure we didnt issue a nodetool move command. From: Tyler Hobbs [mailto:ty...@datastax.com] OpsCenter just periodically calls describe_ring() on different nodes in the cluster, so that's how it's getting that information. Maybe try running nodetool ring on each node in your cluster to make sure they all have the same view of the ring?
Re: row_cache_provider = 'SerializingCacheProvider'
I think that SerializingCacheProvider have more JAVA HEAP footprint, then i think 2012/6/4 ruslan usifov ruslan.usi...@gmail.com: I have setup 5GB of JavaHeap wit follow tuning: MAX_HEAP_SIZE=5G HEAP_NEWSIZE=800M JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=5 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=65 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly JVM_OPTS=$JVM_OPTS -XX:CMSFullGCsBeforeCompaction=1 Also I set up 2GB to memtables (memtable_total_space_in_mb: 2048) My avg heap usage (nodetool -h localhost info): 3G Based on nodetool -h localhost cfhistograms i calc avg row size 70KB I setup row cache only for one CF with follow settings: update column family building with rows_cached=1 and row_cache_provider='SerializingCacheProvider'; When i setup row cache i got promotion failure in GC (with stop the world pause about 30secs) with almost HEAP filled. I very confused with this behavior. PS: i use cassandra 1.0.10, with JNA 3.4.0 on ubuntu lucid (kernel 2.6.32-41) 2012/6/4 aaron morton aa...@thelastpickle.com: Yes SerializingCacheProvider is the off heap caching provider. Can you do some more digging into what is using the heap ? Cheers A - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 1/06/2012, at 9:52 PM, ruslan usifov wrote: Hello I begin use SerializingCacheProvider for rows cashing, and got extremely JAVA heap grows. But i think that this cache provider doesn't use JAVA heap
Re: memory issue on 1.1.0
Had a look at the log, this message INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) Unable to reduce heap usage since there are no dirty column families appears correct, it happens after some flush activity and there are not CF's with memtable data. But the heap is still full. Overall the server is overloaded, but it seems like it should be handling it better. What JVM settings do you have? What is the machine spec ? What settings do you have for key and row cache ? Do the CF's have secondary indexes ? How many clients / requests per second ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 4/06/2012, at 11:12 AM, Poziombka, Wade L wrote: Running a very write intensive (new column, delete old column etc.) process and failing on memory. Log file attached. Curiously when I add new data I have never seen this have in past sent hundreds of millions new transactions. It seems to be when I modify. my process is as follows key slice to get columns to modify in batches of 100, in separate threads modify those columns. I advance the slice with the start key each with last key in previous batch. Mutations done are update a column value in one column family(token), delete column and add new column in another (pan). Runs well until after about 5 million rows then it seems to run out of memory. Note that these column families are quite small. WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 145) Heap is 0.7967470834946492 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) Unable to reduce heap usage since there are no dirty column families INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) InetAddress /10.230.34.170 is now UP INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java (line 122) GC for ParNew: 206 ms for 1 collections, 7345969520 used; max is 8506048512 INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 122) GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 5714800208 used; max is 8506048512 Keyspace: keyspace Read Count: 50042632 Read Latency: 0.23157864418482224 ms. Write Count: 44948323 Write Latency: 0.019460829472992797 ms. Pending Tasks: 0 Column Family: pan SSTable count: 5 Space used (live): 1977467326 Space used (total): 1977467326 Number of Keys (estimate): 16334848 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 74 Read Count: 14985122 Read Latency: 0.408 ms. Write Count: 19972441 Write Latency: 0.022 ms. Pending Tasks: 0 Bloom Filter False Postives: 829 Bloom Filter False Ratio: 0.00073 Bloom Filter Space Used: 37048400 Compacted row minimum size: 125 Compacted row maximum size: 149 Compacted row mean size: 149 Column Family: token SSTable count: 4 Space used (live): 1250973873 Space used (total): 1250973873 Number of Keys (estimate): 14217216 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 49 Read Count: 30059563 Read Latency: 0.167 ms. Write Count: 14985488 Write Latency: 0.014 ms. Pending Tasks: 0 Bloom Filter False Postives: 13642 Bloom Filter False Ratio: 0.00322 Bloom Filter Space Used: 28002984 Compacted row minimum size: 150 Compacted row maximum size: 258 Compacted row mean size: 224 Column Family: counters SSTable count: 2 Space used (live): 561549994 Space used (total): 561549994 Number of Keys (estimate): 9985024 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 38 Read Count: 4997947 Read Latency: 0.092 ms. Write Count: 9990394 Write Latency: 0.023 ms. Pending Tasks: 0 Bloom Filter False Postives: 191 Bloom Filter False Ratio: 0.37525 Bloom Filter Space Used: 18741152 Compacted row
Re: Cassandra upgrade from 0.8.1 to 1.1.0
In addition always read the NEWS.txt file in the distribution and glance at the CHANGES.txt file. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 4/06/2012, at 12:19 PM, Roshan wrote: Hi Hope this will help to you. http://www.datastax.com/docs/1.0/install/upgrading http://www.datastax.com/docs/1.1/install/upgrading Thanks. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-upgrade-from-0-8-1-to-1-1-0-tp7580198p7580210.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Node join streaming stuck at 100%
Are their any errors in the logs about failed streaming ? If you are getting time outs 1.0.8 added a streaming socket timeout https://github.com/apache/cassandra/blob/trunk/CHANGES.txt#L323 Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 4/06/2012, at 3:12 PM, koji wrote: aaron morton aaron at thelastpickle.com writes: Did you restart ? All good? Cheers - Aaron Morton Freelance Developer at aaronmorton http://www.thelastpickle.com On 27/04/2012, at 9:49 AM, Bryce Godfrey wrote: This is the second node I’ve joined to my cluster in the last few days, and so far both have become stuck at 100% on a large file according to netstats. This is on 1.0.9, is there anything I can do to make it move on besides restarting Cassandra? I don’t see any errors or warns in logs for either server, and there is plenty of disk space. On the sender side I see this: Streaming to: /10.20.1.152 /opt/cassandra/data/MonitoringData/PropertyTimeline-hc-80540-Data.db sections=1 progress=82393861085/82393861085 - 100% On the node joining I don’t see this file in netstats, and all pending streams are sitting at 0% Hi we have the same problem (1.0.7) , our netstats log is like this: Mode: NORMAL Streaming to: /1.1.1.1 /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3757-Data.db sections=1234 progress=325/325 - 100% /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3641-Data.db sections=4386 progress=0/1025272214 - 0% /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3761-Data.db sections=2956 progress=0/17826723 - 0% /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3730-Data.db sections=3792 progress=0/56066299 - 0% /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3760-Data.db sections=4384 progress=0/90941161 - 0% /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3687-Data.db sections=3958 progress=0/54729557 - 0% /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3762-Data.db sections=766 progress=0/2605165 - 0% Streaming to: /1.1.1.2 /mnt/ebs1/cassandra-data/data/NemoModel/OneWayFriend-hc-709-Data.db sections=3228 progress=29175698/29175698 - 100% /mnt/ebs1/cassandra-data/data/NemoModel/OneWayFriend-hc-789-Data.db sections=2102 progress=0/618938 - 0% /mnt/ebs1/cassandra-data/data/NemoModel/OneWayFriend-hc-765-Data.db sections=3044 progress=0/1996687 - 0% /mnt/ebs1/cassandra-data/data/NemoModel/OneWayFriend-hc-788-Data.db sections=2773 progress=0/1374636 - 0% /mnt/ebs1/cassandra-data/data/NemoModel/OneWayFriend-hc-729-Data.db sections=3150 progress=0/22111512 - 0% Nothing streaming from /1.1.1.1 Nothing streaming from /1.1.1.2 Pool NameActive Pending Completed Commandsn/a 1 23825242 Responses n/a25 19644808 After restart, the pending streams are cleared, but next time we do nodetool repair -pr again, the pending still happened. And this always happend on same node(we have total 12 nodes). koji
Re: Retrieving old data version for a given row
This is an old issue with sstable2json https://issues.apache.org/jira/browse/CASSANDRA-4054 Internally the tomstone is associated with the o.a.c.db.AbstractColumnContainer see o.a.c.db.RowMutation.delete() to see how a row level delete works. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 4/06/2012, at 9:58 PM, Felipe Schmidt wrote: I was taking a look at tombstones stored at SSTable and I noticed that if I perform a key deletion, the tombstone doesn’t have any timestamp, he has this appearance: “key”:[ ] In all the other deletions granularities the tombstone have a timestamp.Without this information seems to be not possible to solve conflicts when a insertion for the same key is done after this deletion. If it happens, I think Cassandra will always delete this new information because of this tombstone. I’m using a single node configuration and maybe it change how does tombstones looks like. Thanks in advance. Regards, Felipe Mathias Schmidt (Computer Science UFRGS, RS, Brazil) 2012/5/31 aaron morton aa...@thelastpickle.com -Is there any other way to stract the contect of SSTable, writing a java program for example instead of using sstable2json? Look at the code in sstale2json and copy it :) -I tried to get tombstons using the thrift API, but seems to be not possible, is it right? When I try, the program throws an exception. No. Tombstones are not returned from API (See ColumnFamilyStore.getColumnFamily() ). You can see them if you use sstable2json. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 30/05/2012, at 9:53 PM, Felipe Schmidt wrote: I have further questions: -Is there any other way to stract the contect of SSTable, writing a java program for example instead of using sstable2json? -I tried to get tombstons using the thrift API, but seems to be not possible, is it right? When I try, the program throws an exception. thanks in advance Regards, Felipe Mathias Schmidt (Computer Science UFRGS, RS, Brazil) 2012/5/24 aaron morton aa...@thelastpickle.com: Ok... it's really strange to me that Cassandra doesn't support data versioning cause all of other key-value databases support it (at least those who I know). You can design it into your data model if you need it. I have one remaining question: -in the case that I have more than 1 SSTable in the disk for the same column but with different data versions, is it possible to make a query to get the old version instead of the newest one? No. There is only ever 1 value for a column. The older copies of the column in the SSTables are artefacts of immutable on disk structures. If you want to see what's inside an SSTable use bin/sstable2json Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/05/2012, at 9:42 PM, Felipe Schmidt wrote: Ok... it's really strange to me that Cassandra doesn't support data versioning cause all of other key-value databases support it (at least those who I know). I have one remaining question: -in the case that I have more than 1 SSTable in the disk for the same column but with different data versions, is it possible to make a query to get the old version instead of the newest one? Regards, Felipe Mathias Schmidt (Computer Science UFRGS, RS, Brazil) 2012/5/16 Dave Brosius dbros...@mebigfatguy.com: You're in for a world of hurt going down that rabbit hole. If you truely want version data then you should think about changing your keying to perhaps be a composite key where key is of form NaturalKey/VersionId Or if you want the versioning at the column level, use composite columns with ColumnName/VersionId format On 05/16/2012 10:16 AM, Felipe Schmidt wrote: That was very helpfull, thank you very much! I still have some questions: -it is possible to make Cassandra keep old value data after flushing? The same question for the memTable, before flushing. Seems to me that when I update some tuple, the old data will be overwrited in memTable, even before flushing. -it is possible to scan values from the memtable, maybe using the so-called Thrift API? Using the client-api I can just see the newest data version, I can't see what's really happening with the memTable. I ask that cause what I'll try to do is a Change Data Capture to Cassandra and the answers will define what kind of aproaches I'm able to use. Thanks in advance. Regards, Felipe Mathias Schmidt (Computer Science UFRGS, RS, Brazil) 2012/5/14 aaron mortonaa...@thelastpickle.com: Cassandra does not provide access to multiple versions of the same column. It is essentially implementation detail. All mutations are written to the commit log in a binary
RE: 1.1 not removing commit log files?
I'll try to get some log files for this with DEBUG enabled. Tough on production though. From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Monday, June 04, 2012 11:15 AM To: user@cassandra.apache.org Subject: Re: 1.1 not removing commit log files? Apply the local hint mutation follows the same code path and regular mutations. When the commit log is being truncated you should see flush activity, logged from the ColumnFamilyStore with Enqueuing flush of messages. If you set DEBUG logging for the org.apache.cassandra.db.ColumnFamilyStore it will log if it things the CF is clean and no flush takes place. If you set DEBUG logging on org.apache.cassandra.db.commitlog.CommitLog we will see if the commit log file could not be deleted because a dirty CF was not flushed. Cheers A - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 2/06/2012, at 4:43 AM, Rob Coli wrote: On Thu, May 31, 2012 at 7:01 PM, aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com wrote: But that talks about segments not being cleared at startup. Does not explain why they were allowed to get past the limit in the first place. Perhaps the commit log size tracking for this limit does not, for some reason, track hints? This seems like the obvious answer given the state which appears to trigger it? This doesn't explain why the files aren't getting deleted after the hints are delivered, of course... =Rob -- =Robert Coli AIMGTALK - rc...@palominodb.commailto:rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb
Mixing Ec2MultiregionSnitch with private network
Hi All, Does anyone have experience on Cassandra deployment mixing with EC2 and own data center? We plan to use ec2multiregionsnitch to build a Cassandra cluster across EC2 regions, and the same time to have a couple nodes (in the cluster) sitting in our own data center. Any comment whether it’s doable? Thanks. Patrick.
RE: memory issue on 1.1.0
What JVM settings do you have? -Xms8G -Xmx8G -Xmn800m -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.rmi.server.hostname=127.0.0.1 -Djava.net.preferIPv4Stack=true -Dcassandra-pidfile=cassandra.pid What is the machine spec ? It is an RH AS5 x64 16gb memory 2 CPU cores 2.8 Ghz As it turns out it is somewhat wimpier than I thought. While weak on it does have a good amount of memory. It is paired with a larger machine. What settings do you have for key and row cache ? A: All the defaults. (yaml template attached); Do the CF's have secondary indexes ? A: Yes one has two. One of them is used in the key slice used to get the row keys used to do the further mutations. How many clients / requests per second ? A: One client process with 10 threads connected to one of the two nodes in the cluster. On thread reading the slice and putting work in a queue. 9 others reading from this queue and applying the mutations. Mutations are completing at about 20,000/minute roughly. From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Monday, June 04, 2012 4:17 PM To: user@cassandra.apache.org Subject: Re: memory issue on 1.1.0 Had a look at the log, this message INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) Unable to reduce heap usage since there are no dirty column families appears correct, it happens after some flush activity and there are not CF's with memtable data. But the heap is still full. Overall the server is overloaded, but it seems like it should be handling it better. What JVM settings do you have? What is the machine spec ? What settings do you have for key and row cache ? Do the CF's have secondary indexes ? How many clients / requests per second ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 4/06/2012, at 11:12 AM, Poziombka, Wade L wrote: Running a very write intensive (new column, delete old column etc.) process and failing on memory. Log file attached. Curiously when I add new data I have never seen this have in past sent hundreds of millions new transactions. It seems to be when I modify. my process is as follows key slice to get columns to modify in batches of 100, in separate threads modify those columns. I advance the slice with the start key each with last key in previous batch. Mutations done are update a column value in one column family(token), delete column and add new column in another (pan). Runs well until after about 5 million rows then it seems to run out of memory. Note that these column families are quite small. WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 145) Heap is 0.7967470834946492 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) Unable to reduce heap usage since there are no dirty column families INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) InetAddress /10.230.34.170 is now UP INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java (line 122) GC for ParNew: 206 ms for 1 collections, 7345969520 used; max is 8506048512 INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 122) GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 5714800208 used; max is 8506048512 Keyspace: keyspace Read Count: 50042632 Read Latency: 0.23157864418482224 ms. Write Count: 44948323 Write Latency: 0.019460829472992797 ms. Pending Tasks: 0 Column Family: pan SSTable count: 5 Space used (live): 1977467326 Space used (total): 1977467326 Number of Keys (estimate): 16334848 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 74 Read Count: 14985122 Read Latency: 0.408 ms. Write Count: 19972441 Write Latency: 0.022 ms. Pending Tasks: 0 Bloom Filter False Postives: 829 Bloom Filter False Ratio: 0.00073 Bloom Filter Space Used: 37048400 Compacted row minimum size: 125 Compacted row maximum size: 149 Compacted row mean size: 149 Column Family: token SSTable count: 4 Space used (live): 1250973873 Space used
Re: about multitenant datamodel
IMHO a model that allows external users to create CF's is a bad one. why do you think so? I'll let users create ristricted CFs, and limit a number of CFs which users create. is it still a bad one? On Thu, 31 May 2012 06:44:05 +0900, aaron morton aa...@thelastpickle.com wrote: - Do a lot of keyspaces cause some problems? (If I have 1,000 users, cassandra creates 1,000 keyspaces…) It's not keyspaces, but the number of column families. Without storing any data each CF uses about 1MB of ram. When they start storing and reading data they use more. IMHO a model that allows external users to create CF's is a bad one. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 25/05/2012, at 12:52 PM, Toru Inoko wrote: Hi, all. I'm designing data api service(like cassandra.io but not using dedicated server for each user) on cassandra 1.1 on which users can do DML/DDL method like cql. Followings are api which users can use( almost same to cassandra api). - create/read/delete ColumnFamilies/Rows/Columns Now I'm thinking about multitenant datamodel on that. My data model like the following. I'm going to prepare a keyspace for each user as a user's tenant space. | keyspace1 | --- | column family | |(for user1)| | ... | keyspace2 | --- | column family | |(for user2)| | ... Followings are my question! - Is this data model a good for multitenant? - Do a lot of keyspaces cause some problems? (If I have 1,000 users, cassandra creates 1,000 keyspaces...) please, help. thank you in advance. Toru Inoko. -- --- SCSK株式会社 技術・品質・情報グループ 技術開発部 先端技術課 猪子 徹(Toru Inoko) tel : 03-6438-3544 mail : in...@ms.scsk.jp ---
Re: Mixing Ec2MultiregionSnitch with private network
Hi Patrick, I'm not sure if it's doable, but I can tell you for sure that there are lots differences in the way the networks will need to be set up. If you've got to secure client traffic, it's going to get even more complicated with encrypted traffic, etc. We did some performance testing and configuration testing with Cassandra across regions using a virtual network (my company's product). Have a look at what we did. I think when you add in your own datacenter, things are going to get even more complicated. One of the nice things about using a virtual network in EC2 is that you can set up multiple network interfaces so you don't have to use the the multi-region snitch. These interfaces are also clever about using the real and the NAT'ed EC2 interfaces for cluster traffic (better performance and $0 EC2 data bandwidth costs), so things can be set up just like in your own datacetner without worrying about EC2's public/private IPs, NATing, etc. You can read about what we did on our blog. http://blog.vcider.com/2011/09/running-cassandra-on-a-virtual-network-in-ec2/ and http://blog.vcider.com/2011/09/virtual-networks-can-run-cassandra-up-to-60-faster/ Let me know if you have any questions. CM On Mon, Jun 4, 2012 at 3:27 PM, Patrick Lu kuma...@hotmail.com wrote: Hi All, Does anyone have experience on Cassandra deployment mixing with EC2 and own data center? We plan to use ec2multiregionsnitch to build a Cassandra cluster across EC2 regions, and the same time to have a couple nodes (in the cluster) sitting in our own data center. Any comment whether it’s doable? Thanks. Patrick.
RE: memory issue on 1.1.0
I have repeated the test on two quite large machines 12 core, 64 GB as5 boxes and still observed the problem. Interestingly about at the same point. Anything I can monitor... perhaps I'll hook the Yourkit profiler up to it to see if there is some kind of leak? Wade From: Poziombka, Wade L Sent: Monday, June 04, 2012 7:23 PM To: user@cassandra.apache.org Subject: RE: memory issue on 1.1.0 What JVM settings do you have? -Xms8G -Xmx8G -Xmn800m -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.rmi.server.hostname=127.0.0.1 -Djava.net.preferIPv4Stack=true -Dcassandra-pidfile=cassandra.pid What is the machine spec ? It is an RH AS5 x64 16gb memory 2 CPU cores 2.8 Ghz As it turns out it is somewhat wimpier than I thought. While weak on it does have a good amount of memory. It is paired with a larger machine. What settings do you have for key and row cache ? A: All the defaults. (yaml template attached); Do the CF's have secondary indexes ? A: Yes one has two. One of them is used in the key slice used to get the row keys used to do the further mutations. How many clients / requests per second ? A: One client process with 10 threads connected to one of the two nodes in the cluster. On thread reading the slice and putting work in a queue. 9 others reading from this queue and applying the mutations. Mutations are completing at about 20,000/minute roughly. From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Monday, June 04, 2012 4:17 PM To: user@cassandra.apache.org Subject: Re: memory issue on 1.1.0 Had a look at the log, this message INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) Unable to reduce heap usage since there are no dirty column families appears correct, it happens after some flush activity and there are not CF's with memtable data. But the heap is still full. Overall the server is overloaded, but it seems like it should be handling it better. What JVM settings do you have? What is the machine spec ? What settings do you have for key and row cache ? Do the CF's have secondary indexes ? How many clients / requests per second ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 4/06/2012, at 11:12 AM, Poziombka, Wade L wrote: Running a very write intensive (new column, delete old column etc.) process and failing on memory. Log file attached. Curiously when I add new data I have never seen this have in past sent hundreds of millions new transactions. It seems to be when I modify. my process is as follows key slice to get columns to modify in batches of 100, in separate threads modify those columns. I advance the slice with the start key each with last key in previous batch. Mutations done are update a column value in one column family(token), delete column and add new column in another (pan). Runs well until after about 5 million rows then it seems to run out of memory. Note that these column families are quite small. WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 145) Heap is 0.7967470834946492 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) Unable to reduce heap usage since there are no dirty column families INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) InetAddress /10.230.34.170 is now UP INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java (line 122) GC for ParNew: 206 ms for 1 collections, 7345969520 used; max is 8506048512 INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 122) GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 5714800208 used; max is 8506048512 Keyspace: keyspace Read Count: 50042632 Read Latency: 0.23157864418482224 ms. Write Count: 44948323 Write Latency: 0.019460829472992797 ms. Pending Tasks: 0 Column Family: pan SSTable count: 5 Space used (live): 1977467326 Space used (total): 1977467326 Number of Keys (estimate): 16334848 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 74 Read Count: 14985122 Read Latency: 0.408 ms. Write Count: 19972441 Write Latency: 0.022 ms. Pending Tasks: 0 Bloom Filter False
Re: memory issue on 1.1.0
Perhaps the deletes: https://issues.apache.org/jira/browse/CASSANDRA-3741 -Brandon On Sun, Jun 3, 2012 at 6:12 PM, Poziombka, Wade L wade.l.poziom...@intel.com wrote: Running a very write intensive (new column, delete old column etc.) process and failing on memory. Log file attached. Curiously when I add new data I have never seen this have in past sent hundreds of millions new transactions. It seems to be when I modify. my process is as follows key slice to get columns to modify in batches of 100, in separate threads modify those columns. I advance the slice with the start key each with last key in previous batch. Mutations done are update a column value in one column family(token), delete column and add new column in another (pan). Runs well until after about 5 million rows then it seems to run out of memory. Note that these column families are quite small. WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 145) Heap is 0.7967470834946492 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) Unable to reduce heap usage since there are no dirty column families INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) InetAddress /10.230.34.170 is now UP INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java (line 122) GC for ParNew: 206 ms for 1 collections, 7345969520 used; max is 8506048512 INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 122) GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 5714800208 used; max is 8506048512 Keyspace: keyspace Read Count: 50042632 Read Latency: 0.23157864418482224 ms. Write Count: 44948323 Write Latency: 0.019460829472992797 ms. Pending Tasks: 0 Column Family: pan SSTable count: 5 Space used (live): 1977467326 Space used (total): 1977467326 Number of Keys (estimate): 16334848 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 74 Read Count: 14985122 Read Latency: 0.408 ms. Write Count: 19972441 Write Latency: 0.022 ms. Pending Tasks: 0 Bloom Filter False Postives: 829 Bloom Filter False Ratio: 0.00073 Bloom Filter Space Used: 37048400 Compacted row minimum size: 125 Compacted row maximum size: 149 Compacted row mean size: 149 Column Family: token SSTable count: 4 Space used (live): 1250973873 Space used (total): 1250973873 Number of Keys (estimate): 14217216 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 49 Read Count: 30059563 Read Latency: 0.167 ms. Write Count: 14985488 Write Latency: 0.014 ms. Pending Tasks: 0 Bloom Filter False Postives: 13642 Bloom Filter False Ratio: 0.00322 Bloom Filter Space Used: 28002984 Compacted row minimum size: 150 Compacted row maximum size: 258 Compacted row mean size: 224 Column Family: counters SSTable count: 2 Space used (live): 561549994 Space used (total): 561549994 Number of Keys (estimate): 9985024 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 38 Read Count: 4997947 Read Latency: 0.092 ms. Write Count: 9990394 Write Latency: 0.023 ms. Pending Tasks: 0 Bloom Filter False Postives: 191 Bloom Filter False Ratio: 0.37525 Bloom Filter Space Used: 18741152 Compacted row minimum size: 125 Compacted row maximum size: 179 Compacted row mean size: 150
Re: How to use Hector to retrieve data from Cassandra
Please refer following url. You can find some example of how to use hector https://github.com/zznate/hector-examples/tree/master/src/main/java/com/riptano/cassandra/hector/example Toru On Tue, 05 Jun 2012 13:08:31 +0900, Prakrati Agrawal prakrati.agra...@mu-sigma.com wrote: Dear all, I am unable to find a good elaborate example on how to use Hector to get data stored in Cassandra. Please help me. Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. -- --- SCSK Corporation Toru Inoko tel : 03-6438-3544 mail : in...@ms.scsk.jp ---
nodetool repair -pr enough in this scenario?
Hello, Currently I have a 4 node cassandra cluster on CentOS64. I have been running nodetool repair (no -pr option) on a weekly schedule like: Host1: Tue, Host2: Wed, Host3: Thu, Host4: Fri In this scenario, if I were to add the -pr option, would this still be sufficient to prevent forgotten deletes and properly maintain consistency? Thank you, - David