Can not find auto bootstrap property in cassandra.yaml for Cassandra 1.1.0

2012-06-04 Thread Prakrati Agrawal
Dear all

I am trying to add a new node to the Cassandra cluster. In all the 
documentations available on net it says to set the auto bootstrap property in 
cassandra.yaml to true but I am not finding the property in the file. Please 
help me

Thanks and Regards

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.


Re: Can not find auto bootstrap property in cassandra.yaml for Cassandra 1.1.0

2012-06-04 Thread Pushpalanka Jayawardhana
Hi,

Me too met with that problem. You can safely skip that step for latest
versions. I read somewhere that by default it is set to be true in latest
versions.

On Mon, Jun 4, 2012 at 11:28 AM, Prakrati Agrawal 
prakrati.agra...@mu-sigma.com wrote:

  Dear all

 ** **

 I am trying to add a new node to the Cassandra cluster. In all the
 documentations available on net it says to set the auto bootstrap property
 in cassandra.yaml to true but I am not finding the property in the file.
 Please help me

 ** **

 Thanks and Regards

 ** **

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

 ** **

 --
 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.




-- 
Pushpalanka Jayawardhana | Undergraduate | Computer Science and Engineering
University of Moratuwa

+94779716248 | http://pushpalankajaya.blogspot.com

Twitter: http://twitter.com/Pushpalanka | Slideshare:
http://www.slideshare.net/Pushpalanka


Re: Can not find auto bootstrap property in cassandra.yaml for Cassandra 1.1.0

2012-06-04 Thread Roshni Rajagopal
Hi Prakrati,

In 1.1.0 you don't need to set this, its by default. Im also on 1.1.0 and I 
didn't need to set this.


Regards,
Roshni

From: Prakrati Agrawal 
prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Sun, 3 Jun 2012 22:58:24 -0700
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Can not find auto bootstrap property in cassandra.yaml for Cassandra 
1.1.0

Dear all

I am trying to add a new node to the Cassandra cluster. In all the 
documentations available on net it says to set the auto bootstrap property in 
cassandra.yaml to true but I am not finding the property in the file. Please 
help me

Thanks and Regards

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.

This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
***


RE: Can not find auto bootstrap property in cassandra.yaml for Cassandra 1.1.0

2012-06-04 Thread Rishabh Agrawal
Hello,


Auto bootstrap, as an attribute, is not present in latest versions of Cassandra 
(1.0 and later). You can add 'auto_bootstrap: true' or keep initial_token to be 
blank to make it happen. As I have noticed by default it remains true
Regards
Rishabh Agrawal


From: Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com]
Sent: Sunday, June 03, 2012 10:58 PM
To: user@cassandra.apache.org
Subject: Can not find auto bootstrap property in cassandra.yaml for Cassandra 
1.1.0

Dear all

I am trying to add a new node to the Cassandra cluster. In all the 
documentations available on net it says to set the auto bootstrap property in 
cassandra.yaml to true but I am not finding the property in the file. Please 
help me

Thanks and Regards

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | 
www.mu-sigma.comhttp://www.mu-sigma.com



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.



Register for Impetus webinar 'User Experience Design for iPad Applications' 
June 8(10:00am PT). http://lf1.me/f9/

Impetus' Head of Labs to present on 'Integrating Big Data technologies in your 
IT portfolio' at Cloud Expo, NY (June 11-14). Contact us for a complimentary 
pass.Impetus also sponsoring the Yahoo Summit 2012.


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Getting error on adding seed to add a new node

2012-06-04 Thread Prakrati Agrawal
Dear all,

I am trying to add a new node to my existing one node Cassandra. So I edited 
the seeds value in the cassandra.yaml and added the ip addresses of both the 
nodes. But its giving me the following error:

ERROR 13:16:48,342 Fatal configuration error error
while parsing a block mapping
 in reader, line 164, column 13:
  - seeds: 162.192.100.16,162.192 ...
^
expected block end, but found FlowEntry
 in reader, line 164, column 36:
  - seeds: 162.192.100.16,162.192.100.48
   ^

Please help me.
Thanks and Regards

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.


Re: Getting error on adding seed to add a new node

2012-06-04 Thread Pushpalanka Jayawardhana
Hi,

As it is said in cassandra.yaml, you need to define it as # Ex: **
ip1,ip2,ip3** whole list as one String.


On Mon, Jun 4, 2012 at 1:17 PM, Prakrati Agrawal 
prakrati.agra...@mu-sigma.com wrote:

  Dear all,

 ** **

 I am trying to add a new node to my existing one node Cassandra. So I
 edited the seeds value in the cassandra.yaml and added the ip addresses of
 both the nodes. But its giving me the following error:

 ** **

 ERROR 13:16:48,342 Fatal configuration error error

 while parsing a block mapping

  in reader, line 164, column 13:

   - seeds: 162.192.100.16*,*162.192 ... 

 ^

 expected block end, but found FlowEntry

  in reader, line 164, column 36:

   - seeds: 162.192.100.16*,*162.192.100.48

^

 ** **

 Please help me.

 Thanks and Regards

 ** **

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

 ** **

 --
 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.




-- 
Pushpalanka Jayawardhana | Undergraduate | Computer Science and Engineering
University of Moratuwa

+94779716248 | http://pushpalankajaya.blogspot.com

Twitter: http://twitter.com/Pushpalanka | Slideshare:
http://www.slideshare.net/Pushpalanka


Finding whether a new node is successfully added or not

2012-06-04 Thread Prakrati Agrawal
Dear all,

I added a new node to my 1 node Cassandra cluster. Now I want to find out 
whether it is added successfully or not. Also do I need to restart the already 
running node after entering the seed value. Please help me.

Thanks and Regards

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.


Re: Finding whether a new node is successfully added or not

2012-06-04 Thread R. Verlangen
Hi there,

You can check the ring info with nodetool. Furthermore you can take a look
at the streaming statistics: lots of pending indicates a node that is still
receiving data from it's seed(s). As far as I'm aware of the seed value
will be read upon start: so a restart is required.

Good luck.

2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com

  Dear all,

 ** **

 I added a new node to my 1 node Cassandra cluster. Now I want to find out
 whether it is added successfully or not. Also do I need to restart the
 already running node after entering the seed value. Please help me.

 ** **

 Thanks and Regards

 ** **

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

 ** **

 --
 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.




-- 
With kind regards,

Robin Verlangen
*Software engineer*
*
*
W www.robinverlangen.nl
E ro...@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.


Re: Finding whether a new node is successfully added or not

2012-06-04 Thread Pushpalanka Jayawardhana
Hi Prakrati,


bin/nodetool -host ip ring

Refer here at Cassandra wiki http://wiki.apache.org/cassandra/NodeTool for
more details. A restart is needed as I know as node need to communicate
with the seeds and make sense of the cluster it is in.


On Mon, Jun 4, 2012 at 1:41 PM, Prakrati Agrawal 
prakrati.agra...@mu-sigma.com wrote:

  Dear all,

 ** **

 I added a new node to my 1 node Cassandra cluster. Now I want to find out
 whether it is added successfully or not. Also do I need to restart the
 already running node after entering the seed value. Please help me.

 ** **

 Thanks and Regards

 ** **

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

 ** **

 --
 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.




-- 
Pushpalanka Jayawardhana | Undergraduate | Computer Science and Engineering
University of Moratuwa

+94779716248 | http://pushpalankajaya.blogspot.com

Twitter: http://twitter.com/Pushpalanka | Slideshare:
http://www.slideshare.net/Pushpalanka


Re: row_cache_provider = 'SerializingCacheProvider'

2012-06-04 Thread aaron morton
Yes SerializingCacheProvider is the off heap caching provider. 

Can you do some more digging into what is using the heap ? 

Cheers
A

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 1/06/2012, at 9:52 PM, ruslan usifov wrote:

 Hello
 
 I begin use SerializingCacheProvider for rows cashing, and got
 extremely JAVA heap grows. But i think that this cache provider
 doesn't use JAVA heap



Re: batch isolation

2012-06-04 Thread Sylvain Lebresne
On Sun, Jun 3, 2012 at 6:05 PM, Todd Burruss bburr...@expedia.com wrote:
 I just meant there is a row delete in the same batch as inserts - all to
 the same column family and key

Then it's the timestamp that will decide what happens. Whatever has a
timestamp lower or equal to the tombstone timestamp will be deleted
(that stands for insert in the batch itself).

--
Sylvain




 -Original Message-
 From: Sylvain Lebresne [sylv...@datastax.com]
 Received: Sunday, 03 Jun 2012, 3:44am
 To: user@cassandra.apache.org [user@cassandra.apache.org]
 Subject: Re: batch isolation

 On Sun, Jun 3, 2012 at 2:53 AM, Todd Burruss bburr...@expedia.com wrote:
 1 – does this mean that a batch_mutate that first sends a row delete
 mutation on key X, then subsequent insert mutations for key X is isolated?

 I'm not sure what you mean by having a batch_mutate that first sends
 ... then ..., since a batch_mutate is a single API call.

 2 – does isolation span column families for the same key  within the same
 batch_mutate?

 No, it doesn't span column families (contrarily to atomicity). There
 is more details in
 http://www.datastax.com/dev/blog/row-level-isolation.

 --
 Sylvain


Adding a new node to Cassandra cluster

2012-06-04 Thread Prakrati Agrawal
Dear all

I successfully added a new node to my cluster so now it's a 2 node cluster. But 
how do I mention it in my Java code as when I am retrieving data its retrieving 
only for one node that I am specifying in the localhost. How do I specify more 
than one node in the localhost.

Please help me

Thanks and Regards

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.


Re: Adding a new node to Cassandra cluster

2012-06-04 Thread R. Verlangen
Hi there,

When you speak to one node it will internally redirect the request to the
proper node (local / external): but you won't be able to failover on a
crash of the localhost.
For adding another node to the connection pool you should take a look at
the documentation of your java client.

Good luck!

2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com

  Dear all

 ** **

 I successfully added a new node to my cluster so now it’s a 2 node
 cluster. But how do I mention it in my Java code as when I am retrieving
 data its retrieving only for one node that I am specifying in the
 localhost. How do I specify more than one node in the localhost.

 ** **

 Please help me

 ** **

 Thanks and Regards

 ** **

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

 ** **

 --
 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.




-- 
With kind regards,

Robin Verlangen
*Software engineer*
*
*
W www.robinverlangen.nl
E ro...@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.


RE: Adding a new node to Cassandra cluster

2012-06-04 Thread Prakrati Agrawal
Hi,

I am using Thrift API and I am not able to find anything on the internet about 
how to configure it for multiple nodes. I am not using any proper client like 
Hector.

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com

From: R. Verlangen [mailto:ro...@us2.nl]
Sent: Monday, June 04, 2012 2:44 PM
To: user@cassandra.apache.org
Subject: Re: Adding a new node to Cassandra cluster

Hi there,

When you speak to one node it will internally redirect the request to the 
proper node (local / external): but you won't be able to failover on a crash of 
the localhost.
For adding another node to the connection pool you should take a look at the 
documentation of your java client.

Good luck!

2012/6/4 Prakrati Agrawal 
prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com
Dear all

I successfully added a new node to my cluster so now it's a 2 node cluster. But 
how do I mention it in my Java code as when I am retrieving data its retrieving 
only for one node that I am specifying in the localhost. How do I specify more 
than one node in the localhost.

Please help me

Thanks and Regards

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | 
www.mu-sigma.comhttp://www.mu-sigma.com



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.



--
With kind regards,

Robin Verlangen
Software engineer

W www.robinverlangen.nlhttp://www.robinverlangen.nl
E ro...@us2.nlmailto:ro...@us2.nl

Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.


Re: Adding a new node to Cassandra cluster

2012-06-04 Thread R. Verlangen
You might consider using a higher level client (like Hector indeed). If you
don't want this you will have to write your own connection pool. For start
take a look at Hector. But keep in mind that you might be reinventing the
wheel.

2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com

  Hi,

 ** **

 I am using Thrift API and I am not able to find anything on the internet
 about how to configure it for multiple nodes. I am not using any proper
 client like Hector.

 ** **

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

 ** **

 *From:* R. Verlangen [mailto:ro...@us2.nl]
 *Sent:* Monday, June 04, 2012 2:44 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Adding a new node to Cassandra cluster

 ** **

 Hi there,

 ** **

 When you speak to one node it will internally redirect the request to the
 proper node (local / external): but you won't be able to failover on a
 crash of the localhost.

 For adding another node to the connection pool you should take a look at
 the documentation of your java client.

 ** **

 Good luck!

 ** **

 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com

 Dear all

  

 I successfully added a new node to my cluster so now it’s a 2 node
 cluster. But how do I mention it in my Java code as when I am retrieving
 data its retrieving only for one node that I am specifying in the
 localhost. How do I specify more than one node in the localhost.

  

 Please help me

  

 Thanks and Regards

  

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

  

 ** **
  --

 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.



 

 ** **

 --
 With kind regards,

 ** **

 Robin Verlangen

 *Software engineer*

 ** **

 W www.robinverlangen.nl

 E ro...@us2.nl

 ** **

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

 ** **

 --
 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.




-- 
With kind regards,

Robin Verlangen
*Software engineer*
*
*
W www.robinverlangen.nl
E ro...@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.


Query

2012-06-04 Thread MOHD ARSHAD SALEEM
Hi all,

I wanted to know how to read and write data using cassandra API's . is there 
any link related to sample program .

Regards
Arshad


Re: Adding a new node to Cassandra cluster

2012-06-04 Thread Roshni Rajagopal
Prakrati,

I believe even though you would specify one node in your code, internally the 
request would be going to  any – perhaps more than 1 node based on your 
replication factors  consistency level settings.
You can try this  by connecting to one node and writing to it and then reading 
the same data from another node. You can see this replication happening via CLI 
as well.

Regards,
Roshni


From: R. Verlangen ro...@us2.nlmailto:ro...@us2.nl
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Mon, 4 Jun 2012 02:30:40 -0700
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Adding a new node to Cassandra cluster

You might consider using a higher level client (like Hector indeed). If you 
don't want this you will have to write your own connection pool. For start take 
a look at Hector. But keep in mind that you might be reinventing the wheel.

2012/6/4 Prakrati Agrawal 
prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com
Hi,

I am using Thrift API and I am not able to find anything on the internet about 
how to configure it for multiple nodes. I am not using any proper client like 
Hector.

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | 
www.mu-sigma.comhttp://www.mu-sigma.com

From: R. Verlangen [mailto:ro...@us2.nlmailto:ro...@us2.nl]
Sent: Monday, June 04, 2012 2:44 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Adding a new node to Cassandra cluster

Hi there,

When you speak to one node it will internally redirect the request to the 
proper node (local / external): but you won't be able to failover on a crash of 
the localhost.
For adding another node to the connection pool you should take a look at the 
documentation of your java client.

Good luck!

2012/6/4 Prakrati Agrawal 
prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com
Dear all

I successfully added a new node to my cluster so now it’s a 2 node cluster. But 
how do I mention it in my Java code as when I am retrieving data its retrieving 
only for one node that I am specifying in the localhost. How do I specify more 
than one node in the localhost.

Please help me

Thanks and Regards

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | 
www.mu-sigma.comhttp://www.mu-sigma.com



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.



--
With kind regards,

Robin Verlangen
Software engineer

W www.robinverlangen.nlhttp://www.robinverlangen.nl
E ro...@us2.nlmailto:ro...@us2.nl

Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.



--
With kind regards,

Robin Verlangen
Software engineer

W www.robinverlangen.nlhttp://www.robinverlangen.nl
E ro...@us2.nlmailto:ro...@us2.nl

Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use 

RE: Query

2012-06-04 Thread Rishabh Agrawal
If you are using Java try out Kundera or Hector, both are good and have good 
documentation available.

From: MOHD ARSHAD SALEEM [mailto:marshadsal...@tataelxsi.co.in]
Sent: Monday, June 04, 2012 2:37 AM
To: user@cassandra.apache.org
Subject: Query

Hi all,

I wanted to know how to read and write data using cassandra API's . is there 
any link related to sample program .

Regards
Arshad



Register for Impetus webinar 'User Experience Design for iPad Applications' 
June 8(10:00am PT). http://lf1.me/f9/

Impetus' Head of Labs to present on 'Integrating Big Data technologies in your 
IT portfolio' at Cloud Expo, NY (June 11-14). Contact us for a complimentary 
pass.Impetus also sponsoring the Yahoo Summit 2012.


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: Adding a new node to Cassandra cluster

2012-06-04 Thread samal
If you use thrift API, you have to maintain lot of low level code by
yourself which is already being polished by HLC  hector, pycassa also with
HLC your can easily switch between thrift and growing CQL.

On Mon, Jun 4, 2012 at 3:00 PM, R. Verlangen ro...@us2.nl wrote:

 You might consider using a higher level client (like Hector indeed). If
 you don't want this you will have to write your own connection pool. For
 start take a look at Hector. But keep in mind that you might be
 reinventing the wheel.


 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com

  Hi,

 ** **

 I am using Thrift API and I am not able to find anything on the internet
 about how to configure it for multiple nodes. I am not using any proper
 client like Hector.

 ** **

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

 ** **

 *From:* R. Verlangen [mailto:ro...@us2.nl]
 *Sent:* Monday, June 04, 2012 2:44 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Adding a new node to Cassandra cluster

 ** **

 Hi there,

 ** **

 When you speak to one node it will internally redirect the request to the
 proper node (local / external): but you won't be able to failover on a
 crash of the localhost.

 For adding another node to the connection pool you should take a look at
 the documentation of your java client.

 ** **

 Good luck!

 ** **

 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com

 Dear all

  

 I successfully added a new node to my cluster so now it’s a 2 node
 cluster. But how do I mention it in my Java code as when I am retrieving
 data its retrieving only for one node that I am specifying in the
 localhost. How do I specify more than one node in the localhost.

  

 Please help me

  

 Thanks and Regards

  

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

  

 ** **
  --

 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.



 

 ** **

 --
 With kind regards,

 ** **

 Robin Verlangen

 *Software engineer*

 ** **

 W www.robinverlangen.nl

 E ro...@us2.nl

 ** **

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

 ** **

 --
 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.




 --
 With kind regards,

 Robin Verlangen
 *Software engineer*
 *
 *
 W www.robinverlangen.nl
 E ro...@us2.nl

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.




Re: Query

2012-06-04 Thread Amresh Singh
Here is a link that will help you out if you use Kundera as high level
client for Cassandra:

https://github.com/impetus-opensource/Kundera/wiki/Getting-Started-in-5-minuteshttps://mail3.impetus.co.in/owa/redir.aspx?C=EkEr9x7W6ku6EW9m23CsVhgx2Di4Fc8Ixe8fyRCMSrCL8TOfNSadRVR_uY98wDCUO3S71gXRO0g.URL=https%3a%2f%2fgithub.com%2fimpetus-opensource%2fKundera%2fwiki%2fGetting-Started-in-5-minutes

Regards,
Amresh

On Mon, Jun 4, 2012 at 3:09 PM, Rishabh Agrawal 
rishabh.agra...@impetus.co.in wrote:

  If you are using Java try out Kundera or Hector, both are good and have
 good documentation available.



 *From:* MOHD ARSHAD SALEEM [mailto:marshadsal...@tataelxsi.co.in]
 *Sent:* Monday, June 04, 2012 2:37 AM
 *To:* user@cassandra.apache.org
 *Subject:* Query



 Hi all,

 I wanted to know how to read and write data using cassandra API's . is
 there any link related to sample program .

 Regards
 Arshad

 --

 Register for Impetus webinar ‘User Experience Design for iPad
 Applications’ June 8(10:00am PT). http://lf1.me/f9/

 Impetus’ Head of Labs to present on ‘Integrating Big Data technologies in
 your IT portfolio’ at Cloud Expo, NY (June 11-14). Contact us for a
 complimentary pass.Impetus also sponsoring the Yahoo Summit 2012.


 NOTE: This message may contain information that is confidential,
 proprietary, privileged or otherwise protected by law. The message is
 intended solely for the named addressee. If received in error, please
 destroy and notify the sender. Any use of this email is prohibited when
 received in error. Impetus does not represent, warrant and/or guarantee,
 that the integrity of this communication has been maintained nor that the
 communication is free of errors, virus, interception or interference.



Re: Query

2012-06-04 Thread Franc Carter
On Mon, Jun 4, 2012 at 7:36 PM, MOHD ARSHAD SALEEM 
marshadsal...@tataelxsi.co.in wrote:

  Hi all,

 I wanted to know how to read and write data using cassandra API's . is
 there any link related to sample program .


I did a Proof of Concept using a python client -PyCassa (
https://github.com/pycassa/pycassa) which works well

cheers


 Regards
 Arshad




-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215


RE: Adding a new node to Cassandra cluster

2012-06-04 Thread Prakrati Agrawal
Ye I know I am trying to reinvent the wheel but I have to. The requirement is 
such that I have to use Java Thrift API without any client like Hector. Can you 
please tell me how do I do it.

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com

From: samal [mailto:samalgo...@gmail.com]
Sent: Monday, June 04, 2012 3:12 PM
To: user@cassandra.apache.org
Subject: Re: Adding a new node to Cassandra cluster

If you use thrift API, you have to maintain lot of low level code by yourself 
which is already being polished by HLC  hector, pycassa also with HLC your can 
easily switch between thrift and growing CQL.
On Mon, Jun 4, 2012 at 3:00 PM, R. Verlangen 
ro...@us2.nlmailto:ro...@us2.nl wrote:
You might consider using a higher level client (like Hector indeed). If you 
don't want this you will have to write your own connection pool. For start take 
a look at Hector. But keep in mind that you might be reinventing the wheel.

2012/6/4 Prakrati Agrawal 
prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com
Hi,

I am using Thrift API and I am not able to find anything on the internet about 
how to configure it for multiple nodes. I am not using any proper client like 
Hector.

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | 
www.mu-sigma.comhttp://www.mu-sigma.com

From: R. Verlangen [mailto:ro...@us2.nlmailto:ro...@us2.nl]
Sent: Monday, June 04, 2012 2:44 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Adding a new node to Cassandra cluster

Hi there,

When you speak to one node it will internally redirect the request to the 
proper node (local / external): but you won't be able to failover on a crash of 
the localhost.
For adding another node to the connection pool you should take a look at the 
documentation of your java client.

Good luck!

2012/6/4 Prakrati Agrawal 
prakrati.agra...@mu-sigma.commailto:prakrati.agra...@mu-sigma.com
Dear all

I successfully added a new node to my cluster so now it’s a 2 node cluster. But 
how do I mention it in my Java code as when I am retrieving data its retrieving 
only for one node that I am specifying in the localhost. How do I specify more 
than one node in the localhost.

Please help me

Thanks and Regards

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | 
www.mu-sigma.comhttp://www.mu-sigma.com



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.



--
With kind regards,

Robin Verlangen
Software engineer

W www.robinverlangen.nlhttp://www.robinverlangen.nl
E ro...@us2.nlmailto:ro...@us2.nl

Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.



--
With kind regards,

Robin Verlangen
Software engineer

W www.robinverlangen.nlhttp://www.robinverlangen.nl
E ro...@us2.nlmailto:ro...@us2.nl

Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the 

Re: Adding a new node to Cassandra cluster

2012-06-04 Thread R. Verlangen
Connection pooling involves things like:
- (transparent) failover / retry
- disposal of connections after X messages
- keep track of connections

Again: take a look at the hector connection pool. Source:
https://github.com/rantav/hector/tree/master/core/src/main/java/me/prettyprint/cassandra/connection

2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com

  Ye I know I am trying to reinvent the wheel but I have to. The
 requirement is such that I have to use Java Thrift API without any client
 like Hector. Can you please tell me how do I do it.

 ** **

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

 ** **

 *From:* samal [mailto:samalgo...@gmail.com]
 *Sent:* Monday, June 04, 2012 3:12 PM

 *To:* user@cassandra.apache.org
 *Subject:* Re: Adding a new node to Cassandra cluster

  ** **

 If you use thrift API, you have to maintain lot of low level code by
 yourself which is already being polished by HLC  hector, pycassa also with
 HLC your can easily switch between thrift and growing CQL.

 On Mon, Jun 4, 2012 at 3:00 PM, R. Verlangen ro...@us2.nl wrote:

 You might consider using a higher level client (like Hector indeed). If
 you don't want this you will have to write your own connection pool. For
 start take a look at Hector. But keep in mind that you might be
 reinventing the wheel.

 ** **

 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com

 Hi,

  

 I am using Thrift API and I am not able to find anything on the internet
 about how to configure it for multiple nodes. I am not using any proper
 client like Hector.

  

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

  

 *From:* R. Verlangen [mailto:ro...@us2.nl]
 *Sent:* Monday, June 04, 2012 2:44 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Adding a new node to Cassandra cluster

  

 Hi there,

  

 When you speak to one node it will internally redirect the request to the
 proper node (local / external): but you won't be able to failover on a
 crash of the localhost.

 For adding another node to the connection pool you should take a look at
 the documentation of your java client.

  

 Good luck!

  

 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com

 Dear all

  

 I successfully added a new node to my cluster so now it’s a 2 node
 cluster. But how do I mention it in my Java code as when I am retrieving
 data its retrieving only for one node that I am specifying in the
 localhost. How do I specify more than one node in the localhost.

  

 Please help me

  

 Thanks and Regards

  

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

  

  
  --

 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.



 

  

 --
 With kind regards,

  

 Robin Verlangen

 *Software engineer*

  

 W www.robinverlangen.nl

 E ro...@us2.nl

  

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

  

 ** **
  --

 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 

Re: Retrieving old data version for a given row

2012-06-04 Thread Felipe Schmidt
*I was taking a look at tombstones stored at SSTable and I noticed that if
I perform a key deletion, the tombstone doesn’t have any timestamp, he has
this appearance:
“key”:[ ]
In all the other deletions granularities the tombstone have a
timestamp.Without this information seems to be not possible to solve
conflicts when a insertion for the same key is done after this deletion. If
it happens, I think Cassandra will always delete this new information
because of this tombstone.
I’m using a single node configuration and maybe it change how does
tombstones looks like.

Thanks in advance.*
*
*
Regards,
Felipe Mathias Schmidt
*(Computer Science UFRGS, RS, Brazil)*





2012/5/31 aaron morton aa...@thelastpickle.com

 -Is there any other way to stract the contect of SSTable, writing a
 java program for example instead of using sstable2json?

 Look at the code in sstale2json and copy it :)

 -I tried to get tombstons using the thrift API, but seems to be not
 possible, is it right? When I try, the program throws an exception.

 No.
 Tombstones are not returned from API (See
 ColumnFamilyStore.getColumnFamily() ).
 You can see them if you use sstable2json.

 Cheers


 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 30/05/2012, at 9:53 PM, Felipe Schmidt wrote:

 I have further questions:
 -Is there any other way to stract the contect of SSTable, writing a
 java program for example instead of using sstable2json?
 -I tried to get tombstons using the thrift API, but seems to be not
 possible, is it right? When I try, the program throws an exception.

 thanks in advance

 Regards,
 Felipe Mathias Schmidt
 (Computer Science UFRGS, RS, Brazil)




 2012/5/24 aaron morton aa...@thelastpickle.com:

 Ok... it's really strange to me that Cassandra doesn't support data

 versioning cause all of other key-value databases support it (at least

 those who I know).


 You can design it into your data model if you need it.



 I have one remaining question:

 -in the case that I have more than 1 SSTable in the disk for the same

 column but with different data versions, is it possible to make a


 query to get the old version instead of the newest one?


 No.

 There is only ever 1 value for a column.

 The older copies of the column in the SSTables are artefacts of immutable

 on disk structures.

 If you want to see what's inside an SSTable use bin/sstable2json


 Cheers


 -

 Aaron Morton

 Freelance Developer

 @aaronmorton

 http://www.thelastpickle.com


 On 24/05/2012, at 9:42 PM, Felipe Schmidt wrote:


 Ok... it's really strange to me that Cassandra doesn't support data

 versioning cause all of other key-value databases support it (at least

 those who I know).


 I have one remaining question:

 -in the case that I have more than 1 SSTable in the disk for the same

 column but with different data versions, is it possible to make a

 query to get the old version instead of the newest one?


 Regards,

 Felipe Mathias Schmidt

 (Computer Science UFRGS, RS, Brazil)





 2012/5/16 Dave Brosius dbros...@mebigfatguy.com:


 You're in for a world of hurt going down that rabbit hole. If you truely


 want version data then you should think about changing your keying to


 perhaps be a composite key where key is of form



 NaturalKey/VersionId



 Or if you want the versioning at the column level, use composite columns


 with ColumnName/VersionId format






 On 05/16/2012 10:16 AM, Felipe Schmidt wrote:



 That was very helpfull, thank you very much!



 I still have some questions:


 -it is possible to make Cassandra keep old value data after flushing?


 The same question for the memTable, before flushing. Seems to me that


 when I update some tuple, the old data will be overwrited in memTable,


 even before flushing.


 -it is possible to scan values from the memtable, maybe using the


 so-called Thrift API? Using the client-api I can just see the newest


 data version, I can't see what's really happening with the memTable.



 I ask that cause what I'll try to do is a Change Data Capture to


 Cassandra and the answers will define what kind of aproaches I'm able


 to use.



 Thanks in advance.



 Regards,


 Felipe Mathias Schmidt


 (Computer Science UFRGS, RS, Brazil)




 2012/5/14 aaron mortonaa...@thelastpickle.com:



 Cassandra does not provide access to multiple versions of the same


 column.


 It is essentially implementation detail.



 All mutations are written to the commit log in a binary format, see the


 o.a.c.db.RowMutation.getSerializedBuffer() (If you want to tail it for


 analysis you may want to change commitlog_sync in cassandra.yaml)



 Here is post about looking at multiple versions columns in an


 sstable http://thelastpickle.com/2011/05/15/Deletes-and-Tombstones/



 Remember that not all versions of a column are written to disk


  (see http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/).


 Also



Re: row_cache_provider = 'SerializingCacheProvider'

2012-06-04 Thread ruslan usifov
I have setup 5GB of JavaHeap wit follow tuning:

MAX_HEAP_SIZE=5G
HEAP_NEWSIZE=800M

JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC
JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC
JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled
JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8
JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=5
JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=65
JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly
JVM_OPTS=$JVM_OPTS -XX:CMSFullGCsBeforeCompaction=1

Also I set up 2GB to memtables (memtable_total_space_in_mb: 2048)


My avg heap usage (nodetool -h localhost info):

3G


Based on nodetool -h localhost cfhistograms i calc avg row size

70KB

I setup row cache only for one CF with follow settings:

update column family building with rows_cached=1 and
row_cache_provider='SerializingCacheProvider';


When i setup row cache i got promotion failure in GC (with stop the
world pause about 30secs) with almost HEAP filled. I very confused
with this behavior.


PS: i use cassandra 1.0.10, with JNA 3.4.0 on ubuntu lucid (kernel 2.6.32-41)


2012/6/4 aaron morton aa...@thelastpickle.com:
 Yes SerializingCacheProvider is the off heap caching provider.

 Can you do some more digging into what is using the heap ?

 Cheers
 A

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 1/06/2012, at 9:52 PM, ruslan usifov wrote:

 Hello

 I begin use SerializingCacheProvider for rows cashing, and got
 extremely JAVA heap grows. But i think that this cache provider
 doesn't use JAVA heap




Which client to use for Cassandra real time insertion and retrieval

2012-06-04 Thread Prakrati Agrawal
Dear all,

I am trying to explore Cassandra for real time applications. Can you please 
suggest me which client is the best to use ? Is the client choice based on the 
user 's comfort level or on use cases.

Thanks and Regards

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.com



This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.


RPM of Cassandra 1.1.0

2012-06-04 Thread Adeel Akbar
Hi,

 

I need to install Apache Cassandra 1.1.0 from RPM.  Please provide me link
to download rpm for CentOS.

 

Thanks  Regards

 

Adeel Akbar



repair

2012-06-04 Thread Tamar Fraenkel
Hi!
I apologize if for this naive question.
When I run nodetool repair, is it enough to run on one of the nodes, or do
I need to run on each one of them?
Thanks

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956
tokLogo.png

RE: repair

2012-06-04 Thread Rishabh Agrawal
Hello,

As far as my knowledge goes, it works per node basis. So you have to run on 
different nodes. I would suggest you to not to execute it simultaneously on all 
nodes in a production environment.

Regards
Rishabh Agrawal

From: Tamar Fraenkel [mailto:ta...@tok-media.com]
Sent: Monday, June 04, 2012 4:25 AM
To: user@cassandra.apache.org
Subject: repair

Hi!
I apologize if for this naive question.
When I run nodetool repair, is it enough to run on one of the nodes, or do I 
need to run on each one of them?
Thanks

Tamar Fraenkel
Senior Software Engineer, TOK Media
[Inline image 1]

ta...@tok-media.commailto:ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956






Register for Impetus webinar 'User Experience Design for iPad Applications' 
June 8(10:00am PT). http://lf1.me/f9/

Impetus' Head of Labs to present on 'Integrating Big Data technologies in your 
IT portfolio' at Cloud Expo, NY (June 11-14). Contact us for a complimentary 
pass.Impetus also sponsoring the Yahoo Summit 2012.


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.
inline: image001.png

RE repair

2012-06-04 Thread Samuel CARRIERE
Hi,
It is not enough to run the repair in one node, except if the node contain 
all the data (ex : 3 node cluster with RF=3).
In the general case, the best is to launch the repair in every node, with 
the -rp option (use -rp to repair only the first range returned by the 
partitioner)





Tamar Fraenkel ta...@tok-media.com 
04/06/2012 13:24
Veuillez répondre à
user@cassandra.apache.org


A
user@cassandra.apache.org
cc

Objet
repair






Hi!
I apologize if for this naive question.
When I run nodetool repair, is it enough to run on one of the nodes, or do 
I need to run on each one of them?
Thanks

Tamar Fraenkel 
Senior Software Engineer, TOK Media 



ta...@tok-media.com
Tel:   +972 2 6409736 
Mob:  +972 54 8356490 
Fax:   +972 2 5612956 





RE RPM of Cassandra 1.1.0

2012-06-04 Thread Samuel CARRIERE
Hi,
The RPM from datastax : http://rpm.datastax.com/community/noarch/
apache-cassandra11-1.1.0-2.noarch.rpm 

Regards,
Samuel




Adeel Akbar adeel.ak...@panasiangroup.com 
04/06/2012 13:20
Veuillez répondre à
user@cassandra.apache.org


A
user@cassandra.apache.org
cc

Objet
RPM of Cassandra 1.1.0






Hi,
 
I need to install Apache Cassandra 1.1.0 from RPM.  Please provide me link 
to download rpm for CentOS.
 
Thanks  Regards
 
Adeel Akbar


Re: repair

2012-06-04 Thread Tamar Fraenkel
Thanks.

I actually did just that with cron jobs running on different hours.

I asked the question because I saw that when one of the logs was running
the repair, all nodes logged some repair related entries in /var/log/
cassandra/system.log

Thanks again,
*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Mon, Jun 4, 2012 at 2:35 PM, Rishabh Agrawal 
rishabh.agra...@impetus.co.in wrote:

  Hello,



 As far as my knowledge goes, it works per node basis. So you have to run
 on different nodes. I would suggest you to not to execute it simultaneously
 on all nodes in a production environment.



 Regards

 Rishabh Agrawal



 *From:* Tamar Fraenkel [mailto:ta...@tok-media.com]
 *Sent:* Monday, June 04, 2012 4:25 AM
 *To:* user@cassandra.apache.org
 *Subject:* repair



 Hi!

 I apologize if for this naive question.

 When I run nodetool repair, is it enough to run on one of the nodes, or
 do I need to run on each one of them?

 Thanks


   *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 [image: Inline image 1]


 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956







 --

 Register for Impetus webinar ‘User Experience Design for iPad
 Applications’ June 8(10:00am PT). http://lf1.me/f9/

 Impetus’ Head of Labs to present on ‘Integrating Big Data technologies in
 your IT portfolio’ at Cloud Expo, NY (June 11-14). Contact us for a
 complimentary pass.Impetus also sponsoring the Yahoo Summit 2012.


 NOTE: This message may contain information that is confidential,
 proprietary, privileged or otherwise protected by law. The message is
 intended solely for the named addressee. If received in error, please
 destroy and notify the sender. Any use of this email is prohibited when
 received in error. Impetus does not represent, warrant and/or guarantee,
 that the integrity of this communication has been maintained nor that the
 communication is free of errors, virus, interception or interference.

tokLogo.pngimage001.png

Re: repair

2012-06-04 Thread Romain HARDOUIN
Run repair -pr in your cron.

Tamar Fraenkel ta...@tok-media.com a écrit sur 04/06/2012 13:44:32 :

 Thanks. 
 
 I actually did just that with cron jobs running on different hours.
 
 I asked the question because I saw that when one of the logs was 
 running the repair, all nodes logged some repair related entries in 
 /var/log/cassandra/system.log
 
 Thanks again,
 Tamar Fraenkel 
 Senior Software Engineer, TOK Media 

Which client to use for Cassandra real time insertion and retrieval

2012-06-04 Thread Samuel CARRIERE
I'm assuming you are looking for a java client.
From my own experience, Hector is a good client, that can be used in real 
time applications (it supports connexion pooling and automatic retries).
But I would suggest to have a look at astyanax from netflix 
(https://github.com/Netflix/astyanax). I didn't have the opportunity to 
use it, 
but it seems VERY good.

Regards,
Samuel

Re: repair

2012-06-04 Thread R. Verlangen
The repair -pr only repairs the nodes primary range: so is only usefull
in day to day use. When you're recovering from a crash use it without -pr.

2012/6/4 Romain HARDOUIN romain.hardo...@urssaf.fr


 Run repair -pr in your cron.

 Tamar Fraenkel ta...@tok-media.com a écrit sur 04/06/2012 13:44:32 :

  Thanks.
 
  I actually did just that with cron jobs running on different hours.
 
  I asked the question because I saw that when one of the logs was
  running the repair, all nodes logged some repair related entries in
  /var/log/cassandra/system.log
 
  Thanks again,
  Tamar Fraenkel
  Senior Software Engineer, TOK Media




-- 
With kind regards,

Robin Verlangen
*Software engineer*
*
*
W www.robinverlangen.nl
E ro...@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.


Re: repair

2012-06-04 Thread Tamar Fraenkel
Thank you all!
*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Mon, Jun 4, 2012 at 3:16 PM, R. Verlangen ro...@us2.nl wrote:

 The repair -pr only repairs the nodes primary range: so is only usefull
 in day to day use. When you're recovering from a crash use it without -pr.


 2012/6/4 Romain HARDOUIN romain.hardo...@urssaf.fr


 Run repair -pr in your cron.

 Tamar Fraenkel ta...@tok-media.com a écrit sur 04/06/2012 13:44:32 :

  Thanks.
 
  I actually did just that with cron jobs running on different hours.
 
  I asked the question because I saw that when one of the logs was
  running the repair, all nodes logged some repair related entries in
  /var/log/cassandra/system.log
 
  Thanks again,
  Tamar Fraenkel
  Senior Software Engineer, TOK Media




 --
 With kind regards,

 Robin Verlangen
 *Software engineer*
 *
 *
 W www.robinverlangen.nl
 E ro...@us2.nl

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.


tokLogo.png

Re: Integration Testing for Cassandra

2012-06-04 Thread David McNelis
That article is a good starting point.  To make your life a bit easier,
consider checking out CassandraUnit that provides facilities to load
example data in a variety of ways.

https://github.com/jsevellec/cassandra-unit

Then you just need to be able to pass in which cassandra instance to
connect to before you execute your code (embedded versus external
environment).

On Mon, Jun 4, 2012 at 12:10 AM, Eran Chinthaka Withana 
eran.chinth...@gmail.com wrote:

 Hi,

 I want to write integration tests related to my cassandra code where
 instead of accessing production clusters I should be able to start an
 embedded cassandra instance, within my unit test code, populate some data
 and run test cases.

 I found this[1]  as the closest to what I'm looking for (I prefer to use
 thrift API so didn't even think about using storage proxy API). I'm using
 Hector 1.0.x as my client to connect my cassandra 1.0.x clusters. Before I
 go ahead and use it, is this the recommended way to test Cassandra related
 client code? Are there any test utils already in Cassandra code base? I
 really appreciate if someone can shed some light here.

 [1]
 http://prettyprint.me/2010/02/14/running-cassandra-as-an-embedded-service/

 Thanks,
 Eran Chinthaka Withana



RE: repair

2012-06-04 Thread Viktor Jevdokimov
Why without -PR when recovering from crash?

Repair without -PR runs full repair of the cluster, the node which receives a 
command is a repair controller, ALL nodes synchronizes replicas at the same 
time, streaming data between each other.
The problems may arise:

· When streaming hangs (it tends to hang even on a stable network), 
repair session hangs (any version does re-stream?)

· Network will be highly saturated

· In case of high inconsistency some nodes may receive a lot of data, 
disk usage much more than 2x (depends on RF)

· A lot of compactions will be pending

IMO, best way to run repair is from script with -PR for single CF from single 
node at a time and monitoring progress, like:
repair -pr node1 ks1 cf1
repair -pr node2 ks1 cf1
repair -pr node3 ks1 cf1
repair -pr node1 ks1 cf2
repair -pr node2 ks1 cf2
repair -pr node3 ks1 cf2
With some progress or other control in between, your choice.

Use repair with care, do not let your cluster go down.





Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: R. Verlangen [mailto:ro...@us2.nl]
Sent: Monday, June 04, 2012 15:17
To: user@cassandra.apache.org
Subject: Re: repair

The repair -pr only repairs the nodes primary range: so is only usefull in 
day to day use. When you're recovering from a crash use it without -pr.
2012/6/4 Romain HARDOUIN 
romain.hardo...@urssaf.frmailto:romain.hardo...@urssaf.fr

Run repair -pr in your cron.

Tamar Fraenkel ta...@tok-media.commailto:ta...@tok-media.com a écrit sur 
04/06/2012 13:44:32 :

 Thanks.

 I actually did just that with cron jobs running on different hours.

 I asked the question because I saw that when one of the logs was
 running the repair, all nodes logged some repair related entries in
 /var/log/cassandra/system.log

 Thanks again,
 Tamar Fraenkel
 Senior Software Engineer, TOK Media



--
With kind regards,

Robin Verlangen
Software engineer

W www.robinverlangen.nlhttp://www.robinverlangen.nl
E ro...@us2.nlmailto:ro...@us2.nl

Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

inline: signature-logo29.png

Re: repair

2012-06-04 Thread Tamar Fraenkel
Thanks, one more question. On regular basis, should I run repair for the
system keyspace?

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Mon, Jun 4, 2012 at 5:02 PM, Viktor Jevdokimov 
viktor.jevdoki...@adform.com wrote:

  Why without –PR when recovering from crash?

 ** **

 Repair without –PR runs full repair of the cluster, the node which
 receives a command is a repair controller, ALL nodes synchronizesreplicas at 
 the same time, streaming data between each other.
 

 The problems may arise:

 **· **When streaming hangs (it tends to hang even on a stable
 network), repair session hangs (any version does re-stream?)

 **· **Network will be highly saturated

 **· **In case of high inconsistency some nodes may receive a lot
 of data, disk usage much more than 2x (depends on RF)

 **· **A lot of compactions will be pending

 ** **

 IMO, best way to run repair is from script with –PR for single CF from
 single node at a time and monitoring progress, like:

 repair -pr node1 ks1 cf1

 repair -pr node2 ks1 cf1

 repair -pr node3 ks1 cf1

 repair -pr node1 ks1 cf2

 repair -pr node2 ks1 cf2

 repair -pr node3 ks1 cf2

 With some progress or other control in between, your choice.

 ** **

 Use repair with care, do not let your cluster go down.

 ** **

 ** **

 ** **


Best regards / Pagarbiai
 *Viktor Jevdokimov*
 Senior Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider
 What is Adform: watch this short video http://vimeo.com/adform/display
  [image: Adform News] http://www.adform.com

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

   *From:* R. Verlangen [mailto:ro...@us2.nl]
 *Sent:* Monday, June 04, 2012 15:17
 *To:* user@cassandra.apache.org
 *Subject:* Re: repair

 ** **

 The repair -pr only repairs the nodes primary range: so is only usefullin 
 day to day use. When you're recovering from a crash use it without -
 pr.

 2012/6/4 Romain HARDOUIN romain.hardo...@urssaf.fr


 Run repair -pr in your cron.

 Tamar Fraenkel ta...@tok-media.com a écrit sur 04/06/2012 13:44:32 :

  Thanks.  

 
  I actually did just that with cron jobs running on different hours.
 
  I asked the question because I saw that when one of the logs was
  running the repair, all nodes logged some repair related entries in
  /var/log/cassandra/system.log
 
  Thanks again,
  Tamar Fraenkel
  Senior Software Engineer, TOK Media 



 

 ** **

 --
 With kind regards,

 ** **

 Robin Verlangen

 *Software engineer*

 ** **

 W www.robinverlangen.nl

 E ro...@us2.nl

 ** **

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

 ** **

signature-logo29.pngtokLogo.png

Replication factor via hector

2012-06-04 Thread Roshni Rajagopal
Hi ,

   I'm trying to see the effect of different replication factors and 
consistency levels for a keyspace on a 4 node cassandra cluster.

I'm doing this using hector client.
I could not find an api to set replication factor for a keyspace though I could 
find ways to modify consistency level.

Is it possible to change replication factor using hector or does it have to be 
done using CLI?

Regards,
Roshni

This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
***


Re: batch isolation

2012-06-04 Thread Todd Burruss
I don't think I'm being clear.  I just was wondering if a row delete is
isolated with all the other inserts or deletes to a specific column family
and key in the same batch.

On 6/4/12 1:58 AM, Sylvain Lebresne sylv...@datastax.com wrote:

On Sun, Jun 3, 2012 at 6:05 PM, Todd Burruss bburr...@expedia.com wrote:
 I just meant there is a row delete in the same batch as inserts - all
to
 the same column family and key

Then it's the timestamp that will decide what happens. Whatever has a
timestamp lower or equal to the tombstone timestamp will be deleted
(that stands for insert in the batch itself).

--
Sylvain




 -Original Message-
 From: Sylvain Lebresne [sylv...@datastax.com]
 Received: Sunday, 03 Jun 2012, 3:44am
 To: user@cassandra.apache.org [user@cassandra.apache.org]
 Subject: Re: batch isolation

 On Sun, Jun 3, 2012 at 2:53 AM, Todd Burruss bburr...@expedia.com
wrote:
 1 ­ does this mean that a batch_mutate that first sends a row delete
 mutation on key X, then subsequent insert mutations for key X is
isolated?

 I'm not sure what you mean by having a batch_mutate that first sends
 ... then ..., since a batch_mutate is a single API call.

 2 ­ does isolation span column families for the same key  within the
same
 batch_mutate?

 No, it doesn't span column families (contrarily to atomicity). There
 is more details in
 http://www.datastax.com/dev/blog/row-level-isolation.

 --
 Sylvain



Re: 1.1 not removing commit log files?

2012-06-04 Thread aaron morton
Apply the local hint mutation follows the same code path and regular mutations. 

When the commit log is being truncated you should see flush activity, logged 
from the ColumnFamilyStore with Enqueuing flush of  messages. 

If you set DEBUG logging for the  org.apache.cassandra.db.ColumnFamilyStore it 
will log if it things the CF is clean and no flush takes place. 

If you set DEBUG logging on org.apache.cassandra.db.commitlog.CommitLog we will 
see if the commit log file could not be deleted because a dirty CF was not 
flushed. 

Cheers
A


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 2/06/2012, at 4:43 AM, Rob Coli wrote:

 On Thu, May 31, 2012 at 7:01 PM, aaron morton aa...@thelastpickle.com wrote:
 But that talks about segments not being cleared at startup. Does not explain
 why they were allowed to get past the limit in the first place.
 
 Perhaps the commit log size tracking for this limit does not, for some
 reason, track hints? This seems like the obvious answer given the
 state which appears to trigger it? This doesn't explain why the files
 aren't getting deleted after the hints are delivered, of course...
 
 =Rob
 
 -- 
 =Robert Coli
 AIMGTALK - rc...@palominodb.com
 YAHOO - rcoli.palominob
 SKYPE - rcoli_palominodb



Re: Secondary Indexes, Quorum and Cluster Availability

2012-06-04 Thread aaron morton
IIRC index slices work a little differently with consistency, they need to have 
CL level nodes available for all token ranges. If you drop it to CL ONE the 
read is local only for a particular token range. 

The problem when doing index reads is the nodes that contain the results can no 
longer be selected by the partitioner. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 2/06/2012, at 5:15 AM, Jim Ancona wrote:

 Hi,
 
 We have an application with two code paths, one of which uses a secondary 
 index query and the other, which doesn't. While testing node down scenarios 
 in our cluster we got a result which surprised (and concerned) me, and I 
 wanted to find out if the behavior we observed is expected.
 
 Background:
 6 nodes in the cluster (in order: A, B, C, E, F and G)
 RF = 3
 All operations at QUORUM
 Operation 1: Read by row key followed by write
 Operation 2: Read by secondary index, followed by write
 While running a mixed workload of operations 1 and 2, we got the following 
 results:
 
 Scenario   Result
 All nodes up   All operations succeed
 One node down  All operations succeed
 Nodes A and E down All operations succeed
 Nodes A and B down Operation 1: ~33% fail
 Operation 2: All fail
 Nodes A and C down Operation 1: ~17% fail
 Operation 2: All fail
 
 We had expected (perhaps incorrectly) that the secondary index reads would 
 fail in proportion to the portion of the ring that was unable to reach 
 quorum, just as the row key reads did. For both operation types the 
 underlying failure was an UnavailableException.
 
 The same pattern repeated for the other scenarios we tried. The row key 
 operations failed at the expected ratios, given the portion of the ring that 
 was unable to meet quorum because of nodes down, while all the secondary 
 index reads failed as soon as 2 out of any 3 adjacent nodes were down.
 
 Is this an expected behavior? Is it documented anywhere? I didn't find it 
 with a quick search.
 
 The operation doing secondary index query is an important one for our app, 
 and we'd really prefer that it degrade gracefully in the face of cluster 
 failures. My plan at this point is to do that query at ConsistencyLevel.ONE 
 (and accept the increased risk of inconsistency). Will that work?
 
 Thanks in advance,
 
 Jim



Re: Can't delete from SCF wide row

2012-06-04 Thread aaron morton
Delete is a no look write operation, like normal writes. So it should not be 
directly causing a lot of memory allocation. 

It may be causing a lot of compaction activity, which due to the wide row may 
be throwing up lots of GC. 

Try the following to get through the deletions:

* disable compaction by setting min_compaction_level and max_compaction_level 
to 0 (via nodetool on current versions)

Once you have finished compaction
* lower the in_memory_compaction_limit in the yaml. 
* set concurrent_compactions to 2 in the yaml
* enable compaction again

Once everything has settled down restore the in_memory_compaction_limit and 
concurrent_compactions

Hope that helps. 

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 2/06/2012, at 7:53 AM, Rustam Aliyev wrote:

 Hi all,
 
 I have SCF with ~250K rows. One of these rows is relatively large - it's a 
 wide row (according to compaction logs) containing ~100.000 super columns and 
 overall size of 1GB. Each super column has average size of 10K and ~10 sub 
 columns.
 
 When I'm trying to delete ~90% of the columns in this particular row, 
 Cassandra nodes which own this wide row (3 of 5, RF=3) quickly run out of the 
 heap space. See logs from one of the hosts here:
 
 http://pastebin.com/raw.php?i=kwn7b3rP
 
 After that, all 3 nodes start flapping up/down and GC messages (like the one 
 in the bottom of the pastebin above) appearing in the logs. Cassandra never 
 repairs from this mode and the only way out if to kill -9 and start again. 
 On IRC it was suggested that it enters GC death spiral.
 
 I tried to throttle delete requests on the client side - sending batch of 100 
 delete requests each 500ms. So no more than 200 deletes/sec. But it didn't 
 help. I can reduce it further to 100/sec, but I don't think it will help much.
 
 I delete millions of columns from other row in this SCF at the same rate and 
 never have hit this problem. It only happens when I try to delete from this 
 particular wide row.
 
 So right now I don't know how can I delete these columns. Any ideas?
 
 
 Many thanks,
 Rustam.



Re: TimedOutException()

2012-06-04 Thread aaron morton
 Is the node we are connecting to try to proxy requests ? Wouldn't our
 configuration ensure all nodes have replicas ?
It can still time out even when reading locally. (The thread running the query 
is waiting on the read thread). 

Look in the server side logs to see if there are any errors. If you are getting 
a timeout in this situation I would guess either the node is heavily overloaded 
or you are asking for a lot of data from a wide row. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 2/06/2012, at 11:00 AM, Oleg Dulin wrote:

 Tyler Hobbs ty...@datastax.com wrote:
 On Fri, Jun 1, 2012 at 9:39 AM, Oleg Dulin oleg.du...@gmail.com wrote:
 
 Is my understanding correct that this is where cassandra is telling us it
 can't accomplish something within that timeout value -- as opposed to
 network timeout ? Where is it set ?
 
 That's correct.  Basically, the coordinator sees that a replica has not
 responded (or can not respond) before hitting a timeout.  This is
 controlled by rpc_timeout_in_ms in cassandra.yaml.
 
 --
 Tyler Hobbs
 DataStax a href=http://datastax.com/;http://datastax.com//a
 
 So if we are using random partitioner, and read consistency of one, what
 does that mean ? 
 
 We have a 3 node cluster, use write / read consistency of one, replication
 factor of 3. 
 
 Is the node we are connecting to try to proxy requests ? Wouldn't our
 configuration ensure all nodes have replicas ?
 



Re: Errors with Cassandra 1.0.10, 1.1.0, 1.1.1-SNAPSHOT and 1.2.0-SNAPSHOT

2012-06-04 Thread aaron morton
I remember someone have the file exists issue a few weeks ago, IIRC it 
magically went away. 

Do yo have steps to reproduce this fault ? If you can reproduce it on a release 
version please create a ticket on 
https://issues.apache.org/jira/browse/CASSANDRA and update the email thread.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 3/06/2012, at 2:56 PM, Horacio G. de Oro wrote:

 Permissions are ok. The writes works ok, and the data can be read.
 
 Thanks!
 Horacio
 
 
 
 On Sat, Jun 2, 2012 at 11:50 PM, Kirk True k...@mustardgrain.com wrote:
 Permissions problems on /var for the user running Cassandra?
 
 Sent from my iPhone
 
 On Jun 2, 2012, at 6:56 PM, Horacio G. de Oro hgde...@gmail.com wrote:
 
  Hi! While using Cassandra, I've seen this log messages when running some 
  test cases (which insert lots of columns in 4 rows).
  I've tryied Cassandra 1.0.10, 1.1.0, 1.1.1-SNAPSHOT and 1.2.0-SNAPSHOT 
  (built from git). I'm using the default configuration, Oracle jdk 1.6.0_32, 
  Ubuntu 12.04 and pycassa.
 
  Since I'm very new to Cassandra (I'm just starting to learn it) I don't 
  know if I'm doing something wrong, or maybe there are some bugs in the 
  several versions of Cassandra I've tested.
 
  cassandra-1.0.10
 
   - IOException: unable to mkdirs
 
  cassandra-1.1.0
 
   - IOException: Unable to create directory
 
  cassandra-1.1.1-SNAPSHOT
 
   - IOException: Unable to create directory
 
  cassandra-1.2.0-SNAPSHOT
 
   - IOException: Unable to create directory
 
   - CLibrary.java (line 191) Unable to create hard link (...)
command output: ln: failed to create hard link 
  `(...)/lolog_tests-Logs_by_app-ia-3-Summary.db': File exists
 
 
 
  Thanks in advance!
  Horacio
 
 
  
  system-cassandra-1.0.10.log
  
 
  ERROR [MutationStage:1] 2012-06-02 20:37:41,115 
  AbstractCassandraDaemon.java (line 139) Fatal exception in thread 
  Thread[MutationStage:1,5,main]
  java.io.IOError: java.io.IOException: unable to mkdirs 
  /var/lib/cassandra/data/lolog_tests/snapshots/1338680261112-Logs_by_app
at 
  org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1433)
at 
  org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1462)
at 
  org.apache.cassandra.db.ColumnFamilyStore.truncate(ColumnFamilyStore.java:1657)
at 
  org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandler.java:50)
 
  ERROR [MutationStage:11] 2012-06-02 20:37:55,730 
  AbstractCassandraDaemon.java (line 139) Fatal exception in thread 
  Thread[MutationStage:11,5,main]
  java.io.IOError: java.io.IOException: unable to mkdirs 
  /var/lib/cassandra/data/lolog_tests/snapshots/1338680275729-Logs_by_app
at 
  org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1433)
at 
  org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1462)
at 
  org.apache.cassandra.db.ColumnFamilyStore.truncate(ColumnFamilyStore.java:1657)
at 
  org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandler.java:50)
 
  ERROR [MutationStage:19] 2012-06-02 20:37:57,395 
  AbstractCassandraDaemon.java (line 139) Fatal exception in thread 
  Thread[MutationStage:19,5,main]
  java.io.IOError: java.io.IOException: unable to mkdirs 
  /var/lib/cassandra/data/lolog_tests/snapshots/1338680277394-Logs_by_app
at 
  org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1433)
at 
  org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1462)
at 
  org.apache.cassandra.db.ColumnFamilyStore.truncate(ColumnFamilyStore.java:1657)
at 
  org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandler.java:50)
 
  ERROR [MutationStage:20] 2012-06-02 20:41:26,666 
  AbstractCassandraDaemon.java (line 139) Fatal exception in thread 
  Thread[MutationStage:20,5,main]
  java.io.IOError: java.io.IOException: unable to mkdirs 
  /var/lib/cassandra/data/lolog_tests/snapshots/133868048-Logs_by_app
at 
  org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1433)
at 
  org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1462)
at 
  org.apache.cassandra.db.ColumnFamilyStore.truncate(ColumnFamilyStore.java:1657)
at 
  org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandler.java:50)
 
  
  system-cassandra-1.1.0.log
  
 
  ERROR [MutationStage:1] 2012-06-02 20:45:15,609 
  AbstractCassandraDaemon.java (line 134) Exception in thread 
  Thread[MutationStage:1,5,main]
  java.io.IOError: java.io.IOException: Unable to 

[RELEASE] Apache Cassandra 1.1.1 released

2012-06-04 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 1.1.1.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is the first maintenance/bug fix release[1] on the 1.1 series. As
always, please pay attention to the release notes[2] and Let us know[3] if you
were to encounter any problem.

Enjoy!

[1]: http://goo.gl/4Dxae (CHANGES.txt)
[2]: http://goo.gl/ZE8ZK (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


RE: nodes moving spontaneously

2012-06-04 Thread Curt Allred
Thanks for the tip.  Checked nodetool ring on all nodes and they all have a 
consistent view of the ring.  We have had other problems like nodes crashing 
etc so anything could have happened, but we're sure we didnt issue a nodetool 
move command.

From: Tyler Hobbs [mailto:ty...@datastax.com]

OpsCenter just periodically calls describe_ring() on different nodes in the 
cluster, so that's how it's getting that information.

Maybe try running nodetool ring on each node in your cluster to make sure they 
all have the same view of the ring?



Re: row_cache_provider = 'SerializingCacheProvider'

2012-06-04 Thread ruslan usifov
I think that SerializingCacheProvider have more JAVA HEAP footprint,
then i think

2012/6/4 ruslan usifov ruslan.usi...@gmail.com:
 I have setup 5GB of JavaHeap wit follow tuning:

 MAX_HEAP_SIZE=5G
 HEAP_NEWSIZE=800M

 JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC
 JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC
 JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled
 JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8
 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=5
 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=65
 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly
 JVM_OPTS=$JVM_OPTS -XX:CMSFullGCsBeforeCompaction=1

 Also I set up 2GB to memtables (memtable_total_space_in_mb: 2048)


 My avg heap usage (nodetool -h localhost info):

 3G


 Based on nodetool -h localhost cfhistograms i calc avg row size

 70KB

 I setup row cache only for one CF with follow settings:

 update column family building with rows_cached=1 and
 row_cache_provider='SerializingCacheProvider';


 When i setup row cache i got promotion failure in GC (with stop the
 world pause about 30secs) with almost HEAP filled. I very confused
 with this behavior.


 PS: i use cassandra 1.0.10, with JNA 3.4.0 on ubuntu lucid (kernel 2.6.32-41)


 2012/6/4 aaron morton aa...@thelastpickle.com:
 Yes SerializingCacheProvider is the off heap caching provider.

 Can you do some more digging into what is using the heap ?

 Cheers
 A

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 1/06/2012, at 9:52 PM, ruslan usifov wrote:

 Hello

 I begin use SerializingCacheProvider for rows cashing, and got
 extremely JAVA heap grows. But i think that this cache provider
 doesn't use JAVA heap




Re: memory issue on 1.1.0

2012-06-04 Thread aaron morton
Had a look at the log, this message 

 INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 
 2772) Unable to reduce heap usage since there are no dirty column families
appears correct, it happens after some flush activity and there are not CF's 
with memtable data. But the heap is still full. 

Overall the server is overloaded, but it seems like it should be handling it 
better. 

What JVM settings do you have? What is the machine spec ? 
What settings do you have for key and row cache ? 
Do the CF's have secondary indexes ?
How many clients / requests per second ? 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 4/06/2012, at 11:12 AM, Poziombka, Wade L wrote:

 Running a very write intensive (new column, delete old column etc.) process 
 and failing on memory.  Log file attached.
 
 Curiously when I add new data I have never seen this have in past sent 
 hundreds of millions new transactions.  It seems to be when I modify.  my 
 process is as follows
 
 key slice to get columns to modify in batches of 100, in separate threads 
 modify those columns.  I advance the slice with the start key each with last 
 key in previous batch.  Mutations done are update a column value in one 
 column family(token), delete column and add new column in another (pan).
 
 Runs well until after about 5 million rows then it seems to run out of 
 memory.  Note that these column families are quite small.
 
 WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 145) 
 Heap is 0.7967470834946492 full.  You may need to reduce memtable and/or 
 cache sizes.  Cassandra will now flush up to the two largest memtables to 
 free up memory.  Adjust flush_largest_memtables_at threshold in 
 cassandra.yaml if you don't want Cassandra to do this automatically
 INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 
 2772) Unable to reduce heap usage since there are no dirty column families
 INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) 
 InetAddress /10.230.34.170 is now UP
 INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java (line 122) 
 GC for ParNew: 206 ms for 1 collections, 7345969520 used; max is 8506048512
 INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 122) 
 GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 5714800208 used; max 
 is 8506048512
 
 
 Keyspace: keyspace
Read Count: 50042632
Read Latency: 0.23157864418482224 ms.
Write Count: 44948323
Write Latency: 0.019460829472992797 ms.
Pending Tasks: 0
Column Family: pan
SSTable count: 5
Space used (live): 1977467326
Space used (total): 1977467326
Number of Keys (estimate): 16334848
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 74
Read Count: 14985122
Read Latency: 0.408 ms.
Write Count: 19972441
Write Latency: 0.022 ms.
Pending Tasks: 0
Bloom Filter False Postives: 829
Bloom Filter False Ratio: 0.00073
Bloom Filter Space Used: 37048400
Compacted row minimum size: 125
Compacted row maximum size: 149
Compacted row mean size: 149
 
Column Family: token
SSTable count: 4
Space used (live): 1250973873
Space used (total): 1250973873
Number of Keys (estimate): 14217216
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 49
Read Count: 30059563
Read Latency: 0.167 ms.
Write Count: 14985488
Write Latency: 0.014 ms.
Pending Tasks: 0
Bloom Filter False Postives: 13642
Bloom Filter False Ratio: 0.00322
Bloom Filter Space Used: 28002984
Compacted row minimum size: 150
Compacted row maximum size: 258
Compacted row mean size: 224
 
Column Family: counters
SSTable count: 2
Space used (live): 561549994
Space used (total): 561549994
Number of Keys (estimate): 9985024
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 38
Read Count: 4997947
Read Latency: 0.092 ms.
Write Count: 9990394
Write Latency: 0.023 ms.
Pending Tasks: 0
Bloom Filter False Postives: 191
Bloom Filter False Ratio: 0.37525
Bloom Filter Space Used: 18741152
Compacted row 

Re: Cassandra upgrade from 0.8.1 to 1.1.0

2012-06-04 Thread aaron morton
In addition always read the NEWS.txt file in the distribution and glance at the 
CHANGES.txt file. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 4/06/2012, at 12:19 PM, Roshan wrote:

 Hi
 
 Hope this will help to you.
 
 http://www.datastax.com/docs/1.0/install/upgrading
 http://www.datastax.com/docs/1.1/install/upgrading
 
 Thanks.
 
 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-upgrade-from-0-8-1-to-1-1-0-tp7580198p7580210.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.



Re: Node join streaming stuck at 100%

2012-06-04 Thread aaron morton
Are their any errors in the logs about failed streaming ? 

If you are getting time outs 1.0.8 added a streaming socket timeout 
https://github.com/apache/cassandra/blob/trunk/CHANGES.txt#L323

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 4/06/2012, at 3:12 PM, koji wrote:

 
 aaron morton aaron at thelastpickle.com writes:
 
 
 Did you restart ? All good?
 Cheers
 
 
 -
 Aaron Morton
 Freelance Developer
 at aaronmorton
 http://www.thelastpickle.com
 
 
 On 27/04/2012, at 9:49 AM, Bryce Godfrey wrote:
 
 This is the second node I’ve joined to my cluster in the last few days, and 
 so far both have become stuck at 100% on a large file according to netstats.  
 This is on 1.0.9, is there anything I can do to make it move on besides 
 restarting Cassandra?  I don’t see any errors or warns in logs for 
 either server, and there is plenty of disk space.
 
  
 On the sender side I see this:
 
 Streaming to: /10.20.1.152
 
/opt/cassandra/data/MonitoringData/PropertyTimeline-hc-80540-Data.db 
 sections=1 progress=82393861085/82393861085 - 100%
 
  
 On the node joining I don’t see this file in netstats, and all pending 
 streams are sitting at 0%
 
  
  
 
 
 Hi
 we have the same problem (1.0.7) , our netstats log is like this:
 
 Mode: NORMAL
 Streaming to: /1.1.1.1
   /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3757-Data.db 
   sections=1234 progress=325/325 - 100%
   /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3641-Data.db 
   sections=4386 progress=0/1025272214 - 0%
   /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3761-Data.db 
   sections=2956 progress=0/17826723 - 0%
   /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3730-Data.db 
   sections=3792 progress=0/56066299 - 0%
   /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3760-Data.db 
   sections=4384 progress=0/90941161 - 0%
   /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3687-Data.db 
   sections=3958 progress=0/54729557 - 0%
   /mnt/ebs1/cassandra-data/data/NemoModel/OfflineMessage-hc-3762-Data.db 
   sections=766 progress=0/2605165 - 0%
 Streaming to: /1.1.1.2
   /mnt/ebs1/cassandra-data/data/NemoModel/OneWayFriend-hc-709-Data.db 
   sections=3228 progress=29175698/29175698 - 100%
   /mnt/ebs1/cassandra-data/data/NemoModel/OneWayFriend-hc-789-Data.db 
   sections=2102 progress=0/618938 - 0%
   /mnt/ebs1/cassandra-data/data/NemoModel/OneWayFriend-hc-765-Data.db 
   sections=3044 progress=0/1996687 - 0%
   /mnt/ebs1/cassandra-data/data/NemoModel/OneWayFriend-hc-788-Data.db 
   sections=2773 progress=0/1374636 - 0%
   /mnt/ebs1/cassandra-data/data/NemoModel/OneWayFriend-hc-729-Data.db 
   sections=3150 progress=0/22111512 - 0%
 Nothing streaming from /1.1.1.1
 Nothing streaming from /1.1.1.2
 Pool NameActive   Pending  Completed
 Commandsn/a 1   23825242
 Responses   n/a25   19644808
 
 
 After restart, the pending streams are cleared, but next time we do 
 nodetool repair -pr again, the pending still happened. And this always 
 happend on same node(we have total 12 nodes).
 
 koji
 
 



Re: Retrieving old data version for a given row

2012-06-04 Thread aaron morton
This is an old issue with sstable2json 
https://issues.apache.org/jira/browse/CASSANDRA-4054

Internally the tomstone is associated with the o.a.c.db.AbstractColumnContainer 
see o.a.c.db.RowMutation.delete() to see how a row level delete works. 

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 4/06/2012, at 9:58 PM, Felipe Schmidt wrote:

 I was taking a look at tombstones stored at SSTable and I noticed that if I 
 perform a key deletion, the tombstone doesn’t have any timestamp, he has this 
 appearance:
   “key”:[ ]
 In all the other deletions granularities the tombstone have a 
 timestamp.Without this information seems to be not possible to solve 
 conflicts when a insertion for the same key is done after this deletion. If 
 it happens, I think Cassandra will always delete this new information because 
 of this tombstone.
 I’m using a single node configuration and maybe it change how does tombstones 
 looks like.
 
 Thanks in advance.
 
 Regards,
 Felipe Mathias Schmidt
 (Computer Science UFRGS, RS, Brazil)
 
 
 
 
 
 2012/5/31 aaron morton aa...@thelastpickle.com
 -Is there any other way to stract the contect of SSTable, writing a
 java program for example instead of using sstable2json?
 Look at the code in sstale2json and copy it :)
 
 -I tried to get tombstons using the thrift API, but seems to be not
 possible, is it right? When I try, the program throws an exception.
 No. 
 Tombstones are not returned from API (See ColumnFamilyStore.getColumnFamily() 
 ). 
 You can see them if you use sstable2json.
 
 Cheers
 
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 30/05/2012, at 9:53 PM, Felipe Schmidt wrote:
 
 I have further questions:
 -Is there any other way to stract the contect of SSTable, writing a
 java program for example instead of using sstable2json?
 -I tried to get tombstons using the thrift API, but seems to be not
 possible, is it right? When I try, the program throws an exception.
 
 thanks in advance
 
 Regards,
 Felipe Mathias Schmidt
 (Computer Science UFRGS, RS, Brazil)
 
 
 
 
 2012/5/24 aaron morton aa...@thelastpickle.com:
 Ok... it's really strange to me that Cassandra doesn't support data
 versioning cause all of other key-value databases support it (at least
 those who I know).
 
 You can design it into your data model if you need it.
 
 
 I have one remaining question:
 -in the case that I have more than 1 SSTable in the disk for the same
 column but with different data versions, is it possible to make a
 
 query to get the old version instead of the newest one?
 
 No.
 There is only ever 1 value for a column.
 The older copies of the column in the SSTables are artefacts of immutable
 on disk structures.
 If you want to see what's inside an SSTable use bin/sstable2json
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 24/05/2012, at 9:42 PM, Felipe Schmidt wrote:
 
 Ok... it's really strange to me that Cassandra doesn't support data
 versioning cause all of other key-value databases support it (at least
 those who I know).
 
 I have one remaining question:
 -in the case that I have more than 1 SSTable in the disk for the same
 column but with different data versions, is it possible to make a
 query to get the old version instead of the newest one?
 
 Regards,
 Felipe Mathias Schmidt
 (Computer Science UFRGS, RS, Brazil)
 
 
 
 
 2012/5/16 Dave Brosius dbros...@mebigfatguy.com:
 
 You're in for a world of hurt going down that rabbit hole. If you truely
 
 want version data then you should think about changing your keying to
 
 perhaps be a composite key where key is of form
 
 
 NaturalKey/VersionId
 
 
 Or if you want the versioning at the column level, use composite columns
 
 with ColumnName/VersionId format
 
 
 
 
 
 On 05/16/2012 10:16 AM, Felipe Schmidt wrote:
 
 
 That was very helpfull, thank you very much!
 
 
 I still have some questions:
 
 -it is possible to make Cassandra keep old value data after flushing?
 
 The same question for the memTable, before flushing. Seems to me that
 
 when I update some tuple, the old data will be overwrited in memTable,
 
 even before flushing.
 
 -it is possible to scan values from the memtable, maybe using the
 
 so-called Thrift API? Using the client-api I can just see the newest
 
 data version, I can't see what's really happening with the memTable.
 
 
 I ask that cause what I'll try to do is a Change Data Capture to
 
 Cassandra and the answers will define what kind of aproaches I'm able
 
 to use.
 
 
 Thanks in advance.
 
 
 Regards,
 
 Felipe Mathias Schmidt
 
 (Computer Science UFRGS, RS, Brazil)
 
 
 
 2012/5/14 aaron mortonaa...@thelastpickle.com:
 
 
 Cassandra does not provide access to multiple versions of the same
 
 column.
 
 It is essentially implementation detail.
 
 
 All mutations are written to the commit log in a binary 

RE: 1.1 not removing commit log files?

2012-06-04 Thread Bryce Godfrey
I'll try to get some log files for this with DEBUG enabled.  Tough on 
production though.

From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Monday, June 04, 2012 11:15 AM
To: user@cassandra.apache.org
Subject: Re: 1.1 not removing commit log files?

Apply the local hint mutation follows the same code path and regular mutations.

When the commit log is being truncated you should see flush activity, logged 
from the ColumnFamilyStore with Enqueuing flush of  messages.

If you set DEBUG logging for the  org.apache.cassandra.db.ColumnFamilyStore it 
will log if it things the CF is clean and no flush takes place.

If you set DEBUG logging on org.apache.cassandra.db.commitlog.CommitLog we will 
see if the commit log file could not be deleted because a dirty CF was not 
flushed.

Cheers
A


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 2/06/2012, at 4:43 AM, Rob Coli wrote:


On Thu, May 31, 2012 at 7:01 PM, aaron morton 
aa...@thelastpickle.commailto:aa...@thelastpickle.com wrote:

But that talks about segments not being cleared at startup. Does not explain
why they were allowed to get past the limit in the first place.

Perhaps the commit log size tracking for this limit does not, for some
reason, track hints? This seems like the obvious answer given the
state which appears to trigger it? This doesn't explain why the files
aren't getting deleted after the hints are delivered, of course...

=Rob

--
=Robert Coli
AIMGTALK - rc...@palominodb.commailto:rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb



Mixing Ec2MultiregionSnitch with private network

2012-06-04 Thread Patrick Lu
Hi All,

Does anyone have experience on Cassandra deployment mixing with EC2 and own 
data center? 

We plan to use ec2multiregionsnitch to build a Cassandra cluster across EC2 
regions, and the same time to have a couple nodes (in the cluster) sitting in 
our own data center. 

Any comment whether it’s doable? 

Thanks.

Patrick. 

RE: memory issue on 1.1.0

2012-06-04 Thread Poziombka, Wade L
What JVM settings do you have?
-Xms8G
-Xmx8G
-Xmn800m
-XX:+HeapDumpOnOutOfMemoryError
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-Djava.rmi.server.hostname=127.0.0.1
-Djava.net.preferIPv4Stack=true
-Dcassandra-pidfile=cassandra.pid


What is the machine spec ?
It is an RH AS5 x64
16gb memory
2 CPU cores 2.8 Ghz

As it turns out it is somewhat wimpier than I thought.  While weak on it does 
have a good amount of memory.
It is paired with a larger machine.


What settings do you have for key and row cache ?
A: All the defaults. (yaml template attached);

Do the CF's have secondary indexes ?
A: Yes one has two.  One of them is used in the key slice used to get the row 
keys used to do the further mutations.

How many clients / requests per second ?
A: One client process with 10 threads connected to one of the two nodes in the 
cluster.  On thread reading the slice and putting work in a queue.  9 others 
reading from this queue and applying the mutations.  Mutations are completing 
at about 20,000/minute roughly.


From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Monday, June 04, 2012 4:17 PM
To: user@cassandra.apache.org
Subject: Re: memory issue on 1.1.0

Had a look at the log, this message

INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) 
Unable to reduce heap usage since there are no dirty column families
appears correct, it happens after some flush activity and there are not CF's 
with memtable data. But the heap is still full.

Overall the server is overloaded, but it seems like it should be handling it 
better.

What JVM settings do you have? What is the machine spec ?
What settings do you have for key and row cache ?
Do the CF's have secondary indexes ?
How many clients / requests per second ?

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 4/06/2012, at 11:12 AM, Poziombka, Wade L wrote:


Running a very write intensive (new column, delete old column etc.) process and 
failing on memory.  Log file attached.

Curiously when I add new data I have never seen this have in past sent hundreds 
of millions new transactions.  It seems to be when I modify.  my process is 
as follows

key slice to get columns to modify in batches of 100, in separate threads 
modify those columns.  I advance the slice with the start key each with last 
key in previous batch.  Mutations done are update a column value in one column 
family(token), delete column and add new column in another (pan).

Runs well until after about 5 million rows then it seems to run out of memory.  
Note that these column families are quite small.

WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 145) 
Heap is 0.7967470834946492 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) 
Unable to reduce heap usage since there are no dirty column families
INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) 
InetAddress /10.230.34.170 is now UP
INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java (line 122) GC 
for ParNew: 206 ms for 1 collections, 7345969520 used; max is 8506048512
INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 122) GC 
for ConcurrentMarkSweep: 12770 ms for 1 collections, 5714800208 used; max is 
8506048512


Keyspace: keyspace
   Read Count: 50042632
   Read Latency: 0.23157864418482224 ms.
   Write Count: 44948323
   Write Latency: 0.019460829472992797 ms.
   Pending Tasks: 0
   Column Family: pan
   SSTable count: 5
   Space used (live): 1977467326
   Space used (total): 1977467326
   Number of Keys (estimate): 16334848
   Memtable Columns Count: 0
   Memtable Data Size: 0
   Memtable Switch Count: 74
   Read Count: 14985122
   Read Latency: 0.408 ms.
   Write Count: 19972441
   Write Latency: 0.022 ms.
   Pending Tasks: 0
   Bloom Filter False Postives: 829
   Bloom Filter False Ratio: 0.00073
   Bloom Filter Space Used: 37048400
   Compacted row minimum size: 125
   Compacted row maximum size: 149
   Compacted row mean size: 149

   Column Family: token
   SSTable count: 4
   Space used (live): 1250973873
   Space used 

Re: about multitenant datamodel

2012-06-04 Thread Toru Inoko

IMHO a model that allows external users to create CF's is a bad one.


why do you think so? I'll let users create ristricted CFs, and limit a  
number of CFs which users create.

is it still a bad one?

On Thu, 31 May 2012 06:44:05 +0900, aaron morton aa...@thelastpickle.com  
wrote:


- Do a lot of keyspaces cause some problems? (If I have 1,000 users,  
cassandra creates 1,000 keyspaces…)

It's not keyspaces, but the number of column families.

Without storing any data each CF uses about 1MB of ram. When they start  
storing and reading data they use more.


IMHO a model that allows external users to create CF's is a bad one.

Hope that helps.
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 25/05/2012, at 12:52 PM, Toru Inoko wrote:


Hi, all.

I'm designing data api service(like cassandra.io but not using  
dedicated server for each user) on cassandra 1.1 on which users can do  
DML/DDL method like cql.

Followings are api which users can use( almost same to cassandra api).
- create/read/delete ColumnFamilies/Rows/Columns

Now I'm thinking about multitenant datamodel on that.
My data model like the following.
I'm going to prepare a keyspace for each user as a user's tenant space.

| keyspace1 | --- | column family |
|(for user1)|  |
  ...

| keyspace2 | --- | column family |
|(for user2)|  |
  ...

Followings are my question!
- Is this data model a good for multitenant?
- Do a lot of keyspaces cause some problems? (If I have 1,000 users,  
cassandra creates 1,000 keyspaces...)


please, help.
thank you in advance.

Toru Inoko.






--
---
SCSK株式会社
技術・品質・情報グループ 技術開発部
先端技術課

猪子 徹(Toru Inoko)
tel   : 03-6438-3544
mail  : in...@ms.scsk.jp
---



Re: Mixing Ec2MultiregionSnitch with private network

2012-06-04 Thread Chris Marino
Hi Patrick,

I'm not sure if it's doable, but I can tell you for sure that there are
lots differences in the way the networks will need to be set up.  If you've
got to secure client traffic, it's going to get even more complicated with
encrypted traffic, etc.

We did some performance testing and configuration testing
with Cassandra across regions using a virtual network (my company's
product).

Have a look at what we did. I think when you add in your own datacenter,
things are going to get even more complicated.  One of the nice things
about using a virtual network in EC2 is that you can set up multiple
network interfaces so you don't have to use the the multi-region snitch.
 These interfaces are also clever about using the real and the NAT'ed EC2
interfaces for cluster traffic (better performance and $0 EC2 data
bandwidth costs), so things can be set up just like in your own datacetner
without worrying about EC2's public/private IPs, NATing, etc.

You can read about what we did on our blog.

http://blog.vcider.com/2011/09/running-cassandra-on-a-virtual-network-in-ec2/

and

http://blog.vcider.com/2011/09/virtual-networks-can-run-cassandra-up-to-60-faster/

Let me know if you have any questions.
CM


On Mon, Jun 4, 2012 at 3:27 PM, Patrick Lu kuma...@hotmail.com wrote:

   Hi All,

 Does anyone have experience on Cassandra deployment mixing with EC2 and
 own data center?

 We plan to use ec2multiregionsnitch to build a Cassandra cluster across
 EC2 regions, and the same time to have a couple nodes (in the cluster)
 sitting in our own data center.

 Any comment whether it’s doable?

 Thanks.

 Patrick.



RE: memory issue on 1.1.0

2012-06-04 Thread Poziombka, Wade L
I have repeated the test on two quite large machines 12 core, 64 GB as5 boxes 
and still observed the problem.  Interestingly about at the same point.

Anything I can monitor... perhaps I'll hook the Yourkit profiler up to it to 
see if there is some kind of leak?

Wade

From: Poziombka, Wade L
Sent: Monday, June 04, 2012 7:23 PM
To: user@cassandra.apache.org
Subject: RE: memory issue on 1.1.0

What JVM settings do you have?
-Xms8G
-Xmx8G
-Xmn800m
-XX:+HeapDumpOnOutOfMemoryError
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-Djava.rmi.server.hostname=127.0.0.1
-Djava.net.preferIPv4Stack=true
-Dcassandra-pidfile=cassandra.pid


What is the machine spec ?
It is an RH AS5 x64
16gb memory
2 CPU cores 2.8 Ghz

As it turns out it is somewhat wimpier than I thought.  While weak on it does 
have a good amount of memory.
It is paired with a larger machine.


What settings do you have for key and row cache ?
A: All the defaults. (yaml template attached);

Do the CF's have secondary indexes ?
A: Yes one has two.  One of them is used in the key slice used to get the row 
keys used to do the further mutations.

How many clients / requests per second ?
A: One client process with 10 threads connected to one of the two nodes in the 
cluster.  On thread reading the slice and putting work in a queue.  9 others 
reading from this queue and applying the mutations.  Mutations are completing 
at about 20,000/minute roughly.


From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Monday, June 04, 2012 4:17 PM
To: user@cassandra.apache.org
Subject: Re: memory issue on 1.1.0

Had a look at the log, this message

INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) 
Unable to reduce heap usage since there are no dirty column families
appears correct, it happens after some flush activity and there are not CF's 
with memtable data. But the heap is still full.

Overall the server is overloaded, but it seems like it should be handling it 
better.

What JVM settings do you have? What is the machine spec ?
What settings do you have for key and row cache ?
Do the CF's have secondary indexes ?
How many clients / requests per second ?

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 4/06/2012, at 11:12 AM, Poziombka, Wade L wrote:

Running a very write intensive (new column, delete old column etc.) process and 
failing on memory.  Log file attached.

Curiously when I add new data I have never seen this have in past sent hundreds 
of millions new transactions.  It seems to be when I modify.  my process is 
as follows

key slice to get columns to modify in batches of 100, in separate threads 
modify those columns.  I advance the slice with the start key each with last 
key in previous batch.  Mutations done are update a column value in one column 
family(token), delete column and add new column in another (pan).

Runs well until after about 5 million rows then it seems to run out of memory.  
Note that these column families are quite small.

WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 145) 
Heap is 0.7967470834946492 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) 
Unable to reduce heap usage since there are no dirty column families
INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) 
InetAddress /10.230.34.170 is now UP
INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java (line 122) GC 
for ParNew: 206 ms for 1 collections, 7345969520 used; max is 8506048512
INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 122) GC 
for ConcurrentMarkSweep: 12770 ms for 1 collections, 5714800208 used; max is 
8506048512


Keyspace: keyspace
   Read Count: 50042632
   Read Latency: 0.23157864418482224 ms.
   Write Count: 44948323
   Write Latency: 0.019460829472992797 ms.
   Pending Tasks: 0
   Column Family: pan
   SSTable count: 5
   Space used (live): 1977467326
   Space used (total): 1977467326
   Number of Keys (estimate): 16334848
   Memtable Columns Count: 0
   Memtable Data Size: 0
   Memtable Switch Count: 74
   Read Count: 14985122
   Read Latency: 0.408 ms.
   Write Count: 19972441
   Write Latency: 0.022 ms.
   Pending Tasks: 0
   Bloom Filter False 

Re: memory issue on 1.1.0

2012-06-04 Thread Brandon Williams
Perhaps the deletes: https://issues.apache.org/jira/browse/CASSANDRA-3741

-Brandon

On Sun, Jun 3, 2012 at 6:12 PM, Poziombka, Wade L
wade.l.poziom...@intel.com wrote:
 Running a very write intensive (new column, delete old column etc.) process 
 and failing on memory.  Log file attached.

 Curiously when I add new data I have never seen this have in past sent 
 hundreds of millions new transactions.  It seems to be when I modify.  my 
 process is as follows

 key slice to get columns to modify in batches of 100, in separate threads 
 modify those columns.  I advance the slice with the start key each with last 
 key in previous batch.  Mutations done are update a column value in one 
 column family(token), delete column and add new column in another (pan).

 Runs well until after about 5 million rows then it seems to run out of 
 memory.  Note that these column families are quite small.

 WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 145) 
 Heap is 0.7967470834946492 full.  You may need to reduce memtable and/or 
 cache sizes.  Cassandra will now flush up to the two largest memtables to 
 free up memory.  Adjust flush_largest_memtables_at threshold in 
 cassandra.yaml if you don't want Cassandra to do this automatically
  INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 
 2772) Unable to reduce heap usage since there are no dirty column families
  INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) 
 InetAddress /10.230.34.170 is now UP
  INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java (line 122) 
 GC for ParNew: 206 ms for 1 collections, 7345969520 used; max is 8506048512
  INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 122) 
 GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 5714800208 used; max 
 is 8506048512

 
 Keyspace: keyspace
        Read Count: 50042632
        Read Latency: 0.23157864418482224 ms.
        Write Count: 44948323
        Write Latency: 0.019460829472992797 ms.
        Pending Tasks: 0
                Column Family: pan
                SSTable count: 5
                Space used (live): 1977467326
                Space used (total): 1977467326
                Number of Keys (estimate): 16334848
                Memtable Columns Count: 0
                Memtable Data Size: 0
                Memtable Switch Count: 74
                Read Count: 14985122
                Read Latency: 0.408 ms.
                Write Count: 19972441
                Write Latency: 0.022 ms.
                Pending Tasks: 0
                Bloom Filter False Postives: 829
                Bloom Filter False Ratio: 0.00073
                Bloom Filter Space Used: 37048400
                Compacted row minimum size: 125
                Compacted row maximum size: 149
                Compacted row mean size: 149

                Column Family: token
                SSTable count: 4
                Space used (live): 1250973873
                Space used (total): 1250973873
                Number of Keys (estimate): 14217216
                Memtable Columns Count: 0
                Memtable Data Size: 0
                Memtable Switch Count: 49
                Read Count: 30059563
                Read Latency: 0.167 ms.
                Write Count: 14985488
                Write Latency: 0.014 ms.
                Pending Tasks: 0
                Bloom Filter False Postives: 13642
                Bloom Filter False Ratio: 0.00322
                Bloom Filter Space Used: 28002984
                Compacted row minimum size: 150
                Compacted row maximum size: 258
                Compacted row mean size: 224

                Column Family: counters
                SSTable count: 2
                Space used (live): 561549994
                Space used (total): 561549994
                Number of Keys (estimate): 9985024
                Memtable Columns Count: 0
                Memtable Data Size: 0
                Memtable Switch Count: 38
                Read Count: 4997947
                Read Latency: 0.092 ms.
                Write Count: 9990394
                Write Latency: 0.023 ms.
                Pending Tasks: 0
                Bloom Filter False Postives: 191
                Bloom Filter False Ratio: 0.37525
                Bloom Filter Space Used: 18741152
                Compacted row minimum size: 125
                Compacted row maximum size: 179
                Compacted row mean size: 150

 


Re: How to use Hector to retrieve data from Cassandra

2012-06-04 Thread Toru Inoko

Please refer following url. You can find some example of how to use hector
https://github.com/zznate/hector-examples/tree/master/src/main/java/com/riptano/cassandra/hector/example

Toru

On Tue, 05 Jun 2012 13:08:31 +0900, Prakrati Agrawal  
prakrati.agra...@mu-sigma.com wrote:



Dear all,

I am unable to find a good elaborate example on how to use Hector to get  
data stored in Cassandra. Please help me.


Thanks and Regards

Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |  
www.mu-sigma.com




This email message may contain proprietary, private and confidential  
information. The information transmitted is intended only for the  
person(s) or entities to which it is addressed. Any review,  
retransmission, dissemination or other use of, or taking of any action  
in reliance upon, this information by persons or entities other than the  
intended recipient is prohibited and may be illegal. If you received  
this in error, please contact the sender and delete the message from  
your system.


Mu Sigma takes all reasonable steps to ensure that its electronic  
communications are free from viruses. However, given Internet  
accessibility, the Company cannot accept liability for any virus  
introduced by this e-mail or any attachment and you are advised to use  
up-to-date virus checking software.



--
---
SCSK Corporation

Toru Inoko
tel   : 03-6438-3544
mail  : in...@ms.scsk.jp
---



nodetool repair -pr enough in this scenario?

2012-06-04 Thread David Daeschler
Hello,

Currently I have a 4 node cassandra cluster on CentOS64. I have been
running nodetool repair (no -pr option) on a weekly schedule like:

Host1: Tue, Host2: Wed, Host3: Thu, Host4: Fri

In this scenario, if I were to add the -pr option, would this still be
sufficient to prevent forgotten deletes and properly maintain consistency?

Thank you,
- David