Re: portability between enterprise and community version

2012-06-13 Thread R. Verlangen
@Viktor: I've read/heard this many times before, however I've never seen a
real explanation. Java is cross platform. If Cassandra runs properly on
both Linux as Windows clusters: why would it be impossible to communicate?
Of course I understand the disadvantages of having a combined cluster.

2012/6/13 Viktor Jevdokimov viktor.jevdoki...@adform.com

  Do not mix Linux and Windows nodes.

 ** **

 ** **


Best regards / Pagarbiai
 *Viktor Jevdokimov*
 Senior Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider
 What is Adform: watch this short video http://vimeo.com/adform/display
  [image: Adform News] http://www.adform.com

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

   *From:* Abhijit Chanda [mailto:abhijit.chan...@gmail.com]
 *Sent:* Wednesday, June 13, 2012 09:21
 *To:* user@cassandra.apache.org
 *Subject:* portability between enterprise and community version

 ** **

 Hi All,

 ** **

 Is it possible to communicate from a  datastax enterprise edition to
 datastax community edition.

 Actually i want to set one of my node in linux box and other in windows.
 Please suggest.

 ** **


 

 With Regards,

 --
 Abhijit Chanda
 VeHere Interactive Pvt. Ltd.
 +91-974395




-- 
With kind regards,

Robin Verlangen
*Software engineer*
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.
signature-logo368b.png

Re: Why Hector is taking more time than Thrift

2012-06-06 Thread R. Verlangen
Hector is a higher-level client that provides some abstraction and an easy
to use interface. The Thrift API is pretty raw. So for most cases the
Hector client would be the best choice; except for use-cases where the
ultimate performance is a requirement (resulting in lots of more
maintenance between Thrift API changes).

2012/6/6 Prakrati Agrawal prakrati.agra...@mu-sigma.com

  Dear all

 ** **

 I am trying to evaluate the performance of Cassandra and wrote a code to
 retrieve a complete row ( having 43707 columns) using Thrift and Hector. *
 ***

 The thrift client code took 0.767 seconds while Hector code took 0.883
 seconds . Is it expected that Hector will be slower than Thrift? If yes,
 then why are we using Hector and not Thrift?

 ** **

 Thanks and Regards

 ** **

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

 ** **

 --
 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.




-- 
With kind regards,

Robin Verlangen
*Software engineer*
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.


Re: Problem in getting data from a 2 node cluster

2012-06-06 Thread R. Verlangen
Did you run repair on the new node?

2012/6/6 Prakrati Agrawal prakrati.agra...@mu-sigma.com

  Dear all,

 ** **

 I had a 1 node cluster. Then I added 1 more node to it. ** **

 When I ran my query on 1 node cluster I got all my data but when I ran my
 query on the 2 node cluster (Hector code) I am not getting the same data.
 

 How do I ensure that my Hector code retrieves data from all the nodes. ***
 *

 ** **

 Also when I decommission my node and then add it again I get the following
 message.

 This node will not auto bootstrap because it is configured to be a seed
 node

 Please tell me the meaning of it also

 ** **

 Thanks and Regards

 Prakrati

 ** **

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

 ** **

 --
 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.




-- 
With kind regards,

Robin Verlangen
*Software engineer*
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.


Re: Problem in getting data from a 2 node cluster

2012-06-06 Thread R. Verlangen
Repair ensures that all data is consistent and available on the node.

2012/6/6 Prakrati Agrawal prakrati.agra...@mu-sigma.com

  When I run the nodetool command I get the following information

 ./nodetool -h localhost ring

 Address DC  RackStatus State   Load
 Effective-Owership  Token   


   
 85070591730234615865843651857942052864
 

 162.192.100.16  datacenter1 rack1   Up Normal  238.22 MB
 50.00%  0   

 162.192.100.48  datacenter1 rack1   Up Normal  115.6 MB
 50.00%  85070591730234615865843651857942052864  

 ** **

 Please help me

 ** **

 Thanks and Regards

 ** **

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

 ** **

 *From:* Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com]
 *Sent:* Wednesday, June 06, 2012 3:55 PM
 *To:* user@cassandra.apache.org
 *Subject:* RE: Problem in getting data from a 2 node cluster

 ** **

 What does repair do?

 ** **

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

 ** **

 *From:* R. Verlangen [mailto:ro...@us2.nl]
 *Sent:* Wednesday, June 06, 2012 3:56 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Problem in getting data from a 2 node cluster

 ** **

 Did you run repair on the new node?

 2012/6/6 Prakrati Agrawal prakrati.agra...@mu-sigma.com

 Dear all,

  

 I had a 1 node cluster. Then I added 1 more node to it. 

 When I ran my query on 1 node cluster I got all my data but when I ran my
 query on the 2 node cluster (Hector code) I am not getting the same data.
 

 How do I ensure that my Hector code retrieves data from all the nodes. ***
 *

  

 Also when I decommission my node and then add it again I get the following
 message.

 This node will not auto bootstrap because it is configured to be a seed
 node

 Please tell me the meaning of it also

  

 Thanks and Regards

 Prakrati

  

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

  

 ** **
  --

 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.



 

 ** **

 --
 With kind regards,

 ** **

 Robin Verlangen

 *Software engineer*

 ** **

 W http://www.robinverlangen.nl

 E ro...@us2.nl

 ** **

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

 ** **

 ** **
  --

 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.

 --
 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended

Re: nodetool repair -pr enough in this scenario?

2012-06-05 Thread R. Verlangen
In your case -pr would be just fine (see Viktor's explanation).

2012/6/5 Viktor Jevdokimov viktor.jevdoki...@adform.com

  Understand simple mechanics first, decide how to act later.

 ** **

 Without –PR there’s no difference from which host to run repair, it runs
 for the whole 100% range, from start to end, the whole cluster, all nodes,
 at once.

 ** **

 With –PR it runs only for a primary range of a node you are running a
 repair.

 Let say you have simple ring of 3 nodes with RF=2 and ranges (per node)
 N1=C-A, N2=A-B, N3=B-C (node tokens are N1=A, N2=B, N3=C). No rack, no DC
 aware.

 So running repair with –PR on node N2 will only repair a range A-B, for
 which node N2 is a primary and N3 is a backup. N2 and N3 will synchronize
 A-B range one with other. For other ranges you need to run on other nodes.
 

 ** **

 Without –PR running on any node will repair all ranges, A-B, B-C, C-A. A
 node you run a repair without –PR is just a repair coordinator, so no
 difference, which one will be next time.

 ** **

 ** **


Best regards / Pagarbiai
 *Viktor Jevdokimov*
 Senior Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider
 What is Adform: watch this short video http://vimeo.com/adform/display
  [image: Adform News] http://www.adform.com

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

   *From:* David Daeschler [mailto:david.daesch...@gmail.com]
 *Sent:* Tuesday, June 05, 2012 08:59
 *To:* user@cassandra.apache.org
 *Subject:* nodetool repair -pr enough in this scenario?

 ** **

 Hello,

 ** **

 Currently I have a 4 node cassandra cluster on CentOS64. I have been
 running nodetool repair (no -pr option) on a weekly schedule like:

 ** **

 Host1: Tue, Host2: Wed, Host3: Thu, Host4: Fri

 ** **

 In this scenario, if I were to add the -pr option, would this still be
 sufficient to prevent forgotten deletes and properly maintain consistency?
 

 ** **

 Thank you,
 - David 




-- 
With kind regards,

Robin Verlangen
*Software engineer*
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.
signature-logo29.png

Re: about multitenant datamodel

2012-06-05 Thread R. Verlangen
Every CF has a certain amount of overhead in memory. It's just not how
Cassandra is designed to be used. Maybe you could think of a way to smash
data down to indices and entities. With an abstraction layer you can store
practically anything in Cassandra.

2012/6/5 Toru Inoko in...@ms.scsk.jp

 IMHO a model that allows external users to create CF's is a bad one.


 why do you think so? I'll let users create ristricted CFs, and limit a
 number of CFs which users create.
 is it still a bad one?


 On Thu, 31 May 2012 06:44:05 +0900, aaron morton aa...@thelastpickle.com
 wrote:

  - Do a lot of keyspaces cause some problems? (If I have 1,000 users,
 cassandra creates 1,000 keyspaces…)

 It's not keyspaces, but the number of column families.

 Without storing any data each CF uses about 1MB of ram. When they start
 storing and reading data they use more.

 IMHO a model that allows external users to create CF's is a bad one.

 Hope that helps.
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 25/05/2012, at 12:52 PM, Toru Inoko wrote:

  Hi, all.

 I'm designing data api service(like cassandra.io but not using
 dedicated server for each user) on cassandra 1.1 on which users can do
 DML/DDL method like cql.
 Followings are api which users can use( almost same to cassandra api).
 - create/read/delete ColumnFamilies/Rows/Columns

 Now I'm thinking about multitenant datamodel on that.
 My data model like the following.
 I'm going to prepare a keyspace for each user as a user's tenant space.

 | keyspace1 | --- | column family |
 |(for user1)|  |
  ...

 | keyspace2 | --- | column family |
 |(for user2)|  |
  ...

 Followings are my question!
 - Is this data model a good for multitenant?
 - Do a lot of keyspaces cause some problems? (If I have 1,000 users,
 cassandra creates 1,000 keyspaces...)

 please, help.
 thank you in advance.

 Toru Inoko.




 --
 --**-
 SCSK株式会社
 技術・品質・情報グループ 技術開発部
 先端技術課

 猪子 徹(Toru Inoko)
 tel   : 03-6438-3544
 mail  : in...@ms.scsk.jp
 --**-




-- 
With kind regards,

Robin Verlangen
*Software engineer*
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.


Re: Finding whether a new node is successfully added or not

2012-06-04 Thread R. Verlangen
Hi there,

You can check the ring info with nodetool. Furthermore you can take a look
at the streaming statistics: lots of pending indicates a node that is still
receiving data from it's seed(s). As far as I'm aware of the seed value
will be read upon start: so a restart is required.

Good luck.

2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com

  Dear all,

 ** **

 I added a new node to my 1 node Cassandra cluster. Now I want to find out
 whether it is added successfully or not. Also do I need to restart the
 already running node after entering the seed value. Please help me.

 ** **

 Thanks and Regards

 ** **

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

 ** **

 --
 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.




-- 
With kind regards,

Robin Verlangen
*Software engineer*
*
*
W www.robinverlangen.nl
E ro...@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.


Re: Adding a new node to Cassandra cluster

2012-06-04 Thread R. Verlangen
Hi there,

When you speak to one node it will internally redirect the request to the
proper node (local / external): but you won't be able to failover on a
crash of the localhost.
For adding another node to the connection pool you should take a look at
the documentation of your java client.

Good luck!

2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com

  Dear all

 ** **

 I successfully added a new node to my cluster so now it’s a 2 node
 cluster. But how do I mention it in my Java code as when I am retrieving
 data its retrieving only for one node that I am specifying in the
 localhost. How do I specify more than one node in the localhost.

 ** **

 Please help me

 ** **

 Thanks and Regards

 ** **

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

 ** **

 --
 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.




-- 
With kind regards,

Robin Verlangen
*Software engineer*
*
*
W www.robinverlangen.nl
E ro...@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.


Re: Adding a new node to Cassandra cluster

2012-06-04 Thread R. Verlangen
You might consider using a higher level client (like Hector indeed). If you
don't want this you will have to write your own connection pool. For start
take a look at Hector. But keep in mind that you might be reinventing the
wheel.

2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com

  Hi,

 ** **

 I am using Thrift API and I am not able to find anything on the internet
 about how to configure it for multiple nodes. I am not using any proper
 client like Hector.

 ** **

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

 ** **

 *From:* R. Verlangen [mailto:ro...@us2.nl]
 *Sent:* Monday, June 04, 2012 2:44 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Adding a new node to Cassandra cluster

 ** **

 Hi there,

 ** **

 When you speak to one node it will internally redirect the request to the
 proper node (local / external): but you won't be able to failover on a
 crash of the localhost.

 For adding another node to the connection pool you should take a look at
 the documentation of your java client.

 ** **

 Good luck!

 ** **

 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com

 Dear all

  

 I successfully added a new node to my cluster so now it’s a 2 node
 cluster. But how do I mention it in my Java code as when I am retrieving
 data its retrieving only for one node that I am specifying in the
 localhost. How do I specify more than one node in the localhost.

  

 Please help me

  

 Thanks and Regards

  

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

  

 ** **
  --

 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.



 

 ** **

 --
 With kind regards,

 ** **

 Robin Verlangen

 *Software engineer*

 ** **

 W www.robinverlangen.nl

 E ro...@us2.nl

 ** **

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

 ** **

 --
 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.




-- 
With kind regards,

Robin Verlangen
*Software engineer*
*
*
W www.robinverlangen.nl
E ro...@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.


Re: Adding a new node to Cassandra cluster

2012-06-04 Thread R. Verlangen
Connection pooling involves things like:
- (transparent) failover / retry
- disposal of connections after X messages
- keep track of connections

Again: take a look at the hector connection pool. Source:
https://github.com/rantav/hector/tree/master/core/src/main/java/me/prettyprint/cassandra/connection

2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com

  Ye I know I am trying to reinvent the wheel but I have to. The
 requirement is such that I have to use Java Thrift API without any client
 like Hector. Can you please tell me how do I do it.

 ** **

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

 ** **

 *From:* samal [mailto:samalgo...@gmail.com]
 *Sent:* Monday, June 04, 2012 3:12 PM

 *To:* user@cassandra.apache.org
 *Subject:* Re: Adding a new node to Cassandra cluster

  ** **

 If you use thrift API, you have to maintain lot of low level code by
 yourself which is already being polished by HLC  hector, pycassa also with
 HLC your can easily switch between thrift and growing CQL.

 On Mon, Jun 4, 2012 at 3:00 PM, R. Verlangen ro...@us2.nl wrote:

 You might consider using a higher level client (like Hector indeed). If
 you don't want this you will have to write your own connection pool. For
 start take a look at Hector. But keep in mind that you might be
 reinventing the wheel.

 ** **

 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com

 Hi,

  

 I am using Thrift API and I am not able to find anything on the internet
 about how to configure it for multiple nodes. I am not using any proper
 client like Hector.

  

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

  

 *From:* R. Verlangen [mailto:ro...@us2.nl]
 *Sent:* Monday, June 04, 2012 2:44 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Adding a new node to Cassandra cluster

  

 Hi there,

  

 When you speak to one node it will internally redirect the request to the
 proper node (local / external): but you won't be able to failover on a
 crash of the localhost.

 For adding another node to the connection pool you should take a look at
 the documentation of your java client.

  

 Good luck!

  

 2012/6/4 Prakrati Agrawal prakrati.agra...@mu-sigma.com

 Dear all

  

 I successfully added a new node to my cluster so now it’s a 2 node
 cluster. But how do I mention it in my Java code as when I am retrieving
 data its retrieving only for one node that I am specifying in the
 localhost. How do I specify more than one node in the localhost.

  

 Please help me

  

 Thanks and Regards

  

 Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 |
 www.mu-sigma.com 

  

  
  --

 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet
 accessibility, the Company cannot accept liability for any virus introduced
 by this e-mail or any attachment and you are advised to use up-to-date
 virus checking software.



 

  

 --
 With kind regards,

  

 Robin Verlangen

 *Software engineer*

  

 W www.robinverlangen.nl

 E ro...@us2.nl

  

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

  

 ** **
  --

 This email message may contain proprietary, private and confidential
 information. The information transmitted is intended only for the person(s)
 or entities to which it is addressed. Any review, retransmission,
 dissemination or other use of, or taking of any action in reliance upon,
 this information by persons or entities other than the intended recipient
 is prohibited and may be illegal. If you received this in error, please
 contact the sender and delete the message from your system.

 Mu Sigma takes all reasonable steps to ensure that its electronic
 communications are free from viruses. However, given Internet

Re: repair

2012-06-04 Thread R. Verlangen
The repair -pr only repairs the nodes primary range: so is only usefull
in day to day use. When you're recovering from a crash use it without -pr.

2012/6/4 Romain HARDOUIN romain.hardo...@urssaf.fr


 Run repair -pr in your cron.

 Tamar Fraenkel ta...@tok-media.com a écrit sur 04/06/2012 13:44:32 :

  Thanks.
 
  I actually did just that with cron jobs running on different hours.
 
  I asked the question because I saw that when one of the logs was
  running the repair, all nodes logged some repair related entries in
  /var/log/cassandra/system.log
 
  Thanks again,
  Tamar Fraenkel
  Senior Software Engineer, TOK Media




-- 
With kind regards,

Robin Verlangen
*Software engineer*
*
*
W www.robinverlangen.nl
E ro...@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.


Re: Data Versioning Support

2012-05-24 Thread R. Verlangen
Hi Felipe,

There recently was a thread about (
http://www.mail-archive.com/user@cassandra.apache.org/msg22298.html  ). The
answer in short: no. However you can build your own data model to support
it.

Cheers!

2012/5/24 Felipe Schmidt felipef...@gmail.com

 Doe's Cassandra support data versioning?

 I'm trying to find it in many places but I'm not quite sure about it.

 Regards,
 Felipe Mathias Schmidt
 (Computer Science UFRGS, RS, Brazil)




-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: Number of keyspaces

2012-05-22 Thread R. Verlangen
Yes, it does. However there's no real answer what's the limit: it depends
on your hardware and cluster configuration.

You might even want to search the archives of this mailinglist, I remember
this has been asked before.

Cheers!

2012/5/21 Luís Ferreira zamith...@gmail.com

 Hi,

 Does the number of keyspaces affect the overall cassandra performance?


 Cumprimentos,
 Luís Ferreira






-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: Number of keyspaces

2012-05-22 Thread R. Verlangen
Hmm, you got me on that. I assumed (~ wrong) that more keyspaces would mean
more CF's.

2012/5/22 aaron morton aa...@thelastpickle.com

 It's more the number of CF's than keyspaces.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 22/05/2012, at 6:58 PM, R. Verlangen wrote:

 Yes, it does. However there's no real answer what's the limit: it depends
 on your hardware and cluster configuration.

 You might even want to search the archives of this mailinglist, I remember
 this has been asked before.

 Cheers!

 2012/5/21 Luís Ferreira zamith...@gmail.com

 Hi,

 Does the number of keyspaces affect the overall cassandra performance?


 Cumprimentos,
 Luís Ferreira






 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl





-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: is it possible to run cassandra process in client mode as smart proxy

2012-05-16 Thread R. Verlangen
Hi there,

I'm using HAProxy for PHP projects to take care of this. It improved
connection pooling enormous on the client side: with preserving failover
capabilities. Maybe that is something for you to use in combination with
PHP.

Good luck!

2012/5/16 Piavlo lolitus...@gmail.com


  Hi,

 I'm interested in using some smart proxy cassandra process that could act
 as coordinator node and be aware of cluster state.
 And run this smart proxy cassandra process on each client side host  where
 the application(php) with short lived cassandra connections runs.
 Besides being aware of cluster state if it could act as coordinator node
 it would save unneeded network trips.
 And maybe even have an option to take care of hinted handoffs.
 IMHO the best candidate for this is the cassandra itself (like it's done
 in elasticsearch http://www.elasticsearch.org/**
 guide/reference/modules/node.**htmlhttp://www.elasticsearch.org/guide/reference/modules/node.html
 )
 I also see there was a work done in this direction at
 https://issues.apache.org/**jira/browse/CASSANDRA-535https://issues.apache.org/jira/browse/CASSANDRA-535
 So maybe this is something that is already usable?

 Or maybe there is some third party project that could be used as smart
 cassandra proxy?

 Thanks
 Alex




-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: is it possible to run cassandra process in client mode as smart proxy

2012-05-16 Thread R. Verlangen
Yes, I'm aware of those issues however in our use case they don't cause any
problems.

But ... If there's something better out there I'm really curious: so I'll
keep up with this thread.

2012/5/16 Piavlo lolitus...@gmail.com

  On 05/16/2012 01:24 PM, R. Verlangen wrote:

 Hi there,

  I'm using HAProxy for PHP projects to take care of this. It improved
 connection pooling enormous on the client side: with preserving failover
 capabilities. Maybe that is something for you to use in combination with
 PHP.

 I already use it exactly like this :)
 But i don't think it's a good solution. And it's totally unaware of
 thrift/cassandra protocol, it's was pretty well discussed here

 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-amp-HAProxy-td5473144.html
 I even see the plain tcp healchecks failing from time to time for no
 reason.
 I'm planning to make it a bit smarter with localhost http level
 healthchecks - where it would make a cassandra write to CF in
  keyspace wich has replication of 1, and will write to a key that maps to
 the specific cassandra node being checked by healthcheck (of course  the
 keys need to be recaclulated each time the cluster is rebalanced).
 But IMHO it's very very ugly hack, and not as reliable as real smart proxy
 which is way more superior and efficient (especially if it could do the
 reads/writes coordination itself).

 Haproxy also has an issue that then once of the backend ip changes (which
 happens often in clouds) it's has to be restarted to resolve the the
 correct hostname,
 though it looks like Willy is finally seriously considering to implement
 more dynamic for hostnames  lookups (which was not the case about a year
 ago then I asked  for such feature)
 the problem was is discussed here recently -
 http://marc.info/?l=haproxym=133559164408814w=1

 haproxy has some more issues - i don't remember off top of my head.

 Smart proxy would simply not have all those issues, as it's aware of the
 ring state and the protocol and if smaprt proxy was  the cassandra itself
 then it would have all the needed features tested and reliable at no effort.

 Thanks
 Alex


  Good luck!

 2012/5/16 Piavlo lolitus...@gmail.com


  Hi,

 I'm interested in using some smart proxy cassandra process that could act
 as coordinator node and be aware of cluster state.
 And run this smart proxy cassandra process on each client side host
  where the application(php) with short lived cassandra connections runs.
 Besides being aware of cluster state if it could act as coordinator node
 it would save unneeded network trips.
 And maybe even have an option to take care of hinted handoffs.
 IMHO the best candidate for this is the cassandra itself (like it's done
 in elasticsearch
 http://www.elasticsearch.org/guide/reference/modules/node.html)
 I also see there was a work done in this direction at
 https://issues.apache.org/jira/browse/CASSANDRA-535
 So maybe this is something that is already usable?

 Or maybe there is some third party project that could be used as smart
 cassandra proxy?

 Thanks
 Alex




  --
 With kind regards,

  Robin Verlangen
 www.robinverlangen.nl





-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: get dinamicsnith info from php

2012-05-14 Thread R. Verlangen
I struggled with this before and decided to use HAProxy which suits my
needs, you can read a little more about it at my personal blog:

http://www.robinverlangen.nl/index/view/4fa902c1596cb-44a627/how-to-solve-the-pain-of-stateless-php-with-cassandra.html


Good luck with it!

2012/5/14 Viktor Jevdokimov viktor.jevdoki...@adform.com

  Let say you have 8 nodes cluster with replication factor 3. If one node
 is down, for its token range you have only 2 nodes left, not 7, which can
 process you requests – other nodes will forward requests to the nearest
 (depends on snitch) or with lower latency (depends on dynamic snitch) of 2
 remaining.

 ** **

 I have no idea about PHP and its multithreading capabilities, if it’s
 impossible to run background thread to return dead endpoint to the list,
 instead of checking it on HTTP request thread, you’re stacked. For the
 lower latencies dynamic snitch already do a job for you, selecting a node
 with lower latencies.

 ** **

 If you’d like Cassandra to avoid forwarding requests to appropriate node,
 but making a direct request to a node where data is, you need smarter
 client, capable to select node by key and other things to do to achieve
 this.

 ** **

 ** **


Best regards / Pagarbiai
 *Viktor Jevdokimov*
 Senior Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider
 What is Adform: watch this short video http://vimeo.com/adform/display
  [image: Adform News] http://www.adform.com

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

 *From:* ruslan usifov [mailto:ruslan.usi...@gmail.com]
 *Sent:* Monday, May 14, 2012 17:41
 *To:* user@cassandra.apache.org
 *Subject:* Re: get dinamicsnith info from php

 ** **

 Sorry for my bad english.


 I want to solve follow problem. For example we down one node for
 maintenance reason, for a long time (30 min). Now we use TSocketPool for
 polling connection to cassandra, but this poll implementation is as i think
 not so good, it have a custom parameter setRetryInterval, with allow off
 broken node (now we set i to 10sec), but this mean that every 10sec pool
 will try to connet down node (i repeat we shutdown node for maintance
 reason), because it doesn't know node dead or node, but cassandra cluster
 know this, and this connection attempt is senselessly, also when node make
 compact it can be heavy loaded, and can't serve client reqest very good (at
 this moment we can got little increase of avg backend responce time)

 2012/5/14 Viktor Jevdokimov viktor.jevdoki...@adform.com

 I’m not sure, that selecting node upon DS is a good idea. First of all
 every node has values about every node, including self. Self DS values are
 always better than others.

  

 For example, 3 nodes RF=2:

  

 N1

 N2

 N3

 N1

 0.5ms

 2ms

 2ms

 N2

 2ms

 0.5ms

 2ms

 N3

 2ms

 2ms

 0.5ms

  

 We have monitored many Cassandra counters, including DS values for every
 node, and graphs shows that latencies is not about load.

  

 So the strategy should be based on use case, node count, RF, replica
 placement strategy, read repair chance, and more, and more…

  

 What do you want to achieve?

  

  

 ** **

 Best regards / Pagarbiai

 *Viktor Jevdokimov*

 Senior Developer

 ** **

 Email: viktor.jevdoki...@adform.com

 Phone: +370 5 212 3063, Fax +370 5 261 0453

 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania

 Follow us on Twitter: @adforminsiderhttp://twitter.com/#%21/adforminsider
 

 What is Adform: watch this short video http://vimeo.com/adform/display**
 **

 [image: Adform News] http://www.adform.com


 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies. 

 ** **

 *From:* ruslan usifov [mailto:ruslan.usi...@gmail.com]
 *Sent:* Monday, May 14, 2012 16:58
 *To:* user@cassandra.apache.org
 *Subject:* get dinamicsnith info from php

  

Re: Use-case: multi-instance webshop

2012-05-10 Thread R. Verlangen
@Aaron: Solr will probably be the solution to our problem. Thank you!

@Radim: We already have a Cassandra cluster, we do not want to add an extra
MongoDB cluster. At this moment the data would fit easily in SQL, but we
don't know how our platform grows and we want to be prepared for the future.

Would it be stupid to go for the manual indexing on top of Cassandra?

2012/5/10 Radim Kolar h...@filez.com


  Is Cassandra a fit for this use-case or should we just stick with the
 oldskool MySQL and put things like votes, reviews etc in our C* store?

 If all your data fits into one computer and you expect only tens of
 millions records in table then go for SQL. It has far more features and
 people are comfortable to work with it.

 If you want noSQL then go for mongoDB




-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Use-case: multi-instance webshop

2012-05-08 Thread R. Verlangen
Hi there,

I'm working on a datamodel for a multi-website, multi-customer system.
Things we would like to do:
- search products (lucene / solr / solandra)
- multi-filter (e.g. categories)
- reviews
- voting

I can't really see how to do the filtering of the products by categories
and even things like price (ranges would be possible with C*).

Is Cassandra a fit for this use-case or should we just stick with the
oldskool MySQL and put things like votes, reviews etc in our C* store?

-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: Bad Request: No indexed columns present in by-columns clause with equals operator

2012-04-24 Thread R. Verlangen
I read a while ago that a compaction would rebuild the index. You can
trigger this by running repair with the nodetool.

2012/4/24 mdione@orange.com

 De : mdione@orange.com [mailto:mdione@orange.com]
  [default@avatars] describe HBX_FILE;
  ColumnFamily: HBX_FILE
Key Validation Class: org.apache.cassandra.db.marshal.BytesType
Default column value validator:
  org.apache.cassandra.db.marshal.BytesType
Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
Row cache size / save period in seconds / keys to save :
  0.0/0/all
Row Cache Provider:
  org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider
Key cache size / save period in seconds: 20.0/14400
GC grace seconds: 864000
Compaction min/max thresholds: 4/32
Read repair chance: 1.0
Replicate on write: true
Bloom Filter FP chance: default
Built indexes: []
Column Metadata:
  Column Name: HBX_FIL_DATE
Validation Class: org.apache.cassandra.db.marshal.UTF8Type
  Column Name: HBX_FIL_LARGE
Validation Class: org.apache.cassandra.db.marshal.AsciiType
  Column Name: HBX_FIL_MEDIUM
Validation Class: org.apache.cassandra.db.marshal.AsciiType
  Column Name: HBX_FIL_SMALL
Validation Class: org.apache.cassandra.db.marshal.AsciiType
  Column Name: HBX_FIL_STATUS
Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Index Name: HBX_FILE_HBX_FIL_STATUS_idx
Index Type: KEYS
  Column Name: HBX_FIL_TINY
Validation Class: org.apache.cassandra.db.marshal.AsciiType
Compaction Strategy:
  org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy

   Someone in #cassandra pointed out that the index might be created, but
 it's shown as not built («Built indexes: []»). Is that right? Any idea
 how to build it?

 --
 Marcos Dione
 SysAdmin
 Astek Sud-Est
 pour FT/TGPF/OPF/PORTAIL/DOP/HEBEX @ Marco Polo
 04 97 12 62 45 - mdione@orange.com



 _

 Ce message et ses pieces jointes peuvent contenir des informations
 confidentielles ou privilegiees et ne doivent donc
 pas etre diffuses, exploites ou copies sans autorisation. Si vous avez
 recu ce message par erreur, veuillez le signaler
 a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
 electroniques etant susceptibles d'alteration,
 France Telecom - Orange decline toute responsabilite si ce message a ete
 altere, deforme ou falsifie. Merci.

 This message and its attachments may contain confidential or privileged
 information that may be protected by law;
 they should not be distributed, used or copied without authorisation.
 If you have received this email in error, please notify the sender and
 delete this message and its attachments.
 As emails may be altered, France Telecom - Orange is not liable for
 messages that have been modified, changed or falsified.
 Thank you.




-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: blob fields, bynary or hexa?

2012-04-19 Thread R. Verlangen
PHPCassa does support binaries, so that should not be the problem.

2012/4/19 phuduc nguyen duc.ngu...@pearson.com

 Well, I'm not sure exactly how you're passing a blob to the CLI. It would
 be
 helpful if you pasted your commands/code and maybe there is a simple
 oversight.

 With that said, Cassandra can most definitely save blob/binary values. I
 think most people use a high level client; we use Hector. If you're in PHP
 land, see if you problems exist in phpcassa.


 Duc



 On 4/19/12 2:25 AM, mdione@orange.com mdione@orange.com wrote:

  De : phuduc nguyen [mailto:duc.ngu...@pearson.com]
  How are you passing a blob or binary stream to the CLI? It sounds like
  you're passing in a representation of a binary stream as ascii/UTF8
  which will create the problems you describe.
 
So this is only a limitation of Cassandra-cli?
 
  --
  Marcos Dione
  SysAdmin
  Astek Sud-Est
  pour FT/TGPF/OPF/PORTAIL/DOP/HEBEX @ Marco Polo
  04 97 12 62 45 - mdione@orange.com
 
 
 __
  ___
 
  Ce message et ses pieces jointes peuvent contenir des informations
  confidentielles ou privilegiees et ne doivent donc
  pas etre diffuses, exploites ou copies sans autorisation. Si vous avez
 recu ce
  message par erreur, veuillez le signaler
  a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
  electroniques etant susceptibles d'alteration,
  France Telecom - Orange decline toute responsabilite si ce message a ete
  altere, deforme ou falsifie. Merci.
 
  This message and its attachments may contain confidential or privileged
  information that may be protected by law;
  they should not be distributed, used or copied without authorisation.
  If you have received this email in error, please notify the sender and
 delete
  this message and its attachments.
  As emails may be altered, France Telecom - Orange is not liable for
 messages
  that have been modified, changed or falsified.
  Thank you.
 




-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: swap grows

2012-04-14 Thread R. Verlangen
Its recommended to disable swap entirely when you run Cassandra on a server.

2012/4/14 ruslan usifov ruslan.usi...@gmail.com

 I forgot to say that system have 24GB of phis memory


 2012/4/14 ruslan usifov ruslan.usi...@gmail.com

 Hello

 We have 6 node cluster (cassandra 0.8.10). On one node i increase java
 heap size to 6GB, and now at this node begin grows swap, but system have
 about 3GB of free memory:


 root@6wd003:~# free
  total   used   free sharedbuffers cached
 Mem:  24733664   217028123030852  0   6792   13794724
 -/+ buffers/cache:7901296   16832368
 Swap:  1998840   23521996488


 And swap space slowly grows, but i misunderstand why?


 PS: We have JNA mlock, and set  vm.swappiness = 0
 PS: OS ubuntu 10.0.4(2.6.32-40-generic)






-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: swap grows

2012-04-14 Thread R. Verlangen
Maybe it has got something to do with swapiness, it's something you can
configure, more info here:
https://www.linux.com/news/software/applications/8208-all-about-linux-swap-space


2012/4/14 ruslan usifov ruslan.usi...@gmail.com

 I know:-) but this is not answer:-(. I found that on other nodes there
 still about 3GB (on node with JAVA_HEAP=6GB free memory also 3GB) of free
 memory but there JAVA_HEAP=5G, so this looks like some sysctl
 (/proc/sys/vm???) ratio (about 10%(3 / 24 * 100)), i don't known which,
 anybody can explain this situation

 2012/4/14 R. Verlangen ro...@us2.nl

 Its recommended to disable swap entirely when you run Cassandra on a
 server.


 2012/4/14 ruslan usifov ruslan.usi...@gmail.com

 I forgot to say that system have 24GB of phis memory


 2012/4/14 ruslan usifov ruslan.usi...@gmail.com

 Hello

 We have 6 node cluster (cassandra 0.8.10). On one node i increase java
 heap size to 6GB, and now at this node begin grows swap, but system have
 about 3GB of free memory:


 root@6wd003:~# free
  total   used   free sharedbuffers
 cached
 Mem:  24733664   217028123030852  0   6792
 13794724
 -/+ buffers/cache:7901296   16832368
 Swap:  1998840   23521996488


 And swap space slowly grows, but i misunderstand why?


 PS: We have JNA mlock, and set  vm.swappiness = 0
 PS: OS ubuntu 10.0.4(2.6.32-40-generic)






 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl





-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: Trouble with wrong data

2012-04-13 Thread R. Verlangen
It sounds like the commitlog has been replayed however I have really no
idea whether this could have happened. Anyone?

2012/4/13 Alain RODRIGUEZ arodr...@gmail.com

 The commitlog_total_space_in_mb was not set, I set it to avoid having the
 same problem in the future.

 I am aware of the over-counting problem introduced by the counters. The
 point is that I use them to make statistics per hours. I can understand
 having some wrong counts in the column corresponding to the crash time, but
 how to explain that all my counts since the start (months ago) have become
 wrong after the crash ?

 After the crash I tried to repair my entire keyspace from one of the 2
 nodes and this made my server crash again, no idea why. Can this failed
 repair be at the origin of the corrupted data ?

 I'm still replaying all my counts of the past months and I'm afraid this
 kind of bug could happen again...

 I was using cassandra for months without any issue.

 Alain

 2012/4/11 aaron morton aa...@thelastpickle.com

 However after recovering from this issue (freeing some space and fixing
 the value of  commitlog_total_space_in_mb in cassandra.yaml)

 Did the commit log grow larger than commitlog_total_space_in_mb ?

 I realized that all statistics were all destroyed. I have bad values on
 every single counter since I start using them (september) !

 Counter operations are not idempotent. If you client retries a counter
 operation it may result in the increment been applied twice. Could this
 have been your issue ?

 Cheers


   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 11/04/2012, at 2:35 AM, Alain RODRIGUEZ wrote:

 By the way, I am using Cassandra 1.0.7, CL = ONE (R/W), RF = 2, 2 EC2
 c1.medium nodes cluster

 Alain

 2012/4/10 Alain RODRIGUEZ arodr...@gmail.com

 Hi, I'm experimenting a strange and very annoying phenomena.

 I had a problem with the commit log size which grew too much and full
 one of the hard disks in all my nodes almost at the same time (2 nodes
 only, RF=2, so the 2 nodes are behaving exactly in the same way)

 My data are mounted in an other partition that was not full. However
 after recovering from this issue (freeing some space and fixing the value
 of  commitlog_total_space_in_mb in cassandra.yaml) I realized that all
 statistics were all destroyed. I have bad values on every single counter
 since I start using them (september) !

 Does anyone experimented something similar or have any clue on this ?

 Do you need more information ?

 Alain







-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: need of regular nodetool repair

2012-04-11 Thread R. Verlangen
Yes, I personally have configured it to perform a repair once a week, as
the GCGraceSeconds is at 10 days.

This is also what's in the manual
http://wiki.apache.org/cassandra/Operations#Repairing_missing_or_inconsistent_data
(point
2)

2012/4/11 ruslan usifov ruslan.usi...@gmail.com

 Hello

 I have follow question, if we Read and write to cassandra claster with
 QUORUM consistency level, does this allow to us do not call nodetool repair
 regular? (i.e. every GCGraceSeconds)




-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: need of regular nodetool repair

2012-04-11 Thread R. Verlangen
Well, if everything works 100% at any time there should be nothing to
repair, however with a distributed cluster it would be pretty rare for that
to occur. At least that is how I interpret this.

2012/4/11 Igor i...@4friends.od.ua

  BTW, I heard that we don't need to run repair if all your data have TTL,
 all HH works,  and you never delete your data.


 On 04/11/2012 11:34 AM, ruslan usifov wrote:

 Sorry fo my bad english, so QUORUM allow  doesn't make repair regularity?
 But form your anser it does not follow

 2012/4/11 R. Verlangen ro...@us2.nl

 Yes, I personally have configured it to perform a repair once a week, as
 the GCGraceSeconds is at 10 days.

  This is also what's in the manual
 http://wiki.apache.org/cassandra/Operations#Repairing_missing_or_inconsistent_data
  (point
 2)


  2012/4/11 ruslan usifov ruslan.usi...@gmail.com

 Hello

 I have follow question, if we Read and write to cassandra claster with
 QUORUM consistency level, does this allow to us do not call nodetool repair
 regular? (i.e. every GCGraceSeconds)




  --
 With kind regards,

  Robin Verlangen
 www.robinverlangen.nl






-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: cassandra 0.8.7 + hector 0.8.3: All Quorum reads result in writes?

2012-04-11 Thread R. Verlangen
Are you sure this isn't read-repair?
http://wiki.apache.org/cassandra/ReadRepair

2012/4/11 Thibaut Britz thibaut.br...@trendiction.com

 Also executing the same multiget rangeslice query over the same range
 again will trigger the same writes again and again.

 On Wed, Apr 11, 2012 at 5:41 PM, Thibaut Britz 
 thibaut.br...@trendiction.com wrote:

 Hi,

 I just diagnosted this strange behavior:

 When I fetch a rangeslice through hector and set the consistency level to
 quorum, according to cfstats (and also to the output files on the hd),
 cassandra seems to execute a write request for each read I execute. The
 write count in cfstats is increased when I execute the rangeslice function
 over the same range again and again (without saving anything at all).

 If I set the consitency level to ONE, no writes are executed.

 How can I disable this? Why are the records rewritten each time, even
 though I don't want them to be rewritten?

 Thanks,
 Thibaut.


 Code:
 Keyspace ks = getConnection(cluster,
 consistencylevel);

  RangeSlicesQueryString, String, V rangeSlicesQuery =
 HFactory.createRangeSlicesQuery(ks, StringSerializer.get(),
 StringSerializer.get(), s);

 rangeSlicesQuery.setColumnFamily(columnFamily);
 rangeSlicesQuery.setColumnNames(column);

 rangeSlicesQuery.setKeys(start, end);
 rangeSlicesQuery.setRowCount(maxrows);

 QueryResultOrderedRowsString, String, V result =
 rangeSlicesQuery.execute();
 return result.get();







-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: Nodetool snapshot, consistency and replication

2012-04-03 Thread R. Verlangen
Ok, thank you.

2012/4/2 Rob Coli rc...@palominodb.com

 On Mon, Apr 2, 2012 at 9:19 AM, R. Verlangen ro...@us2.nl wrote:
  - 3 node cluster
  - RF = 3
  - fully consistent (not measured, but let's say it is)
 
  Is it true that when I take a snaphot at only one of the 3 nodes this
  contains all the data in the cluster (at least 1 replica)?

 Yes.

 =Rob

 --
 =Robert Coli
 AIMGTALK - rc...@palominodb.com
 YAHOO - rcoli.palominob
 SKYPE - rcoli_palominodb




-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Nodetool snapshot, consistency and replication

2012-04-02 Thread R. Verlangen
Hi there,

I have a question about the nodetool snapshot.

Situation:
- 3 node cluster
- RF = 3
- fully consistent (not measured, but let's say it is)

Is it true that when I take a snaphot at only one of the 3 nodes this
contains all the data in the cluster (at least 1 replica)?

With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: another DataStax OpsCenter question

2012-03-31 Thread R. Verlangen
Nick, would that also result in useless duplicates of the statistics?

2012/3/30 Nick Bailey n...@datastax.com

 Unfortunately at the moment OpsCenter only really supports having one
 instance per cluster. It may be possible to set up an instance in each
 datacenter, however it has not been tested and each opscenter instance
 would lose some functionality.

 On Fri, Mar 30, 2012 at 3:13 AM, Alexandru Sicoe adsi...@gmail.com
 wrote:
  Hi Nick,
 
  I forgot to say I was using 1.2.3 which I think uses different ports. So
 I
  will upgrade to 1.4.1 and open those ports across the firewall although
  that's kind of a pain. I already have about 320 config lines for the
  Cassandra cluster itself.
 
  So, just to make things clear, is it mandatory to have one OpsCenter
  instance per Cassandra cluster? Even if that cluster is split in multiple
  Cassandra DCs across separate regions?
 
  Is there a way to have one OpsCenter per Cassandra DC (monitor Cassandra
 DCs
  individually)? That would get rid of many configuration issues!
 
  Cheers,
  Alex
 
 
  On Thu, Mar 29, 2012 at 9:35 PM, Nick Bailey n...@datastax.com wrote:
 
  This setup may be possible although there are a few potential issues.
  Firstly, see:
 
 http://www.datastax.com/docs/opscenter/configure_opscenter#configuring-firewall-port-access
 
  Basically the agents and OpsCenter communicate on ports 61620 and
  61621 by default (those can be configured though). The agents will
  contact the the OpsCenter machine on port 61620. You can specify the
  interface the agents will use to connect to this port when
  installing/setting up the agents.
 
  The OpsCenter machine will contact the agents on port 61621. Right now
  the OpsCenter machine will only talk to the nodes using the
  listen_address configured in your cassandra conf. We have a task to
  fix this in the future so that you can configure the interface that
  opscenter will contact each agent on. In the meantime though OpsCenter
  will need to be able to hit the listen_address for each node.
 
  On Thu, Mar 29, 2012 at 12:47 PM, Alexandru Sicoe adsi...@gmail.com
  wrote:
   Hello,
I am planning on testing OpsCenter to see how it can monitor a multi
 DC
   cluster. There are 2 DCs each on a different side of a firewall. I've
   configured NAT on the firewall to allow the communication between all
   Cassandra nodes on ports 7000, 7199 and 9160. The cluster works fine.
   However when I start OpsCenter (obviously on one side of the firewall)
   the
   OpsCenter CF gives me two schema versions in the cluster and basically
   messes up everything. Plus, I can only see the nodes on one the same
   side.
  
   What are the requirements to let the OpsCenter on one side see the
   Cassandra
   nodes and the OpsCenter agents on the other, and viceversa?
  
   Is it possible to use OpsCenter across a firewall?
  
   Cheers,
   Alex
 
 




-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: opscenter

2012-03-29 Thread R. Verlangen
As far as I'm aware of that is not possible using the opscenter.

I recommend you use the cassandra-cli and perform an update column family
query.

2012/3/29 puneet loya puneetl...@gmail.com

 I m currently using the the datastax opscenter.

 How do we add column to the column families in opscenter??





-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: Any improvements in Cassandra JDBC driver ?

2012-03-29 Thread R. Verlangen
The best would to not use update / insert at all but set / put / save.

Cheers!

2012/3/29 Dinusha Dilrukshi sdddilruk...@gmail.com

 What I want to tell was this driver does not use INSERT key word.  Since
 CQL support for using INSERT keyword and it is more generic key word used
 to add new records, it's more user friendly to  use INSERT key word  to add
 new record set rather using UPDATE keyword.

 Regards,
 ~Dinusha~




 On Thu, Mar 29, 2012 at 8:34 PM, Jeremiah Jordan 
 jeremiah.jor...@morningstar.com wrote:

  There is no such thing as pure insert which will give an error if the
 thing already exists.  Everything is really UPDATE OR INSERT.  Whether
 you say UPDATE, or INSERT, it will all act like UPDATE OR INSERT, if the
 thing is there it get over written, if it isn't there it gets inserted.

 -Jeremiah


  --
 *From:* Dinusha Dilrukshi [sdddilruk...@gmail.com]
 *Sent:* Wednesday, March 28, 2012 11:41 PM
 *To:* user@cassandra.apache.org
 *Subject:* Any improvements in Cassandra JDBC driver ?

  Hi,

  We are using Cassandra JDBC driver (found in [1]) to call to Cassandra
 sever using CQL and JDBC calls.  One of the main disadvantage is, this
 driver is not available in maven repository where people can publicly
 access. Currently we have to checkout the source and build ourselves. Is
 there any possibility to host this driver in a maven repository ?

  And one of the other limitation in driver is, it does not support for
 the insert query. If we need to do a insert , then it can be done using the
 update statement. So basically it will be same query used for both UPDATE
 and INSERT. As an example, if you execute following query:
 update USER set 'username'=?, 'password'=? where key = ?
 and if the provided 'KEY' already exist in the Column family then it will
 do a update to existing  columns. If the provided KEY does not already
 exist, then it will do a insert..
 Is that the INSERT query option is now available in latest driver?

  Are there any other improvements/supports added to this driver recently
 ?

  Is this driver compatible with Cassandra-1.1.0 and is that the changes
 done for driver will be backward compatible with older Cassandra versions
 (1.0.0) ?

  [1]. 
 http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/

  Regards,
 ~Dinusha~





-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: Graveyard compactions, when do they occur?

2012-03-28 Thread R. Verlangen
Cassandra graveyard sounds like a lot of thombstones that will be
compacted during normal compact.

You can trigger that manually using the nodetool.

2012/3/28 Erik Forsberg forsb...@opera.com

 Hi!

 I was trying out the truncate command in cassandra-cli.

 http://wiki.apache.org/**cassandra/CassandraCli08http://wiki.apache.org/cassandra/CassandraCli08says
  A snapshot of the data is created, which is deleted asyncronously
 during a 'graveyard' compaction.

 When do graveyard compactions happen? Do I have to trigger them somehow?

 Thanks,
 \EF




-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: How to store a list of values?

2012-03-28 Thread R. Verlangen
Yes, that is one of the possible solutions to your problem.

When you want to retrieve only the skills of a particular row just get the
columns with as start value skill:.

A suggestion to your example might be to use a ~ in stead of : as
separator. A tilde is used less often in standard sentences, so you could
replace any of them in skills with some other character (e.g. a dash or
whitespace).

2012/3/27 Ben McCann b...@benmccann.com

 I was given one other suggestion (which may have been suggested earlier in
 this thread, but is clearer to me with an example).  The suggestion was to
 use composite columns and have the first part of the key name be skill
 and the second part be the specific skill and then store a null value.  I
 hope I understood this suggestion correctly.

 user: {
   'name': 'ben',
   'title': 'software engineer',
   'company': 'google',
   'location': 'orange county',
   'skill:java': '',
   'skill:html': '',
   'skill:javascript': ''
 }


 On Tue, Mar 27, 2012 at 12:04 AM, samal samalgo...@gmail.com wrote:

 YEAH! agree, it only matter for time bucket data.


 On Tue, Mar 27, 2012 at 12:31 PM, R. Verlangen ro...@us2.nl wrote:

 That's true, but it does not sound like a real problem to me.. Maybe
 someone else can shed some light upon this.


 2012/3/27 samal samalgo...@gmail.com



 On Tue, Mar 27, 2012 at 1:47 AM, R. Verlangen ro...@us2.nl wrote:

  but any schema change will break it 

 How do you mean? You don't have to specify the columns in Cassandra so
 it should work perfect. Except for the skill~ is preserverd for your 
 list.


  In case skill~ is decided to change to skill:: , it need to be handle
 at app level. Or otherwise had t update in all row, read it first, modify
 it, insert new version and delete old version.




 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl






-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: How to store a list of values?

2012-03-28 Thread R. Verlangen
If you use the CompositeColumn it does, but it looked to me in your example
you just used the simple utf8-based solution. My apologies for the
confusion.

2012/3/28 Ben McCann b...@benmccann.com

 Hmm. I thought that Cassandra would encode the composite column without
 the colon and that it was only there for illustration purposes, so the
 suggestion to use ~ is confusing.  Are there some docs you can point me to?
  Also, after some reading, it seems to me that it is not even possible to
 have a composite column together with a regular column in a column family
 in this manner.


 On Wed, Mar 28, 2012 at 12:34 AM, R. Verlangen ro...@us2.nl wrote:

 Yes, that is one of the possible solutions to your problem.

 When you want to retrieve only the skills of a particular row just get
 the columns with as start value skill:.

 A suggestion to your example might be to use a ~ in stead of : as
 separator. A tilde is used less often in standard sentences, so you could
 replace any of them in skills with some other character (e.g. a dash or
 whitespace).

 2012/3/27 Ben McCann b...@benmccann.com

 I was given one other suggestion (which may have been suggested earlier
 in this thread, but is clearer to me with an example).  The suggestion was
 to use composite columns and have the first part of the key name be skill
 and the second part be the specific skill and then store a null value.  I
 hope I understood this suggestion correctly.

 user: {
   'name': 'ben',
   'title': 'software engineer',
   'company': 'google',
   'location': 'orange county',
   'skill:java': '',
   'skill:html': '',
   'skill:javascript': ''
 }


 On Tue, Mar 27, 2012 at 12:04 AM, samal samalgo...@gmail.com wrote:

 YEAH! agree, it only matter for time bucket data.


 On Tue, Mar 27, 2012 at 12:31 PM, R. Verlangen ro...@us2.nl wrote:

 That's true, but it does not sound like a real problem to me.. Maybe
 someone else can shed some light upon this.


 2012/3/27 samal samalgo...@gmail.com



 On Tue, Mar 27, 2012 at 1:47 AM, R. Verlangen ro...@us2.nl wrote:

  but any schema change will break it 

 How do you mean? You don't have to specify the columns in Cassandra
 so it should work perfect. Except for the skill~ is preserverd for 
 your
 list.


  In case skill~ is decided to change to skill:: , it need to be
 handle at app level. Or otherwise had t update in all row, read it first,
 modify it, insert new version and delete old version.




 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl






 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl





-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: problem in create column family

2012-03-27 Thread R. Verlangen
Not sure about that, what version of Cassandra are you using? Maybe someone
else here knows how to solve this..

2012/3/27 puneet loya puneetl...@gmail.com

 ya had created with UTF8Type before.. It gave the same error.

 On executing help assume command it is giving 'utf8' as a type.

 so can i use comparator='utf8' or not??


 Please reply


 On Mon, Mar 26, 2012 at 9:17 PM, R. Verlangen ro...@us2.nl wrote:

 You should use the full type names, e.g.

 create column family MyColumnFamily with comparator=UTF8Type;


 2012/3/26 puneet loya puneetl...@gmail.com

 It is giving errors like  Unable to find abstract-type class
 'org.apache.cassandra.db.marshal.utf8' 

 and java.lang.RuntimeException:
 org.apache.cassandra.db.marshal.MarshalException: cannot parse
 'catalogueId' as hex bytes

 where catalogueId is a column that has utf8 as its data type. they may
 be just synactical errors..

 Please suggest if u can help me out on dis??




 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl





-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: How to store a list of values?

2012-03-27 Thread R. Verlangen
That's true, but it does not sound like a real problem to me.. Maybe
someone else can shed some light upon this.

2012/3/27 samal samalgo...@gmail.com



 On Tue, Mar 27, 2012 at 1:47 AM, R. Verlangen ro...@us2.nl wrote:

  but any schema change will break it 

 How do you mean? You don't have to specify the columns in Cassandra so it
 should work perfect. Except for the skill~ is preserverd for your list.


  In case skill~ is decided to change to skill:: , it need to be handle at
 app level. Or otherwise had t update in all row, read it first, modify it,
 insert new version and delete old version.




-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: Fwd: information on cassandra

2012-03-27 Thread R. Verlangen
Thank you Maki, wasn't aware of that.

2012/3/27 Maki Watanabe watanabe.m...@gmail.com

 auto_bootstrap has been removed from cassandra.yaml and always enabled
 since 1.0.
 fyi.

 maki

 2012/3/26 R. Verlangen ro...@us2.nl:
  Yes, you can add nodes to a running cluster. It's very simple:
 configure
  the cluster name and seed node(s) in cassandra.yaml, set auto_bootstrap
 to
  true and start the node.
 
 
  2012/3/26 puneet loya puneetl...@gmail.com
 
  5n.. consider i m starting on a single node. can I add nodes later?? plz
  reply :)
 
 
  On Sun, Mar 25, 2012 at 7:41 PM, Ertio Lew ertio...@gmail.com wrote:
 
  I guess 2 node cluster with RF=2 might also be a starting point. Isn't
 it
  ? Are there any issues with this ?
 
  On Sun, Mar 25, 2012 at 12:20 AM, samal samalgo...@gmail.com wrote:
 
  Cassandra has distributed architecture. So 1 node does not fit into
 it.
  although it can used but you loose its benefits , ok if you are just
 playing
  around, use vm  to learn how cluster communicate, handle request.
 
  To get full tolerance, redundancy and consistency minimum 3 node is
  required.
 
  Imp read here:
  http://wiki.apache.org/cassandra/
  http://www.datastax.com/docs/1.0/index
  http://thelastpickle.com/
  http://www.acunu.com/blogs/all/
 
 
 
  On Sat, Mar 24, 2012 at 11:37 PM, Garvita Mehta 
 garvita.me...@tcs.com
  wrote:
 
  its not advisable to use cassandra on single node, as its basic
  definition says if a node fails, data still remains in the system,
 atleast 3
  nodes must be there while setting up a cassandra cluster.
 
 
  Garvita Mehta
  CEG - Open Source Technology Group
  Tata Consultancy Services
  Ph:- +91 22 67324756
  Mailto: garvita.me...@tcs.com
  Website: http://www.tcs.com
  
  Experience certainty. IT Services
  Business Solutions
  Outsourcing
  
 
  -puneet loya wrote: -
 
  To: user@cassandra.apache.org
  From: puneet loya puneetl...@gmail.com
  Date: 03/24/2012 06:36PM
  Subject: Fwd: information on cassandra
 
 
 
 
  hi,
 
  I m puneet, an engineering student. I would like to know that, is
  cassandra useful considering we just have a single node(rather a
 single
  system) having all the information.
  I m looking for decent response time for the database. can you please
  respond?
 
  Thank you ,
 
  Regards,
 
  Puneet Loya
 
  =-=-=
  Notice: The information contained in this e-mail
  message and/or attachments to it may contain
  confidential or privileged information. If you are
  not the intended recipient, any dissemination, use,
  review, distribution, printing or copying of the
  information contained in this e-mail message
  and/or attachments to it are strictly prohibited. If
  you have received this communication in error,
  please notify us by reply e-mail or telephone and
  immediately and permanently delete the message
  and any attachments. Thank you
 
 
 
 
 
 
 
  --
  With kind regards,
 
  Robin Verlangen
  www.robinverlangen.nl
 




-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: Schema advice/help

2012-03-27 Thread R. Verlangen
You can just get a slice range with as start userId: and no end.

2012/3/27 Maciej Miklas mac.mik...@googlemail.com

 multiget would require Order Preserving Partitioner, and this can lead to
 unbalanced ring and hot spots.

 Maybe you can use secondary index on itemtype - is must have small
 cardinality:
 http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/




 On Tue, Mar 27, 2012 at 10:10 AM, Guy Incognito dnd1...@gmail.com wrote:

 without the ability to do disjoint column slices, i would probably use 5
 different rows.

 userId:itemType - activityId

 then it's a multiget slice of 10 items from each of your 5 rows.


 On 26/03/2012 22:16, Ertio Lew wrote:

 I need to store activities by each user, on 5 items types. I always want
 to read last 10 activities on each item type, by a user (ie, total
 activities to read at a time =50).

 I am wanting to store these activities in a single row for each user so
 that they can be retrieved in single row query, since I want to read all
 the last 10 activities on each item.. I am thinking of creating composite
 names appending itemtype : activityId(activityId is just timestamp
 value) but then, I don't see about how to read the last 10 activities from
 all itemtypes.

 Any ideas about schema to do this better way ?






-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: counter column family

2012-03-27 Thread R. Verlangen
*create column family MyCounterColumnFamily with
default_validation_class=CounterColumnType and
key_validation_class=UTF8Type and comparator=UTF8Type;*

There you go! Keys must be utf8, as well as the column names. Of course you
can change those validators.

Cheers!

2012/3/27 puneet loya puneetl...@gmail.com

 Can u give an example of create column family with counter column in it.


 Please reply


 Regards,

 Puneet Loya




-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: counter column family

2012-03-27 Thread R. Verlangen
You should use a connection pool without retries to prevent a single
increment of +1 have a result of e.g. +3.

2012/3/27 Rishabh Agrawal rishabh.agra...@impetus.co.in

  You can even define how much increment you want. But let me just warn
 you, as far my knowledge, it has consistency issues.



 *From:* puneet loya [mailto:puneetl...@gmail.com]
 *Sent:* Tuesday, March 27, 2012 5:59 PM

 *To:* user@cassandra.apache.org
 *Subject:* Re: counter column family



 thanxx a ton :) :)



 the counter column family works synonymous as 'auto increment' in other
 databases rite?



 I mean we have a column of  type integer which increments with every
 insert.



 Am i goin the rite way??



 please reply :)

 On Tue, Mar 27, 2012 at 5:50 PM, R. Verlangen ro...@us2.nl wrote:

 *create column family MyCounterColumnFamily with
 default_validation_class=CounterColumnType and
 key_validation_class=UTF8Type and comparator=UTF8Type;*



 There you go! Keys must be utf8, as well as the column names. Of course
 you can change those validators.



 Cheers!



 2012/3/27 puneet loya puneetl...@gmail.com

 Can u give an example of create column family with counter column in it.





 Please reply





 Regards,



 Puneet Loya





 --
 With kind regards,



 Robin Verlangen

 www.robinverlangen.nl





 --

 Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. Know
 more about our Big Data quick-start program at the event.

 New Impetus webcast ‘Cloud-enabled Performance Testing vis-à-vis
 On-premise’ available at http://bit.ly/z6zT4L.


 NOTE: This message may contain information that is confidential,
 proprietary, privileged or otherwise protected by law. The message is
 intended solely for the named addressee. If received in error, please
 destroy and notify the sender. Any use of this email is prohibited when
 received in error. Impetus does not represent, warrant and/or guarantee,
 that the integrity of this communication has been maintained nor that the
 communication is free of errors, virus, interception or interference.




-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: import

2012-03-27 Thread R. Verlangen
You can write your own script to parse the excel file (export as csv) and
import it with batch inserts.

Should be pretty easy if you have experience with those techniques.

2012/3/27 puneet loya puneetl...@gmail.com

 I want to import files from excel to cassandra? Is it possible??

 Any tool that can help??

 Whats the best way??

 Plz reply :)




-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: Error in FAQ?

2012-03-26 Thread R. Verlangen
If you want to modify a column family, just open the command line interface
(cassandra-cli), connect to a node (probably: connect localhost/9160;).

When you have to create your first keyspace type: create keyspace
MyKeyspace;

For modifying an existing keyspace type: use MyKeyspace;

If you need more information you can just type help;

Good luck!

2012/3/26 Ben McCann b...@benmccann.com

 Hmmm, I don't see anything regarding column families in cassandra.yaml.
  It seems like the answer for that question in the FAQ is very outdated.


 On Sun, Mar 25, 2012 at 4:04 PM, Serge Fonville 
 serge.fonvi...@gmail.comwrote:

 Hi,

 2012/3/26 Ben McCann b...@benmccann.com:
  There's a line that says Make necessary changes to your
 storage-conf.xml.
  I can't find this file.  Does it still exist?  If so, where should I
 look?
   I installed the packaged version of Cassandra available in the Datastax
  community edition.

 From  http://wiki.apache.org/cassandra/StorageConfiguration
 Prior to the 0.7 release, Cassandra storage configuration is described
 by the conf/storage-conf.xml file. As of 0.7, it is described by the
 conf/cassandra.yaml file.

 After googling cassandra storage-conf.xml

 Kind regards/met vriendelijke groet,

 Serge Fonville

 http://www.sergefonville.nl

 Convince Google!!
 They need to add GAL support on Android (star to agree)
 http://code.google.com/p/android/issues/detail?id=4602



 2012/3/26 Ben McCann b...@benmccann.com:
  There's a line that says Make necessary changes to your
 storage-conf.xml.
  I can't find this file.  Does it still exist?  If so, where should I
 look?
   I installed the packaged version of Cassandra available in the Datastax
  community edition.
 
  Thanks,
  Ben
 





Re: problem in create column family

2012-03-26 Thread R. Verlangen
You should use the full type names, e.g.

create column family MyColumnFamily with comparator=UTF8Type;

2012/3/26 puneet loya puneetl...@gmail.com

 It is giving errors like  Unable to find abstract-type class
 'org.apache.cassandra.db.marshal.utf8' 

 and java.lang.RuntimeException:
 org.apache.cassandra.db.marshal.MarshalException: cannot parse
 'catalogueId' as hex bytes

 where catalogueId is a column that has utf8 as its data type. they may be
 just synactical errors..

 Please suggest if u can help me out on dis??




-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: How to store a list of values?

2012-03-26 Thread R. Verlangen
 but any schema change will break it 

How do you mean? You don't have to specify the columns in Cassandra so it
should work perfect. Except for the skill~ is preserverd for your list.

2012/3/26 samal samalgo...@gmail.com


 Save the skills in a single column in json format.  Job done.

 Good if  it have fixed set of skills, then any add or delete changes need
 handle in app. -read column first-reformat JOSN-update column (2 thrift
 calls).

  skill~Java: null,
  skill~Cassandra: null
 This is also good option, but any schema change will break it.


 On Mar 26, 2012 7:04 PM, Ben McCann b...@benmccann.com wrote:

 True.  But I don't need the skills to be searchable, so I'd rather embed
 them in the user than add another top-level CF.  I was thinking of doing
 something along the lines of adding a skills super column to the User table:

 skills: {
   'java': null,
   'c++': null,
   'cobol': null
 }

 However, I'm still not sure yet how to accomplish this with Astyanax.
  I've only figured out how to make composite columns with predefined column
 names with it and not dynamic column names like this.



 On Mon, Mar 26, 2012 at 9:08 AM, R. Verlangen ro...@us2.nl wrote:

 In this case you only neem the columns for values. You don't need the
 column-values to hold multiple columns (the super-column principle). So a
 normal CF would work.


 2012/3/26 Ben McCann b...@benmccann.com

 Thanks for the reply Samal.  I did not realize that you could store a
 column with null value.  Do you know if this solution would work with
 composite columns?  It seems super columns are being phased out in favor 
 of
 composites, but I do not understand composites very well yet.  I'm trying
 to figure out if there's any way to accomplish what you've suggested using
 Astyanax https://github.com/Netflix/astyanax.

 Thanks for the help,
 Ben


 On Mon, Mar 26, 2012 at 8:46 AM, samal samalgo...@gmail.com wrote:

 plus it is fully compatible with CQL.
 SELECT * FROM UserSkill WHERE KEY='ben';


 On Mon, Mar 26, 2012 at 9:13 PM, samal samalgo...@gmail.com wrote:

 I would take simple approach. create one other CF UserSkill  with
 row key same as profile_cf key,
 In user_skill cf will add skill as column name and value null.
 Columns can be added or removed.

 UserProfile={
   '*ben*'={
blah :blah
blah :blah
blah :blah
  }
 }

 UserSkill={
   '*ben*'={
 'java':''
 'cassandra':''
   .
   .
   .
   'linux':''
   'skill':'infinity'

  }

 }


 On Mon, Mar 26, 2012 at 12:34 PM, Ben McCann b...@benmccann.comwrote:

 I have a profile column family and want to store a list of skills
 in each profile.  In BigTable I could store a Protocol 
 Bufferhttp://code.google.com/apis/protocolbuffers/docs/overview.htmlwith
  a repeated field, but I'm not sure how this is typically accomplished
 in Cassandra.  One option would be to store a serialized 
 Thrifthttp://thrift.apache.org/or protobuf, but I'd prefer not to do 
 this as I believe Cassandra doesn't
 have knowledge of these formats, and so the data in the datastore 
 would not
 not human readable in CQL queries from the command line.  The other
 solution I thought of would be to use a super column and put a random 
 UUID
 as the key for each skill:

 skills: {
   '4b27c2b3ac48e8df': 'java',
   '84bf94ea7bc92018': 'c++',
   '9103b9a93ce9d18': 'cobol'
 }

 Is this a good way of handling lists in Cassandra?  I imagine
 there's some idiom I'm not aware of.  I'm using the 
 Astyanaxhttps://github.com/Netflix/astyanax/wikiclient library, 
 which only supports composite columns instead of super
 columns, and so the solution I proposed above would seem quite awkward 
 in
 that case.  Though I'm still having some trouble understanding 
 composite
 columns as they seem not to be completely documented yet.  Would this
 solution work with composite columns?

 Thanks,
 Ben







 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl






-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: Performance overhead when using start and end columns

2012-03-26 Thread R. Verlangen
@Aaron: Very interesting article! Mentioned it on my Dutch blog.

2012/3/26 Mohit Anchlia mohitanch...@gmail.com

 Thanks!


 On Mon, Mar 26, 2012 at 10:53 AM, aaron morton aa...@thelastpickle.comwrote:

 See the test's in the article.

 The code I used for profiling is also available.

 Cheers

-
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

   On 27/03/2012, at 6:21 AM, Mohit Anchlia wrote:

 Thanks but if I do have to specify start and end columns then how much
 overhead roughly would that translate to since reading metadata should be
 constant overall?

 On Mon, Mar 26, 2012 at 10:18 AM, aaron morton 
 aa...@thelastpickle.comwrote:

 Some information on query plans
 http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/

 Tl;Dr; Select columns with no start, in the natural Comparator order.

 Cheers


-
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

  On 25/03/2012, at 2:25 PM, Mohit Anchlia wrote:

  I have rows with around 2K-50K columns but when I do a query I only
 need to fetch few columns between start and end columns. I was wondering
 what performance overhead does it cause by using slice query with start and
 end columns?

 Looking at the code it looks like when you give start and end column it
 goes in IndexSliceReader logic, but it's hard to tell how much overhead on
 an average one would see? Or is it even worth worrying about?








-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: cassandra 1.08 on java7 and win7

2012-03-26 Thread R. Verlangen
Ben Coverston wrote earlier today:  Use a version of the Java 6 runtime,
Cassandra hasn't been tested at all with the Java 7 runtime

So I think that might be a good way to start.

2012/3/26 Frank Hsueh frank.hs...@gmail.com

 I think I have cassandra the server started

 In another window:
 
  cassandra-cli.bat -h localhost -p 9160
 Starting Cassandra Client
 Connected to: Test Cluster on localhost/9160
 Welcome to Cassandra CLI version 1.0.8

 Type 'help;' or '?' for help.
 Type 'quit;' or 'exit;' to quit.

 [default@unknown] create keyspace DEMO;
 log4j:WARN No appenders could be found for logger
 (org.apache.cassandra.config.DatabaseDescriptor).
 log4j:WARN Please initialize the log4j system properly.
 log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
 more info.
 Cannot locate cassandra.yaml
 Fatal configuration error; unable to start server.  See log for stacktrace.

 C:\Workspace\cassandra\apache-cassandra-1.0.8\bin
 

 anybody seen this before ?


 --
 Frank Hsueh | frank.hs...@gmail.com




-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl


Re: cassandra-cli and uncreachable status confusion

2012-03-20 Thread R. Verlangen
That's correct. If you run describe cluster normally you'll see something
like:

Cluster Information:
   Snitch: org.apache.cassandra.locator.SimpleSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
3a0f6a80-7140-11e1--511aec3785ff: [IP_OF_NODE,  IP_OF_NODE ,
IP_OF_NODE ]

If there are troubles with the schema multiple will be shown of them, like:

Cluster Information:
   Snitch: org.apache.cassandra.locator.SimpleSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
3a0f6a80-7140-11e1--511aec3785ff: [IP_OF_NODE,  IP_OF_NODE ]
4e252abe-7140-11e1--511aec3785ff: [IP_OF_NODE ]


2012/3/19 Shoaib Mir shoaib...@gmail.com

 On Tue, Mar 20, 2012 at 4:18 AM, aaron morton aa...@thelastpickle.comwrote:

 There is a server side check to ensure that all available nodes share the
 same schema version.


 Is that checked using describe cluster ??

 cheers,
 Shoaib




Re: Single Node Cassandra Installation

2012-03-17 Thread R. Verlangen
 By default Cassandra tries to write to both nodes, always. Writes will
only fail (on a node) if it is down, and even then hinted handoff will
attempt to keep both nodes in sync when the troubled node comes back up.
The point of having two nodes is to have read and write availability in the
face of transient failure. 

Even more: if you enable read repair the chances of having bad writes
decreases for any further reads. This will make your cluster become faster
consistent again after some failure.

Also consider to use different CL's for different operations. E.g. the
Twitter timeline can miss some records, however if you would want to
display my bank account I would prefer to see the right thing: or a nice
error message.

2012/3/16 Ben Coverston ben.covers...@datastax.com

 Doing reads and writes at CL=1 with RF=2 N=2 does not imply that the reads
 will be inconsistent. It's more complicated than the simple counting of
 blocked replicas. It is easy to support the notion that it will be largely
 consistent, in fact very consistent for most use cases.

 By default Cassandra tries to write to both nodes, always. Writes will
 only fail (on a node) if it is down, and even then hinted handoff will
 attempt to keep both nodes in sync when the troubled node comes back up.
 The point of having two nodes is to have read and write availability in the
 face of transient failure.

 If you are interested there is a good exposition of what 'consistency'
 means in a system like Cassandra from the link below[1].

 [1]
 http://www.eecs.berkeley.edu/~pbailis/projects/pbs/


 On Fri, Mar 16, 2012 at 6:50 AM, Thomas van Neerijnen 
 t...@bossastudios.com wrote:

 You'll need to either read or write at at least quorum to get consistent
 data from the cluster so you may as well do both.
 Now that you mention it, I was wrong about downtime, with a two node
 cluster reads or writes at quorum will mean both nodes need to be online.
 Perhaps you could have an emergency switch in your application which flips
 to consistency of 1 if one of your Cassandra servers goes down? Just make
 sure it's set back to quorum when the second one returns or again you could
 end up with inconsistent data.


 On Fri, Mar 16, 2012 at 2:04 AM, Drew Kutcharian d...@venarc.com wrote:

 Thanks for the comments, I guess I will end up doing a 2 node cluster
 with replica count 2 and read consistency 1.

 -- Drew



 On Mar 15, 2012, at 4:20 PM, Thomas van Neerijnen wrote:

 So long as data loss and downtime are acceptable risks a one node
 cluster is fine.
 Personally this is usually only acceptable on my workstation, even my
 dev environment is redundant, because servers fail, usually when you least
 want them to, like for example when you've decided to save costs by waiting
 before implementing redundancy. Could a failure end up costing you more
 than you've saved? I'd rather get cheaper servers (maybe even used off
 ebay??) so I could have at least two of them.

 If you do go with a one node solution, altho I haven't tried it myself
 Priam looks like a good place to start for backups, otherwise roll your own
 with incremental snapshotting turned on and a watch on the snapshot
 directory. Storage on something like S3 or Cloud Files is very cheap so
 there's no good excuse for no backups.

 On Thu, Mar 15, 2012 at 7:12 PM, R. Verlangen ro...@us2.nl wrote:

 Hi Drew,

 One other disadvantage is the lack of consistency level and
 replication. Both ware part of the high availability / redundancy. So you
 would really need to backup your single-node-cluster to some other
 external location.

 Good luck!


 2012/3/15 Drew Kutcharian d...@venarc.com

 Hi,

 We are working on a project that initially is going to have very
 little data, but we would like to use Cassandra to ease the future
 scalability. Due to budget constraints, we were thinking to run a single
 node Cassandra for now and then add more nodes as required.

 I was wondering if it is recommended to run a single node cassandra in
 production? Are there any other issues besides lack of high availability?

 Thanks,

 Drew








 --
 Ben Coverston
 DataStax -- The Apache Cassandra Company




Re: 0.8.1 Vs 1.0.7

2012-03-17 Thread R. Verlangen
Check your log for messages about rebuilding indices: that might grow your
dataset some.

One thing is for sure: the data import removed all the crap that lasted in
the 0.8.1 cluster (duplicates, thombstones etc). The decrease is fairly
dramatic but not unlogical at all.

2012/3/16 Jeremiah Jordan jeremiah.jor...@morningstar.com

  I would guess more aggressive compaction settings, did you update rows
 or insert some twice?
 If you run major compaction a couple times on the 0.8.1 cluster does the
 data size get smaller?

 You can use the describe command to check if compression got turned on.

 -Jeremiah

  --
 *From:* Ravikumar Govindarajan [ravikumar.govindara...@gmail.com]
 *Sent:* Thursday, March 15, 2012 4:41 AM
 *To:* user@cassandra.apache.org
 *Subject:* 0.8.1 Vs 1.0.7

  Hi,

  I ran some data import tests for cassandra 0.8.1 and 1.0.7. The results
 were a little bit surprising

  0.8.1, SimpleStrategy, Rep_Factor=3,QUORUM Writes, RP, SimpleSnitch

  XXX.XXX.XXX.A  datacenter1 rack1   Up Normal  140.61 GB
 12.50%
 XXX.XXX.XXX.B  datacenter1 rack1   Up Normal  139.92 GB
 12.50%
 XXX.XXX.XXX.C  datacenter1 rack1   Up Normal  138.81 GB
 12.50%
 XXX.XXX.XXX.D  datacenter1 rack1   Up Normal  139.78 GB
 12.50%
 XXX.XXX.XXX.E  datacenter1 rack1   Up Normal  137.44 GB
 12.50%
 XXX.XXX.XXX.F  datacenter1 rack1   Up Normal  138.48 GB
 12.50%
 XXX.XXX.XXX.G  datacenter1 rack1   Up Normal  140.52 GB
 12.50%
 XXX.XXX.XXX.H  datacenter1 rack1   Up Normal  145.24 GB
 12.50%

  1.0.7, NTS, Rep_Factor{DC1:3, DC2:2}, LOCAL_QUORUM writes, RP [DC2 m/c
 yet to join ring],
 PropertyFileSnitch

  XXX.XXX.XXX.A  DC1 RAC1   Up Normal   48.72  GB   12.50%
 XXX.XXX.XXX.B  DC1 RAC1   Up Normal   51.23  GB   12.50%
 XXX.XXX.XXX.C  DC1 RAC1   Up Normal   52.4GB   12.50%

 XXX.XXX.XXX.D  DC1 RAC1   Up Normal   49.64  GB   12.50%
 XXX.XXX.XXX.E  DC1 RAC1   Up Normal   48.5GB   12.50%

 XXX.XXX.XXX.F  DC1 RAC1   Up Normal53.38  GB   12.50%

 XXX.XXX.XXX.G  DC1 RAC1   Up Normal   51.11  GB   12.50%
 XXX.XXX.XXX.H  DC1 RAC1   Up Normal   53.36  GB   12.50%

  There seems to be 3X savings in size for the same dataset running 1.0.7.
 I have not enabled compression for any of the CFs. Will it be enabled by
 default when creating a new CF in 1.0.7? cassandra.yaml is also mostly
 identical.

  Thanks and Regards,
 Ravi



Re: Single Node Cassandra Installation

2012-03-15 Thread R. Verlangen
Hi Drew,

One other disadvantage is the lack of consistency level and
replication. Both ware part of the high availability / redundancy. So you
would really need to backup your single-node-cluster to some other
external location.

Good luck!

2012/3/15 Drew Kutcharian d...@venarc.com

 Hi,

 We are working on a project that initially is going to have very little
 data, but we would like to use Cassandra to ease the future scalability.
 Due to budget constraints, we were thinking to run a single node Cassandra
 for now and then add more nodes as required.

 I was wondering if it is recommended to run a single node cassandra in
 production? Are there any other issues besides lack of high availability?

 Thanks,

 Drew




Re: Node joining / unknown

2012-03-08 Thread R. Verlangen
It seemed that one of the other nodes had trouble with a compaction task.
The C node was waiting for that.

It's now streaming all it's data into place.

Thank you all for your time!

2012/3/7 i...@4friends.od.ua

 just run nodetool compactionstat on other nodes.


 -Original Message-
 From: R. Verlangen ro...@us2.nl
 To: user@cassandra.apache.org
 Sent: Wed, 07 Mar 2012 23:09
 Subject: Re: Node joining / unknown

 @Brandon: Thank you for the information. I'll do that next time.

 @Igor: Any ways to find out whether that is the current state? And if so,
 how to solve it?

 2012/3/7 i...@4friends.od.ua

 Maybe it wait for verification compaction on other node?





 -Original Message-
 From: R. Verlangen ro...@us2.nl
 To: user@cassandra.apache.org
 Sent: Wed, 07 Mar 2012 22:15
 Subject: Re: Node joining / unknown

 At this moment the node has joined the ring (after a restart: tried that
 before, but now it had finally result).

 When I try to run repair on the new node, the log says (the new node is
 NODE C):

 INFO [AntiEntropyStage:1] 2012-03-07 21:12:06,453 AntiEntropyService.java
 (line 190) [repair #cfcc12b0-6891-11e1--70a329caccff] Received merkle
 tree for StorageMeta from NODE A
  INFO [AntiEntropyStage:1] 2012-03-07 21:12:06,643
 AntiEntropyService.java (line 190) [repair
 #cfcc12b0-6891-11e1--70a329caccff] Received merkle tree for StorageMeta
 from NODE B

 And then doesn't do anything anymore. Tried it a couple of times again.
 It's just not starting.

 Results from netstats on NODE C:

 Mode: NORMAL
 Not sending any streams.
 Not receiving any streams.
 Pool NameActive   Pending  Completed
 Commandsn/a 0  5
 Responses   n/a93   4296


 Any suggestions?

 Thank you!

 2012/3/7 aaron morton aa...@thelastpickle.com

 - When I try to remove the token, it says: Exception in thread main
 java.lang.UnsupportedOperationException: Token not found.

 Am assuming you ran nodetool removetoken on a node other than the
 joining node? What  did nodetool ring look like on that machine ?

 Take a look at nodetool netstats on the joining node to see if streaming
 has failed. If it's dead then…

 1) Try restarting the joining node and run nodetool repair on it
 immediately. Note: am assuming QUOURM CL otherwise things may get
 inconsistent.
 or
 2) Stop the node. Try to get remove the token again from another node.
 Node that removing a token will stream data around the place as well.

 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 7/03/2012, at 9:11 PM, R. Verlangen wrote:

 Hi there,

 I'm currently in a really weird situation.
 - Nodetool ring says node X is joining (this already takes 12 hours,
 with no activity)
 - When I try to remove the token, it says: Exception in thread main
 java.lang.UnsupportedOperationException: Token not found.
 - Removetoken status = No token removals in process.

 How to get that node out of my cluster?

 With kind regards,
 Robin Verlangen







Node joining / unknown

2012-03-07 Thread R. Verlangen
Hi there,

I'm currently in a really weird situation.
- Nodetool ring says node X is joining (this already takes 12 hours, with
no activity)
- When I try to remove the token, it says: Exception in thread main
java.lang.UnsupportedOperationException: Token not found.
- Removetoken status = No token removals in process.

How to get that node out of my cluster?

With kind regards,
Robin Verlangen


Re: Node joining / unknown

2012-03-07 Thread R. Verlangen
At this moment the node has joined the ring (after a restart: tried that
before, but now it had finally result).

When I try to run repair on the new node, the log says (the new node is
NODE C):

INFO [AntiEntropyStage:1] 2012-03-07 21:12:06,453 AntiEntropyService.java
(line 190) [repair #cfcc12b0-6891-11e1--70a329caccff] Received merkle
tree for StorageMeta from NODE A
 INFO [AntiEntropyStage:1] 2012-03-07 21:12:06,643 AntiEntropyService.java
(line 190) [repair #cfcc12b0-6891-11e1--70a329caccff] Received merkle
tree for StorageMeta from NODE B

And then doesn't do anything anymore. Tried it a couple of times again.
It's just not starting.

Results from netstats on NODE C:

Mode: NORMAL
Not sending any streams.
Not receiving any streams.
Pool NameActive   Pending  Completed
Commandsn/a 0  5
Responses   n/a93   4296


Any suggestions?

Thank you!

2012/3/7 aaron morton aa...@thelastpickle.com

 - When I try to remove the token, it says: Exception in thread main
 java.lang.UnsupportedOperationException: Token not found.

 Am assuming you ran nodetool removetoken on a node other than the joining
 node? What  did nodetool ring look like on that machine ?

 Take a look at nodetool netstats on the joining node to see if streaming
 has failed. If it's dead then…

 1) Try restarting the joining node and run nodetool repair on it
 immediately. Note: am assuming QUOURM CL otherwise things may get
 inconsistent.
 or
 2) Stop the node. Try to get remove the token again from another node.
 Node that removing a token will stream data around the place as well.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 7/03/2012, at 9:11 PM, R. Verlangen wrote:

 Hi there,

 I'm currently in a really weird situation.
 - Nodetool ring says node X is joining (this already takes 12 hours, with
 no activity)
 - When I try to remove the token, it says: Exception in thread main
 java.lang.UnsupportedOperationException: Token not found.
 - Removetoken status = No token removals in process.

 How to get that node out of my cluster?

 With kind regards,
 Robin Verlangen





Re: Node joining / unknown

2012-03-07 Thread R. Verlangen
@Brandon: Thank you for the information. I'll do that next time.

@Igor: Any ways to find out whether that is the current state? And if so,
how to solve it?

2012/3/7 i...@4friends.od.ua

 Maybe it wait for verification compaction on other node?





 -Original Message-
 From: R. Verlangen ro...@us2.nl
 To: user@cassandra.apache.org
 Sent: Wed, 07 Mar 2012 22:15
 Subject: Re: Node joining / unknown

 At this moment the node has joined the ring (after a restart: tried that
 before, but now it had finally result).

 When I try to run repair on the new node, the log says (the new node is
 NODE C):

 INFO [AntiEntropyStage:1] 2012-03-07 21:12:06,453 AntiEntropyService.java
 (line 190) [repair #cfcc12b0-6891-11e1--70a329caccff] Received merkle
 tree for StorageMeta from NODE A
  INFO [AntiEntropyStage:1] 2012-03-07 21:12:06,643 AntiEntropyService.java
 (line 190) [repair #cfcc12b0-6891-11e1--70a329caccff] Received merkle
 tree for StorageMeta from NODE B

 And then doesn't do anything anymore. Tried it a couple of times again.
 It's just not starting.

 Results from netstats on NODE C:

 Mode: NORMAL
 Not sending any streams.
 Not receiving any streams.
 Pool NameActive   Pending  Completed
 Commandsn/a 0  5
 Responses   n/a93   4296


 Any suggestions?

 Thank you!

 2012/3/7 aaron morton aa...@thelastpickle.com

 - When I try to remove the token, it says: Exception in thread main
 java.lang.UnsupportedOperationException: Token not found.

 Am assuming you ran nodetool removetoken on a node other than the joining
 node? What  did nodetool ring look like on that machine ?

 Take a look at nodetool netstats on the joining node to see if streaming
 has failed. If it's dead then…

 1) Try restarting the joining node and run nodetool repair on it
 immediately. Note: am assuming QUOURM CL otherwise things may get
 inconsistent.
 or
 2) Stop the node. Try to get remove the token again from another node.
 Node that removing a token will stream data around the place as well.

 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 7/03/2012, at 9:11 PM, R. Verlangen wrote:

 Hi there,

 I'm currently in a really weird situation.
 - Nodetool ring says node X is joining (this already takes 12 hours, with
 no activity)
 - When I try to remove the token, it says: Exception in thread main
 java.lang.UnsupportedOperationException: Token not found.
 - Removetoken status = No token removals in process.

 How to get that node out of my cluster?

 With kind regards,
 Robin Verlangen






Re: TimeUUID

2012-02-28 Thread R. Verlangen
For querying purposes it would be better to use readable strings because
you can really get information out of that.

TimeUUID is just a unique value based on time; but not only the time.

2012/2/28 Tamar Fraenkel ta...@tok-media.com

 Hi!
 I have a column family where I use rows as time buckets.
 What I do is take epoc time in seconds, and round it to 1 hour (taking the
 result of time_since_epoc_second divided by 3600).
 My key validation type is LongType.
 I wonder whether it is better to use TimeUUID or even readable string
 representation for time?
 Thanks,

 --
 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 [image: Inline image 1]

 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956




tokLogo.png

Combining Cassandra with some SQL language

2012-02-26 Thread R. Verlangen
Hi there,

I'm currently busy with the technical design of a new project. Of course it
will depend on your needs, but is it weird to combine Cassandra with a SQL
language like MySQL?

In my usecase it would be nice because we have some tables/CF's with lots
and lots of data that does not really have to be consistent 100%, but also
have some data that should be always consistent.

What do you think of this?

With kind regards,
Robin Verlangen


Re: Combining Cassandra with some SQL language

2012-02-26 Thread R. Verlangen
Ok, thank you all for your opinions. Seems that I can continue without any
extra db-model headaches ;-)

2012/2/27 Sanjay Sharma sanjay.sha...@impetus.co.in

  Kundera (https://github.com/impetus-opensource/Kundera)- an open source
 APL Java ORM allows polyglot persistence between  RDBMS and NoSQL databases
 such as Cassandra, MongoDB, HBase etc. transparently to the business logic
 developer.



 A note of caution- this does not mean that Cassandra data modeling can be
 bypassed- NoSQL entities still need to be modeled in such a way so as to
 best use Cassandra capabilities.

 Kundera can also take care of relationship between the entities in RDBMS.
  Transactions management is still pending however.





 Regards,

 Sanjay

 *From:* Adam Haney [mailto:adam.ha...@retickr.com]
 *Sent:* Sunday, February 26, 2012 7:51 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Combining Cassandra with some SQL language



 I've been using a combination of MySQL and Cassandra for about a year now
 on a project that now serves about 20k users. We use Cassandra for storing
 large entities and MySQL to store meta data that allows us to do better ad
 hoc querying. It's worked quite well for us. During this time we have also
 been able to migrate some of our tables in MySQL to Cassandra if MySQL
 performance / capacity became a problem. This may seem obvious but if
 you're planning on creating a data model that spans multiple databases make
 sure you encapsulate the logic to read/write/delete information in a good
 data model library and only use that library to access your data. This is
 good practice anyway but when you add the extra complication of multiple
 databases that may reference one another it's an absolute must.

 On Sun, Feb 26, 2012 at 8:06 AM, R. Verlangen ro...@us2.nl wrote:

 Hi there,



 I'm currently busy with the technical design of a new project. Of course
 it will depend on your needs, but is it weird to combine Cassandra with a
 SQL language like MySQL?



 In my usecase it would be nice because we have some tables/CF's with lots
 and lots of data that does not really have to be consistent 100%, but also
 have some data that should be always consistent.



 What do you think of this?

 With kind regards,

 Robin Verlangen



 --

 Impetus’ Head of Innovation labs, Vineet Tyagi will be presenting on ‘Big
 Data Big Costs?’ at the Strata Conference, CA (Feb 28 - Mar 1)
 http://bit.ly/bSMWd7.

 Listen to our webcast ‘Hybrid Approach to Extend Web Apps to Tablets 
 Smartphones’ available at http://bit.ly/yQC1oD.


 NOTE: This message may contain information that is confidential,
 proprietary, privileged or otherwise protected by law. The message is
 intended solely for the named addressee. If received in error, please
 destroy and notify the sender. Any use of this email is prohibited when
 received in error. Impetus does not represent, warrant and/or guarantee,
 that the integrity of this communication has been maintained nor that the
 communication is free of errors, virus, interception or interference.



Re: List all keys with RandomPartitioner

2012-02-22 Thread R. Verlangen
You can leave the end key empty.

1) Start with startkey = 
2) Next iteration start with startkey = last key of the previous batch
3) Keep on going until you ran out of results

2012/2/22 Rafael Almeida almeida...@yahoo.com

 
  From: Franc Carter franc.car...@sirca.org.au
 To: user@cassandra.apache.org
 Sent: Wednesday, February 22, 2012 9:24 AM
 Subject: Re: List all keys with RandomPartitioner
 
 
 On Wed, Feb 22, 2012 at 8:47 PM, Flavio Baronti f.baro...@list-group.com
 wrote:
 
 I need to iterate over all the rows in a column family stored with
 RandomPartitioner.
 When I reach the end of a key slice, I need to find the token of the
 last key in order to ask for the next slice.
 I saw in an old email that the token for a specific key can be recoveder
 through FBUtilities.hash(). That class however is inside the full Cassandra
 jar, not inside the client-specific part.
 Is there a way to iterate over all the keys which does not require the
 server-side Cassandra jar?
 
 
 
 Does this help ?
 
 
  http://wiki.apache.org/cassandra/FAQ#iter_world


 I don't get it. It says to use the last key read as start key, but what
 should be used as end key?



Re: Please advise -- 750MB object possible?

2012-02-22 Thread R. Verlangen
I would suggest you chunk them down into small pieces (~ 10-50MB) and just
fetch all the parts you need. A problem might be that if fetching one
fails, the whole blob is useless.

2012/2/22 Rafael Almeida almeida...@yahoo.com

 Keep them where?

   --
 *From:* Mohit Anchlia mohitanch...@gmail.com
 *To:* user@cassandra.apache.org
 *Cc:* potek...@bnl.gov
 *Sent:* Wednesday, February 22, 2012 3:44 PM
 *Subject:* Re: Please advise -- 750MB object possible?

 In my opinion if you are busy site or application keep blobs out of the
 database.

 On Wed, Feb 22, 2012 at 9:37 AM, Dan Retzlaff dretzl...@gmail.com wrote:

 Chunking is a good idea, but you'll have to do it yourself. A few of the
 columns in our application got quite large (maybe ~150MB) and the failure
 mode was RPC timeout exceptions. Nodes couldn't always move that much data
 across our data center interconnect in the default 10 seconds. With enough
 heap and a faster network you could probably get by without chunking, but
 it's not ideal.


 On Wed, Feb 22, 2012 at 9:04 AM, Maxim Potekhin potek...@bnl.gov wrote:

 Hello everybody,

 I'm being asked whether we can serve an object, which I assume is a
 blob, of 750MB size?
 I guess the real question is of how to chunk it and/or even it's possible
 to chunk it.

 Thanks!

 Maxim








Re: Newbie Question: Cassandra consuming 100% CPU on ubuntu server

2012-02-18 Thread R. Verlangen
You might want to check your Cassandra logs, they contain important
information that might lead you to the actual cause of the problems.

2012/2/18 Aditya Gupta ady...@gmail.com

 Thanks! But what about the 100% cpu consumption that is causing the server
 to hang?


 On Sat, Feb 18, 2012 at 6:19 PM, Watanabe Maki watanabe.m...@gmail.comwrote:

 I haven't use the packaged kit, but Cassandra uses half of physical
 memory on your system by default.
 You need to edit cassandra-env.sh to decrease heap size.
 Update MAX_HEAP_SIZE and NEW_HEAP_SIZE and restart.

 From iPhone


 On 2012/02/18, at 20:40, Aditya Gupta ady...@gmail.com wrote:

 I just installed Cassandra on my ubuntu server by adding the following to
 the sources list:

 deb http://www.apache.org/dist/cassandra/debian 10x main
 deb-src http://www.apache.org/dist/cassandra/debian 10x main


 Soon after install I started getting OOM errors  then the server became
 unresponsive. I added more RAM to the server but found that cassandra was
 consuming 100% CPU  1GB RAM as soon the server was being started. Why is
 this happening  how can get it to normal conditions ?





Re: Replication factor per column family

2012-02-17 Thread R. Verlangen
Ok, that's clear, thank you for your time!

2012/2/16 aaron morton aa...@thelastpickle.com

 yes.

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 16/02/2012, at 10:15 PM, R. Verlangen wrote:

 Hmm ok. This means if I want to have a CF with RF = 3 and another CF with
 RF = 1 (e.g. some debug logging) I will have to create 2 keyspaces?

 2012/2/16 aaron morton aa...@thelastpickle.com

 Multiple CF mutations for a row are treated atomically in the commit log,
 and they are sent together to the replicas. Replication occurs at the row
 level, not the row+cf level.

 If each CF had it's own RF, odd things may happen. Like sending a batch
 mutation for one row and two CF's that fails because there is not enough
 nodes for one of the CF's.

 Would be other reasons as well. In short it's baked in.

 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 16/02/2012, at 9:54 PM, R. Verlangen wrote:

 Hi there,

 As the subject states: Is it possible to set a replication factor per
 column family?

 Could not find anything of recent releases. I'm running Cassandra 1.0.7
 and I think it should be possible on a per CF basis instead of the whole
 keyspace.

 With kind regards,
 Robin







Re: CQL query issue when fetching data from Cassandra

2012-02-16 Thread R. Verlangen
I'm not sure about your first 2 questions. The third might be an exception:
check your Cassandra logs.

About the like-thing: there's no such query possibiliy in Cassandra / CQL.

You can take a look at Hadoop / Hive to tackle those problems.

2012/2/16 Roshan codeva...@gmail.com

 Hi

 I am using Cassandra 1.0.6 version and having one column family in my
 keyspace.

 create column family TestCF
with comparator = UTF8Type
and column_metadata = [
{column_name : userid,
validation_class : BytesType,
index_name : userid_idx,
index_type : KEYS},
{column_name : workspace,
validation_class : BytesType,
index_name : wp_idx,
index_type : KEYS},
{column_name : module,
validation_class : BytesType,
index_name : module_idx,
index_type : KEYS},
{column_name : action,
validation_class : BytesType,
index_name : action_idx,
index_type : KEYS},
{column_name : description,
validation_class : BytesType},
{column_name : status,
validation_class : BytesType,
index_name : status_idx,
index_type : KEYS},
{column_name : createdtime,
validation_class : BytesType},
{column_name : created,
validation_class : BytesType,
index_name : created_idx,
index_type : KEYS},
{column_name : logdetail,
validation_class : BytesType}]
and keys_cached = 1
and rows_cached = 1000
and row_cache_save_period = 0
and key_cache_save_period = 3600
and memtable_throughput = 255
and memtable_operations = 0.29;

 1). The IN operator is not working
 SELECT * FROM TestCF WHERE status IN ('Failed', 'Success')
 2) The OR operator is not fetching data.
SELECT * FROM TestCF WHERE status='Failed' OR status='Success'
 3) If I use AND operator, it also not sending data. Query doesn't have
 issues, but result set is null.
SELECT * FROM TestCF WHERE status='Failed' AND status='Success'
 4) Is there any thing similar to LIKE in CQL? I want to search data based
 on some part of string.

 Could someone please help me to solve the above issues? Thanks.



 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/CQL-query-issue-when-fetching-data-from-Cassandra-tp7290072p7290072.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.



Re: Wide row column slicing - row size shard limit

2012-02-16 Thread R. Verlangen
Things you should know:

- Thrift has a limit on the amount of data it will accept / send, you can
configure this in Cassandra: 64MB's should still work find (1)
- Rows should not become huge: this will make perfect load balancing
impossible in your cluster
- A single row should fit on a disk
- The limit of columns per row is 2 billion

You should pick a range for your time range (e.g. second, minute, ..) that
suits your needs.

As far as I'm aware of, there's no such limit as 10MB in Cassandra for a
single row to decrease performance. Might be a memory / IO problem.

2012/2/15 Data Craftsman database.crafts...@gmail.com

 Hello experts,

 Based on this blog of Basic Time Series with Cassandra data modeling,
 http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/

 This (wide row column slicing) works well enough for a while, but over
 time, this row will get very large. If you are storing sensor data that
 updates hundreds of times per second, that row will quickly become gigantic
 and unusable. The answer to that is to shard the data up in some way

 There is a limit on how big the row size can be before slowing down the
 update and query performance, that is 10MB or less.

 Is this still true in Cassandra latest version? or in what release
 Cassandra will remove this limit?

 Manually sharding the wide row will increase the application complexity,
 it would be better if Cassandra can handle it transparently.

 Thanks,
 Charlie | DBA  Developer

 p.s. Quora link,

 http://www.quora.com/Cassandra-database/What-are-good-ways-to-design-data-model-in-Cassandra-for-historical-data





Replication factor per column family

2012-02-16 Thread R. Verlangen
Hi there,

As the subject states: Is it possible to set a replication factor per
column family?

Could not find anything of recent releases. I'm running Cassandra 1.0.7 and
I think it should be possible on a per CF basis instead of the whole
keyspace.

With kind regards,
Robin


Re: Replication factor per column family

2012-02-16 Thread R. Verlangen
Hmm ok. This means if I want to have a CF with RF = 3 and another CF with
RF = 1 (e.g. some debug logging) I will have to create 2 keyspaces?

2012/2/16 aaron morton aa...@thelastpickle.com

 Multiple CF mutations for a row are treated atomically in the commit log,
 and they are sent together to the replicas. Replication occurs at the row
 level, not the row+cf level.

 If each CF had it's own RF, odd things may happen. Like sending a batch
 mutation for one row and two CF's that fails because there is not enough
 nodes for one of the CF's.

 Would be other reasons as well. In short it's baked in.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 16/02/2012, at 9:54 PM, R. Verlangen wrote:

 Hi there,

 As the subject states: Is it possible to set a replication factor per
 column family?

 Could not find anything of recent releases. I'm running Cassandra 1.0.7
 and I think it should be possible on a per CF basis instead of the whole
 keyspace.

 With kind regards,
 Robin





Re: Deleting a column vs setting it's value to empty

2012-02-14 Thread R. Verlangen
  Setting to  may cause you less headaches as you won't have to deal
with tombstones  

You won't have to deal with tombstones manually, the Thrift API will take
care of this. Deleting an empty column value will always be better; with
one exception, when empty does actually mean something else then
non-existing.

2012/2/10 Narendra Sharma narendra.sha...@gmail.com

 IMO deleting is always better. It is better to not store the column if
 there is no value associated.

 -Naren


 On Fri, Feb 10, 2012 at 12:15 PM, Drew Kutcharian d...@venarc.com wrote:

 Hi Everyone,

 Let's say I have the following object which I would like to save in
 Cassandra:

 class User {
  UUID id; //row key
  String name; //columnKey: name, columnValue: the name of the user
  String description; //columnKey: description, columnValue: the
 description of the user
 }

 Description can be nullable. What's the best approach when a user updates
 her description and sets it to null? Should I delete the description column
 or set it to an empty string?

 In addition, if I go with the delete column strategy, since I don't know
 what was the previous value of description (the column could not even
 exist), what would happen when I delete a non existent column?

 Thanks,

 Drew




 --
 Narendra Sharma
 Software Engineer
 *http://www.aeris.com http://www.persistentsys.com*
 *http://narendrasharma.blogspot.com/*





Re: Querying for rows without a particular column

2012-02-14 Thread R. Verlangen
One option might be to maintain an index containing the keys of the rows.
The index would then have the same TTL as the row itself so when you
iterate over the index columns you'll find exactly the same results.
Although I'm not really sure whether this is the best option.

Another might be to use Hadoop to find your results with a map/reduce task.

2012/2/13 Asankha C. Perera asan...@apache.org

 Hi All

 I am using expiring columns in my column family, and need to search for
 the rows where a particular column expired (and no longer exists).. I am
 using Hector client. How can I make a query to find the rows of my interest?

 thanks
 asankha

 --
 Asankha C. Perera
 AdroitLogic, http://adroitlogic.org

 http://esbmagic.blogspot.com







Re: deleting rows and tombstones

2012-02-14 Thread R. Verlangen
Are you planning to insert rows with keys that existed before?

If that's true, there will be no tombstones (as far as I understand
Cassandra).

It that's not, then you will get tombstones that might slow down the reads
because they have to be skipped until the next compaction.

2012/2/14 Todd Burruss bburr...@expedia.com

 my design calls for deleting a row (by key, not individual columns) and
 re-inserting it a lot and I'm concerned about tombstone build up slowing
 down reads.  I know if I delete a lot of individual columns the tombstones
 will build up and slow down reads until they are cleaned up, but not sure
 if the same holds for deleting the whole role.

 thoughts?



Re: timed-out retrieving a giant row.

2012-02-14 Thread R. Verlangen
I'm familiar to this in PHPCassa, but with Hector it would be something
like this:

Query you CF with a range.setStart(lastColName) and
range.setFinish(StringUtils.byte() where the  lastColName  is the name
of the column from the previous read.

You can continue this until you run out of results.

2012/2/14 Yuhan Zhang yzh...@onescreen.com

 Hi all,

 I'm using the Hector client 0.8, trying to retrieve a list of IDs from a
 gaint row. each ID is a columnName in the row
 It works ok when there's not many IDs, but SliceQuery starts to time-out
 after the row becomes big.

 Is this approach the correct way to store a list of IDs? are there some
 settings that I'm missing?
 by looking at the code, it sets the range of the columnNames to be
 setRange(null, null, false, Integer.MAX_VALUE);

 is there a way in cassandra to retrieve the first 100 columns, then the
 next 100 columns, and so forth?


 Thank you.

 Yuhan



Re: timed-out retrieving a giant row.

2012-02-14 Thread R. Verlangen
Of course you should set your limit to 100 or something like that, not
Integer.MAX_VALUE ;-)

2012/2/14 R. Verlangen ro...@us2.nl

 I'm familiar to this in PHPCassa, but with Hector it would be something
 like this:

 Query you CF with a range.setStart(lastColName) and
 range.setFinish(StringUtils.byte() where the  lastColName  is the name
 of the column from the previous read.

 You can continue this until you run out of results.


 2012/2/14 Yuhan Zhang yzh...@onescreen.com

 Hi all,

 I'm using the Hector client 0.8, trying to retrieve a list of IDs from a
 gaint row. each ID is a columnName in the row
 It works ok when there's not many IDs, but SliceQuery starts to time-out
 after the row becomes big.

 Is this approach the correct way to store a list of IDs? are there some
 settings that I'm missing?
 by looking at the code, it sets the range of the columnNames to be
 setRange(null, null, false, Integer.MAX_VALUE);

 is there a way in cassandra to retrieve the first 100 columns, then the
 next 100 columns, and so forth?


 Thank you.

 Yuhan





Re: keycache persisted to disk ?

2012-02-13 Thread R. Verlangen
This is because of the warm up of Cassandra as it starts. On a start it
will start fetching the rows that were cached: this will have to be loaded
from the disk, as there is nothing in the cache yet. You can read more
about this at  http://wiki.apache.org/cassandra/LargeDataSetConsiderations

2012/2/13 Franc Carter franc.car...@sirca.org.au

 On Mon, Feb 13, 2012 at 5:03 PM, zhangcheng zhangch...@jike.com wrote:

 **

 I think the keycaches and rowcahches are bothe persisted to disk when
 shutdown, and restored from disk when restart, then improve the performance.


 Thanks - that would explain at least some of what I am seeing

 cheers



 2012-02-13
 --
  zhangcheng
 --
 *发件人:* Franc Carter
 *发送时间:* 2012-02-13  13:53:56
 *收件人:* user
 *抄送:*
 *主题:* keycache persisted to disk ?

 Hi,

 I am testing Cassandra on Amazon and finding performance can vary fairly
 wildly. I'm leaning towards it being an artifact of the AWS I/O system but
 have one other possibility.

 Are keycaches persisted to disk and restored on a clean shutdown and
 restart ?

 cheers

 --

 *Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 9236 9118

 Level 9, 80 Clarence St, Sydney NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215




 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 9236 9118

 Level 9, 80 Clarence St, Sydney NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215




Re: keycache persisted to disk ?

2012-02-13 Thread R. Verlangen
I also noticed that, Cassandra appears to perform better under a continues
load.

Are you sure the rows you're quering are actually in the cache?

2012/2/13 Franc Carter franc.car...@sirca.org.au

 2012/2/13 R. Verlangen ro...@us2.nl

 This is because of the warm up of Cassandra as it starts. On a start it
 will start fetching the rows that were cached: this will have to be loaded
 from the disk, as there is nothing in the cache yet. You can read more
 about this at
 http://wiki.apache.org/cassandra/LargeDataSetConsiderations


 I actually has the opposite 'problem'. I have a pair of servers that have
 been static since mid last week, but have seen performance vary
 significantly (x10) for exactly the same query. I hypothesised it was
 various caches so I shut down Cassandra, flushed the O/S buffer cache and
 then bought it back up. The performance wasn't significantly different to
 the pre-flush performance

 cheers




 2012/2/13 Franc Carter franc.car...@sirca.org.au

 On Mon, Feb 13, 2012 at 5:03 PM, zhangcheng zhangch...@jike.com wrote:

 **

 I think the keycaches and rowcahches are bothe persisted to disk when
 shutdown, and restored from disk when restart, then improve the 
 performance.


 Thanks - that would explain at least some of what I am seeing

 cheers



 2012-02-13
 --
  zhangcheng
 --
 *发件人:* Franc Carter
 *发送时间:* 2012-02-13  13:53:56
 *收件人:* user
 *抄送:*
 *主题:* keycache persisted to disk ?

 Hi,

 I am testing Cassandra on Amazon and finding performance can vary
 fairly wildly. I'm leaning towards it being an artifact of the AWS I/O
 system but have one other possibility.

 Are keycaches persisted to disk and restored on a clean shutdown and
 restart ?

 cheers

 --

 *Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 9236 9118

 Level 9, 80 Clarence St, Sydney NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215




 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 9236 9118

 Level 9, 80 Clarence St, Sydney NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215





 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 9236 9118

 Level 9, 80 Clarence St, Sydney NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215




Re: Best way to know the cluster status

2012-02-06 Thread R. Verlangen
You might consider writing some kind of php script that runs nodetool
ring and parse the output?

2012/2/6 Tamil selvan R.S tamil.3...@gmail.com

 Hi,
  What is the best way to know the cluster status via php?
  Currently we are trying to connect to individual cassandra instance with
 a specified timeout and if it fails we report the node to be down.
  But this test remains faulty. What are the other ways to test
 availability of nodes in cassandra cluster?
  How does datastax opscenter manage to  do that?

 Regards,
 Tamil Selvan



Re: nodetool hangs and didn't print anything with firewall

2012-02-06 Thread R. Verlangen
Do you allow both outbound as inbound traffic? You might also try allowing
both TCP as UDP.

2012/2/6 Roshan codeva...@gmail.com

 Yes, If the firewall is disable it works.

 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/nodetool-hangs-and-didn-t-print-anything-with-firewall-tp7257286p7257310.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.



Re: yet a couple more questions on composite columns

2012-02-05 Thread R. Verlangen
Yiming, I am using 2 CF's. Performance wise this should not be an issue. I
use it for small files data store. My 2 CF's are:

FilesMeta
FilesData

2012/2/5 Yiming Sun yiming@gmail.com

 Interesting idea, Jim.  Is there a reason you don't you use
 metadata:{accountId} instead?  For performance reasons?


 On Sat, Feb 4, 2012 at 6:24 PM, Jim Ancona j...@anconafamily.com wrote:

 I've used special values which still comply with the Composite
 schema for the metadata columns, e.g. a column of
 1970-01-01:{accountId} for a metadata column where the Composite is
 DateType:UTF8Type.

 Jim

 On Sat, Feb 4, 2012 at 2:13 PM, Yiming Sun yiming@gmail.com wrote:
  Thanks Andrey and Chris.  It sounds like we don't necessarily have to
 use
  composite columns.  From what I understand about dynamic CF, each row
 may
  have completely different data from other rows;  but in our case, the
 data
  in each row is similar to other rows; my concern was more about the
  homogeneity of the data between columns.
 
  In our original supercolumn-based schema, one special supercolumn is
 called
  metadata which contains a number of subcolumns to hold metadata
 describing
  each collection (e.g. number of documents, etc.), then the rest of the
  supercolumns in the same row are all IDs of documents belong to the
  collection, and for each document supercolumn, the subcolumns contain
 the
  document content as well as metadata on individual document (e.g.
 checksum
  of each document).
 
  To move away from the supercolumn schema, I could either create two
 CFs, one
  to hold metadata, the other document content; or I could create just
 one CF
  mixing metadata and doc content in the same row, and using composite
 column
  names to identify if the particular column is metadata or a document.
  I am
  just wondering if you have any inputs on the pros and cons of each
 schema.
 
  -- Y.
 
 
  On Fri, Feb 3, 2012 at 10:27 PM, Chris Gerken 
 chrisger...@mindspring.com
  wrote:
 
 
 
 
  On 4 February 2012 06:21, Yiming Sun yiming@gmail.com wrote:
 
  I cannot have one composite column name with 3 components while
 another
  with 4 components?
 
   Just put 4 components and left last empty (if it is same type)?!
 
  Another question I have is how flexible composite columns actually
 are.
   If my data model has a CF containing US zip codes with the following
  composite columns:
 
  {OH:Spring Field} : 45503
  {OH:Columbus} : 43085
  {FL:Spring Field} : 32401
  {FL:Key West}  : 33040
 
  I know I can ask cassandra to give me the zip codes of all cities in
  OH.  But can I ask it to give me the zip codes of all cities named
 Spring
  Field using this model?  Thanks.
 
  No. You set first composite component at first.
 
 
  I'd use a dynamic CF:
  row key = state abbreviation
  column name = city name
  column value = zip code (or a complex object, one of whose properties
 is
  zip code)
 
  you can iterate over the columns in a single row to get a state's city
  names and their zip code and you can do a get_range_slices on all keys
 for
  the columns starting and ending on the city name to find out the zip
 codes
  for a cities with the given name.
 
  I think
 
  - Chris
 
 





Re: yet a couple more questions on composite columns

2012-02-04 Thread R. Verlangen
I also made something like this a while ago. I decided to go for the
2-rows-solution: by doing that you don't have the need for super columns.
Cassandra is really good at reading, so this should not be an issue.

Cheers!

2012/2/4 Yiming Sun yiming@gmail.com

 Thanks Andrey and Chris.  It sounds like we don't necessarily have to use
 composite columns.  From what I understand about dynamic CF, each row may
 have completely different data from other rows;  but in our case, the data
 in each row is similar to other rows; my concern was more about the
 homogeneity of the data between columns.

 In our original supercolumn-based schema, one special supercolumn is
 called metadata which contains a number of subcolumns to hold metadata
 describing each collection (e.g. number of documents, etc.), then the rest
 of the supercolumns in the same row are all IDs of documents belong to the
 collection, and for each document supercolumn, the subcolumns contain the
 document content as well as metadata on individual document (e.g. checksum
 of each document).

 To move away from the supercolumn schema, I could either create two CFs,
 one to hold metadata, the other document content; or I could create just
 one CF mixing metadata and doc content in the same row, and using composite
 column names to identify if the particular column is metadata or a
 document.  I am just wondering if you have any inputs on the pros and cons
 of each schema.

 -- Y.


 On Fri, Feb 3, 2012 at 10:27 PM, Chris Gerken 
 chrisger...@mindspring.comwrote:




 On 4 February 2012 06:21, Yiming Sun yiming@gmail.com wrote:

 I cannot have one composite column name with 3 components while another
 with 4 components?

  Just put 4 components and left last empty (if it is same type)?!

 Another question I have is how flexible composite columns actually are.
  If my data model has a CF containing US zip codes with the following
 composite columns:

 {OH:Spring Field} : 45503
 {OH:Columbus} : 43085
 {FL:Spring Field} : 32401
 {FL:Key West}  : 33040

 I know I can ask cassandra to give me the zip codes of all cities in
 OH.  But can I ask it to give me the zip codes of all cities named Spring
 Field using this model?  Thanks.

 No. You set first composite component at first.


 I'd use a dynamic CF:
 row key = state abbreviation
 column name = city name
 column value = zip code (or a complex object, one of whose properties is
 zip code)

 you can iterate over the columns in a single row to get a state's city
 names and their zip code and you can do a get_range_slices on all keys for
 the columns starting and ending on the city name to find out the zip codes
 for a cities with the given name.

 I think

 - Chris





Re: yet a couple more questions on composite columns

2012-02-04 Thread R. Verlangen
I just kept both row keys the same. This was very trivial for fetching them
both. When you have A, you can fetch B, and vice versa.

2012/2/4 Yiming Sun yiming@gmail.com

 Interesting idea, R.V.  But what did you do with the row keys?


 On Sat, Feb 4, 2012 at 2:29 PM, R. Verlangen ro...@us2.nl wrote:

 I also made something like this a while ago. I decided to go for the
 2-rows-solution: by doing that you don't have the need for super columns.
 Cassandra is really good at reading, so this should not be an issue.

 Cheers!


 2012/2/4 Yiming Sun yiming@gmail.com

 Thanks Andrey and Chris.  It sounds like we don't necessarily have to
 use composite columns.  From what I understand about dynamic CF, each row
 may have completely different data from other rows;  but in our case, the
 data in each row is similar to other rows; my concern was more about the
 homogeneity of the data between columns.

 In our original supercolumn-based schema, one special supercolumn is
 called metadata which contains a number of subcolumns to hold metadata
 describing each collection (e.g. number of documents, etc.), then the rest
 of the supercolumns in the same row are all IDs of documents belong to the
 collection, and for each document supercolumn, the subcolumns contain the
 document content as well as metadata on individual document (e.g. checksum
 of each document).

 To move away from the supercolumn schema, I could either create two CFs,
 one to hold metadata, the other document content; or I could create just
 one CF mixing metadata and doc content in the same row, and using composite
 column names to identify if the particular column is metadata or a
 document.  I am just wondering if you have any inputs on the pros and cons
 of each schema.

 -- Y.


 On Fri, Feb 3, 2012 at 10:27 PM, Chris Gerken 
 chrisger...@mindspring.com wrote:




 On 4 February 2012 06:21, Yiming Sun yiming@gmail.com wrote:

 I cannot have one composite column name with 3 components while
 another with 4 components?

  Just put 4 components and left last empty (if it is same type)?!

 Another question I have is how flexible composite columns actually are.
  If my data model has a CF containing US zip codes with the following
 composite columns:

 {OH:Spring Field} : 45503
 {OH:Columbus} : 43085
 {FL:Spring Field} : 32401
 {FL:Key West}  : 33040

 I know I can ask cassandra to give me the zip codes of all cities in
 OH.  But can I ask it to give me the zip codes of all cities named 
 Spring
 Field using this model?  Thanks.

 No. You set first composite component at first.


 I'd use a dynamic CF:
 row key = state abbreviation
 column name = city name
 column value = zip code (or a complex object, one of whose properties
 is zip code)

 you can iterate over the columns in a single row to get a state's city
 names and their zip code and you can do a get_range_slices on all keys for
 the columns starting and ending on the city name to find out the zip codes
 for a cities with the given name.

 I think

 - Chris







Re: Restart cassandra every X days?

2012-02-02 Thread R. Verlangen
Yes, I already did a repair and cleanup. Currently my ring looks like this:

Address DC  RackStatus State   LoadOwns
   Token
***.89datacenter1 rack1   Up Normal  2.44 GB 50.00%  0
***.135datacenter1 rack1   Up Normal  6.99 GB 50.00%
 85070591730234615865843651857942052864

It's not really a problem, but I'm still wondering why this happens.

2012/2/1 aaron morton aa...@thelastpickle.com

 Do you mean the load in nodetool ring is not even, despite the tokens been
 evenly distributed ?

 I would assume this is not the case given the difference, but it may be
 hints given you have just done an upgrade. Check the system using nodetool
 cfstats to see. They will eventually be delivered and deleted.

 More likely you will want to:
 1) nodetool repair to make sure all data is distributed then
 2) nodetool cleanup if you have changed the tokens at any point finally

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 31/01/2012, at 11:56 PM, R. Verlangen wrote:

 After running 3 days on Cassandra 1.0.7 it seems the problem has been
 solved. One weird thing remains, on our 2 nodes (both 50% of the ring), the
 first's usage is just over 25% of the second.

 Anyone got an explanation for that?

 2012/1/29 aaron morton aa...@thelastpickle.com

 Yes but…

 For every upgrade read the NEWS.TXT it will go through the upgrade
 procedure in detail. If you want to feel extra smart scan through the
 CHANGES.txt to get an idea of whats going on.

 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 29/01/2012, at 4:14 AM, Maxim Potekhin wrote:

  Sorry if this has been covered, I was concentrating solely on 0.8x --
 can I just d/l 1.0.x and continue using same data on same cluster?

 Maxim


 On 1/28/2012 7:53 AM, R. Verlangen wrote:

 Ok, seems that it's clear what I should do next ;-)

 2012/1/28 aaron morton aa...@thelastpickle.com

 There are no blockers to upgrading to 1.0.X.

  A
  -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

   On 28/01/2012, at 7:48 AM, R. Verlangen wrote:

 Ok. Seems that an upgrade might fix these problems. Is Cassandra 1.x.x
 stable enough to upgrade for, or should we wait for a couple of weeks?

 2012/1/27 Edward Capriolo edlinuxg...@gmail.com

 I would not say that issuing restart after x days is a good idea. You
 are mostly developing a superstition. You should find the source of the
 problem. It could be jmx or thrift clients not closing connections. We
 don't restart nodes on a regiment they work fine.


 On Thursday, January 26, 2012, Mike Panchenko m...@mihasya.com wrote:
  There are two relevant bugs (that I know of), both resolved in
 somewhat recent versions, which make somewhat regular restarts beneficial
  https://issues.apache.org/jira/browse/CASSANDRA-2868 (memory leak in
 GCInspector, fixed in 0.7.9/0.8.5)
  https://issues.apache.org/jira/browse/CASSANDRA-2252 (heap
 fragmentation due to the way memtables used to be allocated, refactored in
 1.0.0)
  Restarting daily is probably too frequent for either one of those
 problems. We usually notice degraded performance in our ancient cluster
 after ~2 weeks w/o a restart.
  As Aaron mentioned, if you have plenty of disk space, there's no
 reason to worry about cruft sstables. The size of your active set is what
 matters, and you can determine if that's getting too big by watching for
 iowait (due to reads from the data partition) and/or paging activity of the
 java process. When you hit that problem, the solution is to 1. try to tune
 your caches and 2. add more nodes to spread the load. I'll reiterate -
 looking at raw disk space usage should not be your guide for that.
  Forcing a gc generally works, but should not be relied upon (note
 suggest in
 http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#gc()).
 It's great news that 1.0 uses a better mechanism for releasing unused
 sstables.
  nodetool compact triggers a major compaction and is no longer a
 recommended by datastax (details here
 http://www.datastax.com/docs/1.0/operations/tuning#tuning-compactionbottom 
 of the page).
  Hope this helps.
  Mike.
  On Wed, Jan 25, 2012 at 5:14 PM, aaron morton 
 aa...@thelastpickle.com wrote:
 
  That disk usage pattern is to be expected in pre 1.0 versions. Disk
 usage is far less interesting than disk free space, if it's using 60 GB and
 there is 200GB thats ok. If it's using 60Gb and there is 6MB free thats a
 problem.
  In pre 1.0 the compacted files are deleted on disk by waiting for the
 JVM do decide to GC all remaining references. If there is not enough space
 (to store the total size of the files it is about to write or compact) on
 disk GC is forced and the files are deleted. Otherwise they will get
 deleted at some point in the future.
  In 1.0 files are reference counted

Re: Restart cassandra every X days?

2012-02-02 Thread R. Verlangen
Well, it seems it's balancing itself, 24 hours later the ring looks like
this:

***.89datacenter1 rack1   Up Normal  7.36 GB 50.00%  0
***.135datacenter1 rack1   Up Normal  8.84 GB 50.00%
 85070591730234615865843651857942052864

Looks pretty normal, right?

2012/2/2 aaron morton aa...@thelastpickle.com

 Speaking technically, that ain't right.

 I would:
 * Check if node .135 is holding a lot of hints.
 * Take a look on disk and see what is there.
 * Go through a repair and compact on each node.


 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 2/02/2012, at 9:55 PM, R. Verlangen wrote:

 Yes, I already did a repair and cleanup. Currently my ring looks like this:

 Address DC  RackStatus State   Load
  OwnsToken
 ***.89datacenter1 rack1   Up Normal  2.44 GB 50.00%  0
 ***.135datacenter1 rack1   Up Normal  6.99 GB 50.00%
  85070591730234615865843651857942052864

 It's not really a problem, but I'm still wondering why this happens.

 2012/2/1 aaron morton aa...@thelastpickle.com

 Do you mean the load in nodetool ring is not even, despite the tokens
 been evenly distributed ?

 I would assume this is not the case given the difference, but it may be
 hints given you have just done an upgrade. Check the system using nodetool
 cfstats to see. They will eventually be delivered and deleted.

 More likely you will want to:
 1) nodetool repair to make sure all data is distributed then
 2) nodetool cleanup if you have changed the tokens at any point finally

 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 31/01/2012, at 11:56 PM, R. Verlangen wrote:

 After running 3 days on Cassandra 1.0.7 it seems the problem has been
 solved. One weird thing remains, on our 2 nodes (both 50% of the ring), the
 first's usage is just over 25% of the second.

 Anyone got an explanation for that?

 2012/1/29 aaron morton aa...@thelastpickle.com

 Yes but…

 For every upgrade read the NEWS.TXT it will go through the upgrade
 procedure in detail. If you want to feel extra smart scan through the
 CHANGES.txt to get an idea of whats going on.

 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 29/01/2012, at 4:14 AM, Maxim Potekhin wrote:

  Sorry if this has been covered, I was concentrating solely on 0.8x --
 can I just d/l 1.0.x and continue using same data on same cluster?

 Maxim


 On 1/28/2012 7:53 AM, R. Verlangen wrote:

 Ok, seems that it's clear what I should do next ;-)

 2012/1/28 aaron morton aa...@thelastpickle.com

 There are no blockers to upgrading to 1.0.X.

  A
  -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

   On 28/01/2012, at 7:48 AM, R. Verlangen wrote:

 Ok. Seems that an upgrade might fix these problems. Is Cassandra 1.x.x
 stable enough to upgrade for, or should we wait for a couple of weeks?

 2012/1/27 Edward Capriolo edlinuxg...@gmail.com

 I would not say that issuing restart after x days is a good idea. You
 are mostly developing a superstition. You should find the source of the
 problem. It could be jmx or thrift clients not closing connections. We
 don't restart nodes on a regiment they work fine.


 On Thursday, January 26, 2012, Mike Panchenko m...@mihasya.com wrote:
  There are two relevant bugs (that I know of), both resolved in
 somewhat recent versions, which make somewhat regular restarts beneficial
  https://issues.apache.org/jira/browse/CASSANDRA-2868 (memory leak
 in GCInspector, fixed in 0.7.9/0.8.5)
  https://issues.apache.org/jira/browse/CASSANDRA-2252 (heap
 fragmentation due to the way memtables used to be allocated, refactored in
 1.0.0)
  Restarting daily is probably too frequent for either one of those
 problems. We usually notice degraded performance in our ancient cluster
 after ~2 weeks w/o a restart.
  As Aaron mentioned, if you have plenty of disk space, there's no
 reason to worry about cruft sstables. The size of your active set is 
 what
 matters, and you can determine if that's getting too big by watching for
 iowait (due to reads from the data partition) and/or paging activity of 
 the
 java process. When you hit that problem, the solution is to 1. try to tune
 your caches and 2. add more nodes to spread the load. I'll reiterate -
 looking at raw disk space usage should not be your guide for that.
  Forcing a gc generally works, but should not be relied upon (note
 suggest in
 http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#gc()).
 It's great news that 1.0 uses a better mechanism for releasing unused
 sstables.
  nodetool compact triggers a major compaction and is no longer a
 recommended by datastax (details here
 http://www.datastax.com/docs/1.0/operations/tuning#tuning-compactionbottom
  of the page).
  Hope

Re: Can you query Cassandra while it's doing major compaction

2012-02-02 Thread R. Verlangen
It will have a performance penalty, so it would be better to spread the
compactions over a period of time. But Cassandra will still take care of
any reads/writes (within the given timeout).

2012/2/3 myreasoner myreaso...@gmail.com

 If every node in the cluster is running major compaction, would it be able
 to
 answer any read request?  And is it wise to write anything to a cluster
 while it's doing major compaction?



 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-you-query-Cassandra-while-it-s-doing-major-compaction-tp7249985p7249985.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.



Re: Restart cassandra every X days?

2012-01-31 Thread R. Verlangen
After running 3 days on Cassandra 1.0.7 it seems the problem has been
solved. One weird thing remains, on our 2 nodes (both 50% of the ring), the
first's usage is just over 25% of the second.

Anyone got an explanation for that?

2012/1/29 aaron morton aa...@thelastpickle.com

 Yes but…

 For every upgrade read the NEWS.TXT it will go through the upgrade
 procedure in detail. If you want to feel extra smart scan through the
 CHANGES.txt to get an idea of whats going on.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 29/01/2012, at 4:14 AM, Maxim Potekhin wrote:

  Sorry if this has been covered, I was concentrating solely on 0.8x --
 can I just d/l 1.0.x and continue using same data on same cluster?

 Maxim


 On 1/28/2012 7:53 AM, R. Verlangen wrote:

 Ok, seems that it's clear what I should do next ;-)

 2012/1/28 aaron morton aa...@thelastpickle.com

 There are no blockers to upgrading to 1.0.X.

  A
  -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

   On 28/01/2012, at 7:48 AM, R. Verlangen wrote:

 Ok. Seems that an upgrade might fix these problems. Is Cassandra 1.x.x
 stable enough to upgrade for, or should we wait for a couple of weeks?

 2012/1/27 Edward Capriolo edlinuxg...@gmail.com

 I would not say that issuing restart after x days is a good idea. You
 are mostly developing a superstition. You should find the source of the
 problem. It could be jmx or thrift clients not closing connections. We
 don't restart nodes on a regiment they work fine.


 On Thursday, January 26, 2012, Mike Panchenko m...@mihasya.com wrote:
  There are two relevant bugs (that I know of), both resolved in
 somewhat recent versions, which make somewhat regular restarts beneficial
  https://issues.apache.org/jira/browse/CASSANDRA-2868 (memory leak in
 GCInspector, fixed in 0.7.9/0.8.5)
  https://issues.apache.org/jira/browse/CASSANDRA-2252 (heap
 fragmentation due to the way memtables used to be allocated, refactored in
 1.0.0)
  Restarting daily is probably too frequent for either one of those
 problems. We usually notice degraded performance in our ancient cluster
 after ~2 weeks w/o a restart.
  As Aaron mentioned, if you have plenty of disk space, there's no
 reason to worry about cruft sstables. The size of your active set is what
 matters, and you can determine if that's getting too big by watching for
 iowait (due to reads from the data partition) and/or paging activity of the
 java process. When you hit that problem, the solution is to 1. try to tune
 your caches and 2. add more nodes to spread the load. I'll reiterate -
 looking at raw disk space usage should not be your guide for that.
  Forcing a gc generally works, but should not be relied upon (note
 suggest in
 http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#gc()).
 It's great news that 1.0 uses a better mechanism for releasing unused
 sstables.
  nodetool compact triggers a major compaction and is no longer a
 recommended by datastax (details here
 http://www.datastax.com/docs/1.0/operations/tuning#tuning-compactionbottom 
 of the page).
  Hope this helps.
  Mike.
  On Wed, Jan 25, 2012 at 5:14 PM, aaron morton aa...@thelastpickle.com
 wrote:
 
  That disk usage pattern is to be expected in pre 1.0 versions. Disk
 usage is far less interesting than disk free space, if it's using 60 GB and
 there is 200GB thats ok. If it's using 60Gb and there is 6MB free thats a
 problem.
  In pre 1.0 the compacted files are deleted on disk by waiting for the
 JVM do decide to GC all remaining references. If there is not enough space
 (to store the total size of the files it is about to write or compact) on
 disk GC is forced and the files are deleted. Otherwise they will get
 deleted at some point in the future.
  In 1.0 files are reference counted and space is freed much sooner.
  With regard to regular maintenance, node tool cleanup remvos data from
 a node that it is no longer a replica for. This is only of use when you
 have done a token move.
  I would not recommend a daily restart of the cassandra process. You
 will lose all the run time optimizations the JVM has made (i think the
 mapped files pages will stay resident). As well as adding additional
 entropy to the system which must be repaired via HH, RR or nodetool repair.
  If you want to see compacted files purged faster the best approach
 would be to upgrade to 1.0.
  Hope that helps.
  -
  Aaron Morton
  Freelance Developer
  @aaronmorton
  http://www.thelastpickle.com
  On 26/01/2012, at 9:51 AM, R. Verlangen wrote:
 
  In his message he explains that it's for  Forcing a GC . GC stands
 for garbage collection. For some more background see:
 http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)
  Cheers!
 
  2012/1/25 mike...@thomsonreuters.com
 
  Karl,
 
  Can you give a little more details on these 2 lines, what do they do?
 
  java -jar cmdline

Re: Any tools like phpMyAdmin to see data stored in Cassandra ?

2012-01-30 Thread R. Verlangen
You might run it from a VM?

2012/1/30 Ertio Lew ertio...@gmail.com



 On Mon, Jan 30, 2012 at 7:16 AM, Frisch, Michael 
 michael.fri...@nuance.com wrote:

  OpsCenter?

  http://www.datastax.com/products/opscenter

  - Mike


  I have tried Sebastien's phpmyAdmin For 
 Cassandrahttps://github.com/sebgiroux/Cassandra-Cluster-Admin to
 see the data stored in Cassandra in the same manner as phpMyAdmin allows.
 But since it makes assumptions about the datatypes of the column
 name/column value  doesn't allow to configure the datatype data should be
 read as on per cf basis, I couldn't make the best use of it.

  Are there any similar other tools out there that can do the job better ?


 Thanks, that's a great product but unfortunately doesn't work with
 windows. Any tools for windows ?




Re: Restart cassandra every X days?

2012-01-28 Thread R. Verlangen
Ok, seems that it's clear what I should do next ;-)

2012/1/28 aaron morton aa...@thelastpickle.com

 There are no blockers to upgrading to 1.0.X.

 A
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 28/01/2012, at 7:48 AM, R. Verlangen wrote:

 Ok. Seems that an upgrade might fix these problems. Is Cassandra 1.x.x
 stable enough to upgrade for, or should we wait for a couple of weeks?

 2012/1/27 Edward Capriolo edlinuxg...@gmail.com

 I would not say that issuing restart after x days is a good idea. You are
 mostly developing a superstition. You should find the source of the
 problem. It could be jmx or thrift clients not closing connections. We
 don't restart nodes on a regiment they work fine.


 On Thursday, January 26, 2012, Mike Panchenko m...@mihasya.com wrote:
  There are two relevant bugs (that I know of), both resolved in somewhat
 recent versions, which make somewhat regular restarts beneficial
  https://issues.apache.org/jira/browse/CASSANDRA-2868 (memory leak in
 GCInspector, fixed in 0.7.9/0.8.5)
  https://issues.apache.org/jira/browse/CASSANDRA-2252 (heap
 fragmentation due to the way memtables used to be allocated, refactored in
 1.0.0)
  Restarting daily is probably too frequent for either one of those
 problems. We usually notice degraded performance in our ancient cluster
 after ~2 weeks w/o a restart.
  As Aaron mentioned, if you have plenty of disk space, there's no reason
 to worry about cruft sstables. The size of your active set is what
 matters, and you can determine if that's getting too big by watching for
 iowait (due to reads from the data partition) and/or paging activity of the
 java process. When you hit that problem, the solution is to 1. try to tune
 your caches and 2. add more nodes to spread the load. I'll reiterate -
 looking at raw disk space usage should not be your guide for that.
  Forcing a gc generally works, but should not be relied upon (note
 suggest in
 http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#gc()).
 It's great news that 1.0 uses a better mechanism for releasing unused
 sstables.
  nodetool compact triggers a major compaction and is no longer a
 recommended by datastax (details here
 http://www.datastax.com/docs/1.0/operations/tuning#tuning-compactionbottom 
 of the page).
  Hope this helps.
  Mike.
  On Wed, Jan 25, 2012 at 5:14 PM, aaron morton aa...@thelastpickle.com
 wrote:
 
  That disk usage pattern is to be expected in pre 1.0 versions. Disk
 usage is far less interesting than disk free space, if it's using 60 GB and
 there is 200GB thats ok. If it's using 60Gb and there is 6MB free thats a
 problem.
  In pre 1.0 the compacted files are deleted on disk by waiting for the
 JVM do decide to GC all remaining references. If there is not enough space
 (to store the total size of the files it is about to write or compact) on
 disk GC is forced and the files are deleted. Otherwise they will get
 deleted at some point in the future.
  In 1.0 files are reference counted and space is freed much sooner.
  With regard to regular maintenance, node tool cleanup remvos data from
 a node that it is no longer a replica for. This is only of use when you
 have done a token move.
  I would not recommend a daily restart of the cassandra process. You
 will lose all the run time optimizations the JVM has made (i think the
 mapped files pages will stay resident). As well as adding additional
 entropy to the system which must be repaired via HH, RR or nodetool repair.
  If you want to see compacted files purged faster the best approach
 would be to upgrade to 1.0.
  Hope that helps.
  -
  Aaron Morton
  Freelance Developer
  @aaronmorton
  http://www.thelastpickle.com
  On 26/01/2012, at 9:51 AM, R. Verlangen wrote:
 
  In his message he explains that it's for  Forcing a GC . GC stands
 for garbage collection. For some more background see:
 http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)
  Cheers!
 
  2012/1/25 mike...@thomsonreuters.com
 
  Karl,
 
  Can you give a little more details on these 2 lines, what do they do?
 
  java -jar cmdline-jmxclient-0.10.3.jar - localhost:8080
  java.lang:type=Memory gc
 
  Thank you,
  Mike
 
  -Original Message-
  From: Karl Hiramoto [mailto:k...@hiramoto.org]
  Sent: Wednesday, January 25, 2012 12:26 PM
  To: user@cassandra.apache.org
  Subject: Re: Restart cassandra every X days?
 
 
  On 01/25/12 19:18, R. Verlangen wrote:
  Ok thank you for your feedback. I'll add these tasks to our daily
  cassandra maintenance cronjob. Hopefully this will keep things under
  controll.
 
  I forgot to mention that we found that Forcing a GC also cleans up some
  space.
 
 
  in a cronjob you can do this with
  http://crawler.archive.org/cmdline-jmxclient/
 
 
  my cron






Re: Restart cassandra every X days?

2012-01-27 Thread R. Verlangen
Ok. Seems that an upgrade might fix these problems. Is Cassandra 1.x.x
stable enough to upgrade for, or should we wait for a couple of weeks?

2012/1/27 Edward Capriolo edlinuxg...@gmail.com

 I would not say that issuing restart after x days is a good idea. You are
 mostly developing a superstition. You should find the source of the
 problem. It could be jmx or thrift clients not closing connections. We
 don't restart nodes on a regiment they work fine.


 On Thursday, January 26, 2012, Mike Panchenko m...@mihasya.com wrote:
  There are two relevant bugs (that I know of), both resolved in somewhat
 recent versions, which make somewhat regular restarts beneficial
  https://issues.apache.org/jira/browse/CASSANDRA-2868 (memory leak in
 GCInspector, fixed in 0.7.9/0.8.5)
  https://issues.apache.org/jira/browse/CASSANDRA-2252 (heap
 fragmentation due to the way memtables used to be allocated, refactored in
 1.0.0)
  Restarting daily is probably too frequent for either one of those
 problems. We usually notice degraded performance in our ancient cluster
 after ~2 weeks w/o a restart.
  As Aaron mentioned, if you have plenty of disk space, there's no reason
 to worry about cruft sstables. The size of your active set is what
 matters, and you can determine if that's getting too big by watching for
 iowait (due to reads from the data partition) and/or paging activity of the
 java process. When you hit that problem, the solution is to 1. try to tune
 your caches and 2. add more nodes to spread the load. I'll reiterate -
 looking at raw disk space usage should not be your guide for that.
  Forcing a gc generally works, but should not be relied upon (note
 suggest in
 http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#gc()).
 It's great news that 1.0 uses a better mechanism for releasing unused
 sstables.
  nodetool compact triggers a major compaction and is no longer a
 recommended by datastax (details here
 http://www.datastax.com/docs/1.0/operations/tuning#tuning-compactionbottom of 
 the page).
  Hope this helps.
  Mike.
  On Wed, Jan 25, 2012 at 5:14 PM, aaron morton aa...@thelastpickle.com
 wrote:
 
  That disk usage pattern is to be expected in pre 1.0 versions. Disk
 usage is far less interesting than disk free space, if it's using 60 GB and
 there is 200GB thats ok. If it's using 60Gb and there is 6MB free thats a
 problem.
  In pre 1.0 the compacted files are deleted on disk by waiting for the
 JVM do decide to GC all remaining references. If there is not enough space
 (to store the total size of the files it is about to write or compact) on
 disk GC is forced and the files are deleted. Otherwise they will get
 deleted at some point in the future.
  In 1.0 files are reference counted and space is freed much sooner.
  With regard to regular maintenance, node tool cleanup remvos data from a
 node that it is no longer a replica for. This is only of use when you have
 done a token move.
  I would not recommend a daily restart of the cassandra process. You will
 lose all the run time optimizations the JVM has made (i think the mapped
 files pages will stay resident). As well as adding additional entropy to
 the system which must be repaired via HH, RR or nodetool repair.
  If you want to see compacted files purged faster the best approach would
 be to upgrade to 1.0.
  Hope that helps.
  -
  Aaron Morton
  Freelance Developer
  @aaronmorton
  http://www.thelastpickle.com
  On 26/01/2012, at 9:51 AM, R. Verlangen wrote:
 
  In his message he explains that it's for  Forcing a GC . GC stands for
 garbage collection. For some more background see:
 http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)
  Cheers!
 
  2012/1/25 mike...@thomsonreuters.com
 
  Karl,
 
  Can you give a little more details on these 2 lines, what do they do?
 
  java -jar cmdline-jmxclient-0.10.3.jar - localhost:8080
  java.lang:type=Memory gc
 
  Thank you,
  Mike
 
  -Original Message-
  From: Karl Hiramoto [mailto:k...@hiramoto.org]
  Sent: Wednesday, January 25, 2012 12:26 PM
  To: user@cassandra.apache.org
  Subject: Re: Restart cassandra every X days?
 
 
  On 01/25/12 19:18, R. Verlangen wrote:
  Ok thank you for your feedback. I'll add these tasks to our daily
  cassandra maintenance cronjob. Hopefully this will keep things under
  controll.
 
  I forgot to mention that we found that Forcing a GC also cleans up some
  space.
 
 
  in a cronjob you can do this with
  http://crawler.archive.org/cmdline-jmxclient/
 
 
  my cron



Re: How to create a table in Cassandra

2012-01-27 Thread R. Verlangen
A table is called a column family in Cassandra.

From the CLI you can just create one by typing:

create column family MyApplication;


-- Forwarded message --
 From:  anandbab...@polarisft.com
 Date: Fri, Jan 27, 2012 at 2:36 PM
 Subject: How to create a table in Cassandra
 To: d...@cassandra.apache.org



 Can anyone tell me how to create a table in the Cassandra. I have
 installed it... and I am new to this...
 Thanks,
 Barnabas



 This e-Mail may contain proprietary and confidential information and
 is sent for the intended recipient(s) only.  If by an addressing or
 transmission error this mail has been misdirected to you, you are
 requested to delete this mail immediately. You are also hereby
 notified that any use, any form of reproduction, dissemination,
 copying, disclosure, modification, distribution and/or publication of
 this e-mail message, contents or its attachment other than by its
 intended recipient/s is strictly prohibited.

 Visit us at http://www.polarisFT.com



Restart cassandra every X days?

2012-01-25 Thread R. Verlangen
Hi there,

I'm currently running a 2-node cluster for some small projects that might
need to scale-up in the future: that's why we chose Cassandra. The actual
problem is that one of the node's harddrive usage keeps growing.

For example:
- after a fresh restart ~ 10GB
- after a couple of days running ~ 60GB

I know that Cassandra uses lots of diskspace but is this still normal? I'm
running cassandra 0.8.7

Gr. Robin


Re: Restart cassandra every X days?

2012-01-25 Thread R. Verlangen
Ok thank you for your feedback. I'll add these tasks to our daily cassandra
maintenance cronjob. Hopefully this will keep things under controll.

2012/1/25 Karl Hiramoto k...@hiramoto.org

 On 01/25/12 16:09, R. Verlangen wrote:

 Hi there,

 I'm currently running a 2-node cluster for some small projects that might
 need to scale-up in the future: that's why we chose Cassandra. The actual
 problem is that one of the node's harddrive usage keeps growing.

 For example:
 - after a fresh restart ~ 10GB
 - after a couple of days running ~ 60GB

 I know that Cassandra uses lots of diskspace but is this still normal?
 I'm running cassandra 0.8.7



 I run 9 nodes with cassandra 0.7.8   and we see this same behaviour, but
 we keep it under control by doing the sequence:

 nodetool repair
 nodetool compact
 nodetool cleanup

 According to the 1.0.x changelog IIRC this disk usage is supposed to be
 improved.


 --
 Karl



Re: Restart cassandra every X days?

2012-01-25 Thread R. Verlangen
Thanks for reminding. I'm going to start with adding the cleanup  compact
to the chain of maintenance tasks. In my opinion java should determine
itselfs when to start a GC: doesn't feel natural to do this manually.

2012/1/25 Karl Hiramoto k...@hiramoto.org


 On 01/25/12 19:18, R. Verlangen wrote:

 Ok thank you for your feedback. I'll add these tasks to our daily
 cassandra maintenance cronjob. Hopefully this will keep things under
 controll.


 I forgot to mention that we found that Forcing a GC also cleans up some
 space.


 in a cronjob you can do this with
 http://crawler.archive.org/**cmdline-jmxclient/http://crawler.archive.org/cmdline-jmxclient/


 my cronjob looks more like

 nodetool repair
 nodetool cleanup
 nodetool compact
 java -jar cmdline-jmxclient-0.10.3.jar - localhost:8080
 java.lang:type=Memory gc

 --
 Karl



Re: Restart cassandra every X days?

2012-01-25 Thread R. Verlangen
In his message he explains that it's for  Forcing a GC . GC stands for
garbage collection. For some more background see:
http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)

Cheers!

2012/1/25 mike...@thomsonreuters.com

 Karl,

 Can you give a little more details on these 2 lines, what do they do?

 java -jar cmdline-jmxclient-0.10.3.jar - localhost:8080
 java.lang:type=Memory gc

 Thank you,
 Mike

 -Original Message-
 From: Karl Hiramoto [mailto:k...@hiramoto.org]
 Sent: Wednesday, January 25, 2012 12:26 PM
 To: user@cassandra.apache.org
 Subject: Re: Restart cassandra every X days?


 On 01/25/12 19:18, R. Verlangen wrote:
  Ok thank you for your feedback. I'll add these tasks to our daily
  cassandra maintenance cronjob. Hopefully this will keep things under
  controll.

 I forgot to mention that we found that Forcing a GC also cleans up some
 space.


 in a cronjob you can do this with
 http://crawler.archive.org/cmdline-jmxclient/


 my cronjob looks more like

 nodetool repair
 nodetool cleanup
 nodetool compact
 java -jar cmdline-jmxclient-0.10.3.jar - localhost:8080
 java.lang:type=Memory gc

 --
 Karl

 This email was sent to you by Thomson Reuters, the global news and
 information company. Any views expressed in this message are those of the
 individual sender, except where the sender specifically states them to be
 the views of Thomson Reuters.



Re: Tips for using OrderedPartitioner

2012-01-24 Thread R. Verlangen
If you would like to index your rows in an index-row, you could also
choose for indexing the index-rows. This will scale up for any needs and
create a tree structure.

2012/1/24 aaron morton aa...@thelastpickle.com

 Nothing I can thin of other than making the keys uniform.

 Having a single index row with the RP can be a pain. Is there a way to
 partition it ?

 Cheers


 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 23/01/2012, at 11:42 PM, Tharindu Mathew wrote:

 Hi,

 We use Cassandra in a way we always want to range slice queries. Because,
 of the tendency to create hotspots with OrderedPartioner we decided to use
 RandomPartitioner. Then we would use, a row as an index row, holding values
 of the other row keys of the CF.

 I feel this has become a burden and would like to move to an
 OrderedPartioner to avoid this work around. The index row workaround which
 has become cumbersome when we query the data store.

 Is there any tips we can follow to allow for lesser amount of hot spots?

 --
 Regards,

 Tharindu

 blog: http://mackiemathew.com/





Re: Enable thrift logging

2012-01-24 Thread R. Verlangen
Pick a custom loglevel and redirect them with the /etc/syslog.conf ?

2012/1/24 ruslan usifov ruslan.usi...@gmail.com

 Hello

 I try to log thrift log message (this need to us for solve communicate
 problem between Cassandra daemon and php client ), so in
 log4j-server.properties i write follow lines:

 log4j.logger.org.apache.thrift.transport=DEBUG,THRIFT

 log4j.appender.THRIFT=org.apache.log4j.RollingFileAppender
 log4j.appender.THRIFT.maxFileSize=20MB
 log4j.appender.THRIFT.maxBackupIndex=50
 log4j.appender.THRIFT.layout=org.apache.log4j.PatternLayout
 log4j.appender.THRIFT.layout.ConversionPattern=%5p [%t] %d{ISO8601} %F
 (line %L) %m%n
 log4j.appender.THRIFT.File=/var/log/cassandra/8.0/thrift.log


 But no any messages in log in this case(but thay must be, i.e. Exception
 trace), if we enable DEBUG in rootLogger ie:

 log4j.rootLogger=DEBUG,stdout,R

 Thrift log messages appear in sytem.log as expected, but how can we
 separate them to separate log?

 PS: cassandra 0.8.9



Re: Data Model Question

2012-01-21 Thread R. Verlangen
A couple of days ago I came across Countandra ( http://countandra.org/ ).
It seems that it might be a solution for you.

Gr. Robin

2012/1/20 Tamar Fraenkel ta...@tok-media.com

 **

   Hi!

 I am a newbie to Cassandra and seeking some advice regarding the data
 model I should use to best address my needs.

 For simplicity, what I want to accomplish is:

 I have a system that has users (potentially ~10,000 per day) and they
 perform actions in the system (total of ~50,000 a day).

 Each User’s action is taking place in a certain point in time, and is also
 classified into categories (1 to 5) and tagged by 1-30 tags. Each action’s
 Categories and Tags has a score associated with it, the score is between 0
 to 1 (let’s assume precision of 0.0001).

 I want to be able to identify similar actions in the system (performed
 usually by more than one user). Similarity of actions is calculated based
 on their common Categories and Tags taking scores into account.

 I need the system to store:

- The list of my users with attributes like name, age etc
- For each action – the categories and tags associated with it and
their score, the time of the action, and the user who performed it.
- Groups of similar actions (ActionGroups) – the id’s of actions in
the group, the categories and tags describing the group, with their scores.
Those are calculated using an algorithm that takes into account the
categories and tags of the actions in the group.

 When a user performs a new action in the system, I want to add it to a
 fitting ActionGroups (with similar categories and tags).

 For this I need to be able to perform the following:

 Find all the recent ActionGroups (those who were updated with actions
 performed during the last T minutes), who has at list one of the new
 action’s categories AND at list one of the new action’s tags.



 I thought of two ways to address the issue and I would appreciate your
 insights.



 First one using secondary indexes

 Column Family: *Users*

 Key: userId

 Compare with Bytes Type

 Columns: name: , age:  etc…



 Column Family: *Actions*

 Key: actionId

 Compare with Bytes Type

 Columns:  Category1 : Score ….

   CategoriN: Score,

   Tag1 : Score, ….

   TagK:Score

   Time: timestamp

   user: userId



 Column Family: *ActionGroups*

 Key: actionGroupId

 Compare with Bytes Type

 Columns: Category1 : Score ….

  CategoriN: Score,

  Tag1 : Score ….

  TagK:Score

  lastUpdateTime: timestamp

  actionId1: null, … ,

  actionIdM: null



 I will then define secondary index on each tag columns, category columns,
 and the update time column.

 Let’s assume the new action I want to add to ActionGroup has
 NewActionCategory1 - NewActionCategoryK, and has NewActionTag1 –
 NewActionTagN. I will perform the following query:

 Select  * From ActionGroups where

(NewActionCategory1  0  … or NewActionCategoryK  0) and

(NewActionTag1  0  … or NewActionTagN  0) and

lastUpdateTime  T;



 Second solution

 Have the same CF as in the first solution *without the secondary* *index*, 
 and have two additional CF-ies:

 Column Family: *CategoriesToActionGroupId*

 Key: categoryId

 Compare with ByteType

 Columns: {Timestamp, ActionGroupsId1 } : null

  {Timestamp, ActionGroupsId2} : null

  ...

 *timestamp is the update time for the ActionGroup



 A similar CF will be defined for tags.



 I will then be able to run several queries on CategoriesToActionGroupId
 (one for each of the new story Categories), with column slice for the right
 update time of the ActionGroup.

 I will do the same for the TagsToActionGroupId.

 I will then use my client code to remove duplicates (ActionGroups who are
 associated with more than one Tag or Category).



 My questions are:

1. Are the two solutions viable? If yes, which is better
2. Is there any better way of doing this?
3. Can I use jdbc and CQL with both method, or do I have to use Hector
(I am using Java).

 Thanks

 Tamar







Re: nodetool ring question

2012-01-19 Thread R. Verlangen
I will have a look very soon and if I find something I'll let you know.

Thank you in advance!

2012/1/19 aaron morton aa...@thelastpickle.com

 Michael, Robin

 Let us know if the reported live load is increasing and diverging from the
 on disk size.

 If it is can you check nodetool cfstats and find an example of a
 particular CF where Space Used Live has diverged from the on disk size. The
 provide the schema for the CF and any other info that may be handy.

 Cheers


 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 18/01/2012, at 10:58 PM, Michael Vaknine wrote:

 I did restart the cluster and now it is normal 5GB.
 ** **
 *From:* R. Verlangen [mailto:ro...@us2.nl]
 *Sent:* Wednesday, January 18, 2012 11:32 AM
 *To:* user@cassandra.apache.org
 *Subject:* Re: nodetool ring question
 ** **

 I also have this problem. My data on nodes grows to roughly 30GB. After a
 restart only 5GB remains. Is a factor 6 common for Cassandra?
 2012/1/18 aaron morton aa...@thelastpickle.com
 Good idea Jeremiah, are you using compression Michael ? 
 ** **
 Scanning through the CF stats this jumps out…
 ** **
 Column Family: Attractions
 SSTable count: 3
 Space used (live): 27542876685
 Space used (total): 1213220387
 Thats 25Gb of live data but only 1.3GB total. 
 ** **
 Otherwise want to see if a restart fixes it :) Would be interesting to
 know if it's wrong from the start or drifts during streaming or compaction.
 
 ** **
 Cheers
 ** **
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 ** **
 On 18/01/2012, at 12:04 PM, Jeremiah Jordan wrote:


 
 There were some nodetool ring load reporting issues with early version of
 1.0.X don't remember when they were fixed, but that could be your issue.
 Are you using compressed column families, a lot of the issues were with
 those.
 Might update to 1.0.7.

 -Jeremiah

 On 01/16/2012 04:04 AM, Michael Vaknine wrote:
 Hi,
  
 I have a 4 nodes cluster 1.0.3 version
  
 This is what I get when I run nodetool ring
  
 Address DC  RackStatus State   Load
 OwnsToken

 127605887595351923798765477786913079296
 10.8.193.87 datacenter1 rack1   Up Normal  46.47 GB
 25.00%  0
 10.5.7.76   datacenter1 rack1   Up Normal  48.01 GB
 25.00%  42535295865117307932921825928971026432
 10.8.189.197datacenter1 rack1   Up Normal  53.7 GB
 25.00%  85070591730234615865843651857942052864
 10.5.3.17   datacenter1 rack1   Up Normal  43.49 GB
 25.00%  127605887595351923798765477786913079296
  
 I have finished running repair on all 4 nodes.
  
 I have less then 10 GB on the /var/lib/cassandra/data/ folders
  
 My question is Why nodetool reports almost 50 GB on each node?
  
 Thanks
 Michael
 ** **





Re: nodetool ring question

2012-01-18 Thread R. Verlangen
I also have this problem. My data on nodes grows to roughly 30GB. After a
restart only 5GB remains. Is a factor 6 common for Cassandra?

2012/1/18 aaron morton aa...@thelastpickle.com

 Good idea Jeremiah, are you using compression Michael ?

 Scanning through the CF stats this jumps out…

 Column Family: Attractions
 SSTable count: 3
 Space used (live): 27542876685
 Space used (total): 1213220387
 Thats 25Gb of live data but only 1.3GB total.

 Otherwise want to see if a restart fixes it :) Would be interesting to
 know if it's wrong from the start or drifts during streaming or compaction.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 18/01/2012, at 12:04 PM, Jeremiah Jordan wrote:

  There were some nodetool ring load reporting issues with early version of
 1.0.X don't remember when they were fixed, but that could be your issue.
 Are you using compressed column families, a lot of the issues were with
 those.
 Might update to 1.0.7.

 -Jeremiah

 On 01/16/2012 04:04 AM, Michael Vaknine wrote:

 Hi,

 ** **

 I have a 4 nodes cluster 1.0.3 version

 ** **

 This is what I get when I run nodetool ring

 ** **

 Address DC  RackStatus State   Load
 OwnsToken


 127605887595351923798765477786913079296

 10.8.193.87 datacenter1 rack1   Up Normal  46.47 GB
 25.00%  0

 10.5.7.76   datacenter1 rack1   Up Normal  48.01 GB
 25.00%  42535295865117307932921825928971026432

 10.8.189.197datacenter1 rack1   Up Normal  53.7 GB
 25.00%  85070591730234615865843651857942052864

 10.5.3.17   datacenter1 rack1   Up Normal  43.49 GB
 25.00%  127605887595351923798765477786913079296

 ** **

 I have finished running repair on all 4 nodes.

 ** **

 I have less then 10 GB on the /var/lib/cassandra/data/ folders

 ** **

 My question is Why nodetool reports almost 50 GB on each node?

 ** **

 Thanks

 Michael





Re: Re: Schema clone ...

2012-01-09 Thread R. Verlangen
A null response is most of the times an exception, try to take a look at
the Cassandra logs to find out what causes the problem.

2012/1/9 cbert...@libero.it cbert...@libero.it

 I was just trying it but ... in 0.7 CLI there is no show schema command.

 When I connect with 1.0 CLI to my 0.7 cluster ...


 [default@social] show schema;

 null


 I always get a null as answer! :-|

 Any tip for this?


 ty, Cheers


 Carlo

  Messaggio originale
 Da: aa...@thelastpickle.com
 Data: 09/01/2012 11.33
 A: user@cassandra.apache.org, cbert...@libero.itcbert...@libero.it
 Ogg: Re: Schema clone ...

 Try show schema in the CLI.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 9/01/2012, at 11:12 PM, cbert...@libero.it wrote:

 Hi,
 I have create a new dev-cluster with cassandra 1.0 -- I would like to have
 the
 same CFs that I have in the 0.7 one but I don't need data to be there,
 just the
 schema. Which is the fastest way to do it without making 30 create column
 family ...

 Best regards,

 Carlo









Re: What is the future of supercolumns ?

2012-01-07 Thread R. Verlangen
My suggestion is simple: don't use any deprecated stuff out there. In
practically any case there is a good reason why it's deprecated.

I've seen a couple of composite-column vs supercolumn discussions in the
past weeks here: I think a little bit of searching will get you around.

Cheers

2012/1/7 Aklin_81 asdk...@gmail.com

 I read entire columns inside the supercolumns at any time but as for
 writing them, I write the columns at different times. I don't have the
 need to update them except that die after their TTL period of 60 days.
 But since they are going to be deprecated, I don't know if it would be
 really advisable to use them right now.

 I believe if it was possible to do wildchard querying for a list of
 column names then the supercolumns use cases may be easily replaced by
 normal columns. Could it practically possible, in future ?

 On Sat, Jan 7, 2012 at 8:05 AM, Terje Marthinussen
 tmarthinus...@gmail.com wrote:
  Please realize that I do not make any decisions here and I am not part
 of the core Cassandra developer team.
 
  What has been said before is that they will most likely go away and at
 least under the hood be replaced by composite columns.
 
  Jonathan have however stated that he would like the supercolumn
 API/abstraction to remain at least for backwards compatibility.
 
  Please understand that under the hood, supercolumns are merely groups of
 columns serialized as a single block of data.
 
 
  The fact that there is a specialized and hardcoded way to serialize
 these column groups into supercolumns is a problem however and they should
 probably go away to make space for a more generic implementation allowing
 more flexible data structures and less code specific for one special data
 structure.
 
  Today there are tons of extra code to deal with the slight difference in
 serialization and features of supercolumns vs columns and hopefully most of
 that could go away if things got structured a bit different.
 
  I also hope that we keep APIs to allow simple access to groups of
 key/value pairs to simplify application logic as working with just columns
 can add a lot of application code which should not be needed.
 
  If you almost always need all or mostly all of the columns in a
 supercolumn, and you normally update all of them at the same time, they
 will most likely be faster than normal columns.
 
  Processing wise, you will actually do a bit more work on
 serialization/deserialization of SC's but the I/O part will usually be
 better grouped/require less operations.
 
  I think we did some benchmarks on some heavy use cases with ~30 small
 columns per SC some time back and I think we ended up with  SCs being
 10-20% faster.
 
 
  Terje
 
  On Jan 5, 2012, at 2:37 PM, Aklin_81 wrote:
 
  I have seen supercolumns usage been discouraged most of the times.
  However sometimes the supercolumns seem to fit the scenario most
  appropriately not only in terms of how the data is stored but also in
  terms of how is it retrieved. Some of the queries supported by SCs are
  uniquely capable of doing the task which no other alternative schema
  could do.(Like recently I asked about getting the equivalent of
  retrieving a list of (full)supercolumns by name, through use of
  composite columns, unfortunately there was no way to do this without
  reading lots of extra columns).
 
  So I am really confused whether:
 
  1. Should I really not use the supercolumns for any case at all,
  however appropriate, or I just need to be just careful while realizing
  that supercolumns fit my use case appropriately or what!?
 
  2. Are there any performance concerns with supercolumns even in the
  cases where they are used most appropriately. Like when you need to
  retrieve the entire supercolumns everytime  max. no of subcolumns
  vary between 0-10.
  (I don't write all the subcolumns inside supercolumn, at once though!
  Does this also matter?)
 
  3. What is their future? Are they going to be deprecated or may be
  enhanced later?
 



Re: How to find out when a nodetool operation has ended?

2012-01-07 Thread R. Verlangen
 The repair will continue even if you ctrl+c  nodetool, it runs on the
server not the client.

Hmm, didn't know that. Maybe a tweak for the nodetool that just displays a
message after starting: Started with ... and some kind of notication
(with wall) when it's done?

2012/1/7 aaron morton aa...@thelastpickle.com

 The repair will continue even if you ctrl+c  nodetool, it runs on the
 server not the client.

 Aside from using ops centre you can also look at TP Stats to see when
 there is nothing left in the AntiEntropyStage or look for a log messages
 from the StorageService that says…

 Repair command #{} completed successfully

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 7/01/2012, at 12:32 PM, Maxim Potekhin wrote:

 Thanks, so I take it there is no solution outside of Opcenter.

 I mean of course I can redirect the output, with additional timestamps if
 needed,
 to a log file -- which I can access remotely. I just thought there would
 be some status
 command by chance, to tell me what maintenance the node is doing. Too bad
 there is not!

 Maxim


 On 1/6/2012 5:40 PM, R. Verlangen wrote:

 You might consider:

 - installing DataStax OpsCenter (
 http://www.datastax.com/products/opscenter )

 - starting the repair in a linux screen (so you can attach to the screen
 from another location)







Re: java.lang.IllegalArgumentException occurred when creating a keyspcace with replication factor

2012-01-06 Thread R. Verlangen
Try this:

create keyspace testkeyspace;
update keyspace testkeyspace with placement_strategy =
'org.apache.cassandra.locator.SimpleStrategy' and strategy_options =
{replication_factor:3};

Good luck!

2012/1/6 Sajith Kariyawasam saj...@gmail.com

 Hi all,

 I tried creating a keyspace with the replication factor 3, using cli
 interface ... in Cassandra 1.0.6  (earlier tried in 0.8.2 and failed too)

 But I'm getting an exception

 java.lang.IllegalArgumentException: No enum const class
 org.apache.cassandra.cli.CliClient$AddKeyspaceArgument.REPLICATION_FACTOR

 The command I used was

 [default@unknown] create keyspace testkeyspace with replication_factor=3;

 What has gone wrong  ?

 Many thanks in advance
 --
 Best Regards
 Sajith




Re: How to find out when a nodetool operation has ended?

2012-01-06 Thread R. Verlangen
You might consider:
- installing DataStax OpsCenter ( http://www.datastax.com/products/opscenter
 )
- starting the repair in a linux screen (so you can attach to the screen
from another location)

I prefer the OpsCener.

2012/1/6 Maxim Potekhin potek...@bnl.gov

 Suppose I start a repair on one or a few nodes in my cluster,
 from an interactive machine in the office, and leave for the day
 (which is a very realistic scenario imho).

 Is there a way to know, from a remote machine, when a particular
 action, such as compaction or repair, has been finished?

 I figured that compaction stats can be mum at times, thus
 it's not a reliable indicator.

 Many thanks,

 Maxim




  1   2   >