Re: Repair/Compaction Completion Confirmation

2014-10-28 Thread Alain RODRIGUEZ
I have been trying this yesterday too.

https://github.com/BrianGallew/cassandra_range_repair

Not 100% bullet proof -- Indeed I found that operations are done
multiple times, so it is not very optimised. Though it is open sourced so I
guess you can improve things as much as you want and contribute. Here is
the issue I raised yesterday
https://github.com/BrianGallew/cassandra_range_repair/issues/14.

I am also trying to improve our repair automation since we now have
multiple DC and up to 800 GB per node. Repairs are quite heavy right now.

Good luck,

Alain

2014-10-28 4:59 GMT+01:00 Ben Bromhead b...@instaclustr.com:

 https://github.com/BrianGallew/cassandra_range_repair

 This breaks down the repair operation into very small portions of the ring
 as a way to try and work around the current fragile nature of repair.

 Leveraging range repair should go some way towards automating repair (this
 is how the automatic repair service in DataStax opscenter works, this is
 how we perform repairs).

 We have had a lot of success running repairs in a similar manner against
 vnode enabled clusters. Not 100% bullet proof, but way better than nodetool
 repair



 On 28 October 2014 08:32, Tim Heckman t...@pagerduty.com wrote:

 On Mon, Oct 27, 2014 at 1:44 PM, Robert Coli rc...@eventbrite.com
 wrote:

 On Mon, Oct 27, 2014 at 1:33 PM, Tim Heckman t...@pagerduty.com wrote:

 I know that when issuing some operations via nodetool, the command
 blocks until the operation is finished. However, is there a way to reliably
 determine whether or not the operation has finished without monitoring that
 invocation of nodetool?

 In other words, when I run 'nodetool repair' what is the best way to
 reliably determine that the repair is finished without running something
 equivalent to a 'pgrep' against the command I invoked? I am curious about
 trying to do the same for major compactions too.


 This is beyond a FAQ at this point, unfortunately; non-incremental
 repair is awkward to deal with and probably impossible to automate.

 In The Future [1] the correct solution will be to use incremental
 repair, which mitigates but does not solve this challenge entirely.

 As brief meta commentary, it would have been nice if the project had
 spent more time optimizing the operability of the critically important
 thing you must do once a week [2].

 https://issues.apache.org/jira/browse/CASSANDRA-5483

 =Rob
 [1] http://www.datastax.com/dev/blog/anticompaction-in-cassandra-2-1
 [2] Or, more sensibly, once a month with gc_grace_seconds set to 34 days.


 Thank you for getting back to me so quickly. Not the answer that I was
 secretly hoping for, but it is nice to have confirmation. :)

 Cheers!
 -Tim




 --

 Ben Bromhead

 Instaclustr | www.instaclustr.com | @instaclustr
 http://twitter.com/instaclustr | +61 415 936 359



Re: Repair/Compaction Completion Confirmation

2014-10-28 Thread Colin
When I use virtual nodes, I typically use a much smaller number - usually in 
the range of 10.  This gives me the ability to add nodes easier without the 
performance hit.



--
Colin Clark 
+1-320-221-9531
 

 On Oct 28, 2014, at 10:46 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:
 
 I have been trying this yesterday too.
 
 https://github.com/BrianGallew/cassandra_range_repair
 
 Not 100% bullet proof -- Indeed I found that operations are done multiple 
 times, so it is not very optimised. Though it is open sourced so I guess you 
 can improve things as much as you want and contribute. Here is the issue I 
 raised yesterday 
 https://github.com/BrianGallew/cassandra_range_repair/issues/14.
 
 I am also trying to improve our repair automation since we now have multiple 
 DC and up to 800 GB per node. Repairs are quite heavy right now.
 
 Good luck,
 
 Alain
 
 2014-10-28 4:59 GMT+01:00 Ben Bromhead b...@instaclustr.com:
 https://github.com/BrianGallew/cassandra_range_repair
 
 This breaks down the repair operation into very small portions of the ring 
 as a way to try and work around the current fragile nature of repair. 
 
 Leveraging range repair should go some way towards automating repair (this 
 is how the automatic repair service in DataStax opscenter works, this is how 
 we perform repairs).
 
 We have had a lot of success running repairs in a similar manner against 
 vnode enabled clusters. Not 100% bullet proof, but way better than nodetool 
 repair 
 
 
 
 On 28 October 2014 08:32, Tim Heckman t...@pagerduty.com wrote:
 On Mon, Oct 27, 2014 at 1:44 PM, Robert Coli rc...@eventbrite.com wrote:
 On Mon, Oct 27, 2014 at 1:33 PM, Tim Heckman t...@pagerduty.com wrote:
 I know that when issuing some operations via nodetool, the command blocks 
 until the operation is finished. However, is there a way to reliably 
 determine whether or not the operation has finished without monitoring 
 that invocation of nodetool?
 
 In other words, when I run 'nodetool repair' what is the best way to 
 reliably determine that the repair is finished without running something 
 equivalent to a 'pgrep' against the command I invoked? I am curious about 
 trying to do the same for major compactions too.
 
 This is beyond a FAQ at this point, unfortunately; non-incremental repair 
 is awkward to deal with and probably impossible to automate. 
 
 In The Future [1] the correct solution will be to use incremental repair, 
 which mitigates but does not solve this challenge entirely.
 
 As brief meta commentary, it would have been nice if the project had spent 
 more time optimizing the operability of the critically important thing you 
 must do once a week [2].
 
 https://issues.apache.org/jira/browse/CASSANDRA-5483
 
 =Rob
 [1] http://www.datastax.com/dev/blog/anticompaction-in-cassandra-2-1
 [2] Or, more sensibly, once a month with gc_grace_seconds set to 34 days.
 
 Thank you for getting back to me so quickly. Not the answer that I was 
 secretly hoping for, but it is nice to have confirmation. :)
 
 Cheers!
 -Tim 
 
 
 
 -- 
 Ben Bromhead
 
 Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359
 


EC2 Snitch load imbalance

2014-10-28 Thread Oleg Dulin
I have a setup with 6 cassandra nodes (1.2.18), using RandomPartition, 
not using vnodes -- this is a legacy cluster.


We went from 3 nodes to 6 in the last few days to add capacity. 
However, there appears to be an imbalance:


Datacenter: us-east
==
Replicas: 2

Address RackStatus State   LoadOwns 
  Token
   
  113427455640312821154458202477256070484
x.x.x.73   1d  Up Normal  154.64 GB   33.33%
 85070591730234615865843651857942052863
x.x.x.2511a  Up Normal  62.26 GB16.67%  
   28356863910078205288614550619314017621
x.x.x.238   1b  Up Normal  243.7 GB50.00%   
  56713727820156410577229101238628035242

x.x.x.25   1a  Up Normal  169.3 GB33.33%  210
x.x.x.162  1b  Up Normal  118.24 GB   50.00%
 141784319550391026443072753096570088105
x.x.x.208   1d  Up Normal  226.85 GB   16.67%   
  113427455640312821154458202477256070484



What is the cause of this imbalance ? How can I rectify it ?

Regards,
Oleg




Load balancing in C* Cluster

2014-10-28 Thread Syed, Basit B. (NSN - FI/Espoo)
Hi,
I am learning C* and its usage these days. I have a very simple, possibly naive 
question about load balancing.

I know that C* can automatically balance the load itself by using tokens. But 
what about connecting my cluster to a system. For exp, if we have a client or a 
set of clients (e.g. 12 client machines) accessing a 3-node C* cluster. All 
three nodes are independent and talk with each other through gossip. This means 
that we have three IP addresses to connect to a cluster.

What should be the best strategy for clients to access these IP addresses? 
Should we connect four clients each to only one node? OR all 12 clients should 
see and connect all three nodes? Which strategy is better? Is there any 
resources available on web for this kind of issue?

Regards,
Basit



Re: Load balancing in C* Cluster

2014-10-28 Thread Jonathan Lacefield
Hello,

  Most drivers will handle the load balancing for you and provide policies
for configuring your desired approach for load balancing, i.e. load balance
around the entire ring or localize around a specific DC.  Your clients will
leverage the driver for connections so that the client machines do not
simply select one node for data and coordination.

  Check out DataStax's driver's documentation on load balancing for more
information.  [1]

  Other drivers, like Astyan [2] provide similar capabilities as well.

[1]
http://www.datastax.com/documentation/developer/java-driver/2.1/common/drivers/introduction/introArchOverview_c.html
[2] https://github.com/Netflix/astyanax

Thanks,

Jonathan

[image: datastax_logo.png]

Jonathan Lacefield

Solution Architect | (404) 822 3487 | jlacefi...@datastax.com

[image: linkedin.png] http://www.linkedin.com/in/jlacefield/ [image:
facebook.png] https://www.facebook.com/datastax [image: twitter.png]
https://twitter.com/datastax [image: g+.png]
https://plus.google.com/+Datastax/about
http://feeds.feedburner.com/datastax https://github.com/datastax/

On Tue, Oct 28, 2014 at 6:38 AM, Syed, Basit B. (NSN - FI/Espoo) 
basit.b.s...@nsn.com wrote:

  Hi,
 I am learning C* and its usage these days. I have a very simple, possibly
 naive question about load balancing.

 I know that C* can automatically balance the load itself by using tokens.
 But what about connecting my cluster to a system. For exp, if we have a
 client or a set of clients (e.g. 12 client machines) accessing a 3-node C*
 cluster. All three nodes are independent and talk with each other through
 gossip. This means that we have three IP addresses to connect to a cluster.

 What should be the best strategy for clients to access these IP addresses?
 Should we connect four clients each to only one node? OR all 12 clients
 should see and connect all three nodes? Which strategy is better? Is there
 any resources available on web for this kind of issue?

 Regards,
 Basit




Re: EC2 Snitch load imbalance

2014-10-28 Thread Mark Reddy
Oleg,

If you are running nodetool status, be sure to specify the keyspace also.
If you don't specify the keyspace the results will be nonsense.

https://issues.apache.org/jira/browse/CASSANDRA-7173


Regards,
Mark

On 28 October 2014 10:35, Oleg Dulin oleg.du...@gmail.com wrote:

 I have a setup with 6 cassandra nodes (1.2.18), using RandomPartition, not
 using vnodes -- this is a legacy cluster.

 We went from 3 nodes to 6 in the last few days to add capacity. However,
 there appears to be an imbalance:

 Datacenter: us-east
 ==
 Replicas: 2

 Address RackStatus State   LoadOwns
Token

113427455640312821154458202477256070484
 x.x.x.73   1d  Up Normal  154.64 GB   33.33%
  85070591730234615865843651857942052863
 x.x.x.2511a  Up Normal  62.26 GB16.67%
  28356863910078205288614550619314017621
 x.x.x.238   1b  Up Normal  243.7 GB50.00%
  56713727820156410577229101238628035242
 x.x.x.25   1a  Up Normal  169.3 GB33.33%
 210
 x.x.x.162  1b  Up Normal  118.24 GB   50.00%
  141784319550391026443072753096570088105
 x.x.x.208   1d  Up Normal  226.85 GB   16.67%
  113427455640312821154458202477256070484


 What is the cause of this imbalance ? How can I rectify it ?

 Regards,
 Oleg





opscenter with community cassandra

2014-10-28 Thread Tim Dunphy
Hey all,

 I'd like to setup datastax opscenter to monitor my cassandra ring. However
I'm using the open source version of 2.1.1. And before I expend any time
and effort in setting this up, I'm wondering if it will work with the open
source version? Or would I need to be running datastax cassandra in order
to get this going?

Thanks
Tim

-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


Re: EC2 Snitch load imbalance

2014-10-28 Thread Oleg Dulin

Thanks Mark.

The output in my original post is with keyspace specified.

On 2014-10-28 12:00:15 +, Mark Reddy said:


Oleg, 

If you are running nodetool status, be sure to specify the keyspace 
also. If you don't specify the keyspace the results will be nonsense.


https://issues.apache.org/jira/browse/CASSANDRA-7173


Regards,
Mark

On 28 October 2014 10:35, Oleg Dulin oleg.du...@gmail.com wrote:
I have a setup with 6 cassandra nodes (1.2.18), using RandomPartition, 
not using vnodes -- this is a legacy cluster.


We went from 3 nodes to 6 in the last few days to add capacity. 
However, there appears to be an imbalance:


Datacenter: us-east
==
Replicas: 2

Address         Rack        Status State   Load            Owns         
      Token
                                                                        
     113427455640312821154458202477256070484
x.x.x.73   1d          Up     Normal  154.64 GB       33.33%            
 85070591730234615865843651857942052863
x.x.x.251    1a          Up     Normal  62.26 GB        16.67%          
   28356863910078205288614550619314017621
x.x.x.238   1b          Up     Normal  243.7 GB        50.00%           
  56713727820156410577229101238628035242

x.x.x.25   1a          Up     Normal  169.3 GB        33.33%              210
x.x.x.162  1b          Up     Normal  118.24 GB       50.00%            
 141784319550391026443072753096570088105
x.x.x.208   1d          Up     Normal  226.85 GB       16.67%           
  113427455640312821154458202477256070484



What is the cause of this imbalance ? How can I rectify it ?

Regards,
Oleg


Re: opscenter with community cassandra

2014-10-28 Thread Duncan Sands

Hi Tim,

On 28/10/14 15:42, Tim Dunphy wrote:

Hey all,

  I'd like to setup datastax opscenter to monitor my cassandra ring. However I'm
using the open source version of 2.1.1. And before I expend any time and effort
in setting this up, I'm wondering if it will work with the open source version?
Or would I need to be running datastax cassandra in order to get this going?


yes, it works fine with open source Cassandra, though some advanced 
functionality is disabled.


Ciao, Duncan.

PS: I didn't try it with a 2.1 version of Cassandra though.



Thanks
Tim

--
GPG me!!

gpg --keyserver pool.sks-keyservers.net http://pool.sks-keyservers.net
--recv-keys F186197B





Re: Explode a keyspace into multiple ones online

2014-10-28 Thread Alain RODRIGUEZ
By the way I found this reference:
http://grokbase.com/t/cassandra/user/13824z7ykm/best-way-to-split-cluster-online.
Is this still the easiest solution ? Does this work for same table name
but transferring data to a new keyspace ? Can we migrate sstables this way ?

Alain

2014-10-28 15:46 GMT+01:00 Alain RODRIGUEZ arodr...@gmail.com:

 Hi guys,

 We have a keyspace with more or less 40 tables. We have been adding things
 in there since Cassandra 0.8 to our current C* - 1.2.18 there for
 simplicity and cost reasons. Now we need to duplicate part of the data to
 an other DC, to improve end user latency.

 What would be the best way of splitting this ks but staying online (or
 with the smallest downtime) ?
 What would be advantages / disadvantages of keeping those ks recently
 created in the same cluster rather in distinct cluster.

 The more general question behind this is, what would be according to your
 experience, the main reason of choosing grouping tables into one ks,
 multiple ks or directly multiple clusters and how to migrate from these
 different architectures ?

 Any thoughts, blog addressing this question or even documentation on this ?



RE: opscenter with community cassandra

2014-10-28 Thread Josh Smith
Yes Opscenter does work with the opensource version of Cassandra. I am 
currently running both in the cloud and our private datacenter with no 
problems. I have not tried 2.1.1 yet but I do not see why it wouldn’t work also.

Josh

From: Tim Dunphy [mailto:bluethu...@gmail.com]
Sent: Tuesday, October 28, 2014 10:43 AM
To: user@cassandra.apache.org
Subject: opscenter with community cassandra

Hey all,

 I'd like to setup datastax opscenter to monitor my cassandra ring. However I'm 
using the open source version of 2.1.1. And before I expend any time and effort 
in setting this up, I'm wondering if it will work with the open source version? 
Or would I need to be running datastax cassandra in order to get this going?

Thanks
Tim

--
GPG me!!

gpg --keyserver pool.sks-keyservers.nethttp://pool.sks-keyservers.net 
--recv-keys F186197B


Re: opscenter with community cassandra

2014-10-28 Thread Colin
I cant run opscenter in a secure environment for a couple of reasons, one - it 
phones home, two - lack of role based security.

It is a mistake to call a proprietary piece of software community when you cant 
use it in production.

It is easy enough to automate what opscenter does rather than relying in a 
third party in my enterprise,.



 On Oct 28, 2014, at 10:04 AM, Josh Smith josh.sm...@careerbuilder.com wrote:
 
 Yes Opscenter does work with the opensource version of Cassandra. I am 
 currently running both in the cloud and our private datacenter with no 
 problems. I have not tried 2.1.1 yet but I do not see why it wouldn’t work 
 also.
  
 Josh
  
 From: Tim Dunphy [mailto:bluethu...@gmail.com] 
 Sent: Tuesday, October 28, 2014 10:43 AM
 To: user@cassandra.apache.org
 Subject: opscenter with community cassandra
  
 Hey all,
  
  I'd like to setup datastax opscenter to monitor my cassandra ring. However 
 I'm using the open source version of 2.1.1. And before I expend any time and 
 effort in setting this up, I'm wondering if it will work with the open source 
 version? Or would I need to be running datastax cassandra in order to get 
 this going?
  
 Thanks
 Tim
  
 -- 
 GPG me!!
 
 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


Re: opscenter with community cassandra

2014-10-28 Thread Nick Bailey
OpsCenter does work with Apache Cassandra however it isn't always
immediately compatible with new releases of Cassandra. It will work
somewhat with Cassandra 2.1 but there is definitely some functionality that
is broken. You can check the release notes for new versions of OpsCenter to
monitor when full 2.1 support is included.

On Tue, Oct 28, 2014 at 10:08 AM, Colin colpcl...@gmail.com wrote:

 I cant run opscenter in a secure environment for a couple of reasons, one
 - it phones home, two - lack of role based security.

 It is a mistake to call a proprietary piece of software community when you
 cant use it in production.

 It is easy enough to automate what opscenter does rather than relying in a
 third party in my enterprise,.



 On Oct 28, 2014, at 10:04 AM, Josh Smith josh.sm...@careerbuilder.com
 wrote:

  Yes Opscenter does work with the opensource version of Cassandra. I am
 currently running both in the cloud and our private datacenter with no
 problems. I have not tried 2.1.1 yet but I do not see why it wouldn’t work
 also.



 Josh



 *From:* Tim Dunphy [mailto:bluethu...@gmail.com bluethu...@gmail.com]
 *Sent:* Tuesday, October 28, 2014 10:43 AM
 *To:* user@cassandra.apache.org
 *Subject:* opscenter with community cassandra



 Hey all,



  I'd like to setup datastax opscenter to monitor my cassandra ring.
 However I'm using the open source version of 2.1.1. And before I expend any
 time and effort in setting this up, I'm wondering if it will work with the
 open source version? Or would I need to be running datastax cassandra in
 order to get this going?



 Thanks

 Tim



 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B




Re: opscenter with community cassandra

2014-10-28 Thread Tyler Hobbs
On Tue, Oct 28, 2014 at 10:08 AM, Colin colpcl...@gmail.com wrote:

 It is a mistake to call a proprietary piece of software community when you
 cant use it in production.


You can use OpsCenter community in production (however you'd like).


-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: opscenter with community cassandra

2014-10-28 Thread Rahul Neelakantan
You can turn off the phone home function in the opscenterd.conf file

[stat_reporter]
interval = 0


Rahul Neelakantan

 On Oct 28, 2014, at 11:08 AM, Colin colpcl...@gmail.com wrote:
 
 I cant run opscenter in a secure environment for a couple of reasons, one - 
 it phones home, two - lack of role based security.
 
 It is a mistake to call a proprietary piece of software community when you 
 cant use it in production.
 
 It is easy enough to automate what opscenter does rather than relying in a 
 third party in my enterprise,.
 
 
 
 On Oct 28, 2014, at 10:04 AM, Josh Smith josh.sm...@careerbuilder.com 
 wrote:
 
 Yes Opscenter does work with the opensource version of Cassandra. I am 
 currently running both in the cloud and our private datacenter with no 
 problems. I have not tried 2.1.1 yet but I do not see why it wouldn’t work 
 also.
  
 Josh
  
 From: Tim Dunphy [mailto:bluethu...@gmail.com] 
 Sent: Tuesday, October 28, 2014 10:43 AM
 To: user@cassandra.apache.org
 Subject: opscenter with community cassandra
  
 Hey all,
  
  I'd like to setup datastax opscenter to monitor my cassandra ring. However 
 I'm using the open source version of 2.1.1. And before I expend any time and 
 effort in setting this up, I'm wondering if it will work with the open 
 source version? Or would I need to be running datastax cassandra in order to 
 get this going?
  
 Thanks
 Tim
  
 -- 
 GPG me!!
 
 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


Re: opscenter with community cassandra

2014-10-28 Thread Colin
No, actually, you cant Tyler.

If you mean the useless information it provides outside of licence, fine,  if 
you mean the components outside, then same argument.

Last time i checked, this forumn was about apache and not about datastax.  
Maybe a separate group should be deducated to provider specific offerings.

--
Colin Clark 
+1-320-221-9531
 

 On Oct 28, 2014, at 10:41 AM, Tyler Hobbs ty...@datastax.com wrote:
 
 
 On Tue, Oct 28, 2014 at 10:08 AM, Colin colpcl...@gmail.com wrote:
 It is a mistake to call a proprietary piece of software community when you 
 cant use it in production.
 
 You can use OpsCenter community in production (however you'd like).
 
 
 -- 
 Tyler Hobbs
 DataStax


Re: opscenter with community cassandra

2014-10-28 Thread Ken Hancock
Your criteria for what is appropriate for production may differ from
others, but it's equally incorrect of you to make a blanket statement that
OpsCenter isn't suitable for production.  A number of people use it in
production.



On Tue, Oct 28, 2014 at 11:48 AM, Colin co...@clark.ws wrote:

 No, actually, you cant Tyler.

 If you mean the useless information it provides outside of licence, fine,
  if you mean the components outside, then same argument.

 Last time i checked, this forumn was about apache and not about datastax.
 Maybe a separate group should be deducated to provider specific offerings.

 --
 *Colin Clark*
 +1-320-221-9531


 On Oct 28, 2014, at 10:41 AM, Tyler Hobbs ty...@datastax.com wrote:


 On Tue, Oct 28, 2014 at 10:08 AM, Colin colpcl...@gmail.com wrote:

 It is a mistake to call a proprietary piece of software community when
 you cant use it in production.


 You can use OpsCenter community in production (however you'd like).


 --
 Tyler Hobbs
 DataStax http://datastax.com/




-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
http://www.schange.com/en-US/Company/InvestorRelations.aspx
Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks[image: LinkedIn]
http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.


Re: opscenter with community cassandra

2014-10-28 Thread Redmumba
Furthermore, people ask questions about monitoring and management utilities
for Cassandra all the time--this is in the same vein.

On Tue, Oct 28, 2014 at 8:52 AM, Ken Hancock ken.hanc...@schange.com
wrote:

 Your criteria for what is appropriate for production may differ from
 others, but it's equally incorrect of you to make a blanket statement that
 OpsCenter isn't suitable for production.  A number of people use it in
 production.



 On Tue, Oct 28, 2014 at 11:48 AM, Colin co...@clark.ws wrote:

 No, actually, you cant Tyler.

 If you mean the useless information it provides outside of licence, fine,
  if you mean the components outside, then same argument.

 Last time i checked, this forumn was about apache and not about
 datastax.  Maybe a separate group should be deducated to provider specific
 offerings.

 --
 *Colin Clark*
 +1-320-221-9531


 On Oct 28, 2014, at 10:41 AM, Tyler Hobbs ty...@datastax.com wrote:


 On Tue, Oct 28, 2014 at 10:08 AM, Colin colpcl...@gmail.com wrote:

 It is a mistake to call a proprietary piece of software community when
 you cant use it in production.


 You can use OpsCenter community in production (however you'd like).


 --
 Tyler Hobbs
 DataStax http://datastax.com/




 --
 *Ken Hancock *| System Architect, Advanced Advertising
 SeaChange International
 50 Nagog Park
 Acton, Massachusetts 01720
 ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
 http://www.schange.com/en-US/Company/InvestorRelations.aspx
 Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hanc...@schange.com
  | [image: Skype:]hancockks | [image: Yahoo IM:]hancockks[image: LinkedIn]
 http://www.linkedin.com/in/kenhancock

 [image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
 information which is SeaChange International confidential. The information
 enclosed is intended only for the addressees herein and may not be copied
 or forwarded without permission from SeaChange International.



OldGen saturation

2014-10-28 Thread Adria Arcarons
Hi,

I work for a company that gathers time series data from different sensors. I've 
been trying to set up C* in a single-node test environment in order to have an 
idea of what performance will Cassandra give in our use case. To do so I have 
implemented a test to simulate our real insertion pattern.

We have about 50.000 CFs of varying size, grouping sensors that are in the same 
physical location. Our partition key is made up of the id of the sensor and the 
type of the value that is being measured. Hence, a single row for each 
combination of (sensorId,parameterId). Our primary key is made up of the 
partition key + the timestamp and the measured value. Moreover, we have a 
clustering key by timestamps in order to make slice reads fast.

The writing test consists of a continuous flow of inserts. The inserts are done 
inside BATCH statements in groups of 1.000 to a single CF at a time to make 
them faster. The client is executed in a separate machine.

The problem I'm experiencing is that, eventually, when the script has been 
running for almost 40mins, the heap gets saturated. OldGen gets full and then 
there is an intensive GC activity trying to free OldGen objects, but it can 
only free very little space in each pass. Then GC saturates the CPU. Here are 
the graphs obtained with VisualVM that show this behavior:

CPU: 
https://www.dropbox.com/s/oqqqg0ygbd72n0n/CPU%202014-10-28%2014_24_06-VisualVM%201.3.8.jpg?dl=0
HEAP usage: 
https://www.dropbox.com/s/qp7iyc5o0fpr1xa/Estancament%20MEM%202014-10-28%2014_21_53-VisualVM%201.3.8.jpg?dl=0
OLDGEN full (via VisualGC): 
https://www.dropbox.com/s/5udvqq95qkjuppq/HEAP%202014-10-28%2014_22_27-VisualVM%201.3.8.jpg?dl=0

Moreover, when the heap is saturated, IO activity drops, from avg 90% of 
utilization of HD to roughly 15%. So I end up in a situation where very few 
data is flushed, very few data is freed from memory, and insert rate gets very 
slow. If the insert process is stopped, C* completes all its pending flushes 
and after a certain time GC activity stops but OldGen occupancy remains almost 
full.

Why the GC is not capable of freeing more memory?
Isn't cassandra supposed to stop accepting writes until a certain amount of 
memory is freed?
I'm sceptic about increasing the size of the memtables. If the IO subsystem 
isn't able to cope with the flush activity, the problem would only be delayed.
Can this problem be related in any way to our CF indexing settings?
Why, after completing all pending flushes and compactions, OldGen is still 
almost full, even when mct is set to 0.15?
Is the BATCH statement the appropriate to insert multiple values inside the 
same CF?

Any thoughts on this would be appreciated. I can provide full logs or config 
files to anyone interested.

Regards,
Adrià.

P.S. Details on the setup:
I'm working with the default values except for:
- offheap_objects enabled
- on-heap memtable size set to 128mb. I've experienced that this problem is 
reproduced also with greater on-heap memtable sizes.
- off-heap memtable size set to 2.5GB.
- The number of memtable flusher threads is 3.
- memtable_flush_threshold is set to 0.15 to perform regular flushes to disk.

My total heap size is 1GB and the the NewGen region of 256MB. The C* node has 
4GB RAM. Intel Xeon CPU E5520 @ 2.27GHz (3 cores). SATA 500GB HD. Debian 
7+Cassandra 2.1.0 + Oracle Java JRE  (build 1.7.0_71-b14). Regarding the 
writing client, it is implemented in PHP with the YACassandraPDO CQL library, 
which is based on thrift. The client is executed in a separate machine.


Re: opscenter with community cassandra

2014-10-28 Thread Tim Dunphy

 Furthermore, people ask questions about monitoring and management
 utilities for Cassandra all the time--this is in the same vein.


Speaking of which. Are there any viable alternatives to opscenter that
people also like?



On Tue, Oct 28, 2014 at 11:56 AM, Redmumba redmu...@gmail.com wrote:

 Furthermore, people ask questions about monitoring and management
 utilities for Cassandra all the time--this is in the same vein.

 On Tue, Oct 28, 2014 at 8:52 AM, Ken Hancock ken.hanc...@schange.com
 wrote:

 Your criteria for what is appropriate for production may differ from
 others, but it's equally incorrect of you to make a blanket statement that
 OpsCenter isn't suitable for production.  A number of people use it in
 production.



 On Tue, Oct 28, 2014 at 11:48 AM, Colin co...@clark.ws wrote:

 No, actually, you cant Tyler.

 If you mean the useless information it provides outside of licence,
 fine,  if you mean the components outside, then same argument.

 Last time i checked, this forumn was about apache and not about
 datastax.  Maybe a separate group should be deducated to provider specific
 offerings.

 --
 *Colin Clark*
 +1-320-221-9531


 On Oct 28, 2014, at 10:41 AM, Tyler Hobbs ty...@datastax.com wrote:


 On Tue, Oct 28, 2014 at 10:08 AM, Colin colpcl...@gmail.com wrote:

 It is a mistake to call a proprietary piece of software community when
 you cant use it in production.


 You can use OpsCenter community in production (however you'd like).


 --
 Tyler Hobbs
 DataStax http://datastax.com/




 --
 *Ken Hancock *| System Architect, Advanced Advertising
 SeaChange International
 50 Nagog Park
 Acton, Massachusetts 01720
 ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
 http://www.schange.com/en-US/Company/InvestorRelations.aspx
 Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hanc...@schange.com
  | [image: Skype:]hancockks | [image: Yahoo IM:]hancockks[image:
 LinkedIn] http://www.linkedin.com/in/kenhancock

 [image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
 information which is SeaChange International confidential. The information
 enclosed is intended only for the addressees herein and may not be copied
 or forwarded without permission from SeaChange International.





-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


CCM 2.0 Announcement

2014-10-28 Thread Philip Thompson
Hello,

I am pleased to announce CCM 2.0, a new major release that includes
breaking changes to the CCMLib API. If you are using CCM via its command
line interface, moving to CCM 2.0 should require no changes to your
workflow. Direct consumers of CCM's python modules, such as
cassandra-dtest, will need to make changes in order to use CCM 2.0.

Notably, CCM 2.0 will add support for running DataStax Enterprise clusters,
and running Cassandra 2.1+ clusters on Windows. I would like to thank Mike
Adamson for contributing the pull requests that enabled DSE support.

CCM 2.0 is available immediately on PyPI.  Future updates will also be
published to PyPI, allowing easy installation via pip.

In a future release of CCM, ownership of the project will be transferred to
DataStax. CCM will remain open source and Apache licensed.

Thank you,
Philip Thompson


Re: OldGen saturation

2014-10-28 Thread Bryan Talbot
On Tue, Oct 28, 2014 at 9:02 AM, Adria Arcarons 
adria.arcar...@greenpowermonitor.com wrote:

  Hi,

 Hi





 We have about 50.000 CFs of varying size







 The writing test consists of a continuous flow of inserts. The inserts are
 done inside BATCH statements in groups of 1.000 to a single CF at a time to
 make them faster.






 The problem I’m experiencing is that, eventually, when the script has been
 running for almost 40mins, the heap gets saturated. OldGen gets full and
 then there is an intensive GC activity trying to free OldGen objects, but
 it can only free very little space in each pass. Then GC saturates the CPU.
 Here are the graphs obtained with VisualVM that show this behavior:





 My total heap size is 1GB and the the NewGen region of 256MB. The C* node
 has 4GB RAM. Intel Xeon CPU E5520 @



Without looking at your VM graphs, I'm going to go out on a limb here and
say that your host is woefully underpowered to host fifty-thousand column
families and batch writes of one-thousand statements.

A 1 GB java heap size is sometimes acceptable for a unit test or playing
around with but you can't actually expect it to be adequate for a load test
can you?

Every CF consumes some permanent heap space for its metadata. Too many CF
are a bad thing. You probably have ten times more CF than would be
recommended as an upper limit.

-Bryan


Re: Explode a keyspace into multiple ones online

2014-10-28 Thread Robert Coli
On Tue, Oct 28, 2014 at 8:01 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:

 By the way I found this reference:
 http://grokbase.com/t/cassandra/user/13824z7ykm/best-way-to-split-cluster-online.
 Is this still the easiest solution ? Does this work for same table name
 but transferring data to a new keyspace ? Can we migrate sstables this way ?


The author of that solution is as wise as he is modest... that said,
there's a really easy way to split some column families out of a
Keyspace, even easier than the above.

https://issues.apache.org/jira/browse/CASSANDRA-1585?focusedCommentId=13488959page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13488959

To anyone who is wondering about the manual way to do this :

1) create schema for NEW_Keyspace
2) stop writes to OLD_Keyspace from app (reads can continue)
3) flush OLD_Keyspace on every node, via nodetool
4) hard link all sstables from OLD_Keyspace directory to NEW_Keyspace
directory
5) call nodetool -h localhost refresh NEW_Keyspace
6) enable reads/writes from/to NEW_Keyspace from app (disable reads on
OLD_Keyspace)
7) clean up OLD_Keyspace (drop schema, delete files, etc.)

Alternately, if you don't want to do 2/6 because you can't tolerate
OLD_Keyspace not being writable, you can enable writes to NEW_Keyspace,
flush OLD_Keyspace, hard link the just-flushed tables and then enable reads
from NEW_Keyspace. This resolves the delta with a shorter window where you
can't write.

The same technique could also be applied to renaming Columnfamilies,
although in the Columnfamily case the files also need to be renamed. In
Cassandra 1.1+, the files get renamed to include the Keyspace name, so that
would have to change as appropriate.

(Additional Notes :
In 1) You have to create the scheme with all the column familys and indexes.

In 4) remember that the files that stored the sstables start with the name
of the keyspace. You have to rename the files in order to be recognized by
the nodetool refresh.)

In 5), there is a race whereby if you are writing to NEW_Keyspace, you have
a nonzero chance of clobbering files with newly flushed files [1]


I see you on that ticket (in July 2013!) but perhaps you do not remember
its relevance.. :D

The primary reason to move Columnfamlies between Keyspaces is that
replication configuration is on the Keyspace level. So you can for example
have some Columnfamilies only available in some DCs, or with a lower RF.

=Rob
http://twitter.com/rcolidba
[1] https://issues.apache.org/jira/browse/CASSANDRA-6245


Re: EC2 Snitch load imbalance

2014-10-28 Thread Robert Coli
On Tue, Oct 28, 2014 at 3:35 AM, Oleg Dulin oleg.du...@gmail.com wrote:

 I have a setup with 6 cassandra nodes (1.2.18), using RandomPartition, not
 using vnodes -- this is a legacy cluster.

 We went from 3 nodes to 6 in the last few days to add capacity. However,
 there appears to be an imbalance:


The first step to check is if the ranges are even.

Did you pick initial_token for these new nodes?

=Rob


Re: opscenter with community cassandra

2014-10-28 Thread Colin
Ken, go ahead and check the difference in functionality between licensed and 
not, and what it takes to run a cluster and then get back to me.

Not only did I used to work for datastax, but before then was the lead 
architecht for a very large projecy using cassandra.  Please dont confuse 
apache cassandra with datatstax proprietary offerings.

And now I will tell you that, in a blanket statement, opscenter *IS NOT READY 
FOR A PRODUCTION ENVIRONMENT*

Your mileage may vary, and you might not take security very seriously, so go 
head and expose your cluster.

Enjoy!
--
Colin Clark 
+1-320-221-9531
 

 On Oct 28, 2014, at 10:52 AM, Ken Hancock ken.hanc...@schange.com wrote:
 
 Your criteria for what is appropriate for production may differ from others, 
 but it's equally incorrect of you to make a blanket statement that OpsCenter 
 isn't suitable for production.  A number of people use it in production.
 
 
 
 On Tue, Oct 28, 2014 at 11:48 AM, Colin co...@clark.ws wrote:
 No, actually, you cant Tyler.
 
 If you mean the useless information it provides outside of licence, fine,  
 if you mean the components outside, then same argument.
 
 Last time i checked, this forumn was about apache and not about datastax.  
 Maybe a separate group should be deducated to provider specific offerings.
 
 --
 Colin Clark 
 +1-320-221-9531
  
 
 On Oct 28, 2014, at 10:41 AM, Tyler Hobbs ty...@datastax.com wrote:
 
 
 On Tue, Oct 28, 2014 at 10:08 AM, Colin colpcl...@gmail.com wrote:
 It is a mistake to call a proprietary piece of software community when you 
 cant use it in production.
 
 You can use OpsCenter community in production (however you'd like).
 
 
 -- 
 Tyler Hobbs
 DataStax
 
 
 
 -- 
 Ken Hancock | System Architect, Advanced Advertising 
 SeaChange International 
 50 Nagog Park
 Acton, Massachusetts 01720
 ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC 
 Office: +1 (978) 889-3329 |  ken.hanc...@schange.com | hancockks | hancockks  
 
 
 This e-mail and any attachments may contain information which is SeaChange 
 International confidential. The information enclosed is intended only for the 
 addressees herein and may not be copied or forwarded without permission from 
 SeaChange International.


2.0.10 to 2.0.11 upgrade and immediate ParNew and CMS GC storm

2014-10-28 Thread Peter Haggerty
On a 3 node test cluster we recently upgraded one node from 2.0.10 to
2.0.11. This is a cluster that had been happily running 2.0.10 for
weeks and that has very little load and very capable hardware. The
upgrade was just your typical package upgrade:

$ dpkg -s cassandra | egrep '^Ver|^Main'
Maintainer: Eric Evans eev...@apache.org
Version: 2.0.11

Immediately after started it ran a couple of ParNews and then started
executing CMS runs. In 10 minutes the node had become unreachable and
was marked as down by the two other nodes in the ring, which are still
2.0.10.

We have jstack output and the server logs but nothing seems to be
jumping out. Has anyone else run into this? What should we be looking
for?


Peter


Re: OOM at Bootstrap Time

2014-10-28 Thread Maxime
Doan, thanks for the tip, I just read about it this morning, just waiting
for the new version to pop up on the debian datastax repo.

Michael, I do believe you are correct in the general running of the cluster
and I've reset everything.

So it took me a while to reply, I finally got the SSTables down, as seen in
the OpsCenter graphs. I'm stumped however because when I bootstrap the new
node, I still see very large number of files being streamed (~1500 for some
nodes) and the bootstrap process is failing exactly as it did before, in a
flury of Enqueuing flush of ...

Any ideas? I'm reaching the end of what I know I can do, OpsCenter says
around 32 SStables per CF, but still streaming tons of files. :-/


On Mon, Oct 27, 2014 at 1:12 PM, DuyHai Doan doanduy...@gmail.com wrote:

 Tombstones will be a very important issue for me since the dataset is
 very much a rolling dataset using TTLs heavily.

 -- You can try the new DateTiered compaction strategy (
 https://issues.apache.org/jira/browse/CASSANDRA-6602) released on 2.1.1
 if you have a time series data model to eliminate tombstones

 On Mon, Oct 27, 2014 at 5:47 PM, Laing, Michael michael.la...@nytimes.com
  wrote:

 Again, from our experience w 2.0.x:

 Revert to the defaults - you are manually setting heap way too high IMHO.

 On our small nodes we tried LCS - way too much compaction - switch all
 CFs to STCS.

 We do a major rolling compaction on our small nodes weekly during less
 busy hours - works great. Be sure you have enough disk.

 We never explicitly delete and only use ttls or truncation. You can set
 GC to 0 in that case, so tombstones are more readily expunged. There are a
 couple threads in the list that discuss this... also normal rolling repair
 becomes optional, reducing load (still repair if something unusual happens
 tho...).

 In your current situation, you need to kickstart compaction - are there
 any CFs you can truncate at least temporarily? Then try compacting a small
 CF, then another, etc.

 Hopefully you can get enough headroom to add a node.

 ml




 On Sun, Oct 26, 2014 at 6:24 PM, Maxime maxim...@gmail.com wrote:

 Hmm, thanks for the reading.

 I initially followed some (perhaps too old) maintenance scripts, which
 included weekly 'nodetool compact'. Is there a way for me to undo the
 damage? Tombstones will be a very important issue for me since the dataset
 is very much a rolling dataset using TTLs heavily.

 On Sun, Oct 26, 2014 at 6:04 PM, DuyHai Doan doanduy...@gmail.com
 wrote:

 Should doing a major compaction on those nodes lead to a restructuration
 of the SSTables? -- Beware of the major compaction on SizeTiered, it will
 create 2 giant SSTables and the expired/outdated/tombstone columns in this
 big file will be never cleaned since the SSTable will never get a chance to
 be compacted again

 Essentially to reduce the fragmentation of small SSTables you can stay
 with SizeTiered compaction and play around with compaction properties (the
 thresholds) to make C* group a bunch of files each time it compacts so that
 the file number shrinks to a reasonable count

 Since you're using C* 2.1 and anti-compaction has been introduced, I
 hesitate advising you to use Leveled compaction as a work-around to reduce
 SSTable count.

  Things are a little bit more complicated because of the incremental
 repair process (I don't know whether you're using incremental repair or not
 in production). The Dev blog says that Leveled compaction is performed only
 on repaired SSTables, the un-repaired ones still use SizeTiered, more
 details here:
 http://www.datastax.com/dev/blog/anticompaction-in-cassandra-2-1

 Regards





 On Sun, Oct 26, 2014 at 9:44 PM, Jonathan Haddad j...@jonhaddad.com
 wrote:

 If the issue is related to I/O, you're going to want to determine if
 you're saturated.  Take a look at `iostat -dmx 1`, you'll see avgqu-sz
 (queue size) and svctm, (service time).The higher those numbers
 are, the most overwhelmed your disk is.

 On Sun, Oct 26, 2014 at 12:01 PM, DuyHai Doan doanduy...@gmail.com
 wrote:
  Hello Maxime
 
  Increasing the flush writers won't help if your disk I/O is not
 keeping up.
 
  I've had a look into the log file, below are some remarks:
 
  1) There are a lot of SSTables on disk for some tables (events for
 example,
  but not only). I've seen that some compactions are taking up to 32
 SSTables
  (which corresponds to the default max value for SizeTiered
 compaction).
 
  2) There is a secondary index that I found suspicious :
 loc.loc_id_idx. As
  its name implies I have the impression that it's an index on the id
 of the
  loc which would lead to almost an 1-1 relationship between the
 indexed value
  and the original loc. Such index should be avoided because they do
 not
  perform well. If it's not an index on the loc_id, please disregard
 my remark
 
  3) There is a clear imbalance of SSTable count on some nodes. In the
 log, I
  saw:
 
  INFO  [STREAM-IN-/...20] 2014-10-25