Re: Is there a way to add a new node to a cluster but not sync old data?

2015-01-22 Thread Kai Wang
In last year's summit there was a presentation from Instaclustr -
https://www.instaclustr.com/meetups/presentation-by-ben-bromhead-at-cassandra-summit-2014-san-francisco/.
It could be the solution you are looking for. However I don't see the code
being checked in or JIRA being created. So for now you'd better plan the
capacity carefully.

On Wed, Jan 21, 2015 at 11:21 PM, Yatong Zhang bluefl...@gmail.com wrote:

 Yes, my cluster is almost full and there are lots of pending tasks. You
 helped me a lot and thank you Eric~

 On Thu, Jan 22, 2015 at 11:59 AM, Eric Stevens migh...@gmail.com wrote:

 Yes, bootstrapping a new node will cause read loads on your existing
 nodes - it is becoming the owner and replica of a whole new set of existing
 data.  To do that it needs to know what data it's now responsible for, and
 that's what bootstrapping is for.

 If you're at the point where bootstrapping a new node is placing a
 too-heavy burden on your existing nodes, you may be dangerously close to or
 even past the tipping point where you ought to have already grown your
 cluster.  You need to grow your cluster as soon as possible, and chances
 are you're close to no longer being able to keep up with compaction (see
 nodetool compactionstats, make sure pending tasks is 5, preferably 0 or
 1).  Once you're falling behind on compaction, it becomes difficult to
 successfully bootstrap new nodes, and you're in a very tough spot.


 On Wed, Jan 21, 2015 at 7:43 PM, Yatong Zhang bluefl...@gmail.com
 wrote:

 Thanks for the reply. The bootstrap of new node put a heavy burden on
 the whole cluster and I don't know why. So that' the issue I want to fix
 actually.

 On Mon, Jan 12, 2015 at 6:08 AM, Eric Stevens migh...@gmail.com wrote:

 Yes, but it won't do what I suspect you're hoping for.  If you disable
 auto_bootstrap in cassandra.yaml the node will join the cluster and will
 not stream any old data from existing nodes.

 The cluster will now be in an inconsistent state.  If you bring enough
 nodes online this way to violate your read consistency level (eg RF=3,
 CL=Quorum, if you bring on 2 nodes this way), some of your queries will be
 missing data that they ought to have returned.

 There is no way to bring a new node online and have it be responsible
 just for new data, and have no responsibility for old data.  It *will* be
 responsible for old data, it just won't *know* about the old data it
 should be responsible for.  Executing a repair will fix this, but only
 because the existing nodes will stream all the missing data to the new
 node.  This will create more pressure on your cluster than just normal
 bootstrapping would have.

 I can't think of any reason you'd want to do that unless you needed to
 grow your cluster really quickly, and were ok with corrupting your old 
 data.

 On Sat, Jan 10, 2015 at 12:39 AM, Yatong Zhang bluefl...@gmail.com
 wrote:

 Hi there,

 I am using C* 2.0.10 and I was trying to add a new node to a
 cluster(actually replace a dead node). But after added the new node some
 other nodes in the cluster had a very high work-load and affected the 
 whole
 performance of the cluster.
 So I am wondering is there a way to add a new node and this node only
 afford new data?








Re: Is there a way to add a new node to a cluster but not sync old data?

2015-01-22 Thread Ryan Svihla
Usually this is about tuning, and this isn't an uncommon situation for new
users.

Potential steps to take

1) reduce stream throughput to a point that your cluster can handle it.
This is probably your most important tool. The default throughput depending
on version is 200mb or 400mb, go ahead and drop it down further and
further, I've had to use as low as 15 megs on all nodes to get a single
node bootstrapped. Use nodetool for runtime change of this configuration
http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsSetStreamThroughput.html

2) Scale up. if you run out of disk space on nodes and can't compact
anymore then add more disk and change where the data is stored ( make sure
your new disk is fast enough to keep up). If it's load add more cpu and ram.
3) Do some root cause analysis. I can't tell you how many of these issues
are bad JVM tuning, or bad cassandra settings.

On Thu, Jan 22, 2015 at 7:50 AM, Kai Wang dep...@gmail.com wrote:

 In last year's summit there was a presentation from Instaclustr -
 https://www.instaclustr.com/meetups/presentation-by-ben-bromhead-at-cassandra-summit-2014-san-francisco/.
 It could be the solution you are looking for. However I don't see the code
 being checked in or JIRA being created. So for now you'd better plan the
 capacity carefully.


 On Wed, Jan 21, 2015 at 11:21 PM, Yatong Zhang bluefl...@gmail.com
 wrote:

 Yes, my cluster is almost full and there are lots of pending tasks. You
 helped me a lot and thank you Eric~

 On Thu, Jan 22, 2015 at 11:59 AM, Eric Stevens migh...@gmail.com wrote:

 Yes, bootstrapping a new node will cause read loads on your existing
 nodes - it is becoming the owner and replica of a whole new set of existing
 data.  To do that it needs to know what data it's now responsible for, and
 that's what bootstrapping is for.

 If you're at the point where bootstrapping a new node is placing a
 too-heavy burden on your existing nodes, you may be dangerously close to or
 even past the tipping point where you ought to have already grown your
 cluster.  You need to grow your cluster as soon as possible, and chances
 are you're close to no longer being able to keep up with compaction (see
 nodetool compactionstats, make sure pending tasks is 5, preferably 0 or
 1).  Once you're falling behind on compaction, it becomes difficult to
 successfully bootstrap new nodes, and you're in a very tough spot.


 On Wed, Jan 21, 2015 at 7:43 PM, Yatong Zhang bluefl...@gmail.com
 wrote:

 Thanks for the reply. The bootstrap of new node put a heavy burden on
 the whole cluster and I don't know why. So that' the issue I want to fix
 actually.

 On Mon, Jan 12, 2015 at 6:08 AM, Eric Stevens migh...@gmail.com
 wrote:

 Yes, but it won't do what I suspect you're hoping for.  If you disable
 auto_bootstrap in cassandra.yaml the node will join the cluster and will
 not stream any old data from existing nodes.

 The cluster will now be in an inconsistent state.  If you bring enough
 nodes online this way to violate your read consistency level (eg RF=3,
 CL=Quorum, if you bring on 2 nodes this way), some of your queries will be
 missing data that they ought to have returned.

 There is no way to bring a new node online and have it be responsible
 just for new data, and have no responsibility for old data.  It *will* be
 responsible for old data, it just won't *know* about the old data it
 should be responsible for.  Executing a repair will fix this, but only
 because the existing nodes will stream all the missing data to the new
 node.  This will create more pressure on your cluster than just normal
 bootstrapping would have.

 I can't think of any reason you'd want to do that unless you needed to
 grow your cluster really quickly, and were ok with corrupting your old 
 data.

 On Sat, Jan 10, 2015 at 12:39 AM, Yatong Zhang bluefl...@gmail.com
 wrote:

 Hi there,

 I am using C* 2.0.10 and I was trying to add a new node to a
 cluster(actually replace a dead node). But after added the new node some
 other nodes in the cluster had a very high work-load and affected the 
 whole
 performance of the cluster.
 So I am wondering is there a way to add a new node and this node only
 afford new data?









-- 

Thanks,
Ryan Svihla


Re: Is there a way to add a new node to a cluster but not sync old data?

2015-01-21 Thread Yatong Zhang
Thanks for the reply. The bootstrap of new node put a heavy burden on the
whole cluster and I don't know why. So that' the issue I want to fix
actually.

On Mon, Jan 12, 2015 at 6:08 AM, Eric Stevens migh...@gmail.com wrote:

 Yes, but it won't do what I suspect you're hoping for.  If you disable
 auto_bootstrap in cassandra.yaml the node will join the cluster and will
 not stream any old data from existing nodes.

 The cluster will now be in an inconsistent state.  If you bring enough
 nodes online this way to violate your read consistency level (eg RF=3,
 CL=Quorum, if you bring on 2 nodes this way), some of your queries will be
 missing data that they ought to have returned.

 There is no way to bring a new node online and have it be responsible just
 for new data, and have no responsibility for old data.  It *will* be
 responsible for old data, it just won't *know* about the old data it
 should be responsible for.  Executing a repair will fix this, but only
 because the existing nodes will stream all the missing data to the new
 node.  This will create more pressure on your cluster than just normal
 bootstrapping would have.

 I can't think of any reason you'd want to do that unless you needed to
 grow your cluster really quickly, and were ok with corrupting your old data.

 On Sat, Jan 10, 2015 at 12:39 AM, Yatong Zhang bluefl...@gmail.com
 wrote:

 Hi there,

 I am using C* 2.0.10 and I was trying to add a new node to a
 cluster(actually replace a dead node). But after added the new node some
 other nodes in the cluster had a very high work-load and affected the whole
 performance of the cluster.
 So I am wondering is there a way to add a new node and this node only
 afford new data?





Re: Is there a way to add a new node to a cluster but not sync old data?

2015-01-21 Thread Eric Stevens
Yes, bootstrapping a new node will cause read loads on your existing nodes
- it is becoming the owner and replica of a whole new set of existing
data.  To do that it needs to know what data it's now responsible for, and
that's what bootstrapping is for.

If you're at the point where bootstrapping a new node is placing a
too-heavy burden on your existing nodes, you may be dangerously close to or
even past the tipping point where you ought to have already grown your
cluster.  You need to grow your cluster as soon as possible, and chances
are you're close to no longer being able to keep up with compaction (see
nodetool compactionstats, make sure pending tasks is 5, preferably 0 or
1).  Once you're falling behind on compaction, it becomes difficult to
successfully bootstrap new nodes, and you're in a very tough spot.


On Wed, Jan 21, 2015 at 7:43 PM, Yatong Zhang bluefl...@gmail.com wrote:

 Thanks for the reply. The bootstrap of new node put a heavy burden on the
 whole cluster and I don't know why. So that' the issue I want to fix
 actually.

 On Mon, Jan 12, 2015 at 6:08 AM, Eric Stevens migh...@gmail.com wrote:

 Yes, but it won't do what I suspect you're hoping for.  If you disable
 auto_bootstrap in cassandra.yaml the node will join the cluster and will
 not stream any old data from existing nodes.

 The cluster will now be in an inconsistent state.  If you bring enough
 nodes online this way to violate your read consistency level (eg RF=3,
 CL=Quorum, if you bring on 2 nodes this way), some of your queries will be
 missing data that they ought to have returned.

 There is no way to bring a new node online and have it be responsible
 just for new data, and have no responsibility for old data.  It *will* be
 responsible for old data, it just won't *know* about the old data it
 should be responsible for.  Executing a repair will fix this, but only
 because the existing nodes will stream all the missing data to the new
 node.  This will create more pressure on your cluster than just normal
 bootstrapping would have.

 I can't think of any reason you'd want to do that unless you needed to
 grow your cluster really quickly, and were ok with corrupting your old data.

 On Sat, Jan 10, 2015 at 12:39 AM, Yatong Zhang bluefl...@gmail.com
 wrote:

 Hi there,

 I am using C* 2.0.10 and I was trying to add a new node to a
 cluster(actually replace a dead node). But after added the new node some
 other nodes in the cluster had a very high work-load and affected the whole
 performance of the cluster.
 So I am wondering is there a way to add a new node and this node only
 afford new data?






Re: Is there a way to add a new node to a cluster but not sync old data?

2015-01-21 Thread Yatong Zhang
Yes, my cluster is almost full and there are lots of pending tasks. You
helped me a lot and thank you Eric~

On Thu, Jan 22, 2015 at 11:59 AM, Eric Stevens migh...@gmail.com wrote:

 Yes, bootstrapping a new node will cause read loads on your existing nodes
 - it is becoming the owner and replica of a whole new set of existing
 data.  To do that it needs to know what data it's now responsible for, and
 that's what bootstrapping is for.

 If you're at the point where bootstrapping a new node is placing a
 too-heavy burden on your existing nodes, you may be dangerously close to or
 even past the tipping point where you ought to have already grown your
 cluster.  You need to grow your cluster as soon as possible, and chances
 are you're close to no longer being able to keep up with compaction (see
 nodetool compactionstats, make sure pending tasks is 5, preferably 0 or
 1).  Once you're falling behind on compaction, it becomes difficult to
 successfully bootstrap new nodes, and you're in a very tough spot.


 On Wed, Jan 21, 2015 at 7:43 PM, Yatong Zhang bluefl...@gmail.com wrote:

 Thanks for the reply. The bootstrap of new node put a heavy burden on the
 whole cluster and I don't know why. So that' the issue I want to fix
 actually.

 On Mon, Jan 12, 2015 at 6:08 AM, Eric Stevens migh...@gmail.com wrote:

 Yes, but it won't do what I suspect you're hoping for.  If you disable
 auto_bootstrap in cassandra.yaml the node will join the cluster and will
 not stream any old data from existing nodes.

 The cluster will now be in an inconsistent state.  If you bring enough
 nodes online this way to violate your read consistency level (eg RF=3,
 CL=Quorum, if you bring on 2 nodes this way), some of your queries will be
 missing data that they ought to have returned.

 There is no way to bring a new node online and have it be responsible
 just for new data, and have no responsibility for old data.  It *will* be
 responsible for old data, it just won't *know* about the old data it
 should be responsible for.  Executing a repair will fix this, but only
 because the existing nodes will stream all the missing data to the new
 node.  This will create more pressure on your cluster than just normal
 bootstrapping would have.

 I can't think of any reason you'd want to do that unless you needed to
 grow your cluster really quickly, and were ok with corrupting your old data.

 On Sat, Jan 10, 2015 at 12:39 AM, Yatong Zhang bluefl...@gmail.com
 wrote:

 Hi there,

 I am using C* 2.0.10 and I was trying to add a new node to a
 cluster(actually replace a dead node). But after added the new node some
 other nodes in the cluster had a very high work-load and affected the whole
 performance of the cluster.
 So I am wondering is there a way to add a new node and this node only
 afford new data?







Re: Is there a way to add a new node to a cluster but not sync old data?

2015-01-11 Thread Eric Stevens
Yes, but it won't do what I suspect you're hoping for.  If you disable
auto_bootstrap in cassandra.yaml the node will join the cluster and will
not stream any old data from existing nodes.

The cluster will now be in an inconsistent state.  If you bring enough
nodes online this way to violate your read consistency level (eg RF=3,
CL=Quorum, if you bring on 2 nodes this way), some of your queries will be
missing data that they ought to have returned.

There is no way to bring a new node online and have it be responsible just
for new data, and have no responsibility for old data.  It *will* be
responsible for old data, it just won't *know* about the old data it should
be responsible for.  Executing a repair will fix this, but only because the
existing nodes will stream all the missing data to the new node.  This will
create more pressure on your cluster than just normal bootstrapping would
have.

I can't think of any reason you'd want to do that unless you needed to grow
your cluster really quickly, and were ok with corrupting your old data.

On Sat, Jan 10, 2015 at 12:39 AM, Yatong Zhang bluefl...@gmail.com wrote:

 Hi there,

 I am using C* 2.0.10 and I was trying to add a new node to a
 cluster(actually replace a dead node). But after added the new node some
 other nodes in the cluster had a very high work-load and affected the whole
 performance of the cluster.
 So I am wondering is there a way to add a new node and this node only
 afford new data?