Re: Larger mutations

2019-07-11 Thread Laxmikant Upadhyay
Hi,

First look out for ways to reduce mutation size if possible (reducing batch
size or reducing blob size).  If you cant, then do the change and test it.
More specifically focus on memory usage and gc pauses. If if looks ok then
you are good to go.

regards,
Laxmikant

On Wed, Jul 10, 2019 at 6:21 PM Muralikrishna Gutha 
wrote:

> We are running Cassandra: 2.1.15, We started noticing below errors, app is
> writing larger mutations and before we take a route of increasing from 32MB
> to 64MB for `commitlog_segment_size_in_mb` wanted to understand if there
> are any cons by increasing it. Do we need to tweak any other parameters as
> a result to this change ?
>
>
> Caused by: java.lang.IllegalArgumentException: *Mutation of 19019986
> bytes is too large for the maximum size of 16777216*
> at org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:221)
> ~[apache-cassandra-2.1.15.jar:2.1.15]
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:379)
> ~[apache-cassandra-2.1.15.jar:2.1.15]```
>
> Thanks for your time...
>
> Thanks
> Murali Gutha
>


-- 

regards,
Laxmikant Upadhyay


Breaking up major compacted Sstable with TWCS

2019-07-11 Thread Leon Zaruvinsky
Hi,

We are switching a table to run using TWCS. However, after running the alter 
statement, we ran a major compaction without understanding the implications.

Now, while new sstables are properly being created according to the time 
window, there is a giant sstable sitting around waiting for expiration.

Is there a way we can break it up again?  Running the alter statement again 
doesn’t seem to be touching it.

Thanks,
Leon

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Splitting 2-datacenter cluster into two clusters

2019-07-11 Thread Voytek Jarnot
Thank you for the very in-depth reply. Thinking more about it, I think in
my case I'm safe with keeping the cluster name.  It actually took a ton of
firewall work to get these DCs talking to each other in the first place, so
I'm not too concerned about undoing that and having future accidental
contact.

On Thu, Jul 11, 2019 at 10:23 AM Jeff Jirsa  wrote:

> Let's talk about the challenges, then talk about the strategy we'll use to
> do this.
>
> The logic cassandra uses to identify the rest of it's cluster comes down
> to ~3 things:
> - Cluster name (in yaml and system.local)
> - Seeds in your seed provider (probably a list of IPs in the yaml)
> - The known peers in system.peers, which it will use to connect / discover
> on cassandra restarts (also used by clients to connect to the rest of the
> cluster, so it's important it's accurate)
>
> The cluster name is meant to be mostly immutable - we don't provide an
> obvious mechanism to change it, really, because if you change the yaml and
> restart, the database should (last I checked) fail to startup when the yaml
> doesn't match the data in system.local.
>
> Cassandra keeps a list of all of the other hosts it knows about in
> system.peers, and will attempt to reconnect to them as long as it's in
> system.peers or gossip.
>
> Most approaches to do this ignore the cluster name and rely on firewalls
> to separate the two DCs, then nodetool assassinate to get the IPs from
> gossip, but ultimately the two clusters have the same name, and if they
> EVER get reminded of the old IP, they'll re-join each other, and you'll be
> unhappy. For that reason, we probably want to change the cluster name in
> one side or the other to make sure we protect ourself.
>
> Strategy wise, I'd pick one cluster that can take downtime. It won't take
> much, but it'll be more than zero, then approach it with something like the
> following:
>
> - Firewall off the two clusters so they cant talk.
> - Figure out which cluster will keep the name (we'll call it 'old'), and
> one which will change the name ('new')
> - Push a yaml to old that removes all seeds in the 'new' dc
> - Push a yaml to new that removes all seeds in 'old' dc, and has a 'new'
> cluster name in it
> - Alter the schema in old to remove the 'new' dc in replication settings
> - Alter the schema in new to remove the 'old' dc in replication settings
> - In the 'new' hosts, change the cluster name in every single instance of
> system.local ( update system.local set cluster_name='new' where
> key='local'; ) and flush (nodetool flush) on every host
> - Restart the new hosts, they'll come up with a new cluster name, and at
> this point if the firewall is turned off, both clusters will TRY to talk to
> each other, but the different cluster names will prevent it.
> - At that point, you can nodetool removenode / nodetool assassinate the
> 'old' IPs in 'new' and the 'new' IPs in 'old'
> - Finally, check system.peers for any stray leftovers - there have been
> times when system.peers leaked data. Cleanup anything that's wrong.
> - Then remove the firewall rules
>
> Obviously, you want to try this on a test cluster first.
>
>
>
> On Thu, Jul 11, 2019 at 8:03 AM Voytek Jarnot 
> wrote:
>
>> My google-fu is failing me this morning. I'm looking for any tips on
>> splitting a 2 DC cluster into two separate clusters. I see a lot of docs
>> about decomissioning a datacenter, but not much in the way of disconnecting
>> datacenters into individual clusters, but keeping each one as-is data-wise
>> (aside from replication factor, of course).
>>
>> Our setup is simple: two DCs (dc1 and dc2), two seed nodes (both in dc1;
>> yes, I know not the recommended config), one keyspace (besides the system
>> ones) replicated in both DCs. I'm trying to end up with two clusters with 1
>> DC in each.
>>
>> Would appreciate any input.
>>
>


Re: Splitting 2-datacenter cluster into two clusters

2019-07-11 Thread Jeff Jirsa
Let's talk about the challenges, then talk about the strategy we'll use to
do this.

The logic cassandra uses to identify the rest of it's cluster comes down to
~3 things:
- Cluster name (in yaml and system.local)
- Seeds in your seed provider (probably a list of IPs in the yaml)
- The known peers in system.peers, which it will use to connect / discover
on cassandra restarts (also used by clients to connect to the rest of the
cluster, so it's important it's accurate)

The cluster name is meant to be mostly immutable - we don't provide an
obvious mechanism to change it, really, because if you change the yaml and
restart, the database should (last I checked) fail to startup when the yaml
doesn't match the data in system.local.

Cassandra keeps a list of all of the other hosts it knows about in
system.peers, and will attempt to reconnect to them as long as it's in
system.peers or gossip.

Most approaches to do this ignore the cluster name and rely on firewalls to
separate the two DCs, then nodetool assassinate to get the IPs from gossip,
but ultimately the two clusters have the same name, and if they EVER get
reminded of the old IP, they'll re-join each other, and you'll be unhappy.
For that reason, we probably want to change the cluster name in one side or
the other to make sure we protect ourself.

Strategy wise, I'd pick one cluster that can take downtime. It won't take
much, but it'll be more than zero, then approach it with something like the
following:

- Firewall off the two clusters so they cant talk.
- Figure out which cluster will keep the name (we'll call it 'old'), and
one which will change the name ('new')
- Push a yaml to old that removes all seeds in the 'new' dc
- Push a yaml to new that removes all seeds in 'old' dc, and has a 'new'
cluster name in it
- Alter the schema in old to remove the 'new' dc in replication settings
- Alter the schema in new to remove the 'old' dc in replication settings
- In the 'new' hosts, change the cluster name in every single instance of
system.local ( update system.local set cluster_name='new' where
key='local'; ) and flush (nodetool flush) on every host
- Restart the new hosts, they'll come up with a new cluster name, and at
this point if the firewall is turned off, both clusters will TRY to talk to
each other, but the different cluster names will prevent it.
- At that point, you can nodetool removenode / nodetool assassinate the
'old' IPs in 'new' and the 'new' IPs in 'old'
- Finally, check system.peers for any stray leftovers - there have been
times when system.peers leaked data. Cleanup anything that's wrong.
- Then remove the firewall rules

Obviously, you want to try this on a test cluster first.



On Thu, Jul 11, 2019 at 8:03 AM Voytek Jarnot 
wrote:

> My google-fu is failing me this morning. I'm looking for any tips on
> splitting a 2 DC cluster into two separate clusters. I see a lot of docs
> about decomissioning a datacenter, but not much in the way of disconnecting
> datacenters into individual clusters, but keeping each one as-is data-wise
> (aside from replication factor, of course).
>
> Our setup is simple: two DCs (dc1 and dc2), two seed nodes (both in dc1;
> yes, I know not the recommended config), one keyspace (besides the system
> ones) replicated in both DCs. I'm trying to end up with two clusters with 1
> DC in each.
>
> Would appreciate any input.
>


Re: Splitting 2-datacenter cluster into two clusters

2019-07-11 Thread Voytek Jarnot
Premature send, apologies.

At minimum, I see the following needing to happen:

dc2:
update cluster name in system.local
cassandra.yaml in dc2:
cluster_name: change to new cluster name
seeds: change to point at a couple of local nodes

system_auth, system_distributed, system_traces, and my keyspace replication
factors need altering

I guess the main issues I see are related to the timing of things. Seems
like I need to make sure the DCs are disconnected into separate clusters
before doing an ALTER KEYSPACE (since the ALTER will differ between dc1 and
dc2). That, and getting the nodes in each current DC to "forget" about the
other DC.


On Thu, Jul 11, 2019 at 10:03 AM Voytek Jarnot 
wrote:

> My google-fu is failing me this morning. I'm looking for any tips on
> splitting a 2 DC cluster into two separate clusters. I see a lot of docs
> about decomissioning a datacenter, but not much in the way of disconnecting
> datacenters into individual clusters, but keeping each one as-is data-wise
> (aside from replication factor, of course).
>
> Our setup is simple: two DCs (dc1 and dc2), two seed nodes (both in dc1;
> yes, I know not the recommended config), one keyspace (besides the system
> ones) replicated in both DCs. I'm trying to end up with two clusters with 1
> DC in each.
>
> Would appreciate any input.
>


Re: Splitting 2-datacenter cluster into two clusters

2019-07-11 Thread Oleksandr Shulgin
On Thu, Jul 11, 2019 at 5:04 PM Voytek Jarnot 
wrote:

> My google-fu is failing me this morning. I'm looking for any tips on
> splitting a 2 DC cluster into two separate clusters. I see a lot of docs
> about decomissioning a datacenter, but not much in the way of disconnecting
> datacenters into individual clusters, but keeping each one as-is data-wise
> (aside from replication factor, of course).
>
> Our setup is simple: two DCs (dc1 and dc2), two seed nodes (both in dc1;
> yes, I know not the recommended config), one keyspace (besides the system
> ones) replicated in both DCs. I'm trying to end up with two clusters with 1
> DC in each.
>

The first step that comes to my mind would be this:

cql in dc1=> ALTER KEYSPACE data WITH ... {'dc1': RF};

cql in dc2=> ALTER KEYSPACE data WITH ... {'dc2': RF};

Then you probably want to update seeds list of each DC to contain only
nodes from that local DC.  The next step is tricky, but somehow it feels
that you will need to isolate then networks, so that nodes in DC1 see all
nodes from DC2 as DOWN and vice-versa, then assassinate the nodes from
remote DC.

Testing in a lab first is probably a good idea. ;-)

-- 
Alex


Splitting 2-datacenter cluster into two clusters

2019-07-11 Thread Voytek Jarnot
My google-fu is failing me this morning. I'm looking for any tips on
splitting a 2 DC cluster into two separate clusters. I see a lot of docs
about decomissioning a datacenter, but not much in the way of disconnecting
datacenters into individual clusters, but keeping each one as-is data-wise
(aside from replication factor, of course).

Our setup is simple: two DCs (dc1 and dc2), two seed nodes (both in dc1;
yes, I know not the recommended config), one keyspace (besides the system
ones) replicated in both DCs. I'm trying to end up with two clusters with 1
DC in each.

Would appreciate any input.