Re: Is cleanup is required if cluster topology changes

2023-05-09 Thread Jaydeep Chovatia
Another request to the community to see if this is feasible or not: Can we not wait for (CEP-21), and do the necessary cleanup as part of regular compaction itself to avoid running *cleanup* manually? For now, we can control through a flag, which is *false* by default. Whosoever wants to do the

Re: Is cleanup is required if cluster topology changes

2023-05-09 Thread Bowen Song via user
Because an operator will need to check and ensure the schema is consistent across the cluster before running "nodetool cleanup". At the moment, it's the operator's responsibility to ensure bad things don't happen. On 09/05/2023 06:20, Jaydeep Chovatia wrote: One cl

Re: Is cleanup is required if cluster topology changes

2023-05-08 Thread Jaydeep Chovatia
One clarification question Jeff. AFAIK, the *nodetool cleanup* also internally goes through the same compaction path as the regular compaction. Then why do we have to wait for CEP-21 to clean up unowned data in the regular compaction path? Wouldn't it be as simple as regular compaction just i

Re: Is cleanup is required if cluster topology changes

2023-05-05 Thread Jaydeep Chovatia
Thanks, Jeff, for the detailed steps and summary. We will keep the community (this thread) up to date on how it plays out in our fleet. Jaydeep On Fri, May 5, 2023 at 9:10 AM Jeff Jirsa wrote: > Lots of caveats on these suggestions, let me try to hit most of them. > > Cleanup in pa

Re: Is cleanup is required if cluster topology changes

2023-05-05 Thread Jeff Jirsa
Lots of caveats on these suggestions, let me try to hit most of them. Cleanup in parallel is good and fine and common. Limit number of threads in cleanup if you're using lots of vnodes, so each node runs one at a time and not all nodes use all your cores at the same time. If a host is

Re: Is cleanup is required if cluster topology changes

2023-05-05 Thread Jaydeep Chovatia
ble. > > Also, reducing number of vnodes per server can limit the number of servers > affected by replacing a single server, therefore reducing the amount of > time required to run "nodetool cleanup" if it is run sequentially. > > Finally, you may choose to run "node

Re: Is cleanup is required if cluster topology changes

2023-05-05 Thread Bowen Song via user
table. Also, reducing number of vnodes per server can limit the number of servers affected by replacing a single server, therefore reducing the amount of time required to run "nodetool cleanup" if it is run sequentially. Finally, you may choose to run "nodetool cleanup" concur

Re: Is cleanup is required if cluster topology changes

2023-05-05 Thread Runtian Liu
; will not result in a topology change, which means "nodetool cleanup" is not > needed after the operation is completed. > On 05/05/2023 05:24, Jaydeep Chovatia wrote: > > Thanks, Jeff! > But in our environment we replace nodes quite often for various > optimization purp

Re: Is cleanup is required if cluster topology changes

2023-05-05 Thread Bowen Song via user
Have you thought of using "-Dcassandra.replace_address_first_boot=..." (or "-Dcassandra.replace_address=..." if you are using an older version)? This will not result in a topology change, which means "nodetool cleanup" is not needed after the operation is co

RE: Is cleanup is required if cluster topology changes

2023-05-05 Thread Durity, Sean R via user
like 6-8 hours. (And many nodes were done much earlier than that.) I restrict clean-up to one compactionthread, but I double the compactionthroughput for the duration of the cleanup. This protects against two large sstables being compacted at the same time and running out of disk space. Sean

Re: Is cleanup is required if cluster topology changes

2023-05-05 Thread manish khandelwal
You can replace the node directly why to add a node and decommission the another node. Just replace the node with the new node and your topology remains the same so no need to run the cleanup . On Fri, May 5, 2023 at 10:26 AM Jaydeep Chovatia wrote: > We use STCS, and our experience w

Re: Is cleanup is required if cluster topology changes

2023-05-04 Thread Jaydeep Chovatia
We use STCS, and our experience with *cleanup* is that it takes a long time to run in a 100-node cluster. We would like to replace one node every day for various purposes in our fleet. If we run *cleanup* after each node replacement, then it might take, say, 15 days to complete, and that hinders

Re: Is cleanup is required if cluster topology changes

2023-05-04 Thread Jeff Jirsa
You should 100% trigger cleanup each time or you’ll almost certainly resurrect data sooner or laterIf you’re using leveled compaction it’s especially cheap. Stcs and twcs are worse, but if you’re really scaling that often, I’d be considering lcs and running cleanup just before or just after each

Re: Is cleanup is required if cluster topology changes

2023-05-04 Thread Jaydeep Chovatia
run cleanup on 100 nodes after every replacement, then it could take forever. What is the recommendation until we get this fixed in Cassandra itself as part of compaction (w/o externally triggering *cleanup*)? Jaydeep On Thu, May 4, 2023 at 8:14 PM Jeff Jirsa wrote: > Cleanup is fast and ch

Re: Is cleanup is required if cluster topology changes

2023-05-04 Thread Jeff Jirsa
Cleanup is fast and cheap and basically a no-op if you haven’t changed the ring After cassandra has transactional cluster metadata to make ring changes strongly consistent, cassandra should do this in every compaction. But until then it’s left for operators to run when they’re sure the state of

Re: Is cleanup is required if cluster topology changes

2023-05-04 Thread Jaydeep Chovatia
Isn't this considered a kind of *bug* in Cassandra because as we know *cleanup* is a lengthy and unreliable operation, so relying on the *cleanup* means higher chances of data resurrection? Do you think we should discard the unowned token-ranges as part of the regular compaction itself? Wha

Re: Is cleanup is required if cluster topology changes

2023-05-04 Thread guo Maxwell
compact ion will just merge duplicate data and remove delete data in this node .if you add or remove one node for the cluster, I think clean up is needed. if clean up failed, I think we should come to see the reason. Runtian Liu 于2023年5月5日周五 06:37写道: > Hi all, > > Is cleanup the sole

Is cleanup is required if cluster topology changes

2023-05-04 Thread Runtian Liu
Hi all, Is cleanup the sole method to remove data that does not belong to a specific node? In a cluster, where nodes are added or decommissioned from time to time, failure to run cleanup may lead to data resurrection issues, as deleted data may remain on the node that lost ownership of certain

RE: Cleanup

2023-02-17 Thread Durity, Sean R via user
Cleanup, by itself, uses all the compactors available. So, it is important to see if you have the disk space for multiple large cleanup compactions running at the same time. We have a utility to do cleanup more intelligently – it temporarily doubles compaction throughput, operates on a single

Re: Cleanup

2023-02-16 Thread Dipan Shah
it is altered via nodetool, is it altered until manually changed > or service restart, so must be manually put pack? > > > > *From:* Aaron Ploetz > *Sent:* Thursday, February 16, 2023 4:50 PM > *To:* user@cassandra.apache.org > *Subject:* Re: Cleanup > > >

RE: Cleanup

2023-02-16 Thread Marc Hoppins
…and if it is altered via nodetool, is it altered until manually changed or service restart, so must be manually put pack? From: Aaron Ploetz Sent: Thursday, February 16, 2023 4:50 PM To: user@cassandra.apache.org Subject: Re: Cleanup EXTERNAL So if I remember right, setting

Re: Cleanup

2023-02-16 Thread Aaron Ploetz
So if I remember right, setting compaction_throughput_per_mb to zero effectively disables throttling, which means cleanup and compaction will run as fast as the instance will allow. For normal use, I'd recommend capping that at 8 or 16. Aaron On Thu, Feb 16, 2023 at 9:43 AM Marc Hoppins

RE: Cleanup

2023-02-16 Thread Marc Hoppins
Compaction_throughtput_per_mb is 0 in cassandra.yaml. Is setting it in nodetool going to provide any increase? From: Durity, Sean R via user Sent: Thursday, February 16, 2023 4:20 PM To: user@cassandra.apache.org Subject: RE: Cleanup EXTERNAL Clean-up is constrained/throttled by

RE: Cleanup

2023-02-16 Thread Durity, Sean R via user
nodes in a DC at the same time. Think of it as compaction and consider your cluster performance/workload/timelines accordingly. Sean R. Durity From: manish khandelwal Sent: Thursday, February 16, 2023 5:05 AM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: Cleanup There is no advantage of

Re: Cleanup

2023-02-16 Thread manish khandelwal
There is no advantage of running cleanup if no new nodes are introduced. So cleanup time should remain same when adding new nodes. Cleanup is a local to node so network bandwidth should have no effect on reducing cleanup time. Dont ignore cleanup as it can cause you disks occupied without any

Cleanup

2023-02-16 Thread Marc Hoppins
Hulloa all, I read a thing re. adding new nodes where the recommendation was to run cleanup on the nodes after adding a new node to remove redundant token ranges. I timed this way back when we only had ~20G of data per node and it took approx. 5 mins per node. After adding a node on Tuesday

best setup of tombstones cleanup over a "wide" table (was: efficient delete over a "wide" table?)

2020-09-05 Thread Attila Wind
Thank you guys for the answers - I expected this but wanted to verify (who knows how smart Cassandra can be in the background! :-) ) @Jeff: unfortunately the records we will pick up for delete are not necessarily "neighbours" in terms of creation time so forming up contiguous ranges can not be

RE: Determine disc space that will be freed after expansion cleanup

2019-02-25 Thread Kenneth Brotman
@cassandra.apache.org Subject: Determine disc space that will be freed after expansion cleanup Hi Some Cassandra nodes could have rows that are associated with tokens that aren't owned by those nodes anymore as a result of expansion, this data will remain until a cleanup compaction is run. We would like to

Determine disc space that will be freed after expansion cleanup

2019-02-25 Thread Cameron Gandevia
Hi Some Cassandra nodes could have rows that are associated with tokens that aren't owned by those nodes anymore as a result of expansion, this data will remain until a cleanup compaction is run. We would like to know the best way to calculate the amount (or close to) of data th

Re: forgot to run nodetool cleanup

2019-02-14 Thread Oleksandr Shulgin
On Thu, Feb 14, 2019 at 4:39 PM Jeff Jirsa wrote: > > Wait, doesn't cleanup just rewrite every SSTable one by one? Why would compaction strategy matter? Do you mean that after cleanup STCS may pick some resulting tables to re-compact them due to the min/max size difference, which w

Re: forgot to run nodetool cleanup

2019-02-14 Thread Jeff Jirsa
> On Feb 14, 2019, at 12:19 AM, Oleksandr Shulgin > wrote: > >> On Wed, Feb 13, 2019 at 6:47 PM Jeff Jirsa wrote: >> Depending on how bad data resurrection is, you should run it for any host >> that loses a range. In vnodes, that's usually all hosts. >&

Re: forgot to run nodetool cleanup

2019-02-14 Thread shalom sagges
Cleanup is a great way to free up disk space. Just note you might run into https://issues.apache.org/jira/browse/CASSANDRA-9036 if you use a version older than 2.0.15. On Thu, Feb 14, 2019 at 10:20 AM Oleksandr Shulgin < oleksandr.shul...@zalando.de> wrote: > On Wed, Feb 13, 2019 a

Re: forgot to run nodetool cleanup

2019-02-14 Thread Oleksandr Shulgin
On Wed, Feb 13, 2019 at 6:47 PM Jeff Jirsa wrote: > Depending on how bad data resurrection is, you should run it for any host > that loses a range. In vnodes, that's usually all hosts. > > Cleanup with LCS is very cheap. Cleanup with STCS/TWCS is a bit more work. > Wait,

Re: forgot to run nodetool cleanup

2019-02-13 Thread Jeff Jirsa
nother host). > > I also believe, but don’t have time to prove, that enough new hosts can >> eventually give you a range back (moving it all the way around the ring) - >> less likely but probably possible. >> >> Easiest to just assume that any range movement may resurrect

Re: forgot to run nodetool cleanup

2019-02-13 Thread Oleksandr Shulgin
t have time to prove, that enough new hosts can > eventually give you a range back (moving it all the way around the ring) - > less likely but probably possible. > > Easiest to just assume that any range movement may resurrect data if you > haven’t run cleanup. > Does this me

Re: forgot to run nodetool cleanup

2019-02-13 Thread Jeff Jirsa
. Easiest to just assume that any range movement may resurrect data if you haven’t run cleanup. -- Jeff Jirsa > On Feb 13, 2019, at 12:34 AM, Oleksandr Shulgin > wrote: > >> On Wed, Feb 13, 2019 at 5:31 AM Jeff Jirsa wrote: > >> The most likely result of not runnin

Re: forgot to run nodetool cleanup

2019-02-13 Thread Oleksandr Shulgin
On Wed, Feb 13, 2019 at 5:31 AM Jeff Jirsa wrote: > The most likely result of not running cleanup is wasted disk space. > > The second most likely result is resurrecting deleted data if you do a > second range movement (expansion, shrink, etc). > > If this is bad for you, you

Re: forgot to run nodetool cleanup

2019-02-12 Thread Jeff Jirsa
The most likely result of not running cleanup is wasted disk space. The second most likely result is resurrecting deleted data if you do a second range movement (expansion, shrink, etc). If this is bad for you, you should run cleanup now. For many use cases, it’s a nonissue. If you know

forgot to run nodetool cleanup

2019-02-12 Thread onmstester onmstester
Hi, I should have run cleanup after adding a few nodes to my cluster, about 2 months ago, the ttl is 6 month, What happens now? Should i worry about any  catastrophics? Should i run the cleanup now? Thanks in advance Sent using https://www.zoho.com/mail/

Re: Cleanup cluster after expansion?

2018-10-25 Thread Alain RODRIGUEZ
Hello, '*nodetool cleanup*' use to be mono-threaded (up to C*2.1) then used all the cores (C*2.1 - C*2.1.14) and is now something that can be controlled (C*2.1.14+): '*nodetool cleanup -j 2*' for example would use 2 compactors maximum (out of the number of concurrent_co

Re: Cleanup cluster after expansion?

2018-10-22 Thread Jeff Jirsa
added 16 new nodes to our 38-node cluster (now 54 nodes). What > would be the safest and most > efficient way of running a cleanup operation? I’ve experimented with running > cleanup on a single node and > nodetool just hangs, but that seems to be a known issue. > > Would some

Cleanup cluster after expansion?

2018-10-22 Thread Ian Spence
Environment: Cassandra 2.2.9, GNU/Linux CentOS 6 + 7. Two DCs, 3 RACs in DC1 and 6 in DC2. We recently added 16 new nodes to our 38-node cluster (now 54 nodes). What would be the safest and most efficient way of running a cleanup operation? I’ve experimented with running cleanup on a single

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-09 Thread Jonathan Haddad
Your example only really applies if someone is using a 20 node cluster at RF=1, something I've never seen, but I'm sure exists somewhere. Realistically, RF=3 using racks (or AWS regions) and 21 nodes, means you'll have 3 racks with 7 nodes per rack. Adding a single node is an unlikely operation, y

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-08 Thread Oleksandr Shulgin
On Sat, 8 Sep 2018, 19:00 Jeff Jirsa, wrote: > Virtual nodes accomplish two primary goals > > 1) it makes it easier to gradually add/remove capacity to your cluster by > distributing the new host capacity around the ring in smaller increments > > 2) it increases the number of sources for streamin

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-08 Thread onmstester onmstester
Thanks Jeff, You mean that with RF=2, num_tokens = 256 and having less than 256 nodes i should not worry about data distribution? Sent using Zoho Mail On Sat, 08 Sep 2018 21:30:28 +0430 Jeff Jirsa wrote Virtual nodes accomplish two primary goals 1) it makes it easier to gradually add

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-08 Thread Jeff Jirsa
Virtual nodes accomplish two primary goals 1) it makes it easier to gradually add/remove capacity to your cluster by distributing the new host capacity around the ring in smaller increments 2) it increases the number of sources for streaming, which speeds up bootstrap and decommission Whether

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-08 Thread Jonathan Haddad
> I wonder why not setting it to all the way down to 1 then? What's the key difference once you have so few vnodes? 4 tokens lets you have balanced clusters when they're small and imposes very little overhead when they get big. Using multiple tokens let's multiple nodes stream data to the new nod

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-08 Thread Jonathan Haddad
Keep using whatever settings you've been using. I'd still use allocate tokens for keyspace but it probably won't make much of a difference with 256 tokens. On Sat, Sep 8, 2018 at 10:40 AM onmstester onmstester wrote: > Thanks Jon, > But i never concerned about num_tokens config before, because

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-08 Thread Oleksandr Shulgin
On Sat, 8 Sep 2018, 14:47 Jonathan Haddad, wrote: > 256 tokens is a pretty terrible default setting especially post 3.0. I > recommend folks use 4 tokens for new clusters, > I wonder why not setting it to all the way down to 1 then? What's the key difference once you have so few vnodes? with s

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-08 Thread onmstester onmstester
Thanks Jon, But i never concerned about num_tokens config before, because no official cluster setup documents (on datastax:  https://docs.datastax.com/en/cassandra/3.0/cassandra/initialize/initSingleDS.html or other blogs) warned us-beginners to be concerned about it. I always setup my clusters

Re: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-08 Thread Jonathan Haddad
256 tokens is a pretty terrible default setting especially post 3.0. I recommend folks use 4 tokens for new clusters, with some caveats. When you fire up a cluster, there's no way to make the initial tokens be distributed evenly, you'll get random ones. You'll want to set them explicitly using:

RE: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-07 Thread onmstester onmstester
Why not setting default vnodes count to that recommendation in Cassandra installation files?  Sent using Zoho Mail On Tue, 04 Sep 2018 17:35:54 +0430 Durity, Sean R wrote   Longer term, I agree with Oleksandr, the recommendation for number of vnodes is now much smaller than 256. I am

RE: nodetool cleanup - compaction remaining time

2018-09-07 Thread Steinmaurer, Thomas
I have created https://issues.apache.org/jira/browse/CASSANDRA-14701 Please adapt as needed. Thanks! Thomas From: Jeff Jirsa Sent: Donnerstag, 06. September 2018 07:52 To: cassandra Subject: Re: nodetool cleanup - compaction remaining time Probably worth a JIRA (especially if you can repro

RE: nodetool cleanup - compaction remaining time

2018-09-06 Thread Steinmaurer, Thomas
Alain, compaction throughput is set to 32. Regards, Thomas From: Alain RODRIGUEZ Sent: Donnerstag, 06. September 2018 11:50 To: user cassandra.apache.org Subject: Re: nodetool cleanup - compaction remaining time Hello Thomas. Be aware that this behavior happens when the compaction

Re: nodetool cleanup - compaction remaining time

2018-09-06 Thread Alain RODRIGUEZ
ited). I believe the estimate uses the speed >> limit for calculation (which is often very much wrong anyway). >> > > As far as I can remember, if you have unthrottled compaction, then the > message is different: it says "n/a". The all zeroes you usually see when > you on

Re: nodetool cleanup - compaction remaining time

2018-09-06 Thread Oleksandr Shulgin
As far as I can remember, if you have unthrottled compaction, then the message is different: it says "n/a". The all zeroes you usually see when you only have Validation compactions, and apparently Cleanup work the same way, at least in the 2.1 version. https://github.

Re: nodetool cleanup - compaction remaining time

2018-09-06 Thread Alain RODRIGUEZ
Hello Thomas. Be aware that this behavior happens when the compaction throughput is set to *0 *(unthrottled/unlimited). I believe the estimate uses the speed limit for calculation (which is often very much wrong anyway). I just meant to say, you might want to make sure that it's due to cl

Re: nodetool cleanup - compaction remaining time

2018-09-05 Thread Jeff Jirsa
Probably worth a JIRA (especially if you can repro in 3.0 or higher, since 2.1 is critical fixes only) On Wed, Sep 5, 2018 at 10:46 PM Steinmaurer, Thomas < thomas.steinmau...@dynatrace.com> wrote: > Hello, > > > > is it a known issue / limitation that cleanup compactions ar

nodetool cleanup - compaction remaining time

2018-09-05 Thread Steinmaurer, Thomas
Hello, is it a known issue / limitation that cleanup compactions aren't counted in the compaction remaining time? nodetool compactionstats -H pending tasks: 1 compaction type keyspace table completed totalunit progress CleanupXXX

RE: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-04 Thread Durity, Sean R
term, I agree with Oleksandr, the recommendation for number of vnodes is now much smaller than 256. I am using 8 or 16. Sean Durity From: Oleksandr Shulgin Sent: Monday, September 03, 2018 10:02 AM To: User Subject: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens On

Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-03 Thread Oleksandr Shulgin
then > if i add node E to the cluster immediately, the old data on A,B,C would be > also moved between nodes everytime? > Potentially, when you add node E it takes ownership of some of the data that D has. So you have to run cleanup on all (except the very last node you add) in the en

Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-03 Thread onmstester onmstester
would be also moved between nodes everytime? Sent using Zoho Mail On Mon, 03 Sep 2018 14:39:37 +0430  onmstester onmstester wrote Thanks Alex, So you suggest that i should not worry about this:  Failure to run this command (cleanup) after adding a node causes Cassandra to include the old

Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-03 Thread onmstester onmstester
Thanks Alex, So you suggest that i should not worry about this:  Failure to run this command (cleanup) after adding a node causes Cassandra to include the old data to rebalance the load on that node Would you kindly explain a little more? Sent using Zoho Mail It makes a lot of sense to run

Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-03 Thread Oleksandr Shulgin
e, although there is < 200GB on each > node, i will do so. > In the document mentioned that i should run nodetool cleanup after joining > a new node: > *Run* *nodetool cleanup* *on the source node and on neighboring nodes > that shared the same subrange after the new node is up

adding multiple node to a cluster, cleanup and num_tokens

2018-09-03 Thread onmstester onmstester
d that i should run nodetool cleanup after joining a new node:  Run nodetool cleanup on the source node and on neighboring nodes that shared the same subrange after the new node is up and running. Failure to run this command after adding a node causes Cassandra to include the old data to rebal

Re: Why nodetool cleanup should be run sequentially after node joined a cluster

2018-04-11 Thread Alain RODRIGUEZ
I confirm what Christophe said. I always ran them in parallel without any problem, really. Historically it was using only one compactor and impact in my clusters have always been acceptable. Nonetheless, newer Cassandra versions allow multiple compactor to work in parallel during cleanup and

Re: Why nodetool cleanup should be run sequentially after node joined a cluster

2018-04-10 Thread Christophe Schmitz
Hi Mikhail, Nodetool cleanup can add a fair amount of extra load (mostly IO) on your Cassandra nodes. Therefore it is recommended to run it during lower cluster usage, and one node at a time, in order to limit the impact on your cluster. There are no technical limitations that would prevent you

Why nodetool cleanup should be run sequentially after node joined a cluster

2018-04-10 Thread Mikhail Tsaplin
Hi, In https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsAddNodeToCluster.html there is recommendation: 6) After all new nodes are running, run nodetool cleanup <https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsCleanup.html> on each of the previously existing

Re: Please give input or approval of JIRA 14128 so we can continue document cleanup

2018-03-06 Thread Alain RODRIGUEZ
Hello Kenneth, I believe this belongs to the dev list. Please be mindful about this, I think it matters for you as you will get faster answers and for us as we will be reading what we expect to find in each place, and save time. We all know here, as Cassandra users, how much it is important to p

Please give input or approval of JIRA 14128 so we can continue document cleanup

2018-03-04 Thread Kenneth Brotman
Two months ago Kurt Greaves cleaned up the home page of the website which currently has broken links and other issues. We need to get that JIRA rapped up. Further improvements, scores of them are coming. Get ready! Please take time soon to review the patch he submitted. https://issues.apache.

Re: Secondary Index Cleanup

2018-03-02 Thread malte
}, "cells" : [ ] }, ... normally i can find the key as an indexed field, but most of the keys in the dump do no longer exist in the parent CF. these keys are sometimes months old. (we have gc_grace_seconds set to 30 mins) if i use nodetool rebuild_index it does not help,

Re: Secondary Index Cleanup

2018-03-02 Thread Dikang Gu
; : { "tstamp" : "2017-10-30T16:49:37.160361Z" }, > "cells" : [ ] > }, > > ... > > normally i can find the key as an indexed field, but most of the keys in > the dump do no longer exist in the parent CF. > > these keys are someti

Secondary Index Cleanup

2018-03-02 Thread Malte Krüger
ometimes months old. (we have gc_grace_seconds set to 30 mins) if i use nodetool rebuild_index it does not help, but if i drop the index und recreate it size goes down  two several hundred mb! what is the reason the cleanup does not work automatically and how can i fix this? -Malte --

Re: Cleanup blocking snapshots - Options?

2018-01-31 Thread kurt greaves
another try now, and yes, with 2.1.18, this constantly happens. > Currently running nodetool cleanup on a single node in production with > disabled hourly snapshots. SSTables with > 100G involved here. Triggering > nodetool snapshot will result in being blocked. From an operational

RE: Cleanup blocking snapshots - Options?

2018-01-30 Thread Steinmaurer, Thomas
Hi Kurt, had another try now, and yes, with 2.1.18, this constantly happens. Currently running nodetool cleanup on a single node in production with disabled hourly snapshots. SSTables with > 100G involved here. Triggering nodetool snapshot will result in being blocked. From an operatio

Re: Cassandra 3.11 - nodetool cleanup - Compaction interrupted ...

2018-01-22 Thread kurt greaves
It's fine and intended behaviour. Upgradesstables also has the same effect. Basically cleanup operates on all SSTables on a node (for each table) and will cancel any in-progress compactions and instead run cleanup across them, as you can't have two different compactions including the

Cassandra 3.11 - nodetool cleanup - Compaction interrupted ...

2018-01-22 Thread Steinmaurer, Thomas
Hello, when triggering a "nodetool cleanup" with Cassandra 3.11, the nodetool call almost returns instantly and I see the following INFO log. INFO [CompactionExecutor:54] 2018-01-22 12:59:53,903 CompactionManager.java:1777 - Compaction interrupted: Compaction@fc9b0073-1008

Re: Cleanup blocking snapshots - Options?

2018-01-15 Thread Nicolas Guyomar
though On 15 January 2018 at 08:43, Steinmaurer, Thomas < thomas.steinmau...@dynatrace.com> wrote: > Hi Kurt, > > > > it was easily triggered with the mentioned combination (cleanup after > extending the cluster) a few months ago, thus I guess it will be the same > when

RE: Cleanup blocking snapshots - Options?

2018-01-14 Thread Steinmaurer, Thomas
Hi Kurt, it was easily triggered with the mentioned combination (cleanup after extending the cluster) a few months ago, thus I guess it will be the same when I re-try. Due to the issue we simply omitted running cleanup then, but as disk space is becoming some sort of bottle-neck again, we need

Re: Cleanup blocking snapshots - Options?

2018-01-14 Thread kurt greaves
ction and due to ( > https://issues.apache.org/jira/browse/CASSANDRA-11155) we can’t run > cleanup e.g. after extending the cluster without blocking our hourly > snapshots. > > > > What options do we have to get rid of partitions a node does not own > anymore? > > · Using

Cleanup blocking snapshots - Options?

2018-01-14 Thread Steinmaurer, Thomas
Hello, we are running 2.1.18 with vnodes in production and due to (https://issues.apache.org/jira/browse/CASSANDRA-11155) we can't run cleanup e.g. after extending the cluster without blocking our hourly snapshots. What options do we have to get rid of partitions a node does not own an

Re: run cleanup and rebuild simultaneously

2017-12-22 Thread Peng Xiao
Thanks Jeff -- Original -- From: Jeff Jirsa Date: ,12?? 23,2017 09:28 To: user Subject: Re: run cleanup and rebuild simultaneously Should be fine, though it will increase disk usage in dc1 for a while - a reference to the the cleaned up sstables will

Re: run cleanup and rebuild simultaneously

2017-12-22 Thread Jeff Jirsa
iao <2535...@qq.com> wrote: > > Hi there, > Can we run nodetool cleanup in DC1,and run rebuild in DC2 against DC1 > simultaneously? > in C* 2.1.18 > > > Thanks, > Peng Xiao >

run cleanup and rebuild simultaneously

2017-12-22 Thread Peng Xiao
Hi there,Can we run nodetool cleanup in DC1,and run rebuild in DC2 against DC1 simultaneously? in C* 2.1.18 Thanks, Peng Xiao

Re: Safe to run cleanup before repair?

2017-11-12 Thread Joel Samuelsson
ing which nodes own what parts of the data. My concern is if a >> piece of data is now owned by say Node 1 and Node 3 but before the addition >> of new nodes only existed on Node 2 and a cleanup would then delete it >> permanently since Node 2 no longer owns it. Could this ever happen? >> >

Re: Safe to run cleanup before repair?

2017-11-12 Thread Jeff Jirsa
ay Node 1 and Node 3 but before the addition of new >> nodes only existed on Node 2 and a cleanup would then delete it permanently >> since Node 2 no longer owns it. Could this ever happen?

Re: Safe to run cleanup before repair?

2017-11-12 Thread kurt greaves
parts of the data. My concern is if a > piece of data is now owned by say Node 1 and Node 3 but before the addition > of new nodes only existed on Node 2 and a cleanup would then delete it > permanently since Node 2 no longer owns it. Could this ever happen? >

Re: Safe to run cleanup before repair?

2017-11-12 Thread Joel Samuelsson
which nodes own what parts of the data. My concern is if a piece of data is now owned by say Node 1 and Node 3 but before the addition of new nodes only existed on Node 2 and a cleanup would then delete it permanently since Node 2 no longer owns it. Could this ever happen?

Re: Safe to run cleanup before repair?

2017-11-12 Thread Jeff Jirsa
Cleanup, very simply, throws away data no longer owned by the instance because of range movements. Repair only repairs data owned by the instance (it ignores data that would be cleared by cleanup). I don't see any reason why you can't run cleanup before repair. On Sun, Nov 12, 2017

Safe to run cleanup before repair?

2017-11-12 Thread Joel Samuelsson
disk errors. Is it safe to run cleanup before I run the repair or might I lose data because of said incosistencies?

RE: 回复: nodetool cleanup in parallel

2017-09-26 Thread Steinmaurer, Thomas
Side-note: At least with 2.1 (or even later), be aware that you might run into the following issue: https://issues.apache.org/jira/browse/CASSANDRA-11155 We are doing cron―job based hourly snapshots in production and have tried to also run cleanup after extending a cluster from 6 to 9 nodes

?????? nodetool cleanup in parallel

2017-09-26 Thread Peng Xiao
Thanks Kurt. -- -- ??: "kurt";; : 2017??9??27??(??) 11:57 ??: "User"; : Re: nodetool cleanup in parallel correct. you can run it in parallel across many nodes if you have capacity. generally see abou

Re: nodetool cleanup in parallel

2017-09-26 Thread kurt greaves
correct. you can run it in parallel across many nodes if you have capacity. generally see about a 10% CPU increase from cleanups which isn't a big deal if you have the capacity to handle it + the io. on that note on later versions you can specify -j to run multiple cleanup compactions a

nodetool cleanup in parallel

2017-09-26 Thread Peng Xiao
hi, nodetool cleanup will only remove those keys which no longer belong to those nodes,than theoretically we can run nodetool cleanup in parallel,right?the document suggests us to run this one by one,but it's too slow. Thanks, Peng Xiao

RE: Adding nodes and cleanup

2017-06-19 Thread ZAIDI, ASAD A
I think the token ranges that are clean/completed and potentially streamed down to additional node , won’t be cleaned again so potentially you’ll need to run cleanup once again. Can you can stop cleanup, add additional node and start cleanup over again so to get nodes clean in single shot

Adding nodes and cleanup

2017-06-19 Thread Mark Furlong
I have added a few nodes and now am running some cleanups. Can I add an additional node while these cleanups are running? What are the ramifications of doing this? Mark Furlong Sr. Database Administrator mfurl...@ancestry.com M: 801-859-7427 O: 801-705-7115 1300 W

Re: Nodetool cleanup doesn't work

2017-05-11 Thread Jai Bheemsen Rao Dhanwada
017 at 12:53 PM, Jai Bheemsen Rao Dhanwada < > jaibheem...@gmail.com> wrote: > >> Yes I have many keyspaces which are not spread across all the data >> centers(expected by design). >> In this case, is this the expected behavior cleanup will not work for all >> the

Re: Nodetool cleanup doesn't work

2017-05-11 Thread Jeff Jirsa
ces which are not spread across all the data > centers(expected by design). > In this case, is this the expected behavior cleanup will not work for all > the keyspaces(nodetool cleanup)? is it going to be fixed in the latest > versions? > > P.S: Thanks for the tip, I can workaroun

Re: Nodetool cleanup doesn't work

2017-05-11 Thread Jai Bheemsen Rao Dhanwada
Yes I have many keyspaces which are not spread across all the data centers(expected by design). In this case, is this the expected behavior cleanup will not work for all the keyspaces(nodetool cleanup)? is it going to be fixed in the latest versions? P.S: Thanks for the tip, I can workaround this

Re: Nodetool cleanup doesn't work

2017-05-11 Thread Jeff Jirsa
If you didn't explicitly remove a keyspace from one of your datacenters, the next most likely cause is that you have one keyspace that's NOT replicated to one of the datacenters. You can work around this by running 'nodetool cleanup ' on all of your other keyspaces individual

  1   2   3   >