Couldn't another reason for doing cleanup sequentially be to avoid data
loss? If data is being streamed from a node during bootstrap and cleanup is
run too soon, couldn't you wind up in a situation with data loss if the new
node being bootstrapped goes down (permanently)?


On Thu, Nov 28, 2013 at 8:59 PM, Aaron Morton <aa...@thelastpickle.com>wrote:

> I hope I get this right :)
>
> Thanks for contributing :)
>
> a repair will trigger a mayor compaction on your node which will take up a
> lot of CPU and IO performance. It needs to do this to build up the data
> structure that is used for the repair. After the compaction this is
> streamed to the different nodes in order to repair them.
>
> It does not trigger a major compaction, that’s what we call running
> compaction on the command line and compacting all SSTables into one big
> one.
>
> it will flush all the data to disk that will create some additional
> compaction.
>
> The major concern is that s a disk IO intensive operation, it reads all
> the data and writes data to new SSTables (a one to one mapping). If you
> have all nodes doing this at the same time there may be some degraded
> performance. And as it’s all nodes it’s not possible for the Dynamic Snitch
> to avoid nodes if they are overloaded.
>
> Cleanup is less intensive than repair, but it’s still a good idea to
> stagger it. If you need to run it on all machines (or you have very
> powerful machines) it’s probably going to be OK.
>
> Hope that helps.
>
> -----------------
> Aaron Morton
> New Zealand
> @aaronmorton
>
> Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> On 26/11/2013, at 5:14 am, Artur Kronenberg <
> artur.kronenb...@openmarket.com> wrote:
>
>  Hi Julien,
>
> I hope I get this right :)
>
> a repair will trigger a mayor compaction on your node which will take up a
> lot of CPU and IO performance. It needs to do this to build up the data
> structure that is used for the repair. After the compaction this is
> streamed to the different nodes in order to repair them.
>
> If you trigger this on every node simultaneously you basically take the
> performance away from your cluster. I would expect cassandra still to
> function, just way slower then before. Triggering it node after node will
> leave your cluster with more resources to handle incoming requests.
>
>
> Cheers,
>
> Artur
> On 25/11/13 15:12, Julien Campan wrote:
>
>   Hi,
>
>  I'm working with Cassandra 1.2.2 and I have a question about nodetool
> cleanup.
>  In the documentation , it's writted " Wait for cleanup to complete on
> one node before doing the next"
>
>  I would like to know, why we can't perform a lot of cleanup in a same
> time ?
>
>
>  Thanks
>
>
>
>
>


-- 

- John

Reply via email to