Re: cassandra repair takes ages

2018-04-23 Thread Nuno Cervaens - Hoist Group - Portugal
Hi Carlos,

Ok thanks for the feedback and the url, its pretty clear now.

cheers,
nuno

On Dom, 2018-04-22 at 16:13 +0100, Carlos Rolo wrote:
> Hello,
> 
> I just stated that if you use QUORUM or in fact using ALL, since
> you're running ONE, this is a non-issue.
> 
> Regarding incremental repairs you can read here: http://thelastpickle
> .com/blog/2017/12/14/should-you-use-incremental-repair.html
> 
> You can't run repair -pr simultaneously. You can try to use a tool
> like Reaper to better manage and schedule repairs, but I doubt it
> will speed up a lot.
> 
> Regards,
> 
> Carlos Juzarte Rolo
> Cassandra Consultant / Datastax Certified Architect / Cassandra MVP
>  
> Pythian - Love your data
> 
> rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
> linkedin.com/in/carlosjuzarterolo 
> Mobile: +351 918 918 100 
> www.pythian.com
> 
> On Sun, Apr 22, 2018 at 11:39 AM, Nuno Cervaens - Hoist Group -
> Portugal <nuno.cerva...@hoistgroup.com> wrote:
> > Hi Carlos,
> > 
> > Thanks for the reply.
> > Isnt the consistency level defined per session? All my session,
> > being for read or write as defaulted to ONE.
> > 
> > Movid to SSD is for sure an obvious improvement but not possible at
> > the moment.
> > My goal is to really spend the lowest time possible on running a
> > repair throughout all the nodes.
> > Are there any more downsides to run nodetool repair -pr
> > simultaneously on each node, besides the cpu and mem overload?
> > Also if someone can clarify about the safety of an incremental
> > repair.
> > 
> > thanks,
> > nuno
> > From: Carlos Rolo <r...@pythian.com>
> > Sent: Friday, April 20, 2018 4:55:21 PM
> > To: user@cassandra.apache.org
> > Subject: Re: cassandra repair takes ages
> >  
> > Changing the datadrives to SSD would help to speed up the repairs.
> > 
> > Also don't run 3 node, RF2. That makes Quorum = All. 
> > 
> > Regards,
> > 
> > Carlos Juzarte Rolo
> > Cassandra Consultant / Datastax Certified Architect / Cassandra MVP
> >  
> > Pythian - Love your data
> > 
> > rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
> > linkedin.com/in/carlosjuzarterolo 
> > Mobile: +351 918 918 100 
> > www.pythian.com
> > 
> > On Fri, Apr 20, 2018 at 4:42 PM, Nuno Cervaens - Hoist Group -
> > Portugal <nuno.cerva...@hoistgroup.com> wrote:
> > Hello,
> > 
> > I have a 3 node cluster with RF 2 and using STCS. I use SSDs for
> > commitlogs and HDDs for data. Apache Cassandra version is 3.11.2.
> > I basically have a huge keyspace ('newts' from opennms) and a big
> > keyspace ('opspanel'). Here's a summary of the 'du' output for one
> > node (which is more or less the same for each node):
> > 
> > 51G
> > ./data/opspanel
> > 776G
> > ./data/newts/samples-00ae9420ea0711e5a39bbd7839a19930
> > 776G
> > ./data/newts
> > 
> > My issue is that running a 'nodetool repair -pr' takes one day an a
> > half per node and as I want to store daily snapshots (for the past
> > 7 days), I dont see how I can do this as repairs take too long.
> > For example I see huge compactions and validations that take lots
> > of hours (compactionstats taken at different times):
> > 
> > id   compaction type keyspace
> > table   completedtotalunit  progress
> > 7125eb20-446b-11e8-a57d-f36e88375e31
> > Compaction  newtssamples 294177987449 835153786347 bytes
> > 35,22% 
> > 
> > id   compaction
> > type keyspace
> > table   completedtotalunit  progress
> > 6aa5ce51-4425-11e8-a7c1-572dede7e4d6 Anticompaction after repair
> > newtssamples 581839334815 599408876344 bytes 97,07%  
> > 
> > id   compaction type keyspace
> > table   completedtotalunit  progress
> > 69976700-43e2-11e8-a7c1-572dede7e4d6
> > Validation  newtssamples 63249761990  826302170493 bytes
> > 7,65%   
> > 69973ff0-43e2-11e8-a7c1-572dede7e4d6
> > Validation  newtssamples 102513762816 826302170600 bytes
> > 12,41%
> > 
> > Is there something I can do to improve the situation?
> > 
> > Also, is an incremental repair (apparently nodetool's default)
> > safe? As I see in the datastax documentation that the incremental
> > should not be used, only the full. Can you please clarify?
> > 
> > Thanks for the feedback.
> > Nuno
> > 
> > 
> > --
> > 
> > 
> > 
> 
> --
> 
> 

Re: cassandra repair takes ages

2018-04-22 Thread Carlos Rolo
Hello,

I just stated that if you use QUORUM or in fact using ALL, since you're
running ONE, this is a non-issue.

Regarding incremental repairs you can read here:
http://thelastpickle.com/blog/2017/12/14/should-you-use-incremental-repair.html

You can't run repair -pr simultaneously. You can try to use a tool like
Reaper to better manage and schedule repairs, but I doubt it will speed up
a lot.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant / Datastax Certified Architect / Cassandra MVP

Pythian - Love your data

rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
*linkedin.com/in/carlosjuzarterolo
<http://linkedin.com/in/carlosjuzarterolo>*
Mobile: +351 918 918 100
www.pythian.com

On Sun, Apr 22, 2018 at 11:39 AM, Nuno Cervaens - Hoist Group - Portugal <
nuno.cerva...@hoistgroup.com> wrote:

> Hi Carlos,
>
>
> Thanks for the reply.
>
> Isnt the consistency level defined per session? All my session, being for
> read or write as defaulted to ONE.
>
>
> Movid to SSD is for sure an obvious improvement but not possible at the
> moment.
>
> My goal is to really spend the lowest time possible on running a repair
> throughout all the nodes.
>
> Are there any more downsides to run nodetool repair -pr simultaneously on
> each node, besides the cpu and mem overload?
>
> Also if someone can clarify about the safety of an incremental repair.
>
>
> thanks,
>
> nuno
> --
> *From:* Carlos Rolo <r...@pythian.com>
> *Sent:* Friday, April 20, 2018 4:55:21 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: cassandra repair takes ages
>
> Changing the datadrives to SSD would help to speed up the repairs.
>
> Also don't run 3 node, RF2. That makes Quorum = All.
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant / Datastax Certified Architect / Cassandra MVP
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
> *linkedin.com/in/carlosjuzarterolo
> <http://linkedin.com/in/carlosjuzarterolo>*
> Mobile: +351 918 918 100
> www.pythian.com
>
> On Fri, Apr 20, 2018 at 4:42 PM, Nuno Cervaens - Hoist Group - Portugal <
> nuno.cerva...@hoistgroup.com> wrote:
>
> Hello,
>
> I have a 3 node cluster with RF 2 and using STCS. I use SSDs for
> commitlogs and HDDs for data. Apache Cassandra version is 3.11.2.
> I basically have a huge keyspace ('newts' from opennms) and a big keyspace
> ('opspanel'). Here's a summary of the 'du' output for one node (which is
> more or less the same for each node):
>
> 51G ./data/opspanel
> 776G ./data/newts/samples-00ae9420ea0711e5a39bbd7839a19930
> 776G ./data/newts
>
> My issue is that running a 'nodetool repair -pr' takes one day an a half
> per node and as I want to store daily snapshots (for the past 7 days), I
> dont see how I can do this as repairs take too long.
> For example I see huge compactions and validations that take lots of hours
> (compactionstats taken at different times):
>
> id   compaction type keyspace
> table   completedtotalunit  progress
> 7125eb20-446b-11e8-a57d-f36e88375e31 Compaction  newtssamples
> 294177987449 835153786347 bytes 35,22%
>
> id   compaction type keyspace
> table   completedtotalunit  progress
> 6aa5ce51-4425-11e8-a7c1-572dede7e4d6 Anticompaction after repair
> newtssamples 581839334815 599408876344 bytes 97,07%
>
> id   compaction type keyspace
> table   completedtotalunit  progress
> 69976700-43e2-11e8-a7c1-572dede7e4d6 Validation  newtssamples
> 63249761990  826302170493 bytes 7,65%
> 69973ff0-43e2-11e8-a7c1-572dede7e4d6 Validation  newtssamples
> 102513762816 826302170600 bytes 12,41%
>
> Is there something I can do to improve the situation?
>
> Also, is an incremental repair (apparently nodetool's default) safe? As I
> see in the datastax documentation that the incremental should not be used,
> only the full. Can you please clarify?
>
> Thanks for the feedback.
> Nuno
>
>
>
> --
>
>
>
>

-- 


--







Re: cassandra repair takes ages

2018-04-22 Thread Nuno Cervaens - Hoist Group - Portugal
Hi Carlos,


Thanks for the reply.

Isnt the consistency level defined per session? All my session, being for read 
or write as defaulted to ONE.


Movid to SSD is for sure an obvious improvement but not possible at the moment.

My goal is to really spend the lowest time possible on running a repair 
throughout all the nodes.

Are there any more downsides to run nodetool repair -pr simultaneously on each 
node, besides the cpu and mem overload?

Also if someone can clarify about the safety of an incremental repair.


thanks,

nuno


From: Carlos Rolo <r...@pythian.com>
Sent: Friday, April 20, 2018 4:55:21 PM
To: user@cassandra.apache.org
Subject: Re: cassandra repair takes ages

Changing the datadrives to SSD would help to speed up the repairs.

Also don't run 3 node, RF2. That makes Quorum = All.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant / Datastax Certified Architect / Cassandra MVP

Pythian - Love your data

rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin: 
linkedin.com/in/carlosjuzarterolo
<http://linkedin.com/in/carlosjuzarterolo>
Mobile: +351 918 918 100
www.pythian.com<http://www.pythian.com/>

On Fri, Apr 20, 2018 at 4:42 PM, Nuno Cervaens - Hoist Group - Portugal 
<nuno.cerva...@hoistgroup.com<mailto:nuno.cerva...@hoistgroup.com>> wrote:
Hello,

I have a 3 node cluster with RF 2 and using STCS. I use SSDs for commitlogs and 
HDDs for data. Apache Cassandra version is 3.11.2.
I basically have a huge keyspace ('newts' from opennms) and a big keyspace 
('opspanel'). Here's a summary of the 'du' output for one node (which is more 
or less the same for each node):

51G ./data/opspanel
776G ./data/newts/samples-00ae9420ea0711e5a39bbd7839a19930
776G ./data/newts

My issue is that running a 'nodetool repair -pr' takes one day an a half per 
node and as I want to store daily snapshots (for the past 7 days), I dont see 
how I can do this as repairs take too long.
For example I see huge compactions and validations that take lots of hours 
(compactionstats taken at different times):

id   compaction type keyspace table   completed 
   totalunit  progress
7125eb20-446b-11e8-a57d-f36e88375e31 Compaction  newtssamples 
294177987449 835153786347 bytes 35,22%

id   compaction type keyspace table 
  completedtotalunit  progress
6aa5ce51-4425-11e8-a7c1-572dede7e4d6 Anticompaction after repair newts
samples 581839334815 599408876344 bytes 97,07%

id   compaction type keyspace table   completed 
   totalunit  progress
69976700-43e2-11e8-a7c1-572dede7e4d6 Validation  newtssamples 
63249761990  826302170493 bytes 7,65%
69973ff0-43e2-11e8-a7c1-572dede7e4d6 Validation  newtssamples 
102513762816 826302170600 bytes 12,41%

Is there something I can do to improve the situation?

Also, is an incremental repair (apparently nodetool's default) safe? As I see 
in the datastax documentation that the incremental should not be used, only the 
full. Can you please clarify?

Thanks for the feedback.
Nuno



--




Re: cassandra repair takes ages

2018-04-20 Thread Carlos Rolo
Changing the datadrives to SSD would help to speed up the repairs.

Also don't run 3 node, RF2. That makes Quorum = All.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant / Datastax Certified Architect / Cassandra MVP

Pythian - Love your data

rolo@pythian | Twitter: @cjrolo | Skype: cjr2k3 | Linkedin:
*linkedin.com/in/carlosjuzarterolo
*
Mobile: +351 918 918 100
www.pythian.com

On Fri, Apr 20, 2018 at 4:42 PM, Nuno Cervaens - Hoist Group - Portugal <
nuno.cerva...@hoistgroup.com> wrote:

> Hello,
>
> I have a 3 node cluster with RF 2 and using STCS. I use SSDs for
> commitlogs and HDDs for data. Apache Cassandra version is 3.11.2.
> I basically have a huge keyspace ('newts' from opennms) and a big keyspace
> ('opspanel'). Here's a summary of the 'du' output for one node (which is
> more or less the same for each node):
>
> 51G ./data/opspanel
> 776G ./data/newts/samples-00ae9420ea0711e5a39bbd7839a19930
> 776G ./data/newts
>
> My issue is that running a 'nodetool repair -pr' takes one day an a half
> per node and as I want to store daily snapshots (for the past 7 days), I
> dont see how I can do this as repairs take too long.
> For example I see huge compactions and validations that take lots of hours
> (compactionstats taken at different times):
>
> id   compaction type keyspace
> table   completedtotalunit  progress
> 7125eb20-446b-11e8-a57d-f36e88375e31 Compaction  newtssamples
> 294177987449 835153786347 bytes 35,22%
>
> id   compaction type keyspace
> table   completedtotalunit  progress
> 6aa5ce51-4425-11e8-a7c1-572dede7e4d6 Anticompaction after repair
> newtssamples 581839334815 599408876344 bytes 97,07%
>
> id   compaction type keyspace
> table   completedtotalunit  progress
> 69976700-43e2-11e8-a7c1-572dede7e4d6 Validation  newtssamples
> 63249761990  826302170493 bytes 7,65%
> 69973ff0-43e2-11e8-a7c1-572dede7e4d6 Validation  newtssamples
> 102513762816 826302170600 bytes 12,41%
>
> Is there something I can do to improve the situation?
>
> Also, is an incremental repair (apparently nodetool's default) safe? As I
> see in the datastax documentation that the incremental should not be used,
> only the full. Can you please clarify?
>
> Thanks for the feedback.
> Nuno
>

-- 


--







cassandra repair takes ages

2018-04-20 Thread Nuno Cervaens - Hoist Group - Portugal
Hello,

I have a 3 node cluster with RF 2 and using STCS. I use SSDs for commitlogs and 
HDDs for data. Apache Cassandra version is 3.11.2.
I basically have a huge keyspace ('newts' from opennms) and a big keyspace 
('opspanel'). Here's a summary of the 'du' output for one node (which is more 
or less the same for each node):

51G ./data/opspanel
776G ./data/newts/samples-00ae9420ea0711e5a39bbd7839a19930
776G ./data/newts

My issue is that running a 'nodetool repair -pr' takes one day an a half per 
node and as I want to store daily snapshots (for the past 7 days), I dont see 
how I can do this as repairs take too long.
For example I see huge compactions and validations that take lots of hours 
(compactionstats taken at different times):

id   compaction type keyspace table   completed 
   totalunit  progress
7125eb20-446b-11e8-a57d-f36e88375e31 Compaction  newtssamples 
294177987449 835153786347 bytes 35,22%

id   compaction type keyspace table 
  completedtotalunit  progress
6aa5ce51-4425-11e8-a7c1-572dede7e4d6 Anticompaction after repair newts
samples 581839334815 599408876344 bytes 97,07%

id   compaction type keyspace table   completed 
   totalunit  progress
69976700-43e2-11e8-a7c1-572dede7e4d6 Validation  newtssamples 
63249761990  826302170493 bytes 7,65%
69973ff0-43e2-11e8-a7c1-572dede7e4d6 Validation  newtssamples 
102513762816 826302170600 bytes 12,41%

Is there something I can do to improve the situation?

Also, is an incremental repair (apparently nodetool's default) safe? As I see 
in the datastax documentation that the incremental should not be used, only the 
full. Can you please clarify?

Thanks for the feedback.
Nuno