Re: Yet another repair solution

2019-02-20 Thread Marcus Olsson
A small update here.

There is now a 1.0.0 release available of our repair scheduler
solution.

https://github.com/ericsson/ecchronos

There is a binary distribution available at https://search.maven.org/se
arch?q=g:%22com.ericsson.bss.cassandra.ecchronos%22%20AND%20a:%22ecchro
nos-binary%22 if you want to try it out.

Best Regards
Marcus Olsson

On tis, 2018-09-11 at 13:35 +, Marcus Olsson wrote:
> Sure thing!
> 
> Up until now it has been running in an OSGi environment, so among
> other
> things I'm working towards both OSGi and a standalone application.
> 
> It's designed to be tightly coupled with a single instance, where it
> keeps track of the repair state and performs repair of tables for
> that
> node only.
> The current features include alarms, "pausing repairs", metrics,
> dynamic scheduling and "pluggability" for each of them (as well as
> some
> other components like connection management, lease management, etc).
> 
> The design is based on CASSANDRA-10070 with Cassandra (and LWT) as a
> default backend for the lease management. It utilizes the repair
> history from Cassandra to determine repair state of tables in order
> to
> prioritize and schedule them. This also means that a manual "nodetool
> repair" would be counted towards the repair state of the tables.
> 
> Best Regards
> Marcus Olsson
> 
> On tor, 2018-08-30 at 07:55 -0700, Dinesh Joshi wrote:
> > 
> > In the meanwhile, do you think you could highlight the features of
> > your repair solution / sidecar?
> > 
> > Dinesh
> > 
> > > 
> > > 
> > > On Aug 30, 2018, at 4:57 AM, Marcus Olsson  > > n.
> > > com> wrote:
> > > 
> > > Great to see that there is an interest! As there currently are
> > > some
> > > internal dependencies etc. in place there is still some work to
> > > be
> > > done before we can publish it. I would expect this to take at
> > > least
> > > a few weeks, to try set the correct expectations.
> > > 
> > > Best Regards
> > > Marcus Olsson
> > > 
> > > On tis, 2018-08-28 at 23:18 -0700, Vinay Chella wrote:
> > > I am excited to see that the community is working on solving the
> > > critical
> > > problems in C* operations (e.g., repair, backups etc.,) with
> > > different
> > > solutions. Of course, learnings from these systems are key to
> > > designing the
> > > robust solution which works for everyone.
> > > 
> > > 
> > > Thanks,
> > > Vinay Chella
> > > 
> > > 
> > > On Tue, Aug 28, 2018 at 1:23 PM Roopa  > > al
> > > id<mailto:rtangir...@netflix.com.invalid>>
> > > wrote:
> > > 
> > > 
> > > +1 interested in seeing and understanding another repair
> > > solution.
> > > 
> > > 
> > > On Aug 28, 2018, at 1:03 PM, Joseph Lynch  > > ma
> > > ilto:joe.e.ly...@gmail.com>> wrote:
> > > 
> > > I'm pretty interested in seeing and understanding your solution!
> > > When we
> > > started on CASSANDRA-14346 reading your design documents and plan
> > > you
> > > sketched out in CASSANDRA-10070 were really helpful in improving
> > > our
> > > design. I'm particularly interested in how the Scheduler/Job/Task
> > > APIs
> > > turned out (we're working on something similar internally and
> > > would
> > > love
> > > to
> > > 
> > > compare notes and figure out the best way to implement that kind
> > > of
> > > abstraction)?
> > > 
> > > -Joey
> > > 
> > > 
> > > On Tue, Aug 28, 2018 at 6:34 AM Marcus Olsson <
> > > marcus.ols...@ericsson.com<mailto:marcus.ols...@ericsson.com>>
> > > 
> > > wrote:
> > > 
> > > 
> > > Hi,
> > > 
> > > With the risk of stirring the repair/side-car topic  even further
> > > I'd
> > > just
> > > 
> > > 
> > > like to mention that we have recently gotten approval to
> > > contribute
> > > our
> > > repair management side-car solution.
> > > It's based on the proposal in
> > > https://issues.apache.org/jira/browse/CASSANDRA-10070 as a
> > > standalone
> > > application sitting next to each instance.
> > > With the recent discussions in mind I'd just like to hear the
> > > thoughts
> > > from the community on this before we put in the effort of
> > > bringing
> > > our
> > > solution into open source.
> > > 
> > > Would there be an interest of having yet another repair solution
> > > in
> > > the
> > > discussion?
> > > 
> > > Best Regards
> > > Marcus Olsson
> > > 
> > > -
> > > 
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > to
> > > :dev-unsubscr...@cassandra.apache.org>
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > il
> > > to:dev-h...@cassandra.apache.org>
> > > 
> > > 
> > -
> > 
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Yet another repair solution

2018-09-11 Thread Marcus Olsson
Sure thing!

Up until now it has been running in an OSGi environment, so among other
things I'm working towards both OSGi and a standalone application.

It's designed to be tightly coupled with a single instance, where it
keeps track of the repair state and performs repair of tables for that
node only.
The current features include alarms, "pausing repairs", metrics,
dynamic scheduling and "pluggability" for each of them (as well as some
other components like connection management, lease management, etc).

The design is based on CASSANDRA-10070 with Cassandra (and LWT) as a
default backend for the lease management. It utilizes the repair
history from Cassandra to determine repair state of tables in order to
prioritize and schedule them. This also means that a manual "nodetool
repair" would be counted towards the repair state of the tables.

Best Regards
Marcus Olsson

On tor, 2018-08-30 at 07:55 -0700, Dinesh Joshi wrote:
> In the meanwhile, do you think you could highlight the features of
> your repair solution / sidecar?
> 
> Dinesh
> 
> > 
> > On Aug 30, 2018, at 4:57 AM, Marcus Olsson  > com> wrote:
> > 
> > Great to see that there is an interest! As there currently are some
> > internal dependencies etc. in place there is still some work to be
> > done before we can publish it. I would expect this to take at least
> > a few weeks, to try set the correct expectations.
> > 
> > Best Regards
> > Marcus Olsson
> > 
> > On tis, 2018-08-28 at 23:18 -0700, Vinay Chella wrote:
> > I am excited to see that the community is working on solving the
> > critical
> > problems in C* operations (e.g., repair, backups etc.,) with
> > different
> > solutions. Of course, learnings from these systems are key to
> > designing the
> > robust solution which works for everyone.
> > 
> > 
> > Thanks,
> > Vinay Chella
> > 
> > 
> > On Tue, Aug 28, 2018 at 1:23 PM Roopa  > id<mailto:rtangir...@netflix.com.invalid>>
> > wrote:
> > 
> > 
> > +1 interested in seeing and understanding another repair solution.
> > 
> > 
> > On Aug 28, 2018, at 1:03 PM, Joseph Lynch  > ilto:joe.e.ly...@gmail.com>> wrote:
> > 
> > I'm pretty interested in seeing and understanding your solution!
> > When we
> > started on CASSANDRA-14346 reading your design documents and plan
> > you
> > sketched out in CASSANDRA-10070 were really helpful in improving
> > our
> > design. I'm particularly interested in how the Scheduler/Job/Task
> > APIs
> > turned out (we're working on something similar internally and would
> > love
> > to
> > 
> > compare notes and figure out the best way to implement that kind of
> > abstraction)?
> > 
> > -Joey
> > 
> > 
> > On Tue, Aug 28, 2018 at 6:34 AM Marcus Olsson <
> > marcus.ols...@ericsson.com<mailto:marcus.ols...@ericsson.com>>
> > 
> > wrote:
> > 
> > 
> > Hi,
> > 
> > With the risk of stirring the repair/side-car topic  even further
> > I'd
> > just
> > 
> > 
> > like to mention that we have recently gotten approval to contribute
> > our
> > repair management side-car solution.
> > It's based on the proposal in
> > https://issues.apache.org/jira/browse/CASSANDRA-10070 as a
> > standalone
> > application sitting next to each instance.
> > With the recent discussions in mind I'd just like to hear the
> > thoughts
> > from the community on this before we put in the effort of bringing
> > our
> > solution into open source.
> > 
> > Would there be an interest of having yet another repair solution in
> > the
> > discussion?
> > 
> > Best Regards
> > Marcus Olsson
> > 
> > -
> > 
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > :dev-unsubscr...@cassandra.apache.org>
> > For additional commands, e-mail: dev-h...@cassandra.apache.org > to:dev-h...@cassandra.apache.org>
> > 
> > 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 

Re: Yet another repair solution

2018-08-30 Thread Marcus Olsson
Great to see that there is an interest! As there currently are some internal 
dependencies etc. in place there is still some work to be done before we can 
publish it. I would expect this to take at least a few weeks, to try set the 
correct expectations.

Best Regards
Marcus Olsson

On tis, 2018-08-28 at 23:18 -0700, Vinay Chella wrote:
I am excited to see that the community is working on solving the critical
problems in C* operations (e.g., repair, backups etc.,) with different
solutions. Of course, learnings from these systems are key to designing the
robust solution which works for everyone.


Thanks,
Vinay Chella


On Tue, Aug 28, 2018 at 1:23 PM Roopa 
mailto:rtangir...@netflix.com.invalid>>
wrote:


+1 interested in seeing and understanding another repair solution.


On Aug 28, 2018, at 1:03 PM, Joseph Lynch 
mailto:joe.e.ly...@gmail.com>> wrote:

I'm pretty interested in seeing and understanding your solution! When we
started on CASSANDRA-14346 reading your design documents and plan you
sketched out in CASSANDRA-10070 were really helpful in improving our
design. I'm particularly interested in how the Scheduler/Job/Task APIs
turned out (we're working on something similar internally and would love
to

compare notes and figure out the best way to implement that kind of
abstraction)?

-Joey


On Tue, Aug 28, 2018 at 6:34 AM Marcus Olsson <
marcus.ols...@ericsson.com<mailto:marcus.ols...@ericsson.com>>

wrote:


Hi,

With the risk of stirring the repair/side-car topic  even further I'd
just


like to mention that we have recently gotten approval to contribute our
repair management side-car solution.
It's based on the proposal in
https://issues.apache.org/jira/browse/CASSANDRA-10070 as a standalone
application sitting next to each instance.
With the recent discussions in mind I'd just like to hear the thoughts
from the community on this before we put in the effort of bringing our
solution into open source.

Would there be an interest of having yet another repair solution in the
discussion?

Best Regards
Marcus Olsson

-
To unsubscribe, e-mail: 
dev-unsubscr...@cassandra.apache.org<mailto:dev-unsubscr...@cassandra.apache.org>
For additional commands, e-mail: 
dev-h...@cassandra.apache.org<mailto:dev-h...@cassandra.apache.org>




Yet another repair solution

2018-08-28 Thread Marcus Olsson
Hi,

With the risk of stirring the repair/side-car topic  even further I'd just like 
to mention that we have recently gotten approval to contribute our repair 
management side-car solution.
It's based on the proposal in 
https://issues.apache.org/jira/browse/CASSANDRA-10070 as a standalone 
application sitting next to each instance.
With the recent discussions in mind I'd just like to hear the thoughts from the 
community on this before we put in the effort of bringing our solution into 
open source.

Would there be an interest of having yet another repair solution in the 
discussion?

Best Regards
Marcus Olsson


Re: JIRAs in Review

2018-08-22 Thread Marcus Olsson
Hi,

Hopefully it's not too late to add to this list.

https://issues.apache.org/jira/browse/CASSANDRA-14096 could use some
reviewing. This issue occurs if you have multiple tables in a keyspace
and perform a full repair. Then you could end up storing alot of
MerkleTrees in memory until the repair is completed.

The applied patch is for 3.0.x, but before creating patches for 4.0 I
thought it might be best to agree on the way forward (two solution
proposals have been provided).

Best Regards
Marcus Olsson

On fre, 2018-07-20 at 15:42 +1000, kurt greaves wrote:
> Cheers Dinesh, not too worried about that one in 4.0's case though as
> it's
> a bug but it will need a committer.
> 
> As for improvements for 4.0:
> https://issues.apache.org/jira/browse/CASSANDRA-13010 - More verbose
> nodetool compactionstats. Speaks for itself.
> https://issues.apache.org/jira/browse/CASSANDRA-10023 - Metrics for
> number
> of local reads/writes. For detecting when you're choosing wrong
> coordinators. Useful for those of us without access to clients.
> https://issues.apache.org/jira/browse/CASSANDRA-10789 - nodetool
> blacklist
> command to stop bad clients. Would be great for the sysadmin toolbox.
> 
> https://issues.apache.org/jira/browse/CASSANDRA-13841 - Smarter
> nodetool
> rebuild. Kind of a bug but would be nice to get it in 4.0 *at least*.
> (I
> probably need to rebase this)
> https://issues.apache.org/jira/browse/CASSANDRA-14309 - Hint window
> persistence. Would be nice to get some thoughts on this.
> https://issues.apache.org/jira/browse/CASSANDRA-12783 - Batchlog
> refactor
> to better support MV's. Been meaning to get back to this one, but
> it's
> pretty much there except needs rebase and a bit more testing. Someone
> else
> to go over it and see if it makes sense would be useful.
> 
> May have traction? but worth keeping an eye on.
> https://issues.apache.org/jira/browse/CASSANDRA-14291 - Nodetool
> command to
> regenerate SSTable components. Mostly important for efficient
> summary/bloomfilter regeneration which doesn't exist apart from using
> upgradesstables. Other than that it's effectively upgradesstables but
> with
> a cleaner interface. Chris has started looking at this but would
> probably
> be nice to make sure it gets in before 4.0 seeing as we have no way
> to
> regenerate bloomfilter/summary without re-writing the entire SSTable
> ATM.
> 
> Other than that hoping to get
> https://issues.apache.org/jira/browse/CASSANDRA-10540 (RangeAwareCS)
> in. On
> Markus' plate ATM but I'm fairly sure its been decently reviewed.
> 
> On 19 July 2018 at 10:07, dinesh.jo...@yahoo.com.INVALID <
> dinesh.jo...@yahoo.com.invalid> wrote:
> 
> > 
> > Kurt was looking at some help with this ticket -
> > https://issues.apache.org/jira/browse/CASSANDRA-14525
> > Dinesh
> > 
> > On Tuesday, July 17, 2018, 12:35:25 PM PDT, sankalp kohli <
> > kohlisank...@gmail.com> wrote:
> > 
> >  Hi,
> > We are 7 weeks away from 4.0 freeze and there are ~150 JIRAs
> > waiting
> > for review. It is hard to know which ones should be prioritized as
> > some of
> > them could be not valid(fixes 2.0 bug), some of them will have the
> > assignee
> > who no longer is active, etc.
> > 
> > If anyone is *not* getting traction on the JIRA to get it reviewed,
> > please
> > use this thread to send your JIRA number and optionally why it is
> > important.
> > 
> > Thanks,
> > Sankalp
> > 
> > 
-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: STCS in L0 behaviour

2016-12-02 Thread Marcus Olsson

Hi,

In reply to Dikang Gu:
For the run where we incorporated the change from CASSANDRA-11571 the 
stack trace was like this (from JMC):

*Stack Trace*   *Sample Count*  *Percentage(%)*
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(int) 
	229 	11.983
-org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates() 
	228 	11.931
--org.apache.cassandra.db.compaction.LeveledManifest.getCandidatesFor(int) 
	221 	11.565
---org.apache.cassandra.db.compaction.LeveledManifest.overlappingWithBounds(SSTableReader, 
Map) 	201 	10.518
org.apache.cassandra.db.compaction.LeveledManifest.overlappingWithBounds(Token, 
Token, Map) 	201 	10.518

-org.apache.cassandra.dht.Bounds.intersects(Bounds) 141 7.378
-java.util.HashSet.add(Object)  56  2.93


This is for one of the compaction executors during an interval of 1 
minute and 24 seconds, but we saw similar behavior for other compaction 
threads as well. The full flight recording was 10 minutes and was 
started at the same time as the repair. The interval was taken from the 
end of the recording where the number of sstables had increased. During 
this interval this compaction thread used ~10% of the total CPU.


I agree that optimally there shouldn't be many sstables in L0 and except 
for when repair is running we don't have that many.


---

In reply to Jeff Jirsa/Nate McCall:
I might have been unclear about the compaction order in my first email, 
I meant to say that there is a check for STCS right before L1+, but only 
if a L1+ compaction is possible. We used version 2.2.7 for the test run 
so https://issues.apache.org/jira/browse/CASSANDRA-10979 should be 
included and have reduced some of the backlog of L0.


Correct me if I'm wrong but my interpretation of the scenario that 
Sylvain describes in 
https://issues.apache.org/jira/browse/CASSANDRA-5371 is when you either 
almost constantly have 32+ SSTables in L0 or are close to it. My guess 
is that this could be applied to having constant load during a certain 
timespan as well. So when you get more than 32 sstables you start to do 
STCS which in turn creates larger sstables which might span the whole of 
L1. Then when these sstables should be promoted to L1 it re-writes the 
whole L1 which creates a larger backlog in L0. So then the number of 
sstables keeps rising and trigger a STCS again, and complete the circle. 
Based on this interpretation it seems to me that if the write pattern 
into L0 is "random" this might happen regardless if a STCS compaction 
has occurred or not.


If my interpretation is correct it might be better to choose a higher 
number of sstables before STCS starts in L0 and make it configurable. 
With a reduced complexity it could be something like this:

1. Perform STCS in L0 if we have above X(1000?) sstables in L0.
2. Check L1+
3. Check for L0->L1

It should be possible to keep the current logic as well and only add a 
configurable check before (step 1) to avoid the overlapping check with 
larger backlogs. Another alternative might be 
https://issues.apache.org/jira/browse/CASSANDRA-7409 and allow 
overlapping sstables in more levels than L0. If it can quickly push 
sorted data to L1 it might remove the need for STCS in LCS. The 
previously mentioned potential cost of the overlapping check would still 
be there if we have a large backlog, but the approach might reduce the 
risk of getting into the situation. I'll try to get some time to run a 
test with CASSANDRA-7409 in our test cluster.


BR
Marcus O

On 11/28/2016 06:48 PM, Eric Evans wrote:

On Sat, Nov 26, 2016 at 6:30 PM, Dikang Gu  wrote:

Hi Marcus,

Do you have some stack trace to show that which function in the `
getNextBackgroundTask` is most expensive?

Yeah, I think having 15-20K sstables in L0 is very bad, in our heavy-write
cluster, I try my best to reduce the impact of repair, and keep number of
sstables in L0 < 100.

Thanks
Dikang.

On Thu, Nov 24, 2016 at 12:53 PM, Nate McCall  wrote:


The reason is described here:

https://issues.apache.org/jira/browse/CASSANDRA-5371?
focusedCommentId=13621679=com.atlassian.jira.
plugin.system.issuetabpanels:comment-tabpanel#comment-13621679

/Marcus

"...a lot of the work you've done you will redo when you compact your now
bigger L0 sstable against L1."

^ Sylvain's hypothesis (next comment down) is actually something we see
occasionally in practice: having to re-write the contents of L1 too often
when large L0 SSTables are pulled in. Here is an example we took on a
system with pending compaction spikes that was seeing this specific issue
with four LCS-based tables:

https://gist.github.com/zznate/d22812551fa7a527d4c0d931f107c950

The significant part of this particular workload is a burst of heavy writes
from long-duration scheduled jobs.



--
Dikang






STCS in L0 behaviour

2016-11-23 Thread Marcus Olsson

Hi everyone,

TL;DR
Should LCS be changed to always prefer an STCS compaction in L0 if it's 
falling behind? Assuming that STCS in L0 is enabled.
Currently LCS seems to check if there is a possible L0->L1 compaction 
before checking if it's falling behind, which in our case used between 
15-30% of the compaction thread CPU.

TL;DR

So first some background:
We have a Apache Cassandra 2.2 cluster running with a high load. In that 
cluster there is a table with a moderate amount of writes per second 
that is using LeveledCompactionStrategy. The test was to run repair on 
that table while we monitored the cluster through JMC and with Flight 
Recordings enabled. This resulted in a large amount of sstables for that 
table, which I assume others have experienced as well. In this case I 
think it was between 15-20k.


From the Flight Recording one thing we saw was that 15-30% of the CPU 
time in each of the compaction threads was spent on 
"getNextBackgroundTask()" which retrieves the next compaction job. With 
some further investigation this seems to mostly be when it's checking 
for overlap in L0 sstables before performing an L0->L1 compaction. There 
is a JIRA which seems to be related to this 
https://issues.apache.org/jira/browse/CASSANDRA-11571 which we 
backported to 2.2 and tested. In our testing it seemed to improve the 
situation but it was still using noticeable CPU.


My interpretation of the current logic of LCS is (if STCS in L0 is enabled):
1. Check each level (L1+)
 - If a L1+ compaction is needed check if L0 is behind and do STCS if 
that's the case, otherwise do the L1+ compaction.
2. Check L0 -> L1 compactions and if none is needed/possible check for 
STCS in L0.


My proposal is to change this behavior to always check if L0 is far 
behind first and do a STCS compaction in that case. This would avoid the 
overlap check for L0 -> L1 compactions when L0 is behind and I think it 
makes sense since we already prefer STCS to L1+ compactions. This would 
not solve the repair situation, but it would lower some of the impact 
that repair has on LCS.


For what version this could get in I think trunk would be enough since 
compaction is pluggable.



--
Fwd: Footer

Ericsson <http://www.ericsson.com/>

*MARCUS OLSSON *
Software Developer

*Ericsson*
Sweden
marcus.ols...@ericsson.com <mailto:marcus.ols...@ericsson.com>
www.ericsson.com <http://www.ericsson.com>



Re: Automatic scheduling execution of repair

2015-08-14 Thread Marcus Olsson

Added a JIRA for this where it can be discussed further:
https://issues.apache.org/jira/browse/CASSANDRA-10070

BR
Marcus Olsson

On 08/13/2015 05:15 PM, Jonathan Ellis wrote:

Now that we have LWT I think it could be self-coordinated.

On Thu, Aug 13, 2015 at 6:45 AM, Marcus Olsson marcus.ols...@ericsson.com
wrote:


Hi,

Scheduling and running repairs in a Cassandra cluster is most often a
required task, but this can both be hard for new users and it also requires
a bit of manual configuration. There are good tools out there that can be
used to simplify things, but wouldn't this be a good feature to have inside
of Cassandra? To automatically schedule and run repairs, so that when you
start up your cluster it basically maintains itself in terms of normal
anti-entropy, with the possibility for manual configuration.

BR
Marcus Olsson








Automatic scheduling execution of repair

2015-08-13 Thread Marcus Olsson

Hi,

Scheduling and running repairs in a Cassandra cluster is most often a 
required task, but this can both be hard for new users and it also 
requires a bit of manual configuration. There are good tools out there 
that can be used to simplify things, but wouldn't this be a good feature 
to have inside of Cassandra? To automatically schedule and run repairs, 
so that when you start up your cluster it basically maintains itself in 
terms of normal anti-entropy, with the possibility for manual configuration.


BR
Marcus Olsson


Re: Problem with upgrade to 2.1.3

2015-02-20 Thread Marcus Olsson
We encountered the same problem, I put in a comment on the JIRA that 
caused it:


https://issues.apache.org/jira/browse/CASSANDRA-8677

On 02/20/2015 09:38 AM, Jan Kesten wrote:

Hi,
I put this into an JIRA issue:CASSANDRA-8839 
https://issues.apache.org/jira/browse/CASSANDRA-8839