Re: Repair Management

2017-05-18 Thread Cameron Zemek
Here is what I have done so far:
https://github.com/apache/cassandra/compare/trunk...instaclustr:repair_management

> I'm not sure what you mean by "coordinator repair commands". Do you mean
full repairs?

By coordinator repair I meant the repair command from the coordinator node.
That is the repair command from StorageService::repairAsync . Hopefully the
branch above shows what I am mean.





On 19 May 2017 at 03:16, Blake Eggleston  wrote:

> I am looking to improve monitoring and management of repairs (so far I
> have
> patch for adding ActiveRepairs to table/keyspace metrics) and come across
> ActiveRepairServiceMBean but this appears to be limited to incremental
> repairs. Is there a reason for this
>
> The incremental repair stuff was just the first set of jmx controls added
> to ActiveRepairService. ActiveRepairService is involved in all repairs
> though.
>
> I was looking to add something very similar to this nodetool repair_admin
> but it would work on co-ordinator repair commands.
>
>
> I'm not sure what you mean by "coordinator repair commands". Do you mean
> full repairs?
>
> What is the purpose of the current repair_admin? If I wish to add the
> above
> should I rename the MBean to say
> org.apache.cassandra.db:type=IncrementalRepairService and the nodetool
> command to inc_repair_admin ?
>
>
> nodetool help repair_admin says it's purpose is to "list and fail
> incremental repair sessions". However, by failing incremental repair
> sessions, it doesn't mean that it cancels the validation/sync, just that it
> releases the sstables that were involved in the repair back into the
> unrepaired data set. I don't see any reason why you couldn't add this
> functionality to the existing RepairService mbean. That said, before
> getting into mbean names, it's probably best to come up with a plan for
> cancelling validation and sync on each of the replicas involved in a given
> repair. As far as I know (though I may be wrong), that's not currently
> supported.
>
> On May 17, 2017 at 7:36:51 PM, Cameron Zemek (came...@instaclustr.com)
> wrote:
>
> I am looking to improve monitoring and management of repairs (so far I
> have
> patch for adding ActiveRepairs to table/keyspace metrics) and come across
> ActiveRepairServiceMBean but this appears to be limited to incremental
> repairs. Is there a reason for this?
>
> I was looking to add something very similar to this nodetool repair_admin
> but it would work on co-ordinator repair commands.
>
> For example:
> $ nodetool repair_admin --list
> Repair#1 mykeyspace columnFamilies=colfamilya,colfamilyb;
> incremental=True;
> parallelism=parallel progress=5%
>
> $ nodetool repair_admin --terminate 1
> Terminating repair command #1 (19f00c30-1390-11e7-bb50-ffb920a6d70f)
>
> $ nodetool repair_admin --terminate-all # calls
> ssProxy.forceTerminateAllRepairSessions()
> Terminating all repair sessions
> Terminated repair command #2 (64c44230-21aa-11e7-9ede-cd6eb64e3786)
>
> What is the purpose of the current repair_admin? If I wish to add the
> above
> should I rename the MBean to say
> org.apache.cassandra.db:type=IncrementalRepairService and the nodetool
> command to inc_repair_admin ?
>
>


Re: Integrating vendor-specific code and developing plugins

2017-05-18 Thread Jeff Jirsa
On Thu, May 18, 2017 at 10:28 AM, Jeff Jirsa  wrote:

>
>
> On Mon, May 15, 2017 at 5:25 PM, Jeremiah D Jordan <
> jeremiah.jor...@gmail.com> wrote:
>
>>
>>
>> To me testable means that we can run the tests at the very least for
>> every release, but ideally they would be run more often than that.
>> Especially with the push to not release unless the test board is all
>> passing, we should not be releasing features that we don’t have a test
>> board for.  Ideally that means we have it in ASF CI.  If there is someone
>> that can commit to posting results of runs from an outside CI somewhere,
>> then I think that could work as well, but that gets pretty cumbersome if we
>> have to check 10 different CI dashboards at different locations before
>> every release.
>>
>
>
> It turns out there's a ppc64le jenkins slave @ asf, so I've setup
> https://builds.apache.org/view/A-D/view/Cassandra/job/cassandra-devbranch-
> ppc64le-testall/ for testing.
>
> Like our other devbranch-testall builds, it takes a repo+branch as
> parameters, and runs unit tests. While the unit tests aren't passing, this
> platform should now be considered testable.
>
>
(Platform != device, though, the CAPI device obviously isn't there, so the
row cache implementation still doesn't have public testing)


Re: Integrating vendor-specific code and developing plugins

2017-05-18 Thread Michael Kjellman
That’s epic Jeff. Very cool.

Sent from my iPhone

On May 18, 2017, at 10:28 AM, Jeff Jirsa 
> wrote:

On Mon, May 15, 2017 at 5:25 PM, Jeremiah D Jordan <
jeremiah.jor...@gmail.com> wrote:



To me testable means that we can run the tests at the very least for every
release, but ideally they would be run more often than that.  Especially
with the push to not release unless the test board is all passing, we
should not be releasing features that we don’t have a test board for.
Ideally that means we have it in ASF CI.  If there is someone that can
commit to posting results of runs from an outside CI somewhere, then I
think that could work as well, but that gets pretty cumbersome if we have
to check 10 different CI dashboards at different locations before every
release.



It turns out there's a ppc64le jenkins slave @ asf, so I've setup
https://builds.apache.org/view/A-D/view/Cassandra/job/cassandra-devbranch-ppc64le-testall/
for testing.

Like our other devbranch-testall builds, it takes a repo+branch as
parameters, and runs unit tests. While the unit tests aren't passing, this
platform should now be considered testable.


Re: Integrating vendor-specific code and developing plugins

2017-05-18 Thread Jeff Jirsa
On Mon, May 15, 2017 at 5:25 PM, Jeremiah D Jordan <
jeremiah.jor...@gmail.com> wrote:

>
>
> To me testable means that we can run the tests at the very least for every
> release, but ideally they would be run more often than that.  Especially
> with the push to not release unless the test board is all passing, we
> should not be releasing features that we don’t have a test board for.
> Ideally that means we have it in ASF CI.  If there is someone that can
> commit to posting results of runs from an outside CI somewhere, then I
> think that could work as well, but that gets pretty cumbersome if we have
> to check 10 different CI dashboards at different locations before every
> release.
>


It turns out there's a ppc64le jenkins slave @ asf, so I've setup
https://builds.apache.org/view/A-D/view/Cassandra/job/cassandra-devbranch-ppc64le-testall/
for testing.

Like our other devbranch-testall builds, it takes a repo+branch as
parameters, and runs unit tests. While the unit tests aren't passing, this
platform should now be considered testable.


Re: Repair Management

2017-05-18 Thread Blake Eggleston
I am looking to improve monitoring and management of repairs (so far I have 
patch for adding ActiveRepairs to table/keyspace metrics) and come across 
ActiveRepairServiceMBean but this appears to be limited to incremental 
repairs. Is there a reason for this
The incremental repair stuff was just the first set of jmx controls added to 
ActiveRepairService. ActiveRepairService is involved in all repairs though.

I was looking to add something very similar to this nodetool repair_admin 
but it would work on co-ordinator repair commands. 

I'm not sure what you mean by "coordinator repair commands". Do you mean full 
repairs?

What is the purpose of the current repair_admin? If I wish to add the above 
should I rename the MBean to say 
org.apache.cassandra.db:type=IncrementalRepairService and the nodetool 
command to inc_repair_admin ? 

nodetool help repair_admin says it's purpose is to "list and fail incremental 
repair sessions". However, by failing incremental repair sessions, it doesn't 
mean that it cancels the validation/sync, just that it releases the sstables 
that were involved in the repair back into the unrepaired data set. I don't see 
any reason why you couldn't add this functionality to the existing 
RepairService mbean. That said, before getting into mbean names, it's probably 
best to come up with a plan for cancelling validation and sync on each of the 
replicas involved in a given repair. As far as I know (though I may be wrong), 
that's not currently supported.
On May 17, 2017 at 7:36:51 PM, Cameron Zemek (came...@instaclustr.com) wrote:

I am looking to improve monitoring and management of repairs (so far I have  
patch for adding ActiveRepairs to table/keyspace metrics) and come across  
ActiveRepairServiceMBean but this appears to be limited to incremental  
repairs. Is there a reason for this?  

I was looking to add something very similar to this nodetool repair_admin  
but it would work on co-ordinator repair commands.  

For example:  
$ nodetool repair_admin --list  
Repair#1 mykeyspace columnFamilies=colfamilya,colfamilyb; incremental=True;  
parallelism=parallel progress=5%  

$ nodetool repair_admin --terminate 1  
Terminating repair command #1 (19f00c30-1390-11e7-bb50-ffb920a6d70f)  

$ nodetool repair_admin --terminate-all # calls  
ssProxy.forceTerminateAllRepairSessions()  
Terminating all repair sessions  
Terminated repair command #2 (64c44230-21aa-11e7-9ede-cd6eb64e3786)  

What is the purpose of the current repair_admin? If I wish to add the above  
should I rename the MBean to say  
org.apache.cassandra.db:type=IncrementalRepairService and the nodetool  
command to inc_repair_admin ?