[jira] [Comment Edited] (CASSANDRA-12510) Disallow decommission when number of replicas will drop below configured RF

2017-01-25 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838217#comment-15838217
 ] 

Nick Bailey edited comment on CASSANDRA-12510 at 1/25/17 5:40 PM:
--

Removing the no argument version of decommission here was a breaking API change 
that we should have done in a major release (which tick tock makes weird but 
still). This is more of an informational comment than anything because I'm not 
sure it's worth fixing now, but just a reminder to keep an eye out for breaking 
JMX changes.

*edit: should have been done in a major release and probably marked as 
deprecated first.


was (Author: nickmbailey):
Removing the no argument version of decommission here was a breaking API change 
that we should have done in a major release (which tick tock makes weird but 
still). This is more of an informational comment than anything because I'm not 
sure it's worth fixing now, but just a reminder to keep an eye out for breaking 
JMX changes.

> Disallow decommission when number of replicas will drop below configured RF
> ---
>
> Key: CASSANDRA-12510
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12510
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Streaming and Messaging
> Environment: C* version 3.3
>Reporter: Atin Sood
>Assignee: Kurt Greaves
>Priority: Minor
>  Labels: lhf
> Fix For: 3.12
>
> Attachments: 12510-3.x.patch, 12510-3.x-v2.patch
>
>
> Steps to replicate :
> - Create a 3 node cluster in DC1 and create a keyspace test_keyspace with 
> table test_table with replication strategy NetworkTopologyStrategy , DC1=3 . 
> Populate some data into this table.
> - Add 5 more nodes to this cluster, but in DC2. Also do not alter the 
> keyspace to add the new DC2 to replication (this is intentional and the 
> reason why the bug shows up). So the desc keyspace should still list 
> NetworkTopologyStrategy with DC1=3 as RF
> - As expected, this will now be a 8 node cluster with 3 nodes in DC1 and 5 in 
> DC2
> - Now start decommissioning the nodes in DC1. Note that the decommission runs 
> fine on all the 3 nodes, but since the new nodes are in DC2 and the RF for 
> keyspace is restricted to DC1, the new 5 nodes won't get any data.
> - You will now end with the 5 node cluster which has no data from the 
> decommissioned 3 nodes and hence ending up in data loss
> I do understand that this problem could have been avoided if we perform an 
> alter stmt and add DC2 replication before adding the 5 nodes. But the fact 
> that decommission ran fine on the 3 nodes on DC1 without complaining that 
> there were no nodes to stream its data seems a little discomforting. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12510) Disallow decommission when number of replicas will drop below configured RF

2017-01-25 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838217#comment-15838217
 ] 

Nick Bailey commented on CASSANDRA-12510:
-

Removing the no argument version of decommission here was a breaking API change 
that we should have done in a major release (which tick tock makes weird but 
still). This is more of an informational comment than anything because I'm not 
sure it's worth fixing now, but just a reminder to keep an eye out for breaking 
JMX changes.

> Disallow decommission when number of replicas will drop below configured RF
> ---
>
> Key: CASSANDRA-12510
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12510
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Streaming and Messaging
> Environment: C* version 3.3
>Reporter: Atin Sood
>Assignee: Kurt Greaves
>Priority: Minor
>  Labels: lhf
> Fix For: 3.12
>
> Attachments: 12510-3.x.patch, 12510-3.x-v2.patch
>
>
> Steps to replicate :
> - Create a 3 node cluster in DC1 and create a keyspace test_keyspace with 
> table test_table with replication strategy NetworkTopologyStrategy , DC1=3 . 
> Populate some data into this table.
> - Add 5 more nodes to this cluster, but in DC2. Also do not alter the 
> keyspace to add the new DC2 to replication (this is intentional and the 
> reason why the bug shows up). So the desc keyspace should still list 
> NetworkTopologyStrategy with DC1=3 as RF
> - As expected, this will now be a 8 node cluster with 3 nodes in DC1 and 5 in 
> DC2
> - Now start decommissioning the nodes in DC1. Note that the decommission runs 
> fine on all the 3 nodes, but since the new nodes are in DC2 and the RF for 
> keyspace is restricted to DC1, the new 5 nodes won't get any data.
> - You will now end with the 5 node cluster which has no data from the 
> decommissioned 3 nodes and hence ending up in data loss
> I do understand that this problem could have been avoided if we perform an 
> alter stmt and add DC2 replication before adding the 5 nodes. But the fact 
> that decommission ran fine on the 3 nodes on DC1 without complaining that 
> there were no nodes to stream its data seems a little discomforting. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-12053) ONE != LOCAL_ONE for SimpleStrategy

2016-06-21 Thread Nick Bailey (JIRA)
Nick Bailey created CASSANDRA-12053:
---

 Summary: ONE != LOCAL_ONE for SimpleStrategy
 Key: CASSANDRA-12053
 URL: https://issues.apache.org/jira/browse/CASSANDRA-12053
 Project: Cassandra
  Issue Type: Bug
Reporter: Nick Bailey


Currently our consistency level code doesn't account for SimpleStrategy if you 
are using a topology enabled snitch. In a 2 dc cluster using GPFS and a 
keyspace using SimpleStrategy, you can get UnavailableException when all nodes 
are up simply based on which datacenter you query while using LOCAL_ONE or 
LOCAL_QUORUM consistency levels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11824) If repair fails no way to run repair again

2016-05-18 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289068#comment-15289068
 ] 

Nick Bailey commented on CASSANDRA-11824:
-

Hmm. Yeah this could be the cause of  CASSANDRA-11728. I remember seeing 
dropped message warnings in the logs during that test which could be a similar 
situation to turning off gossip.

> If repair fails no way to run repair again
> --
>
> Key: CASSANDRA-11824
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11824
> Project: Cassandra
>  Issue Type: Bug
>Reporter: T Jake Luciani
>Assignee: Marcus Eriksson
>  Labels: fallout
> Fix For: 3.0.x
>
>
> I have a test that disables gossip and runs repair at the same time. 
> {quote}
> WARN  [RMI TCP Connection(15)-54.67.121.105] 2016-05-17 16:57:21,775 
> StorageService.java:384 - Stopping gossip by operator request
> INFO  [RMI TCP Connection(15)-54.67.121.105] 2016-05-17 16:57:21,775 
> Gossiper.java:1463 - Announcing shutdown
> INFO  [RMI TCP Connection(15)-54.67.121.105] 2016-05-17 16:57:21,776 
> StorageService.java:1999 - Node /172.31.31.1 state jump to shutdown
> INFO  [HANDSHAKE-/172.31.17.32] 2016-05-17 16:57:21,895 
> OutboundTcpConnection.java:514 - Handshaking version with /172.31.17.32
> INFO  [HANDSHAKE-/172.31.24.76] 2016-05-17 16:57:21,895 
> OutboundTcpConnection.java:514 - Handshaking version with /172.31.24.76
> INFO  [Thread-25] 2016-05-17 16:57:21,925 RepairRunnable.java:125 - Starting 
> repair command #1, repairing keyspace keyspace1 with repair options 
> (parallelism: parallel, primary range: false, incremental: true, job threads: 
> 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 3)
> INFO  [Thread-26] 2016-05-17 16:57:21,953 RepairRunnable.java:125 - Starting 
> repair command #2, repairing keyspace stresscql with repair options 
> (parallelism: parallel, primary range: false, incremental: true, job threads: 
> 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 3)
> INFO  [Thread-27] 2016-05-17 16:57:21,967 RepairRunnable.java:125 - Starting 
> repair command #3, repairing keyspace system_traces with repair options 
> (parallelism: parallel, primary range: false, incremental: true, job threads: 
> 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 2)
> {quote}
> This ends up failing:
> {quote}
> 16:54:44.844 INFO  serverGroup-node-1-574 - STDOUT: [2016-05-17 16:57:21,933] 
> Starting repair command #1, repairing keyspace keyspace1 with repair options 
> (parallelism: parallel, primary range: false, incremental: true, job threads: 
> 1, ColumnFamilies: [], dataCenters: [], hosts: [], # of ranges: 3)
> [2016-05-17 16:57:21,943] Did not get positive replies from all endpoints. 
> List of failed endpoint(s): [172.31.24.76, 172.31.17.32]
> [2016-05-17 16:57:21,945] null
> {quote}
> Subsequent calls to repair with all nodes up still fails:
> {quote}
> ERROR [ValidationExecutor:3] 2016-05-17 18:58:53,460 
> CompactionManager.java:1193 - Cannot start multiple repair sessions over the 
> same sstables
> ERROR [ValidationExecutor:3] 2016-05-17 18:58:53,460 Validator.java:261 - 
> Failed creating a merkle tree for [repair 
> #66425f10-1c61-11e6-83b2-0b1fff7a067d on keyspace1/standard1, 
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11728) Incremental repair fails with vnodes+lcs+multi-dc

2016-05-06 Thread Nick Bailey (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Bailey updated CASSANDRA-11728:

Reproduced In: 2.1.12

> Incremental repair fails with vnodes+lcs+multi-dc
> -
>
> Key: CASSANDRA-11728
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11728
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nick Bailey
>
> Produced on 2.1.12
> We are seeing incremental repair fail with an error regarding creating 
> multiple repair sessions on overlapping sstables. This is happening in the 
> following setup
> * 6 nodes
> * 2 Datacenters
> * Vnodes enabled
> * Leveled compaction on the relevant tables
> When STCS is used instead, we don't hit an issue. This is slightly related to 
> https://issues.apache.org/jira/browse/CASSANDRA-11461, except in this case 
> OpsCenter repair service is running all repairs sequentially. Let me know 
> what other information we can provide. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11728) Incremental repair fails with vnodes+lcs+multi-dc

2016-05-06 Thread Nick Bailey (JIRA)
Nick Bailey created CASSANDRA-11728:
---

 Summary: Incremental repair fails with vnodes+lcs+multi-dc
 Key: CASSANDRA-11728
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11728
 Project: Cassandra
  Issue Type: Bug
Reporter: Nick Bailey


Produced on 2.1.12

We are seeing incremental repair fail with an error regarding creating multiple 
repair sessions on overlapping sstables. This is happening in the following 
setup

* 6 nodes
* 2 Datacenters
* Vnodes enabled
* Leveled compaction on the relevant tables

When STCS is used instead, we don't hit an issue. This is slightly related to 
https://issues.apache.org/jira/browse/CASSANDRA-11461, except in this case 
OpsCenter repair service is running all repairs sequentially. Let me know what 
other information we can provide. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8377) Coordinated Commitlog Replay

2016-05-05 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272648#comment-15272648
 ] 

Nick Bailey commented on CASSANDRA-8377:


Not sure if Chris has had time to do anything with this lately but I don't 
think we should close this ticket if thats what you mean.

> Coordinated Commitlog Replay
> 
>
> Key: CASSANDRA-8377
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8377
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Nick Bailey
>Assignee: Chris Lohfink
> Fix For: 3.x
>
> Attachments: CASSANDRA-8377.txt
>
>
> Commit log archiving and replay can be used to support point in time restores 
> on a cluster. Unfortunately, at the moment that is only true when the 
> topology of the cluster is exactly the same as when the commitlogs were 
> archived. This is because commitlogs need to be replayed on a node that is a 
> replica for those writes.
> To support replaying commitlogs when the topology has changed we should have 
> a tool that replays the writes in a commitlog as if they were writes from a 
> client and will get coordinated to the correct replicas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10091) Integrated JMX authn & authz

2016-04-27 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260458#comment-15260458
 ] 

Nick Bailey commented on CASSANDRA-10091:
-

I'm curious how this would behave during a bootstrap operation with auth 
enabled. Would jmx be unavailable until the relevant auth data had been 
streamed to the system_auth keyspace?

> Integrated JMX authn & authz
> 
>
> Key: CASSANDRA-10091
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10091
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jan Karlsson
>Assignee: Sam Tunnicliffe
>Priority: Minor
> Fix For: 3.x
>
>
> It would be useful to authenticate with JMX through Cassandra's internal 
> authentication. This would reduce the overhead of keeping passwords in files 
> on the machine and would consolidate passwords to one location. It would also 
> allow the possibility to handle JMX permissions in Cassandra.
> It could be done by creating our own JMX server and setting custom classes 
> for the authenticator and authorizer. We could then add some parameters where 
> the user could specify what authenticator and authorizer to use in case they 
> want to make their own.
> This could also be done by creating a premain method which creates a jmx 
> server. This would give us the feature without changing the Cassandra code 
> itself. However I believe this would be a good feature to have in Cassandra.
> I am currently working on a solution which creates a JMX server and uses a 
> custom authenticator and authorizer. It is currently build as a premain, 
> however it would be great if we could put this in Cassandra instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-3486) Node Tool command to stop repair

2016-04-27 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260388#comment-15260388
 ] 

Nick Bailey commented on CASSANDRA-3486:


bq. Do you think a blocking + timeout approach would be preferable?

Maybe. My goal in asking would be to know if the repair needs to be canceled on 
other nodes or not. Right now you need to either just run the abort on all 
nodes from the start or run it on the coordinator then check the participants 
to double check that it succeeded there as well.

bq. I personally think we should go this route of making repair more stateful

I agree, especially with the upcoming coordinated repairs in C*

> Node Tool command to stop repair
> 
>
> Key: CASSANDRA-3486
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3486
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
> Environment: JVM
>Reporter: Vijay
>Assignee: Paulo Motta
>Priority: Minor
>  Labels: repair
> Fix For: 2.1.x
>
> Attachments: 0001-stop-repair-3583.patch
>
>
> After CASSANDRA-1740, If the validation compaction is stopped then the repair 
> will hang. This ticket will allow users to kill the original repair.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-3486) Node Tool command to stop repair

2016-04-26 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258764#comment-15258764
 ] 

Nick Bailey commented on CASSANDRA-3486:


Some questions:

* If the abort is initiated on the coordinator can we return the 
success/failure of the attempt to abort on the participants as well? And vice 
versa?
* Similarly for the list of results when aborting all jobs.
* Can we make sure we are testing the case where for whatever reason a 
coordinator or participant receives an abort for a repair it doesn't know about?
* Since we are now tracking repairs by uuid like this, can we expose a progress 
API outside of the jmx notification process? An mbean for retrieving the 
progress/status of a repair job by uuid?


> Node Tool command to stop repair
> 
>
> Key: CASSANDRA-3486
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3486
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
> Environment: JVM
>Reporter: Vijay
>Assignee: Paulo Motta
>Priority: Minor
>  Labels: repair
> Fix For: 2.1.x
>
> Attachments: 0001-stop-repair-3583.patch
>
>
> After CASSANDRA-1740, If the validation compaction is stopped then the repair 
> will hang. This ticket will allow users to kill the original repair.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7190) Add schema to snapshot manifest

2016-04-13 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239926#comment-15239926
 ] 

Nick Bailey commented on CASSANDRA-7190:


Well the schema attached to the last sstable is only "right" if the an sstable 
has been flushed since the last schema change. Really the point is that this 
ticket and 9587 are geared towards very different audiences, so we should treat 
them differently.

> Add schema to snapshot manifest
> ---
>
> Key: CASSANDRA-7190
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7190
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
>Priority: Minor
>  Labels: lhf
>
> followup from CASSANDRA-6326



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11430) Add legacy notifications backward-support on deprecated repair methods

2016-04-07 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15230454#comment-15230454
 ] 

Nick Bailey commented on CASSANDRA-11430:
-

Yeah a quick test with nodetool from 2.1 appears to be working with this patch.

> Add legacy notifications backward-support on deprecated repair methods
> --
>
> Key: CASSANDRA-11430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nick Bailey
>Assignee: Paulo Motta
> Fix For: 3.x
>
>
> forceRepairRangeAsync is deprecated in 2.2/3.x series. It's still available 
> for older clients though. Unfortunately it sometimes hangs when you call it. 
> It looks like it completes fine but the notification to the client that the 
> operation is done is never sent. This is easiest to see by using nodetool 
> from 2.1 against a 3.x cluster.
> {noformat}
> [Nicks-MacBook-Pro:16:06:21 cassandra-2.1] cassandra$ ./bin/nodetool repair 
> -st 0 -et 1 OpsCenter
> [2016-03-24 16:06:50,165] Nothing to repair for keyspace 'OpsCenter'
> [Nicks-MacBook-Pro:16:06:50 cassandra-2.1] cassandra$
> [Nicks-MacBook-Pro:16:06:55 cassandra-2.1] cassandra$
> [Nicks-MacBook-Pro:16:06:55 cassandra-2.1] cassandra$ ./bin/nodetool repair 
> -st 0 -et 1 system_distributed
> ...
> ...
> {noformat}
> (I added the ellipses)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11461) Failed incremental repairs never cleared from pending list

2016-04-01 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15222072#comment-15222072
 ] 

Nick Bailey commented on CASSANDRA-11461:
-

Hmm ok so we should prevent full incremental repairs from ever running on more 
than one node and never allow subrange incremental repairs?

> Failed incremental repairs never cleared from pending list
> --
>
> Key: CASSANDRA-11461
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11461
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Adam Hattrell
>
> Set up a test cluster with 2 DC's, heavy use of LCS (not sure if that's 
> relevant).
> Kick off cassandra-stress against it.
> Kick of an automated incremental repair cycle.  
> After a bit a node starts flapping which causes a few repairs to fail.  This 
> is never cleared out of pending repairs - given the keyspace is replicated to 
> all nodes it means they all have pending repairs that will never complete.  
> Repairs  are basically blocked at this point.
> Given we're using Incremental repairs you're now spammed with:
> "Cannot start multiple repair sessions over the same sstables"
> Cluster and logs are still available for review - message me for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11461) Failed incremental repairs never cleared from pending list

2016-03-31 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220014#comment-15220014
 ] 

Nick Bailey commented on CASSANDRA-11461:
-

We would need to reproduce the issue in OpsCenter with debug logging enabled to 
see how many parallel repairs OpsCenter is attempting. But I could easily see 
how you would hit this issue with vnodes enabled. We don't have any logic to 
prevent that from happening. 

> Failed incremental repairs never cleared from pending list
> --
>
> Key: CASSANDRA-11461
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11461
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Adam Hattrell
>
> Set up a test cluster with 2 DC's, heavy use of LCS (not sure if that's 
> relevant).
> Kick off cassandra-stress against it.
> Kick of an automated incremental repair cycle.  
> After a bit a node starts flapping which causes a few repairs to fail.  This 
> is never cleared out of pending repairs - given the keyspace is replicated to 
> all nodes it means they all have pending repairs that will never complete.  
> Repairs  are basically blocked at this point.
> Given we're using Incremental repairs you're now spammed with:
> "Cannot start multiple repair sessions over the same sstables"
> Cluster and logs are still available for review - message me for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11461) Failed incremental repairs never cleared from pending list

2016-03-31 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219907#comment-15219907
 ] 

Nick Bailey commented on CASSANDRA-11461:
-

Well that depends. It starts by doing everything synchronously and tries to 
calculate throughput. Based on the throughput it calculates it may try to run 
things in parallel if it think it's required to complete but it prefers to run 
a single repair at a time. I'm not 100% certain, but I believe in the case 
where this was scene, OpsCenter was not running anything in parallel.

> Failed incremental repairs never cleared from pending list
> --
>
> Key: CASSANDRA-11461
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11461
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Adam Hattrell
>
> Set up a test cluster with 2 DC's, heavy use of LCS (not sure if that's 
> relevant).
> Kick off cassandra-stress against it.
> Kick of an automated incremental repair cycle.  
> After a bit a node starts flapping which causes a few repairs to fail.  This 
> is never cleared out of pending repairs - given the keyspace is replicated to 
> all nodes it means they all have pending repairs that will never complete.  
> Repairs  are basically blocked at this point.
> Given we're using Incremental repairs you're now spammed with:
> "Cannot start multiple repair sessions over the same sstables"
> Cluster and logs are still available for review - message me for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11461) Failed incremental repairs never cleared from pending list

2016-03-30 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219041#comment-15219041
 ] 

Nick Bailey commented on CASSANDRA-11461:
-

Yeah. So OpsCenter lets you configure some tables for incremental repair and 
some for normal subrange repair, which is what was happening in this case. So 
OpsCenter is doing:

* Break up the ring into small chunks for subrange repair
* Visit a node and repair a small range for all tables that are using subrange 
repair
* If any tables are configured for incremental repair, run an incremental 
repair on those tables
** By default this would do a full incremental repair on those tables, which is 
what was in use when this bug was hit
* Jump across the ring to a different node and repeat the above process.

It does all this in a single datacenter, since opscenter does cross dc repair.

That's at least the very high level overview.

> Failed incremental repairs never cleared from pending list
> --
>
> Key: CASSANDRA-11461
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11461
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Adam Hattrell
>
> Set up a test cluster with 2 DC's, heavy use of LCS (not sure if that's 
> relevant).
> Kick off cassandra-stress against it.
> Kick of an automated incremental repair cycle.  
> After a bit a node starts flapping which causes a few repairs to fail.  This 
> is never cleared out of pending repairs - given the keyspace is replicated to 
> all nodes it means they all have pending repairs that will never complete.  
> Repairs  are basically blocked at this point.
> Given we're using Incremental repairs you're now spammed with:
> "Cannot start multiple repair sessions over the same sstables"
> Cluster and logs are still available for review - message me for details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11430) forceRepairRangeAsync hangs sometimes

2016-03-29 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216715#comment-15216715
 ] 

Nick Bailey commented on CASSANDRA-11430:
-

This is happening because the progress notification mechanism has completely 
changed. The old method signatures were left in place but this is fairly 
pointless since old clients won't be able to understand the new progress/status 
reporting mechanism.

> forceRepairRangeAsync hangs sometimes
> -
>
> Key: CASSANDRA-11430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nick Bailey
> Fix For: 3.x
>
>
> forceRepairRangeAsync is deprecated in 2.2/3.x series. It's still available 
> for older clients though. Unfortunately it sometimes hangs when you call it. 
> It looks like it completes fine but the notification to the client that the 
> operation is done is never sent. This is easiest to see by using nodetool 
> from 2.1 against a 3.x cluster.
> {noformat}
> [Nicks-MacBook-Pro:16:06:21 cassandra-2.1] cassandra$ ./bin/nodetool repair 
> -st 0 -et 1 OpsCenter
> [2016-03-24 16:06:50,165] Nothing to repair for keyspace 'OpsCenter'
> [Nicks-MacBook-Pro:16:06:50 cassandra-2.1] cassandra$
> [Nicks-MacBook-Pro:16:06:55 cassandra-2.1] cassandra$
> [Nicks-MacBook-Pro:16:06:55 cassandra-2.1] cassandra$ ./bin/nodetool repair 
> -st 0 -et 1 system_distributed
> ...
> ...
> {noformat}
> (I added the ellipses)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-11430) forceRepairRangeAsync hangs sometimes

2016-03-29 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216715#comment-15216715
 ] 

Nick Bailey edited comment on CASSANDRA-11430 at 3/29/16 7:38 PM:
--

This is happening because the progress notification mechanism has completely 
changed. The old method signatures were left in place but this is fairly 
pointless since old clients won't be able to understand the new progress/status 
reporting mechanism.

Anyone have any idea on how much work it would be to pull back in the old 
progress reporting mechanism for the old method signatures? I'm guessing quite 
a bit.


was (Author: nickmbailey):
This is happening because the progress notification mechanism has completely 
changed. The old method signatures were left in place but this is fairly 
pointless since old clients won't be able to understand the new progress/status 
reporting mechanism.

> forceRepairRangeAsync hangs sometimes
> -
>
> Key: CASSANDRA-11430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nick Bailey
> Fix For: 3.x
>
>
> forceRepairRangeAsync is deprecated in 2.2/3.x series. It's still available 
> for older clients though. Unfortunately it sometimes hangs when you call it. 
> It looks like it completes fine but the notification to the client that the 
> operation is done is never sent. This is easiest to see by using nodetool 
> from 2.1 against a 3.x cluster.
> {noformat}
> [Nicks-MacBook-Pro:16:06:21 cassandra-2.1] cassandra$ ./bin/nodetool repair 
> -st 0 -et 1 OpsCenter
> [2016-03-24 16:06:50,165] Nothing to repair for keyspace 'OpsCenter'
> [Nicks-MacBook-Pro:16:06:50 cassandra-2.1] cassandra$
> [Nicks-MacBook-Pro:16:06:55 cassandra-2.1] cassandra$
> [Nicks-MacBook-Pro:16:06:55 cassandra-2.1] cassandra$ ./bin/nodetool repair 
> -st 0 -et 1 system_distributed
> ...
> ...
> {noformat}
> (I added the ellipses)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11430) forceRepairRangeAsync hangs sometimes

2016-03-29 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216569#comment-15216569
 ] 

Nick Bailey commented on CASSANDRA-11430:
-

I was able to reproduce this with non system_distributed keyspaces.

> forceRepairRangeAsync hangs sometimes
> -
>
> Key: CASSANDRA-11430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nick Bailey
> Fix For: 3.x
>
>
> forceRepairRangeAsync is deprecated in 2.2/3.x series. It's still available 
> for older clients though. Unfortunately it sometimes hangs when you call it. 
> It looks like it completes fine but the notification to the client that the 
> operation is done is never sent. This is easiest to see by using nodetool 
> from 2.1 against a 3.x cluster.
> {noformat}
> [Nicks-MacBook-Pro:16:06:21 cassandra-2.1] cassandra$ ./bin/nodetool repair 
> -st 0 -et 1 OpsCenter
> [2016-03-24 16:06:50,165] Nothing to repair for keyspace 'OpsCenter'
> [Nicks-MacBook-Pro:16:06:50 cassandra-2.1] cassandra$
> [Nicks-MacBook-Pro:16:06:55 cassandra-2.1] cassandra$
> [Nicks-MacBook-Pro:16:06:55 cassandra-2.1] cassandra$ ./bin/nodetool repair 
> -st 0 -et 1 system_distributed
> ...
> ...
> {noformat}
> (I added the ellipses)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11430) forceRepairRangeAsync hangs sometimes

2016-03-29 Thread Nick Bailey (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Bailey updated CASSANDRA-11430:

Description: 
forceRepairRangeAsync is deprecated in 2.2/3.x series. It's still available for 
older clients though. Unfortunately it sometimes hangs when you call it. It 
looks like it completes fine but the notification to the client that the 
operation is done is never sent. This is easiest to see by using nodetool from 
2.1 against a 3.x cluster.

{noformat}
[Nicks-MacBook-Pro:16:06:21 cassandra-2.1] cassandra$ ./bin/nodetool repair -st 
0 -et 1 OpsCenter
[2016-03-24 16:06:50,165] Nothing to repair for keyspace 'OpsCenter'
[Nicks-MacBook-Pro:16:06:50 cassandra-2.1] cassandra$
[Nicks-MacBook-Pro:16:06:55 cassandra-2.1] cassandra$
[Nicks-MacBook-Pro:16:06:55 cassandra-2.1] cassandra$ ./bin/nodetool repair -st 
0 -et 1 system_distributed
...
...
{noformat}

(I added the ellipses)



  was:
forceRepairRangeAsync is deprecated in 2.2/3.x series. It's still available for 
older clients though. Unfortunately it hangs when you call it with the 
system_distributed table. It looks like it completes fine but the notification 
to the client that the operation is done is never sent. This is easiest to see 
by using nodetool from 2.1 against a 3.x cluster.

{noformat}
[Nicks-MacBook-Pro:16:06:21 cassandra-2.1] cassandra$ ./bin/nodetool repair -st 
0 -et 1 OpsCenter
[2016-03-24 16:06:50,165] Nothing to repair for keyspace 'OpsCenter'
[Nicks-MacBook-Pro:16:06:50 cassandra-2.1] cassandra$
[Nicks-MacBook-Pro:16:06:55 cassandra-2.1] cassandra$
[Nicks-MacBook-Pro:16:06:55 cassandra-2.1] cassandra$ ./bin/nodetool repair -st 
0 -et 1 system_distributed
...
...
{noformat}

(I added the ellipses)




> forceRepairRangeAsync hangs sometimes
> -
>
> Key: CASSANDRA-11430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nick Bailey
> Fix For: 3.x
>
>
> forceRepairRangeAsync is deprecated in 2.2/3.x series. It's still available 
> for older clients though. Unfortunately it sometimes hangs when you call it. 
> It looks like it completes fine but the notification to the client that the 
> operation is done is never sent. This is easiest to see by using nodetool 
> from 2.1 against a 3.x cluster.
> {noformat}
> [Nicks-MacBook-Pro:16:06:21 cassandra-2.1] cassandra$ ./bin/nodetool repair 
> -st 0 -et 1 OpsCenter
> [2016-03-24 16:06:50,165] Nothing to repair for keyspace 'OpsCenter'
> [Nicks-MacBook-Pro:16:06:50 cassandra-2.1] cassandra$
> [Nicks-MacBook-Pro:16:06:55 cassandra-2.1] cassandra$
> [Nicks-MacBook-Pro:16:06:55 cassandra-2.1] cassandra$ ./bin/nodetool repair 
> -st 0 -et 1 system_distributed
> ...
> ...
> {noformat}
> (I added the ellipses)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11430) forceRepairRangeAsync hangs sometimes

2016-03-29 Thread Nick Bailey (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Bailey updated CASSANDRA-11430:

Summary: forceRepairRangeAsync hangs sometimes  (was: forceRepairRangeAsync 
hangs on system_distributed keyspace.)

> forceRepairRangeAsync hangs sometimes
> -
>
> Key: CASSANDRA-11430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nick Bailey
> Fix For: 3.x
>
>
> forceRepairRangeAsync is deprecated in 2.2/3.x series. It's still available 
> for older clients though. Unfortunately it hangs when you call it with the 
> system_distributed table. It looks like it completes fine but the 
> notification to the client that the operation is done is never sent. This is 
> easiest to see by using nodetool from 2.1 against a 3.x cluster.
> {noformat}
> [Nicks-MacBook-Pro:16:06:21 cassandra-2.1] cassandra$ ./bin/nodetool repair 
> -st 0 -et 1 OpsCenter
> [2016-03-24 16:06:50,165] Nothing to repair for keyspace 'OpsCenter'
> [Nicks-MacBook-Pro:16:06:50 cassandra-2.1] cassandra$
> [Nicks-MacBook-Pro:16:06:55 cassandra-2.1] cassandra$
> [Nicks-MacBook-Pro:16:06:55 cassandra-2.1] cassandra$ ./bin/nodetool repair 
> -st 0 -et 1 system_distributed
> ...
> ...
> {noformat}
> (I added the ellipses)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11430) forceRepairRangeAsync hangs on system_distributed keyspace.

2016-03-24 Thread Nick Bailey (JIRA)
Nick Bailey created CASSANDRA-11430:
---

 Summary: forceRepairRangeAsync hangs on system_distributed 
keyspace.
 Key: CASSANDRA-11430
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11430
 Project: Cassandra
  Issue Type: Bug
Reporter: Nick Bailey
 Fix For: 3.x


forceRepairRangeAsync is deprecated in 2.2/3.x series. It's still available for 
older clients though. Unfortunately it hangs when you call it with the 
system_distributed table. It looks like it completes fine but the notification 
to the client that the operation is done is never sent. This is easiest to see 
by using nodetool from 2.1 against a 3.x cluster.

{noformat}
[Nicks-MacBook-Pro:16:06:21 cassandra-2.1] cassandra$ ./bin/nodetool repair -st 
0 -et 1 OpsCenter
[2016-03-24 16:06:50,165] Nothing to repair for keyspace 'OpsCenter'
[Nicks-MacBook-Pro:16:06:50 cassandra-2.1] cassandra$
[Nicks-MacBook-Pro:16:06:55 cassandra-2.1] cassandra$
[Nicks-MacBook-Pro:16:06:55 cassandra-2.1] cassandra$ ./bin/nodetool repair -st 
0 -et 1 system_distributed
...
...
{noformat}

(I added the ellipses)





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7190) Add schema to snapshot manifest

2016-03-23 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209016#comment-15209016
 ] 

Nick Bailey commented on CASSANDRA-7190:


[~iamaleksey] I don't think CASSANDRA-9587 completely duplicates this. The 
schema we care about here is the latest schema at the time of the snapshot, not 
the schema associated with every sstable in the snapshot. The schema can change 
from sstable to sstable and the changes may be forward compatible but not 
backward compatible.

> Add schema to snapshot manifest
> ---
>
> Key: CASSANDRA-7190
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7190
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
>Priority: Minor
>
> followup from CASSANDRA-6326



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11239) Deprecated repair methods cause NPE

2016-02-25 Thread Nick Bailey (JIRA)
Nick Bailey created CASSANDRA-11239:
---

 Summary: Deprecated repair methods cause NPE
 Key: CASSANDRA-11239
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11239
 Project: Cassandra
  Issue Type: Bug
Reporter: Nick Bailey
Assignee: Nick Bailey
 Fix For: 3.0.4, 3.4
 Attachments: 0001-Don-t-NPE-when-using-forceRepairRangeAsync.patch

The deprecated repair methods cause an NPE if you aren't doing local repairs. 
Attaching patch to fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11181) Add broadcast_rpc_address to system.local

2016-02-17 Thread Nick Bailey (JIRA)
Nick Bailey created CASSANDRA-11181:
---

 Summary: Add broadcast_rpc_address to system.local
 Key: CASSANDRA-11181
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11181
 Project: Cassandra
  Issue Type: Improvement
Reporter: Nick Bailey
 Fix For: 3.4


Right now it's impossible to get the broadcast_rpc_address of the node you are 
connected to via the drivers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10907) Nodetool snapshot should provide an option to skip flushing

2015-12-22 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15068501#comment-15068501
 ] 

Nick Bailey commented on CASSANDRA-10907:
-

Fair enough on incremental backups. The only other thing I'd say is that if 
blocking on flushing is that big of an impact you might be close to IO capacity 
anyway. That said, I won't advocate for closing this ticket.

> Nodetool snapshot should provide an option to skip flushing
> ---
>
> Key: CASSANDRA-10907
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10907
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
> Environment: PROD
>Reporter: Anubhav Kale
>Priority: Minor
>  Labels: lhf
>
> For some practical scenarios, it doesn't matter if the data is flushed to 
> disk before taking a snapshot. However, it's better to save some flushing 
> time to make snapshot process quick.
> As such, it will be a good idea to provide this option to snapshot command. 
> The wiring from nodetool to MBean to VerbHandler should be easy. 
> I can provide a patch if this makes sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10907) Nodetool snapshot should provide an option to skip flushing

2015-12-21 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066826#comment-15066826
 ] 

Nick Bailey commented on CASSANDRA-10907:
-

My only objection is that the behavior of what information is actually backed 
up is basically undefined. It's possibly it's useful in some very specific use 
cases but it also introduces potential traps when used incorrectly.

It sounds to me like you should be using incremental backups. When that is 
enabled a hardlink is created every time a memtable is flushed or an sstable 
streamed. You can then just watch that directory and ship the sstables off node 
on demand as they are created.

> Nodetool snapshot should provide an option to skip flushing
> ---
>
> Key: CASSANDRA-10907
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10907
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
> Environment: PROD
>Reporter: Anubhav Kale
>Priority: Minor
>  Labels: lhf
>
> For some practical scenarios, it doesn't matter if the data is flushed to 
> disk before taking a snapshot. However, it's better to save some flushing 
> time to make snapshot process quick.
> As such, it will be a good idea to provide this option to snapshot command. 
> The wiring from nodetool to MBean to VerbHandler should be easy. 
> I can provide a patch if this makes sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10907) Nodetool snapshot should provide an option to skip flushing

2015-12-21 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066715#comment-15066715
 ] 

Nick Bailey commented on CASSANDRA-10907:
-

Just wondering what scenarios skipping flushing makes sense. It seems like any 
scenario there would be covered by the incremental backup option which 
hardlinks every sstable as its flushed.

> Nodetool snapshot should provide an option to skip flushing
> ---
>
> Key: CASSANDRA-10907
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10907
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
> Environment: PROD
>Reporter: Anubhav Kale
>Priority: Minor
>  Labels: lhf
>
> For some practical scenarios, it doesn't matter if the data is flushed to 
> disk before taking a snapshot. However, it's better to save some flushing 
> time to make snapshot process quick.
> As such, it will be a good idea to provide this option to snapshot command. 
> The wiring from nodetool to MBean to VerbHandler should be easy. 
> I can provide a patch if this makes sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10757) Cluster migration with sstableloader requires significant compaction time

2015-11-30 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032986#comment-15032986
 ] 

Nick Bailey commented on CASSANDRA-10757:
-

You are seeing the effects of CASSANDRA-4756

> Cluster migration with sstableloader requires significant compaction time
> -
>
> Key: CASSANDRA-10757
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10757
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction, Streaming and Messaging
>Reporter: Juho Mäkinen
>Priority: Minor
>  Labels: sstableloader
>
> When sstableloader is used to migrate data from a cluster into another the 
> loading creates a lot more data and a lot more sstable files than what the 
> original cluster had.
> For example in my case a 62 node with 16 TiB of data in 8 sstables was 
> sstableloaded into another cluster with 36 nodes and this resulted with 42 
> TiB of used data in a whopping 35 sstables.
> The sstableloadering process itself was relatively fast (around 8 hours), but 
> in the result the destination cluster needs approximately two weeks of 
> compaction to be able to reduce the number of sstables back to the original 
> levels. (The instances are C4.4xlarge in EC2, 16 cores each, compaction 
> running on 14 cores. the EBS disks in each instance provide 9000 iops and max 
> 250 MiB/sec disk bandwidth.).
> Could sstableloader process somehow improved to make this kind of migrations 
> less painful and faster?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10091) Align JMX authentication with internal authentication

2015-11-25 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027628#comment-15027628
 ] 

Nick Bailey commented on CASSANDRA-10091:
-

I'm definitely a fan of making it possible to reduce the number of auth schemes 
users have to set up. We should avoid breaking existing jmx clients and tools 
like you mentioned in CASSANDRA-10551 though.

[~beobal] you are proposing just getting authc done in this ticket and leaving 
authz controlled by the built in file based roles mechanism for now? That's 
probably fine, although we'll want to make sure we handle the edge cases 
appropriately. For example if a user turns on auth via cassandra but then 
doesn't specify a roles file on the filesystem as well. Or if there is a 
mismatch in the users defined in either. If those edge cases get hairy I might 
personally prefer to wait until we can deliver it all.

> Align JMX authentication with internal authentication
> -
>
> Key: CASSANDRA-10091
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10091
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>Priority: Minor
> Fix For: 3.x
>
>
> It would be useful to authenticate with JMX through Cassandra's internal 
> authentication. This would reduce the overhead of keeping passwords in files 
> on the machine and would consolidate passwords to one location. It would also 
> allow the possibility to handle JMX permissions in Cassandra.
> It could be done by creating our own JMX server and setting custom classes 
> for the authenticator and authorizer. We could then add some parameters where 
> the user could specify what authenticator and authorizer to use in case they 
> want to make their own.
> This could also be done by creating a premain method which creates a jmx 
> server. This would give us the feature without changing the Cassandra code 
> itself. However I believe this would be a good feature to have in Cassandra.
> I am currently working on a solution which creates a JMX server and uses a 
> custom authenticator and authorizer. It is currently build as a premain, 
> however it would be great if we could put this in Cassandra instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8794) AntiEntropySessions doesn't show up until after a repair

2015-11-23 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023281#comment-15023281
 ] 

Nick Bailey commented on CASSANDRA-8794:


If that mbean is gone in 3.0, is there still away to get how many repairs are 
pending? We were specifically using that mbean to try to determine if repair 
was backing up on a node.

> AntiEntropySessions doesn't show up until after a repair
> 
>
> Key: CASSANDRA-8794
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8794
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability
>Reporter: Peter Halliday
>Assignee: Yuki Morishita
>
> The metric AntiEntropySessions for internal thread pools doesn't actually 
> show up as an mbean until after a repair is run.  This should actually be 
> displayed before.  This also, keeps any cluster that doesn't need repairing 
> from displaying stats for AntiEntropySessions.  The lack of the mbean's 
> existence until after the repair will cause problem for various monitoring 
> tools.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10486) Expose tokens of bootstrapping nodes in JMX

2015-10-08 Thread Nick Bailey (JIRA)
Nick Bailey created CASSANDRA-10486:
---

 Summary: Expose tokens of bootstrapping nodes in JMX
 Key: CASSANDRA-10486
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10486
 Project: Cassandra
  Issue Type: Bug
Reporter: Nick Bailey
Priority: Minor


Currently you can get a list of bootstrapping nodes from JMX, but the only way 
to get the tokens of those bootstrapping nodes is to string parse info from the 
failure detector. This is fragile and can easily break when changes like 
https://issues.apache.org/jira/browse/CASSANDRA-10330 happen.

We should have a clean way of knowing the tokens of bootstrapping nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10486) Expose tokens of bootstrapping nodes in JMX

2015-10-08 Thread Nick Bailey (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Bailey updated CASSANDRA-10486:

Assignee: Brandon Williams

> Expose tokens of bootstrapping nodes in JMX
> ---
>
> Key: CASSANDRA-10486
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10486
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Nick Bailey
>Assignee: Brandon Williams
>Priority: Minor
>
> Currently you can get a list of bootstrapping nodes from JMX, but the only 
> way to get the tokens of those bootstrapping nodes is to string parse info 
> from the failure detector. This is fragile and can easily break when changes 
> like https://issues.apache.org/jira/browse/CASSANDRA-10330 happen.
> We should have a clean way of knowing the tokens of bootstrapping nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7066) Simplify (and unify) cleanup of compaction leftovers

2015-07-30 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648158#comment-14648158
 ] 

Nick Bailey commented on CASSANDRA-7066:


bq. If copying the files manually users will have to remove any partial 
transaction log files and their temporary files. 

Is it possible to have the refresh command and possibly node startup fail 
and/or throw an error if this step doesn't happen? Even with documentation 
around this step it seems like a pretty big trap for users who may be used to 
restoring by copying sstables.

 Simplify (and unify) cleanup of compaction leftovers
 

 Key: CASSANDRA-7066
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7066
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Stefania
Priority: Minor
  Labels: benedict-to-commit, compaction
 Fix For: 3.0 alpha 1

 Attachments: 7066.txt


 Currently we manage a list of in-progress compactions in a system table, 
 which we use to cleanup incomplete compactions when we're done. The problem 
 with this is that 1) it's a bit clunky (and leaves us in positions where we 
 can unnecessarily cleanup completed files, or conversely not cleanup files 
 that have been superceded); and 2) it's only used for a regular compaction - 
 no other compaction types are guarded in the same way, so can result in 
 duplication if we fail before deleting the replacements.
 I'd like to see each sstable store in its metadata its direct ancestors, and 
 on startup we simply delete any sstables that occur in the union of all 
 ancestor sets. This way as soon as we finish writing we're capable of 
 cleaning up any leftovers, so we never get duplication. It's also much easier 
 to reason about.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7066) Simplify (and unify) cleanup of compaction leftovers

2015-07-29 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646618#comment-14646618
 ] 

Nick Bailey commented on CASSANDRA-7066:


Thanks for the ping Jonathan. There is a lot to follow and digest here so let 
me just try to bring up my concerns as someone working on OpsCenter. Those 
concerns should fairly well represent any other tools trying to do 
backup/restore or even a user trying to do it manually.

From what I have tried to read through, it sounds like most of the concerns 
here are around cases where files/directories are manipulated manually rather 
than through the provided tools. So hopefully I can safely be ignored :).

* The snapshot command should create a full backup of a keyspace/table on the 
node. The directories created from the snapshot should be all that is required 
to restore that keyspace/table on that node to the point in time that the 
snapshot was taken.
* A snapshot should be restorable either via the sstableloader tool or by 
manually copying the files from the snapshot in to place (given the same 
schema/topology). If copying the files into place manually, restarting the node 
or making an additional call to load the sstables may be required.
* When using the sstableloader tool I should be able to restore data taken from 
a snapshot regardless of what data exists on the node or is currently being 
written.

If we are all good on those points then I don't see any issues from my 
standpoint. [~jbellis] was there anything else you wanted to me to look at 
specifically?



 Simplify (and unify) cleanup of compaction leftovers
 

 Key: CASSANDRA-7066
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7066
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Stefania
Priority: Minor
  Labels: benedict-to-commit, compaction
 Fix For: 3.0 alpha 1

 Attachments: 7066.txt


 Currently we manage a list of in-progress compactions in a system table, 
 which we use to cleanup incomplete compactions when we're done. The problem 
 with this is that 1) it's a bit clunky (and leaves us in positions where we 
 can unnecessarily cleanup completed files, or conversely not cleanup files 
 that have been superceded); and 2) it's only used for a regular compaction - 
 no other compaction types are guarded in the same way, so can result in 
 duplication if we fail before deleting the replacements.
 I'd like to see each sstable store in its metadata its direct ancestors, and 
 on startup we simply delete any sstables that occur in the union of all 
 ancestor sets. This way as soon as we finish writing we're capable of 
 cleaning up any leftovers, so we never get duplication. It's also much easier 
 to reason about.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9448) Metrics should use up to date nomenclature

2015-07-07 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616885#comment-14616885
 ] 

Nick Bailey commented on CASSANDRA-9448:


This will certainly break some monitoring tools, OpsCenter for sure. If the 
only solution that allows deprecating the old metrics is to duplicate them, 
then perhaps that is too much overhead and the monitoring tools out there will 
just have to take that hit, but it would be great to avoid. I also don't know 
of a way to 'alias' things with the metrics library though.

 Metrics should use up to date nomenclature
 --

 Key: CASSANDRA-9448
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9448
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Sam Tunnicliffe
Assignee: Stefania
  Labels: docs-impacting, jmx
 Fix For: 3.0 beta 1


 There are a number of exposed metrics that currently are named using the old 
 nomenclature of columnfamily and rows (meaning partitions).
 It would be good to audit all metrics and update any names to match what they 
 actually represent; we should probably do that in a single sweep to avoid a 
 confusing mixture of old and new terminology. 
 As we'd need to do this in a major release, I've initially set the fixver for 
 3.0 beta1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9436) Expose rpc_address and listen_address of each Cassandra node

2015-05-29 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565313#comment-14565313
 ] 

Nick Bailey commented on CASSANDRA-9436:


i think broadcast is probably what we really want. thats what shows up in the 
peers table for other nodes in the cluster. can't hurt to put both in i suppose 
though.

 Expose rpc_address and listen_address of each Cassandra node
 

 Key: CASSANDRA-9436
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9436
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Piotr Kołaczkowski
Assignee: Carl Yeksigian

 When running Cassandra nodes with collocated Spark nodes and accessing such 
 cluster from remote, to get data-locality right, we need to tell Spark the 
 locations of the Cassandra nodes and they should match the addresses that 
 Spark nodes bind to. Therefore in cloud environments we need to use private 
 IPs for that. Unfortunately, the client which connects from remote would know 
 only the broadcast rpc_addresses which are different.
 Can we have the IP/hostname that every C* node binds to exposed in a system 
 table? 
 system.peers table contains that information, but it doesn't contain that 
 information for the local node.
 So can we have listen_address and rpc_address added to the system.local table?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9374) Remove thrift dependency in stress schema creation

2015-05-13 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542394#comment-14542394
 ] 

Nick Bailey commented on CASSANDRA-9374:


It seems fine that this doesn't block a beta release but I'd say this needs to 
target an rc at least. Requiring turning on thrift and doing a rolling restart 
to run stress doesn't seem very friendly.

 Remove thrift dependency in stress schema creation
 --

 Key: CASSANDRA-9374
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9374
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Ryan McGuire
Assignee: T Jake Luciani
Priority: Minor
 Fix For: 2.2.x


 With CASSANDRA-9319 the thrift server is turned off by default, which makes 
 stress no longer work out of the box. Even though stress uses native CQL3 by 
 default, there is still some remaining piece that uses thrift for schema 
 creation.
 This is what you get by default now:
 {code}
 $ JAVA_HOME=~/fab/java ~/fab/stress/default/tools/bin/cassandra-stress write 
 n=1900 -rate threads=300 -node blade-11-4a,blade-11-3a,blade-11-2a
 Exception in thread main java.lang.RuntimeException: 
 org.apache.thrift.transport.TTransportException: java.net.ConnectException: 
 Connection refused
 at 
 org.apache.cassandra.stress.settings.StressSettings.getRawThriftClient(StressSettings.java:144)
 at 
 org.apache.cassandra.stress.settings.StressSettings.getRawThriftClient(StressSettings.java:110)
 at 
 org.apache.cassandra.stress.settings.SettingsSchema.createKeySpacesThrift(SettingsSchema.java:111)
 at 
 org.apache.cassandra.stress.settings.SettingsSchema.createKeySpaces(SettingsSchema.java:59)
 at 
 org.apache.cassandra.stress.settings.StressSettings.maybeCreateKeyspaces(StressSettings.java:205)
 at org.apache.cassandra.stress.StressAction.run(StressAction.java:55)
 at org.apache.cassandra.stress.Stress.main(Stress.java:109)
 Caused by: org.apache.thrift.transport.TTransportException: 
 java.net.ConnectException: Connection refused
 at org.apache.thrift.transport.TSocket.open(TSocket.java:187)
 at 
 org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
 at 
 org.apache.cassandra.thrift.TFramedTransportFactory.openTransport(TFramedTransportFactory.java:41)
 at 
 org.apache.cassandra.stress.settings.StressSettings.getRawThriftClient(StressSettings.java:124)
 ... 6 more
 Caused by: java.net.ConnectException: Connection refused
 at java.net.PlainSocketImpl.socketConnect(Native Method)
 at 
 java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
 at 
 java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
 at 
 java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
 at java.net.Socket.connect(Socket.java:579)
 at org.apache.thrift.transport.TSocket.open(TSocket.java:182)
 ... 9 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-9363) Expose vnode to directory assignment

2015-05-12 Thread Nick Bailey (JIRA)
Nick Bailey created CASSANDRA-9363:
--

 Summary: Expose vnode to directory assignment
 Key: CASSANDRA-9363
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9363
 Project: Cassandra
  Issue Type: Improvement
Reporter: Nick Bailey
 Fix For: 3.x


CASSANDRA-6696 adds the feature of pinning vnodes to specific disks to improve 
things when JBOD is being used.

We also need a way to introspect what vnodes are assigned where. I'm not sure 
what the easiest/best way to expose that info is. JMX/manifest file/system 
table could all be a valid option.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9195) PITR commitlog replay only actually replays mutation every other time

2015-04-24 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511302#comment-14511302
 ] 

Nick Bailey commented on CASSANDRA-9195:


So you can replay archived commitlogs without specifying a list of keyspaces 
and column families and it will restore everything. I think we should handle 
that case as well here.

 PITR commitlog replay only actually replays mutation every other time
 -

 Key: CASSANDRA-9195
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9195
 Project: Cassandra
  Issue Type: Bug
Reporter: Jon Moses
Assignee: Branimir Lambov
 Fix For: 2.1.5

 Attachments: 9195-2.1-v2.patch, 9195-v2.1.patch, loader.py


 Version: Cassandra 2.1.4.374 | DSE 4.7.0
 The main issue here is that the restore-cycle only replays the mutations
 every other try.  On the first try, it will restore the snapshot as expected
 and the cassandra system load will show that it's reading the mutations, but
 they do not actually get replayed, and at the end you're left with only the
 snapshot data (2k records).
 If you re-run the restore-cycle again, the commitlogs are replayed as 
 expected,
 and the data expected is present in the table (4k records, with a spot check 
 of 
 record 4500, as it's in the commitlog but not the snapshot).
 Then if you run the cycle again, it will fail.  Then again, and it will work. 
 The work/
 not work pattern continues.  Even re-running the commitlog replay a 2nd time, 
 without
 reloading the snapshot doesn't work
 The load process is:
 * Modify commitlog segment to 1mb
 * Archive to directory
 * create keyspace/table
 * insert base data
 * initial snapshot
 * write more data
 * capture timestamp
 * write more data
 * final snapshot
 * copy commitlogs to 2nd location
 * modify cassandra-env to replay only specified keyspace
 * modify commitlog properties to restore from 2nd location, with noted 
 timestamp
 The restore cycle is:
 * truncate table
 * sstableload snapshot
 * flush
 * output data status
 * restart to replay commitlogs
 * output data status
 
 See attached .py for a mostly automated reproduction scenario.  It expects 
 DSE (and I found it with DSE 4.7.0-1), rather than actual Cassandra, but 
 it's not using any DSE specific features.  The script looks for the configs 
 in the DSE locations, but they're set at the top, and there's only 2 places 
 where dse is restarted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9195) commitlog replay only actually replays mutation every other time

2015-04-23 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509273#comment-14509273
 ] 

Nick Bailey commented on CASSANDRA-9195:


[~jjordan] Ok right makes sense, but as I mentioned it appears we are only 
storing the latest truncate metadata so if that is the behavior we expect out 
of the database then it is broken since we forget about any past truncates 
whenever we do a new one. It seems we should be writing truncate records to the 
commitlog and then processing those correctly when doing the replay.

 commitlog replay only actually replays mutation every other time
 

 Key: CASSANDRA-9195
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9195
 Project: Cassandra
  Issue Type: Bug
Reporter: Jon Moses
Assignee: Branimir Lambov
Priority: Critical
 Fix For: 2.1.5

 Attachments: 9195-v2.1.patch, loader.py


 Version: Cassandra 2.1.4.374 | DSE 4.7.0
 The main issue here is that the restore-cycle only replays the mutations
 every other try.  On the first try, it will restore the snapshot as expected
 and the cassandra system load will show that it's reading the mutations, but
 they do not actually get replayed, and at the end you're left with only the
 snapshot data (2k records).
 If you re-run the restore-cycle again, the commitlogs are replayed as 
 expected,
 and the data expected is present in the table (4k records, with a spot check 
 of 
 record 4500, as it's in the commitlog but not the snapshot).
 Then if you run the cycle again, it will fail.  Then again, and it will work. 
 The work/
 not work pattern continues.  Even re-running the commitlog replay a 2nd time, 
 without
 reloading the snapshot doesn't work
 The load process is:
 * Modify commitlog segment to 1mb
 * Archive to directory
 * create keyspace/table
 * insert base data
 * initial snapshot
 * write more data
 * capture timestamp
 * write more data
 * final snapshot
 * copy commitlogs to 2nd location
 * modify cassandra-env to replay only specified keyspace
 * modify commitlog properties to restore from 2nd location, with noted 
 timestamp
 The restore cycle is:
 * truncate table
 * sstableload snapshot
 * flush
 * output data status
 * restart to replay commitlogs
 * output data status
 
 See attached .py for a mostly automated reproduction scenario.  It expects 
 DSE (and I found it with DSE 4.7.0-1), rather than actual Cassandra, but 
 it's not using any DSE specific features.  The script looks for the configs 
 in the DSE locations, but they're set at the top, and there's only 2 places 
 where dse is restarted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9195) commitlog replay only actually replays mutation every other time

2015-04-23 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509286#comment-14509286
 ] 

Nick Bailey commented on CASSANDRA-9195:


bq. Because commitlog is acting correctly, it should not replay truncated data 
regardless of where it comes from. In this particular case sstable loader is 
leaving the table in a bad state; rather than try to live with it I prefer to 
correct it at the origin.

bq. The PITR process includes a truncation step. What I am doing here is giving 
PITR the option to correctly continue after that step.

I don't think this is right. The attached reproduction step does include a 
truncation step but that step is just there to clear the database before 
verifying that restore works. It could be replaced with drop and recreate (and 
then you wouldn't see this bug). But in any case, users may end up truncating 
and then regretting it and trying to restore from commitlogs.

I don't think sstableloader is the right place for this because restoring from 
a snapshot is not required for point in time restore. You can simply archive 
every commitlog from the start rather than ever taking snapshots if you want. 
Then if you go and replay those commitlogs up to some time before the 
truncation C* should recognize that the replay is strictly before any 
truncation took place and let things replay.

Also, I still think it seems like truncation records should exist in the 
commitlogs themselves, or at the very least be a historical list of truncations.

 commitlog replay only actually replays mutation every other time
 

 Key: CASSANDRA-9195
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9195
 Project: Cassandra
  Issue Type: Bug
Reporter: Jon Moses
Assignee: Branimir Lambov
Priority: Critical
 Fix For: 2.1.5

 Attachments: 9195-v2.1.patch, loader.py


 Version: Cassandra 2.1.4.374 | DSE 4.7.0
 The main issue here is that the restore-cycle only replays the mutations
 every other try.  On the first try, it will restore the snapshot as expected
 and the cassandra system load will show that it's reading the mutations, but
 they do not actually get replayed, and at the end you're left with only the
 snapshot data (2k records).
 If you re-run the restore-cycle again, the commitlogs are replayed as 
 expected,
 and the data expected is present in the table (4k records, with a spot check 
 of 
 record 4500, as it's in the commitlog but not the snapshot).
 Then if you run the cycle again, it will fail.  Then again, and it will work. 
 The work/
 not work pattern continues.  Even re-running the commitlog replay a 2nd time, 
 without
 reloading the snapshot doesn't work
 The load process is:
 * Modify commitlog segment to 1mb
 * Archive to directory
 * create keyspace/table
 * insert base data
 * initial snapshot
 * write more data
 * capture timestamp
 * write more data
 * final snapshot
 * copy commitlogs to 2nd location
 * modify cassandra-env to replay only specified keyspace
 * modify commitlog properties to restore from 2nd location, with noted 
 timestamp
 The restore cycle is:
 * truncate table
 * sstableload snapshot
 * flush
 * output data status
 * restart to replay commitlogs
 * output data status
 
 See attached .py for a mostly automated reproduction scenario.  It expects 
 DSE (and I found it with DSE 4.7.0-1), rather than actual Cassandra, but 
 it's not using any DSE specific features.  The script looks for the configs 
 in the DSE locations, but they're set at the top, and there's only 2 places 
 where dse is restarted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9195) commitlog replay only actually replays mutation every other time

2015-04-23 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509246#comment-14509246
 ] 

Nick Bailey commented on CASSANDRA-9195:


I see how the truncation record is necessary when replaying commitlogs for a 
normal node restart, but I'm trying to imagine the case where you would ever 
want that when replaying archived commitlogs. 

I guess if truncating tables is part of your application design? If thats the 
case we are trying to handle then simply storing the last truncate in the 
system tables doesn't really help since I'll be able to restore across all the 
previous truncates.

To me it seems like the correct solution is for replaying of archived 
commitlogs to ignore the truncate metadata or clear it automatically for the 
user, although manually clearing it can be a workaround until that gets 
implemented.

 commitlog replay only actually replays mutation every other time
 

 Key: CASSANDRA-9195
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9195
 Project: Cassandra
  Issue Type: Bug
Reporter: Jon Moses
Assignee: Branimir Lambov
Priority: Critical
 Fix For: 2.1.5

 Attachments: 9195-v2.1.patch, loader.py


 Version: Cassandra 2.1.4.374 | DSE 4.7.0
 The main issue here is that the restore-cycle only replays the mutations
 every other try.  On the first try, it will restore the snapshot as expected
 and the cassandra system load will show that it's reading the mutations, but
 they do not actually get replayed, and at the end you're left with only the
 snapshot data (2k records).
 If you re-run the restore-cycle again, the commitlogs are replayed as 
 expected,
 and the data expected is present in the table (4k records, with a spot check 
 of 
 record 4500, as it's in the commitlog but not the snapshot).
 Then if you run the cycle again, it will fail.  Then again, and it will work. 
 The work/
 not work pattern continues.  Even re-running the commitlog replay a 2nd time, 
 without
 reloading the snapshot doesn't work
 The load process is:
 * Modify commitlog segment to 1mb
 * Archive to directory
 * create keyspace/table
 * insert base data
 * initial snapshot
 * write more data
 * capture timestamp
 * write more data
 * final snapshot
 * copy commitlogs to 2nd location
 * modify cassandra-env to replay only specified keyspace
 * modify commitlog properties to restore from 2nd location, with noted 
 timestamp
 The restore cycle is:
 * truncate table
 * sstableload snapshot
 * flush
 * output data status
 * restart to replay commitlogs
 * output data status
 
 See attached .py for a mostly automated reproduction scenario.  It expects 
 DSE (and I found it with DSE 4.7.0-1), rather than actual Cassandra, but 
 it's not using any DSE specific features.  The script looks for the configs 
 in the DSE locations, but they're set at the top, and there's only 2 places 
 where dse is restarted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9195) PITR commitlog replay only actually replays mutation every other time

2015-04-23 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509814#comment-14509814
 ] 

Nick Bailey commented on CASSANDRA-9195:


I don't think we need to worry about sstableloader. First, it isn't affected by 
the truncate metadata because it just streams the sstables directly to the 
nodes. Also, when you load a snapshot, you are loading data at a specific point 
in time rather than a range of time. Presumably when you go to load a snapshot 
you know that you want exactly the state of the node/cluster at that point in 
time. Any sort of behavior where truncating now and then loading a snapshot 
taken before the truncation, but not having any of that data present would be 
very strange IMO.

Given that we are going to fix it The Right Way in 3.0, I'd say just an 
automatic clear when replaying archived commitlogs is sufficient for 2.1. Can 
we also document on this ticket the workaround for older versions? Specifically 
what table/row/column needs to be wiped out before replaying archived 
commitlogs?

 PITR commitlog replay only actually replays mutation every other time
 -

 Key: CASSANDRA-9195
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9195
 Project: Cassandra
  Issue Type: Bug
Reporter: Jon Moses
Assignee: Branimir Lambov
 Fix For: 2.1.5

 Attachments: 9195-v2.1.patch, loader.py


 Version: Cassandra 2.1.4.374 | DSE 4.7.0
 The main issue here is that the restore-cycle only replays the mutations
 every other try.  On the first try, it will restore the snapshot as expected
 and the cassandra system load will show that it's reading the mutations, but
 they do not actually get replayed, and at the end you're left with only the
 snapshot data (2k records).
 If you re-run the restore-cycle again, the commitlogs are replayed as 
 expected,
 and the data expected is present in the table (4k records, with a spot check 
 of 
 record 4500, as it's in the commitlog but not the snapshot).
 Then if you run the cycle again, it will fail.  Then again, and it will work. 
 The work/
 not work pattern continues.  Even re-running the commitlog replay a 2nd time, 
 without
 reloading the snapshot doesn't work
 The load process is:
 * Modify commitlog segment to 1mb
 * Archive to directory
 * create keyspace/table
 * insert base data
 * initial snapshot
 * write more data
 * capture timestamp
 * write more data
 * final snapshot
 * copy commitlogs to 2nd location
 * modify cassandra-env to replay only specified keyspace
 * modify commitlog properties to restore from 2nd location, with noted 
 timestamp
 The restore cycle is:
 * truncate table
 * sstableload snapshot
 * flush
 * output data status
 * restart to replay commitlogs
 * output data status
 
 See attached .py for a mostly automated reproduction scenario.  It expects 
 DSE (and I found it with DSE 4.7.0-1), rather than actual Cassandra, but 
 it's not using any DSE specific features.  The script looks for the configs 
 in the DSE locations, but they're set at the top, and there's only 2 places 
 where dse is restarted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6696) Partition sstables by token range

2015-04-17 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500283#comment-14500283
 ] 

Nick Bailey commented on CASSANDRA-6696:


So I just want to mention on here that the current approach here isn't going to 
help us much with CASSANDRA-4756.

If you don't update your compaction strategy, sstables will contain data from 
many vnodes so things aren't much different than now. If you do use the new 
compaction strategy, things are slightly better in that levels 1 or higher are 
split per vnode and you could deduplicate that data, but level 0 won't be so 
you'll still be forced to overstream anything in level 0.

We may want to revisit a new approach to CASSANDRA-4756, specifically one that 
isn't compaction strategy specific.

 Partition sstables by token range
 -

 Key: CASSANDRA-6696
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6696
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: sankalp kohli
Assignee: Marcus Eriksson
  Labels: compaction, correctness, dense-storage, performance
 Fix For: 3.0


 In JBOD, when someone gets a bad drive, the bad drive is replaced with a new 
 empty one and repair is run. 
 This can cause deleted data to come back in some cases. Also this is true for 
 corrupt stables in which we delete the corrupt stable and run repair. 
 Here is an example:
 Say we have 3 nodes A,B and C and RF=3 and GC grace=10days. 
 row=sankalp col=sankalp is written 20 days back and successfully went to all 
 three nodes. 
 Then a delete/tombstone was written successfully for the same row column 15 
 days back. 
 Since this tombstone is more than gc grace, it got compacted in Nodes A and B 
 since it got compacted with the actual data. So there is no trace of this row 
 column in node A and B.
 Now in node C, say the original data is in drive1 and tombstone is in drive2. 
 Compaction has not yet reclaimed the data and tombstone.  
 Drive2 becomes corrupt and was replaced with new empty drive. 
 Due to the replacement, the tombstone in now gone and row=sankalp col=sankalp 
 has come back to life. 
 Now after replacing the drive we run repair. This data will be propagated to 
 all nodes. 
 Note: This is still a problem even if we run repair every gc grace. 
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6696) Partition sstables by token range

2015-04-17 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500329#comment-14500329
 ] 

Nick Bailey commented on CASSANDRA-6696:


I'd also like to mention that we should consider what the best way to expose 
this new information to operators is. Specifically, what vnodes are assigned to 
what disk? What vnode is an sstable responsible for? It should be possible to 
get that information without running sstablemetadata against every sstable file.

 Partition sstables by token range
 -

 Key: CASSANDRA-6696
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6696
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: sankalp kohli
Assignee: Marcus Eriksson
  Labels: compaction, correctness, dense-storage, performance
 Fix For: 3.0


 In JBOD, when someone gets a bad drive, the bad drive is replaced with a new 
 empty one and repair is run. 
 This can cause deleted data to come back in some cases. Also this is true for 
 corrupt stables in which we delete the corrupt stable and run repair. 
 Here is an example:
 Say we have 3 nodes A,B and C and RF=3 and GC grace=10days. 
 row=sankalp col=sankalp is written 20 days back and successfully went to all 
 three nodes. 
 Then a delete/tombstone was written successfully for the same row column 15 
 days back. 
 Since this tombstone is more than gc grace, it got compacted in Nodes A and B 
 since it got compacted with the actual data. So there is no trace of this row 
 column in node A and B.
 Now in node C, say the original data is in drive1 and tombstone is in drive2. 
 Compaction has not yet reclaimed the data and tombstone.  
 Drive2 becomes corrupt and was replaced with new empty drive. 
 Due to the replacement, the tombstone in now gone and row=sankalp col=sankalp 
 has come back to life. 
 Now after replacing the drive we run repair. This data will be propagated to 
 all nodes. 
 Note: This is still a problem even if we run repair every gc grace. 
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9179) Unable to point in time restore if table/cf has been recreated

2015-04-14 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14494458#comment-14494458
 ] 

Nick Bailey commented on CASSANDRA-9179:


I'm not necessarily advocating for minor vs. major, but I'll also point out 
that this issue also affects the cloning use case for backup/restore. It will 
be difficult to do a PIT clone of a table to a different cluster (although it 
is already fairly difficult since all commitlogs would need to be replayed on 
all nodes in the case of a different topology).

And just in general, it's fine to say that a perfect user will never actually 
need to this but the reality is that eventually someone will want to. At the 
time of a restore to fix some issue and get an application running again is not 
a great place to run into gotchas and complicated workarounds.

 Unable to point in time restore if table/cf has been recreated
 

 Key: CASSANDRA-9179
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9179
 Project: Cassandra
  Issue Type: Bug
Reporter: Jon Moses
Priority: Minor

 With Cassandra 2.1, and the addition of the CF UUID, the ability to do a 
 point in time restore by restoring a snapshot and replaying commitlogs is 
 lost if the table has been dropped and recreated.
 When the table is recreated, the cf_id changes, and the commitlog replay 
 mechanism skips the desired mutations as the cf_id no longer matches what's 
 present in the schema.
 There should exist a way to inform the replay that you want the mutations 
 replayed even if the cf_id doesn't match.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8348) allow takeColumnFamilySnapshot to take a list of ColumnFamilies

2015-04-14 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14494200#comment-14494200
 ] 

Nick Bailey commented on CASSANDRA-8348:


+1 from me. Tested the patches out on 3.0 and 2.1

 allow takeColumnFamilySnapshot to take a list of ColumnFamilies
 ---

 Key: CASSANDRA-8348
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8348
 Project: Cassandra
  Issue Type: Improvement
Reporter: Peter Halliday
Priority: Minor
 Fix For: 2.1.5

 Attachments: 8348_21.patch, 8348_trunk.patch, 8348_v2.patch, 
 Patch-8348.patch


 Within StorageServiceMBean.java the function takeSnapshot allows for a list 
 of keyspaces to snapshot.  However, the function takeColumnFamilySnapshot 
 only allows for a single ColumnFamily to snapshot.  This should allow for 
 multiple ColumnFamilies within the same Keyspace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8348) allow takeColumnFamilySnapshot to take a list of ColumnFamilies

2015-04-09 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487850#comment-14487850
 ] 

Nick Bailey commented on CASSANDRA-8348:


[~SachinJanani] ^

 allow takeColumnFamilySnapshot to take a list of ColumnFamilies
 ---

 Key: CASSANDRA-8348
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8348
 Project: Cassandra
  Issue Type: Improvement
Reporter: Peter Halliday
Priority: Minor
 Fix For: 2.1.5

 Attachments: 8348_v2.patch, Patch-8348.patch


 Within StorageServiceMBean.java the function takeSnapshot allows for a list 
 of keyspaces to snapshot.  However, the function takeColumnFamilySnapshot 
 only allows for a single ColumnFamily to snapshot.  This should allow for 
 multiple ColumnFamilies within the same Keyspace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8348) allow takeColumnFamilySnapshot to take a list of ColumnFamilies

2015-04-07 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483359#comment-14483359
 ] 

Nick Bailey commented on CASSANDRA-8348:


bq. Also I think the imports should be managed properly and unnecessary imports 
should be avoided

I'm fine with it personally. I didn't see anything in the contributor 
guidelines so I think you are good to leave it.

bq. Regarding the other patch for 3.0 I cant find the branch for 3.0 version on 
github https://github.com/apache/cassandra

3.0 is trunk, you likely need an additional patch for the 2.1 branch. 
https://github.com/apache/cassandra/tree/cassandra-2.1

Other notes, 

* You added some tabs into this new patch. Whitespace should be spaces only.
* Your check if (splittedString.length == 2) will catch both cases where a 
column family isn't passed in or a secondary index is passed in. Can probably 
remove those additional checks in the if block and update the error message.

So nothing major to change, looks good to me besides that.



 allow takeColumnFamilySnapshot to take a list of ColumnFamilies
 ---

 Key: CASSANDRA-8348
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8348
 Project: Cassandra
  Issue Type: Improvement
Reporter: Peter Halliday
Priority: Minor
 Fix For: 2.1.5

 Attachments: 8348_v2.patch, Patch-8348.patch


 Within StorageServiceMBean.java the function takeSnapshot allows for a list 
 of keyspaces to snapshot.  However, the function takeColumnFamilySnapshot 
 only allows for a single ColumnFamily to snapshot.  This should allow for 
 multiple ColumnFamilies within the same Keyspace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8348) allow takeColumnFamilySnapshot to take a list of ColumnFamilies

2015-03-31 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388846#comment-14388846
 ] 

Nick Bailey commented on CASSANDRA-8348:


Unfortunately the patch doesn't apply to 2.1 anymore. Sorry for the delay, can 
you update the patch [~SachinJanani]?

The code looks pretty good to me except I think you call getValidKeyspace() on 
the same keyspace twice. Also it looks like your IDE auto expanded any '*' 
imports. I don't remember what the c* team's policy on that is.

 allow takeColumnFamilySnapshot to take a list of ColumnFamilies
 ---

 Key: CASSANDRA-8348
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8348
 Project: Cassandra
  Issue Type: Improvement
Reporter: Peter Halliday
Priority: Minor
 Fix For: 2.1.4

 Attachments: Patch-8348.patch


 Within StorageServiceMBean.java the function takeSnapshot allows for a list 
 of keyspaces to snapshot.  However, the function takeColumnFamilySnapshot 
 only allows for a single ColumnFamily to snapshot.  This should allow for 
 multiple ColumnFamilies within the same Keyspace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8348) allow takeColumnFamilySnapshot to take a list of ColumnFamilies

2015-03-31 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388850#comment-14388850
 ] 

Nick Bailey commented on CASSANDRA-8348:


Also, we'll probably want a second patch against 3.0 since we changed 'column 
family' to 'table' everywhere.

 allow takeColumnFamilySnapshot to take a list of ColumnFamilies
 ---

 Key: CASSANDRA-8348
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8348
 Project: Cassandra
  Issue Type: Improvement
Reporter: Peter Halliday
Priority: Minor
 Fix For: 2.1.4

 Attachments: Patch-8348.patch


 Within StorageServiceMBean.java the function takeSnapshot allows for a list 
 of keyspaces to snapshot.  However, the function takeColumnFamilySnapshot 
 only allows for a single ColumnFamily to snapshot.  This should allow for 
 multiple ColumnFamilies within the same Keyspace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8348) allow takeColumnFamilySnapshot to take a list of ColumnFamilies

2015-03-31 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388846#comment-14388846
 ] 

Nick Bailey edited comment on CASSANDRA-8348 at 3/31/15 5:04 PM:
-

Unfortunately the patch doesn't apply to 2.1 anymore. Sorry for the delay, can 
you update the patch [~SachinJanani]?

The code looks pretty good to me except I think you call getValidKeyspace() on 
the same keyspace twice. Also it looks like your IDE auto expanded any '\*' 
imports. I don't remember what the c\* team's policy on that is.


was (Author: nickmbailey):
Unfortunately the patch doesn't apply to 2.1 anymore. Sorry for the delay, can 
you update the patch [~SachinJanani]?

The code looks pretty good to me except I think you call getValidKeyspace() on 
the same keyspace twice. Also it looks like your IDE auto expanded any '*' 
imports. I don't remember what the c* team's policy on that is.

 allow takeColumnFamilySnapshot to take a list of ColumnFamilies
 ---

 Key: CASSANDRA-8348
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8348
 Project: Cassandra
  Issue Type: Improvement
Reporter: Peter Halliday
Priority: Minor
 Fix For: 2.1.4

 Attachments: Patch-8348.patch


 Within StorageServiceMBean.java the function takeSnapshot allows for a list 
 of keyspaces to snapshot.  However, the function takeColumnFamilySnapshot 
 only allows for a single ColumnFamily to snapshot.  This should allow for 
 multiple ColumnFamilies within the same Keyspace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8499) Ensure SSTableWriter cleans up properly after failure

2015-03-26 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382090#comment-14382090
 ] 

Nick Bailey commented on CASSANDRA-8499:


So would this affect snapshot repairs? Potentially causing an eventual OOM 
after continually doing snapshot repairs on the cluster?

 Ensure SSTableWriter cleans up properly after failure
 -

 Key: CASSANDRA-8499
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8499
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.0.12, 2.1.3

 Attachments: 8499-20.txt, 8499-20v2, 8499-21.txt, 8499-21v2, 8499-21v3


 In 2.0 we do not free a bloom filter, in 2.1 we do not free a small piece of 
 offheap memory for writing compression metadata. In both we attempt to flush 
 the BF despite having encountered an exception, making the exception slow to 
 propagate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9022) Node Cleanup deletes all its data after a new node joined the cluster

2015-03-23 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376881#comment-14376881
 ] 

Nick Bailey commented on CASSANDRA-9022:


So does this mean cleanup should never been run on a 2.1.0-2.1.3 cluster if 
nodes have been added to the cluster?

 Node Cleanup deletes all its data after a new node joined the cluster
 -

 Key: CASSANDRA-9022
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9022
 Project: Cassandra
  Issue Type: Bug
Reporter: Alan Boudreault
Assignee: Benedict
Priority: Critical
 Fix For: 2.1.4

 Attachments: 9022.txt, bisect.sh, results_cassandra_2.1.3.txt, 
 results_cassandra_2.1_branch.txt


 I try to add a node in my cluster and doing some cleanup deleted all my data 
 on a node. This makes the cluster totally broken since all next read seem to 
 not be able to validate the data. Even a repair on the problematic node 
 doesn't fix the issue.  I've attached the bisect script used and the output 
 results of the procedure.
 Procedure to reproduce:
 {code}
 ccm stop  ccm remove
 ccm create -n 2 --install-dir=path/to/cassandra-2.1/branch demo
 ccm start
 ccm node1 stress -- write n=100 -schema replication\(factor=2\) -rate 
 threads=50
 ccm node1 nodetool status
 ccm add -i 127.0.0.3 -j 7400 node3 # no auto-boostrap
 ccm node3 start
 ccm node1 nodetool status
 ccm node3 repair
 ccm node3 nodetool status
 ccm node1 nodetool cleanup
 ccm node2 nodetool cleanup
 ccm node3 nodetool cleanup
 ccm node1 nodetool status
 ccm node1 repair
 ccm node1 stress -- read n=100 ## CRASH Data returned was not validated 
 ?!?
 {code}
 bisec script output:
 {code}
 $ git bisect start cassandra-2.1 cassandra-2.1.3
 $ git bisect run ~/dev/cstar/cleanup_issue/bisect.sh
 ...
 4b05b204acfa60ecad5672c7e6068eb47b21397a is the first bad commit
 commit 4b05b204acfa60ecad5672c7e6068eb47b21397a
 Author: Benedict Elliott Smith bened...@apache.org
 Date:   Wed Feb 11 15:49:43 2015 +
 Enforce SSTableReader.first/last
 
 patch by benedict; reviewed by yukim for CASSANDRA-8744
 :100644 100644 3f0463731e624cbe273dcb3951b2055fa5d9e1a2 
 b2f894eb22b9102d410f1eabeb3e11d26727fbd3 M  CHANGES.txt
 :04 04 51ac2a6cd39bd2377c2e1ed6693ef789ab65a26c 
 79fa2501f4155a64dca2bbdcc9e578008e4e425a M  src
 bisect run success
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5310) New authentication module does not wok in multi datacenters in case of network outage

2015-03-12 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359589#comment-14359589
 ] 

Nick Bailey commented on CASSANDRA-5310:


Can we add some additional error messaging around this maybe? If we get 
UnavailableException when looking up the 'cassandra' user log that we have to 
use quorum for that user lookup? It's extremely confusing to run into this in 
the wild.

 New authentication module does not wok in multi datacenters in case of 
 network outage
 -

 Key: CASSANDRA-5310
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5310
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 1.2.2
 Environment: Ubuntu 12.04
 Cluster of 16 nodes in 2 datacenters (8 nodes in each datacenter)
Reporter: jal
Assignee: Aleksey Yeschenko
Priority: Minor
 Fix For: 1.2.3

 Attachments: auth_fix_consistency.patch


 With 1.2.2, I am using the new authentication backend PasswordAuthenticator 
 with the authorizer CassandraAuthorizer
 In case of network outage, we are no more able to connect to Cassandra.
 Here is the error message we get when I want to connect through cqlsh:
 Traceback (most recent call last):
   File ./cqlsh, line 2262, in module
 main(*read_options(sys.argv[1:], os.environ))
   File ./cqlsh, line 2248, in main
 display_float_precision=options.float_precision)
   File ./cqlsh, line 483, in __init__
 cql_version=cqlver, transport=transport)
 File ./../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/connection.py, line 
 143, in connect
   File ./../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/connection.py, 
 line 59, in __init__
   File ./../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/thrifteries.py, 
 line 157, in establish_connection
   File 
 ./../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/cassandra/Cassandra.py, 
 line 455, in login
   File 
 ./../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/cassandra/Cassandra.py, 
 line 476, in recv_login
 cql.cassandra.ttypes.AuthenticationException: 
 AuthenticationException(why='org.apache.cassandra.exceptions.UnavailableException:
  Cannot achieve consistency level QUORUM')



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7555) Support copy and link for commitlog archiving without forking the jvm

2015-03-04 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347166#comment-14347166
 ] 

Nick Bailey commented on CASSANDRA-7555:


Oh nice. Maybe we just split this into two tickets then. One for copy now and 
one for link that depends on 8771.

 Support copy and link for commitlog archiving without forking the jvm
 -

 Key: CASSANDRA-7555
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7555
 Project: Cassandra
  Issue Type: Improvement
Reporter: Nick Bailey
Assignee: Joshua McKenzie
Priority: Minor
 Fix For: 2.1.4


 Right now for commitlog archiving the user specifies a command to run and c* 
 forks the jvm to run that command. The most common operations will be either 
 copy or link (hard or soft). Since we can do all of these operations without 
 forking the jvm, which is very expensive, we should have special cases for 
 those.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7555) Support copy and link for commitlog archiving without forking the jvm

2015-03-04 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347152#comment-14347152
 ] 

Nick Bailey commented on CASSANDRA-7555:


So it seems that C* recycles commitlog segments rather than creating new ones. 
So I'm not sure either of the link options are actually viable, perhaps just 
the copy option.

 Support copy and link for commitlog archiving without forking the jvm
 -

 Key: CASSANDRA-7555
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7555
 Project: Cassandra
  Issue Type: Improvement
Reporter: Nick Bailey
Assignee: Joshua McKenzie
Priority: Minor
 Fix For: 2.1.4


 Right now for commitlog archiving the user specifies a command to run and c* 
 forks the jvm to run that command. The most common operations will be either 
 copy or link (hard or soft). Since we can do all of these operations without 
 forking the jvm, which is very expensive, we should have special cases for 
 those.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8879) Alter table on compact storage broken

2015-03-03 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345640#comment-14345640
 ] 

Nick Bailey commented on CASSANDRA-8879:


FWIW, that is essentially the case I was hitting. This was a thrift table that 
I know contains only ascii data and rather than deal with hex/bytes i wanted to 
just update the schema. I can see the argument for not allowing this since you 
could be shooting yourself in the foot if the actual data isn't the right type. 
On the other hand the user-friendliness of having to alter my schema with 
thrift (in not completely obvious ways) leaves something to be desired as well. 
Either way thats probably separate from the actual bug in this ticket (since 
it's broken going bytes-ascii or ascii-bytes).

 Alter table on compact storage broken
 -

 Key: CASSANDRA-8879
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8879
 Project: Cassandra
  Issue Type: Bug
Reporter: Nick Bailey
Assignee: Tyler Hobbs
 Fix For: 2.0.13

 Attachments: 8879-2.0.txt


 In 2.0 HEAD, alter table on compact storage tables seems to be broken. With 
 the following table definition, altering the column breaks cqlsh and 
 generates a stack trace in the log.
 {noformat}
 CREATE TABLE settings (
   key blob,
   column1 blob,
   value blob,
   PRIMARY KEY ((key), column1)
 ) WITH COMPACT STORAGE
 {noformat}
 {noformat}
 cqlsh:OpsCenter alter table settings ALTER column1 TYPE ascii ;
 TSocket read 0 bytes
 cqlsh:OpsCenter DESC TABLE settings;
 {noformat}
 {noformat}
 ERROR [Thrift:7] 2015-02-26 17:20:24,640 CassandraDaemon.java (line 199) 
 Exception in thread Thread[Thrift:7,5,main]
 java.lang.AssertionError
 ...at 
 org.apache.cassandra.cql3.statements.AlterTableStatement.announceMigration(AlterTableStatement.java:198)
 ...at 
 org.apache.cassandra.cql3.statements.SchemaAlteringStatement.execute(SchemaAlteringStatement.java:79)
 ...at 
 org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:158)
 ...at 
 org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:175)
 ...at 
 org.apache.cassandra.thrift.CassandraServer.execute_cql3_query(CassandraServer.java:1958)
 ...at 
 org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4486)
 ...at 
 org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4470)
 ...at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
 ...at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
 ...at 
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:204)
 ...at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 ...at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 ...at java.lang.Thread.run(Thread.java:724)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8879) Alter table on compact storage broken

2015-02-27 Thread Nick Bailey (JIRA)
Nick Bailey created CASSANDRA-8879:
--

 Summary: Alter table on compact storage broken
 Key: CASSANDRA-8879
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8879
 Project: Cassandra
  Issue Type: Bug
Reporter: Nick Bailey
 Fix For: 2.0.13


In 2.0 HEAD, alter table on compact storage tables seems to be broken. With the 
following table definition, altering the column breaks cqlsh and generates a 
stack trace in the log.

{noformat}
CREATE TABLE settings (
  key blob,
  column1 blob,
  value blob,
  PRIMARY KEY ((key), column1)
) WITH COMPACT STORAGE
{noformat}
{noformat}
cqlsh:OpsCenter alter table settings ALTER column1 TYPE ascii ;
TSocket read 0 bytes
cqlsh:OpsCenter DESC TABLE settings;
{noformat}
{noformat}
ERROR [Thrift:7] 2015-02-26 17:20:24,640 CassandraDaemon.java (line 199) 
Exception in thread Thread[Thrift:7,5,main]
java.lang.AssertionError
...at 
org.apache.cassandra.cql3.statements.AlterTableStatement.announceMigration(AlterTableStatement.java:198)
...at 
org.apache.cassandra.cql3.statements.SchemaAlteringStatement.execute(SchemaAlteringStatement.java:79)
...at 
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:158)
...at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:175)
...at 
org.apache.cassandra.thrift.CassandraServer.execute_cql3_query(CassandraServer.java:1958)
...at 
org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4486)
...at 
org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4470)
...at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
...at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
...at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:204)
...at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
...at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
...at java.lang.Thread.run(Thread.java:724)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8702) LatencyMetrics is reporting total latency in nanoseconds rather than microseconds

2015-01-29 Thread Nick Bailey (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Bailey updated CASSANDRA-8702:
---
Assignee: T Jake Luciani  (was: Nick Bailey)

 LatencyMetrics is reporting total latency in nanoseconds rather than 
 microseconds 
 --

 Key: CASSANDRA-8702
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8702
 Project: Cassandra
  Issue Type: Bug
Reporter: Mike Adamson
Assignee: T Jake Luciani
 Fix For: 3.0

 Attachments: 8702.txt


 I don't know if this is the desired behaviour but all the comments in the 
 code indicate that it should be reporting microseconds. 
 A single write shows the following:
 {code}
 WriteLatency
 
 Count: 1
 Min: 315.853
 Max: 379.022
 WriteTotalLatency
 -
 Count: 339667
 {code}
 In LatencyMetrics:
 {code}
 /** Total latency in micro sec */
 public final Counter totalLatency;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-8702) LatencyMetrics is reporting total latency in nanoseconds rather than microseconds

2015-01-29 Thread Nick Bailey (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Bailey reassigned CASSANDRA-8702:
--

Assignee: Nick Bailey  (was: T Jake Luciani)

 LatencyMetrics is reporting total latency in nanoseconds rather than 
 microseconds 
 --

 Key: CASSANDRA-8702
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8702
 Project: Cassandra
  Issue Type: Bug
Reporter: Mike Adamson
Assignee: Nick Bailey
 Fix For: 3.0

 Attachments: 8702.txt


 I don't know if this is the desired behaviour but all the comments in the 
 code indicate that it should be reporting microseconds. 
 A single write shows the following:
 {code}
 WriteLatency
 
 Count: 1
 Min: 315.853
 Max: 379.022
 WriteTotalLatency
 -
 Count: 339667
 {code}
 In LatencyMetrics:
 {code}
 /** Total latency in micro sec */
 public final Counter totalLatency;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7560) 'nodetool repair -pr' leads to indefinitely hanging AntiEntropySession

2015-01-23 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289291#comment-14289291
 ] 

Nick Bailey commented on CASSANDRA-7560:


Is this a bug only for snapshot repair?

 'nodetool repair -pr' leads to indefinitely hanging AntiEntropySession
 --

 Key: CASSANDRA-7560
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7560
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Vladimir Avram
Assignee: Yuki Morishita
 Fix For: 2.0.10

 Attachments: 0001-backport-CASSANDRA-6747.patch, 
 0001-partial-backport-3569.patch, cassandra_daemon.log, 
 cassandra_daemon_rep1.log, cassandra_daemon_rep2.log, nodetool_command.log


 Running {{nodetool repair -pr}} will sometimes hang on one of the resulting 
 AntiEntropySessions.
 The system logs will show the repair command starting
 {noformat}
  INFO [Thread-3079] 2014-07-15 02:22:56,514 StorageService.java (line 2569) 
 Starting repair command #1, repairing 256 ranges for keyspace x
 {noformat}
 You can then see a few AntiEntropySessions completing with:
 {noformat}
 INFO [AntiEntropySessions:2] 2014-07-15 02:28:12,766 RepairSession.java (line 
 282) [repair #eefb3c30-0bc6-11e4-83f7-a378978d0c49] session completed 
 successfully
 {noformat}
 Finally we reach an AntiEntropySession at some point that hangs just before 
 requesting the merkle trees for the next column family in line for repair. So 
 we first see the previous CF being finished and the whole repair sessions 
 hangs here with no visible progress or errors on this or any of the related 
 nodes.
 {noformat}
 INFO [AntiEntropyStage:1] 2014-07-15 02:38:20,325 RepairSession.java (line 
 221) [repair #8f85c1b0-0bc8-11e4-83f7-a378978d0c49] previous_cf is fully 
 synced
 {noformat}
 Notes:
 * Single DC 6 node cluster with an average load of 86 GB per node.
 * This appears to be random; it does not always happen on the same CF or on 
 the same session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-8602) ArithmethicException: Divide by zero in agent (cassandra)

2015-01-12 Thread Nick Bailey (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Bailey resolved CASSANDRA-8602.

Resolution: Invalid

So this is just the issue tracker for Apache Cassandra doesn't include tracking 
issues for OpsCenter, so I'm going to close this here.

The best way to report issues like this for OpsCenter is via the feedback form 
in the OpsCenter interface.

For this particular bug, we are tracking this internally and it should be fixed 
in the next major release of OpsCenter. In the meantime, the bug should be 
mostly harmless, except for the alarming logging. Thanks for the report though.

 ArithmethicException: Divide by zero in agent (cassandra)
 -

 Key: CASSANDRA-8602
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8602
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Catalin Alexandru Zamfir

 We got the following exception and no data is currently showing on the graphs 
 in OpsCenter. From the datastax-agent logs:
 {code}
 ERROR [jmx-metrics-2] 2015-01-11 03:55:00,000 Error getting CF metrics
 java.lang.ArithmeticException: Divide by zero
 at clojure.lang.Numbers.divide(Numbers.java:156)
 at opsagent.rollup$transform_value.invoke(rollup.clj:43)
 at opsagent.rollup$add_value.invoke(rollup.clj:132)
 at opsagent.rollup$add_value.invoke(rollup.clj:150)
 at opsagent.rollup$add_value.invoke(rollup.clj:150)
 at opsagent.rollup$process_keypair$fn__701.invoke(rollup.clj:211)
 at 
 opsagent.cache$update_cache_value_default$fn__481$fn__482.invoke(cache.clj:23)
 at clojure.lang.AFn.applyToHelper(AFn.java:161)
 at clojure.lang.AFn.applyTo(AFn.java:151)
 at clojure.lang.Ref.alter(Ref.java:174)
 at clojure.core$alter.doInvoke(core.clj:2244)
 at clojure.lang.RestFn.invoke(RestFn.java:425)
 at 
 opsagent.cache$update_cache_value_default$fn__481.invoke(cache.clj:23)
 at clojure.lang.AFn.call(AFn.java:18)
 at clojure.lang.LockingTransaction.run(LockingTransaction.java:263)
 at 
 clojure.lang.LockingTransaction.runInTransaction(LockingTransaction.java:231)
 at opsagent.cache$update_cache_value_default.invoke(cache.clj:22)
 at opsagent.rollup$process_keypair.invoke(rollup.clj:211)
 at opsagent.rollup$process_metric_map.invoke(rollup.clj:217)
 at 
 opsagent.metrics.jmx$start_jmx_metric_collection$send_metrics__5266.invoke(jmx.clj:200)
 at opsagent.metrics.jmx$cf_metric_helper.invoke(jmx.clj:92)
 at opsagent.metrics.jmx$start_pool$fn__5238.invoke(jmx.clj:148)
 at clojure.lang.AFn.run(AFn.java:24)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 ERROR [jmx-metrics-2] 2015-01-11 14:27:00,000 Error getting CF metrics
 java.lang.ArithmeticException: Divide by zero
 at clojure.lang.Numbers.divide(Numbers.java:156)
 at opsagent.rollup$transform_value.invoke(rollup.clj:43)
 at opsagent.rollup$add_value.invoke(rollup.clj:132)
 at opsagent.rollup$add_value.invoke(rollup.clj:150)
 at opsagent.rollup$add_value.invoke(rollup.clj:150)
 at opsagent.rollup$process_keypair$fn__701.invoke(rollup.clj:211)
 at 
 opsagent.cache$update_cache_value_default$fn__481$fn__482.invoke(cache.clj:23)
 at clojure.lang.AFn.applyToHelper(AFn.java:161)
 at clojure.lang.AFn.applyTo(AFn.java:151)
 at clojure.lang.Ref.alter(Ref.java:174)
 at clojure.core$alter.doInvoke(core.clj:2244)
 at clojure.lang.RestFn.invoke(RestFn.java:425)
 at 
 opsagent.cache$update_cache_value_default$fn__481.invoke(cache.clj:23)
 at clojure.lang.AFn.call(AFn.java:18)
 at clojure.lang.LockingTransaction.run(LockingTransaction.java:263)
 at 
 clojure.lang.LockingTransaction.runInTransaction(LockingTransaction.java:231)
 at opsagent.cache$update_cache_value_default.invoke(cache.clj:22)
 at opsagent.rollup$process_keypair.invoke(rollup.clj:211)
 at opsagent.rollup$process_metric_map.invoke(rollup.clj:217)
 at 
 

[jira] [Commented] (CASSANDRA-8076) Expose an mbean method to poll for repair job status

2014-12-17 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250829#comment-14250829
 ] 

Nick Bailey commented on CASSANDRA-8076:


Can we differentiate between running and invalid repair number? Both are 
represented by -1 and are pretty different things. Right now if C* restarts 
while repair X is running and the client doesn't recognize the restart it would 
forever check the status of repair X assuming it is still running.

 Expose an mbean method to poll for repair job status
 

 Key: CASSANDRA-8076
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8076
 Project: Cassandra
  Issue Type: Improvement
Reporter: Philip S Doctor
Assignee: Yuki Morishita
 Fix For: 2.0.12

 Attachments: 8076-2.0.txt


 Given the int reply-id from forceRepairAsync, allow a client to request the 
 status of this ID via jmx.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8056) nodetool snapshot keyspace -cf table -t sametagname does not work on multiple tabes of the same keyspace

2014-12-01 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230060#comment-14230060
 ] 

Nick Bailey commented on CASSANDRA-8056:


Since we are doing this here is CASSANDRA-8348 now a duplicate?

 nodetool snapshot keyspace -cf table -t sametagname does not work on 
 multiple tabes of the same keyspace
 --

 Key: CASSANDRA-8056
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8056
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Cassandra 2.0.6 debian wheezy and squeeze
Reporter: Esha Pathak
Priority: Trivial
  Labels: lhf
 Fix For: 2.0.12

 Attachments: CASSANDRA-8056.txt


 scenario
 keyspace thing has tables : thing:user , thing:object, thing:user_details
 steps to reproduce :
 1. nodetool snapshot thing --column-family user --tag tagname
   Requested creating snapshot for: thing and table: user
   Snapshot directory: tagname
 2.nodetool snapshot thing --column-family object --tag tagname
 Requested creating snapshot for: thing and table: object
 Exception in thread main java.io.IOException: Snapshot tagname already 
 exists.
   at 
 org.apache.cassandra.service.StorageService.takeColumnFamilySnapshot(StorageService.java:2274)
   at sun.reflect.GeneratedMethodAccessor129.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
   at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
   at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
   at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
   at 
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
   at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
   at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
   at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
   at 
 com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
   at 
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487)
   at 
 javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
   at 
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
   at 
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
   at 
 javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848)
   at sun.reflect.GeneratedMethodAccessor39.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
   at sun.rmi.transport.Transport$1.run(Transport.java:177)
   at sun.rmi.transport.Transport$1.run(Transport.java:174)
   at java.security.AccessController.doPrivileged(Native Method)
   at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
   at 
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556)
   at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
   at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8377) Coordinated Commitlog Replay

2014-11-26 Thread Nick Bailey (JIRA)
Nick Bailey created CASSANDRA-8377:
--

 Summary: Coordinated Commitlog Replay
 Key: CASSANDRA-8377
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8377
 Project: Cassandra
  Issue Type: New Feature
Reporter: Nick Bailey
 Fix For: 3.0


Commit log archiving and replay can be used to support point in time restores 
on a cluster. Unfortunately, at the moment that is only true when the topology 
of the cluster is exactly the same as when the commitlogs were archived. This 
is because commitlogs need to be replayed on a node that is a replica for those 
writes.

To support replaying commitlogs when the topology has changed we should have a 
tool that replays the writes in a commitlog as if they were writes from a 
client and will get coordinated to the correct replicas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8338) Simplify Token Selection

2014-11-24 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223366#comment-14223366
 ] 

Nick Bailey commented on CASSANDRA-8338:


It might be worth putting this in a different file than cassandra.yaml. It's 
already confusing that some options in there (initial_token, num_tokens) only 
matter the very first time a node starts up. I'm not sure if we should be 
adding more. Also we should make sure we convey that this only helps when the 
entire cluster is being set up for the first time, not when adding nodes.

Lastly, this will need to incorporate rack information as well if we want it to 
work correctly when not everything is in the same rack.

 Simplify Token Selection
 

 Key: CASSANDRA-8338
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8338
 Project: Cassandra
  Issue Type: Improvement
  Components: Config
Reporter: Joaquin Casares
Assignee: Jeremiah Jordan
Priority: Trivial
  Labels: lhf

 When creating provisioning scripts, especially when running tools like Chef, 
 each node is launched individually. When not using vnodes your initial setup 
 will always be unbalanced unless you handle token assignment within your 
 scripts. 
 I spoke to someone recently who was using this in production and his 
 operations team wasn't too pleased that they had to use OpsCenter as an extra 
 step for rebalancing. Instead, we should provide this functionality out of 
 the box for new clusters.
 Instead, could we have the following options below the initial_token section?
 {CODE}
 # datacenter_index: 0
 # node_index: 0
 # datacenter_size: 1
 {CODE}
 The above configuration options, when uncommented, would do the math of:
 {CODE}
 token = node_index * (range / datacenter_size) + (datacenter_index * 100) 
 + start_of_range
 {CODE}
 This means that users don't have to repeatedly implement the initial_token 
 selection code nor know the range and offsets of their partitioner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8348) allow takeColumnFamilySnapshot to take a list of ColumnFamilies

2014-11-24 Thread Nick Bailey (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Bailey updated CASSANDRA-8348:
---
Fix Version/s: 2.1.3
   3.0

 allow takeColumnFamilySnapshot to take a list of ColumnFamilies
 ---

 Key: CASSANDRA-8348
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8348
 Project: Cassandra
  Issue Type: Improvement
Reporter: Peter Halliday
Priority: Minor
 Fix For: 3.0, 2.1.3


 Within StorageServiceMBean.java the function takeSnapshot allows for a list 
 of keyspaces to snapshot.  However, the function takeColumnFamilySnapshot 
 only allows for a single ColumnFamily to snapshot.  This should allow for 
 multiple ColumnFamilies within the same Keyspace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8348) allow takeColumnFamilySnapshot to take a list of ColumnFamilies

2014-11-24 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219742#comment-14219742
 ] 

Nick Bailey edited comment on CASSANDRA-8348 at 11/25/14 2:26 AM:
--

It may make sense to include a method that takes a list of ks.cf pairs to 
snapshot as well.


was (Author: nickmbailey):
It make make sense to include a method that takes a list of ks.cf pairs to 
snapshot as well.

 allow takeColumnFamilySnapshot to take a list of ColumnFamilies
 ---

 Key: CASSANDRA-8348
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8348
 Project: Cassandra
  Issue Type: Improvement
Reporter: Peter Halliday
Priority: Minor
 Fix For: 3.0, 2.1.3


 Within StorageServiceMBean.java the function takeSnapshot allows for a list 
 of keyspaces to snapshot.  However, the function takeColumnFamilySnapshot 
 only allows for a single ColumnFamily to snapshot.  This should allow for 
 multiple ColumnFamilies within the same Keyspace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8348) allow takeColumnFamilySnapshot to take a list of ColumnFamilies

2014-11-20 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219742#comment-14219742
 ] 

Nick Bailey commented on CASSANDRA-8348:


It make make sense to include a method that takes a list of ks.cf pairs to 
snapshot as well.

 allow takeColumnFamilySnapshot to take a list of ColumnFamilies
 ---

 Key: CASSANDRA-8348
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8348
 Project: Cassandra
  Issue Type: Improvement
Reporter: Peter Halliday
Priority: Minor

 Within StorageServiceMBean.java the function takeSnapshot allows for a list 
 of keyspaces to snapshot.  However, the function takeColumnFamilySnapshot 
 only allows for a single ColumnFamily to snapshot.  This should allow for 
 multiple ColumnFamilies within the same Keyspace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8327) snapshots taken before repair are not cleared if snapshot fails

2014-11-17 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214947#comment-14214947
 ] 

Nick Bailey commented on CASSANDRA-8327:


I think that ideally, c* would be using a specific directory for snapshots 
created by repair. In addition to cleaning up after the snapshot fails for some 
reason, c* can be restarted while a repair is ongoing and leave these 
directories behind. By using a specific directory, the c* process can also 
simply clean up this directory when the process starts.

 snapshots taken before repair are not cleared if snapshot fails
 ---

 Key: CASSANDRA-8327
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8327
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: cassandra 2.0.10.71
Reporter: MASSIMO CELLI
Priority: Minor

 running repair service the following directory was created for the snapshots:
 drwxr-xr-x 2 cassandra cassandra 36864 Nov 5 07:47 
 073d16e0-64c0-11e4-8e9a-7b3d4674c508 
 but the system.log reports the following error which suggests the snapshot 
 failed:
 ERROR [RMI TCP Connection(3251)-10.150.27.78] 2014-11-05 07:47:55,734 
 StorageService.java (line 2599) Repair session 
 073d16e0-64c0-11e4-8e9a-7b3d4674c508 for range 
 (7530018576963469312,7566047373982433280] failed with error 
 java.io.IOException: Failed during snapshot creation. 
 java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
 java.io.IOException: Failed during snapshot creation.  ERROR 
 [AntiEntropySessions:3312] 2014-11-05 07:47:55,731 RepairSession.java (line 
 288) [repair #073d16e0-64c0-11e4-8e9a-7b3d4674c508] session completed with 
 the following error java.io.IOException: Failed during snapshot creation.
 the problem is that the directory for the snapshots that fail are just left 
 on the disk and don't get cleaned up. They must be removed manually, which is 
 not ideal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8005) Server-side DESCRIBE

2014-09-30 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153369#comment-14153369
 ] 

Nick Bailey commented on CASSANDRA-8005:


bq. Support writing out schemas where it makes sense.

bq. I don't see how this is better done by this feature than by the existing 
KeyspaceMetadata.exportAsString method the java driver already provided.

So I think the point here is to have the cassandra server itself do this. We 
already have CASSANDRA-7190 for example. We could dump relevant info from the 
schema tables so that a client could consume that but I like the idea of a 
CREATE string a lot more.

 Server-side DESCRIBE
 

 Key: CASSANDRA-8005
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8005
 Project: Cassandra
  Issue Type: New Feature
  Components: API
Reporter: Tyler Hobbs
Priority: Minor
  Labels: cql3
 Fix For: 3.0


 The various {{DESCRIBE}} commands are currently implemented by cqlsh, and 
 nearly identical implementations exist in many drivers.  There are several 
 motivations for making {{DESCRIBE}} part of the CQL language:
 * Eliminate the (fairly complex) duplicate implementations across drivers and 
 cqlsh
 * Get closer to allowing drivers to not have to fetch the schema tables. 
 (Minor changes to prepared statements are also needed.)
 * Have instantaneous support for new schema features in cqlsh.  (You 
 currently have to update the bundled python driver.)
 * Support writing out schemas where it makes sense.  One good example of this 
 is backups.  You need to restore the schema before restoring data in the case 
 of total loss, so it makes sense to write out the schema alongside snapshots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8005) Server-side DESCRIBE

2014-09-30 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153847#comment-14153847
 ] 

Nick Bailey commented on CASSANDRA-8005:


bq. Why is every driver implementing describe in the first place?

I would guess because there is user demand for the feature, but I suppose it 
could be an unused feature in the drivers. I'm not sure what the dividing 
characteristic between cqlsh and a driver is that puts the responsibility into 
cqlsh land and not driver land.

I still say CASSANDRA-7190 presents a good reason for having this server side. 
I suppose you could dump the relevant rows from the schema tables along with 
the backup but then you are putting the recreation of the create statement 
burden on any C* backup tools (OpsCenter or otherwise).

 Server-side DESCRIBE
 

 Key: CASSANDRA-8005
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8005
 Project: Cassandra
  Issue Type: New Feature
  Components: API
Reporter: Tyler Hobbs
Priority: Minor
  Labels: cql3
 Fix For: 3.0


 The various {{DESCRIBE}} commands are currently implemented by cqlsh, and 
 nearly identical implementations exist in many drivers.  There are several 
 motivations for making {{DESCRIBE}} part of the CQL language:
 * Eliminate the (fairly complex) duplicate implementations across drivers and 
 cqlsh
 * Get closer to allowing drivers to not have to fetch the schema tables. 
 (Minor changes to prepared statements are also needed.)
 * Have instantaneous support for new schema features in cqlsh.  (You 
 currently have to update the bundled python driver.)
 * Support writing out schemas where it makes sense.  One good example of this 
 is backups.  You need to restore the schema before restoring data in the case 
 of total loss, so it makes sense to write out the schema alongside snapshots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8005) Server-side DESCRIBE

2014-09-30 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14154208#comment-14154208
 ] 

Nick Bailey commented on CASSANDRA-8005:


I agree that backup tools do need to be a bit more intelligent. They do need to 
attempt to detect conflicts in the existing schema and the schema of the data 
in the backup. I'm not convinced that means they have to know how to inspect 
schema tables though. Especially since we are already saying we want to 
reformat those tables. At least with a create statement, a backup tool will 
have a common format to parse and compare across versions of Cassandra. We are 
already talking about reformatting the schema tables in CASSANDRA-6717, so we 
already know that won't be true for the schema tables. I doubt we want to 
commit to trying to guarantee the format of the schema tables stays somewhat 
consistent.

What are the downsides for server side DESC support?

 Server-side DESCRIBE
 

 Key: CASSANDRA-8005
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8005
 Project: Cassandra
  Issue Type: New Feature
  Components: API
Reporter: Tyler Hobbs
Priority: Minor
  Labels: cql3
 Fix For: 3.0


 The various {{DESCRIBE}} commands are currently implemented by cqlsh, and 
 nearly identical implementations exist in many drivers.  There are several 
 motivations for making {{DESCRIBE}} part of the CQL language:
 * Eliminate the (fairly complex) duplicate implementations across drivers and 
 cqlsh
 * Get closer to allowing drivers to not have to fetch the schema tables. 
 (Minor changes to prepared statements are also needed.)
 * Have instantaneous support for new schema features in cqlsh.  (You 
 currently have to update the bundled python driver.)
 * Support writing out schemas where it makes sense.  One good example of this 
 is backups.  You need to restore the schema before restoring data in the case 
 of total loss, so it makes sense to write out the schema alongside snapshots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5657) remove deprecated metrics

2014-09-26 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14149588#comment-14149588
 ] 

Nick Bailey commented on CASSANDRA-5657:


Yeah if the metric is just a simple counter then a recent* method exposing it 
isn't really necessary. You can just track the difference between calls. The 
more complicated metrics like latency (total recent latency / total recent 
requests) are better exposed by the 1/5/15 minute rate metrics you mentioned.

That said, last I checked there were a few recent* metrics that didn't have a 
1/5/15 minute rate equivalent. For example Key/RowCacheRecentHitRate.

 remove deprecated metrics
 -

 Key: CASSANDRA-5657
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5657
 Project: Cassandra
  Issue Type: Task
  Components: Tools
Reporter: Jonathan Ellis
Assignee: T Jake Luciani
  Labels: technical_debt
 Fix For: 3.0

 Attachments: 5657.txt






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-7995) sstablerepairedset should take more that one sstable as an argument

2014-09-23 Thread Nick Bailey (JIRA)
Nick Bailey created CASSANDRA-7995:
--

 Summary: sstablerepairedset should take more that one sstable as 
an argument
 Key: CASSANDRA-7995
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7995
 Project: Cassandra
  Issue Type: Bug
Reporter: Nick Bailey
 Fix For: 2.1.1


Given that a c* node can a number of sstables in the 10s (100s?) of thousands 
of sstables on it, sstablerepairedset should be taking a list of sstables to 
mark as repaired rather than a single sstable.

Running any command 10s of thousands of times isn't really good let alone one 
that spins up a jvm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7995) sstablerepairedset should take more that one sstable as an argument

2014-09-23 Thread Nick Bailey (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Bailey updated CASSANDRA-7995:
---
Description: 
Given that a c* node can have a number of sstables in the 10s (100s?) of 
thousands of sstables on it, sstablerepairedset should be taking a list of 
sstables to mark as repaired rather than a single sstable.

Running any command 10s of thousands of times isn't really good let alone one 
that spins up a jvm.

  was:
Given that a c* node can a number of sstables in the 10s (100s?) of thousands 
of sstables on it, sstablerepairedset should be taking a list of sstables to 
mark as repaired rather than a single sstable.

Running any command 10s of thousands of times isn't really good let alone one 
that spins up a jvm.


 sstablerepairedset should take more that one sstable as an argument
 ---

 Key: CASSANDRA-7995
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7995
 Project: Cassandra
  Issue Type: Bug
Reporter: Nick Bailey
  Labels: lhf
 Fix For: 2.1.1


 Given that a c* node can have a number of sstables in the 10s (100s?) of 
 thousands of sstables on it, sstablerepairedset should be taking a list of 
 sstables to mark as repaired rather than a single sstable.
 Running any command 10s of thousands of times isn't really good let alone one 
 that spins up a jvm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7952) DataStax Agent Null Pointer Exception

2014-09-18 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139118#comment-14139118
 ] 

Nick Bailey commented on CASSANDRA-7952:


[~harisekhon] the best place to report bugs like that is the feedback form in 
OpsCenter itself. Can you email me some additional info to nick @ datastax? 
Specifically what operating system and version you are on and which jvm you are 
using? I can file an internal Jira from there.

 DataStax Agent Null Pointer Exception
 -

 Key: CASSANDRA-7952
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7952
 Project: Cassandra
  Issue Type: Bug
 Environment: DSE 4.5.1, DataStax OpsCenter  Agent 5.0.0
Reporter: Hari Sekhon
Priority: Minor

 I've got a Null Pointer Exception in my DataStax OpsCenter Agent log, and 
 it's not reporting in to the OpsCenter. Here is the log
 {code}
  INFO [StompConnection receiver] 2014-09-17 13:01:15,992 New JMX connection 
 (127.0.0.1:7199)
  INFO [Jetty] 2014-09-17 13:01:16,019 Jetty server started
  INFO [Initialization] 2014-09-17 13:01:16,031 Using x.x.x.x as the cassandra 
 broadcast address
  INFO [StompConnection receiver] 2014-09-17 13:01:16,032 Starting up agent 
 collection.
  INFO [Initialization] 2014-09-17 13:01:16,162 agent RPC address is  x.x.x.x
  INFO [StompConnection receiver] 2014-09-17 13:01:16,162 agent RPC address is 
  x.x.x.x
  INFO [Initialization] 2014-09-17 13:01:16,162 agent RPC broadcast address is 
  x.x.x.x
  INFO [StompConnection receiver] 2014-09-17 13:01:16,162 agent RPC broadcast 
 address is  x.x.x.x
  INFO [StompConnection receiver] 2014-09-17 13:01:16,163 Starting OS metric 
 collectors (Linux)
  INFO [Initialization] 2014-09-17 13:01:16,166 Clearing ssl.truststore
  INFO [Initialization] 2014-09-17 13:01:16,166 Clearing 
 ssl.truststore.password
  INFO [Initialization] 2014-09-17 13:01:16,167 Setting ssl.store.type to JKS
  INFO [Initialization] 2014-09-17 13:01:16,167 Clearing 
 kerberos.service.principal.name
  INFO [Initialization] 2014-09-17 13:01:16,167 Clearing kerberos.principal
  INFO [Initialization] 2014-09-17 13:01:16,167 Setting 
 kerberos.useTicketCache to true
  INFO [Initialization] 2014-09-17 13:01:16,167 Clearing kerberos.ticketCache
  INFO [Initialization] 2014-09-17 13:01:16,168 Setting kerberos.useKeyTab to 
 true
  INFO [Initialization] 2014-09-17 13:01:16,168 Clearing kerberos.keyTab
  INFO [Initialization] 2014-09-17 13:01:16,168 Setting kerberos.renewTGT to 
 true
  INFO [Initialization] 2014-09-17 13:01:16,168 Setting kerberos.debug to false
  INFO [StompConnection receiver] 2014-09-17 13:01:16,171 Starting Cassandra 
 JMX metric collectors
  INFO [thrift-init] 2014-09-17 13:01:16,171 Connecting to Cassandra cluster: 
 x.x.x.x (port 9160)
  INFO [StompConnection receiver] 2014-09-17 13:01:16,187 New JMX connection 
 (127.0.0.1:7199)
  INFO [thrift-init] 2014-09-17 13:01:16,189 Downed Host Retry service started 
 with queue size -1 and retry delay 10s
  INFO [thrift-init] 2014-09-17 13:01:16,192 Registering JMX 
 me.prettyprint.cassandra.service_Agent 
 Cluster:ServiceType=hector,MonitorType=hector
  INFO [pdp-loader] 2014-09-17 13:01:16,231 in execute with client 
 org.apache.cassandra.thrift.Cassandra$Client@7a22c094
  INFO [pdp-loader] 2014-09-17 13:01:16,237 Attempting to load stored metric 
 values.
  INFO [thrift-init] 2014-09-17 13:01:16,240 Connected to Cassandra cluster: 
 PoC
  INFO [thrift-init] 2014-09-17 13:01:16,240 in execute with client 
 org.apache.cassandra.thrift.Cassandra$Client@7a22c094
  INFO [thrift-init] 2014-09-17 13:01:16,240 Using partitioner: 
 org.apache.cassandra.dht.Murmur3Partitioner
  INFO [jmx-metrics-1] 2014-09-17 13:01:21,181 New JMX connection 
 (127.0.0.1:7199)
 ERROR [StompConnection receiver] 2014-09-17 13:01:24,376 Failed to collect 
 machine info
 java.lang.NullPointerException
 at clojure.lang.Numbers.ops(Numbers.java:942)
 at clojure.lang.Numbers.divide(Numbers.java:157)
 at 
 opsagent.nodedetails.machine_info$get_machine_info.invoke(machine_info.clj:76)
 at 
 opsagent.nodedetails$get_static_properties$fn__4313.invoke(nodedetails.clj:161)
 at 
 opsagent.nodedetails$get_static_properties.invoke(nodedetails.clj:160)
 at 
 opsagent.nodedetails$get_longtime_values$fn__4426.invoke(nodedetails.clj:227)
 at 
 opsagent.nodedetails$get_longtime_values.invoke(nodedetails.clj:226)
 at 
 opsagent.nodedetails$send_all_nodedetails$fn__.invoke(nodedetails.clj:245)
 at opsagent.jmx$jmx_wrap.doInvoke(jmx.clj:111)
 at clojure.lang.RestFn.invoke(RestFn.java:410)
 at 
 opsagent.nodedetails$send_all_nodedetails.invoke(nodedetails.clj:241)
 at opsagent.opsagent$post_interface_startup.doInvoke(opsagent.clj:125)
 at 

[jira] [Created] (CASSANDRA-7779) Add option to sstableloader to only stream to the local dc

2014-08-15 Thread Nick Bailey (JIRA)
Nick Bailey created CASSANDRA-7779:
--

 Summary: Add option to sstableloader to only stream to the local dc
 Key: CASSANDRA-7779
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7779
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Nick Bailey
 Fix For: 1.2.19, 2.0.10, 2.1.1


This is meant to be a potential workaround for CASSANDRA-4756. Due to that 
ticket, trying to load a cluster wide snapshot via sstableloader will 
potentially stream an enormous amount of data. In a 3 datacenter cluster with 
rf=3 in each datacenter, 81 copies of the data would be streamed. Once we have 
per range sstables we can optimize sstableloader to merge data and only stream 
one copy, but until then we need a workaround. By only streaming to the local 
datacenter we can load the data locally in each datacenter and only have 9 
copies of the data rather than 81.

This could potentially be achieved by the option to ignore certain nodes that 
already exists in sstableloader, but in the case of vnodes and topology changes 
in the cluster, this could require specifying every node in the cluster as 
'ignored' on the command line which could be problematic. This is just a 
shortcut to avoid that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-7555) Support copy and link for commitlog archiving without forking the jvm

2014-07-16 Thread Nick Bailey (JIRA)
Nick Bailey created CASSANDRA-7555:
--

 Summary: Support copy and link for commitlog archiving without 
forking the jvm
 Key: CASSANDRA-7555
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7555
 Project: Cassandra
  Issue Type: Bug
Reporter: Nick Bailey
 Fix For: 2.1.1


Right now for commitlog archiving the user specifies a command to run and c* 
forks the jvm to run that command. The most common operations will be either 
copy or link (hard or soft). Since we can do all of these operations without 
forking the jvm, which is very expensive, we should have special cases for 
those.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7232) Enable live replay of commit logs

2014-07-16 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063717#comment-14063717
 ] 

Nick Bailey commented on CASSANDRA-7232:


My initial thought is that specifying a more dynamic location will be more 
tools friendly. A location specified on startup of the jvm will probably 
created/owned by the c* process which presents potential problems for external 
tools. We already see this with snapshots. Also requiring a restart of the c* 
process to change that location seems pretty heavyweight.

Also, should the command accept a 'start time' as well? 

 Enable live replay of commit logs
 -

 Key: CASSANDRA-7232
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7232
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Patrick McFadin
Assignee: Lyuben Todorov
Priority: Minor
 Fix For: 2.0.10

 Attachments: 
 0001-Expose-CommitLog-recover-to-JMX-add-nodetool-cmd-for.patch, 
 0001-TRUNK-JMX-and-nodetool-cmd-for-commitlog-replay.patch


 Replaying commit logs takes a restart but restoring sstables can be an online 
 operation with refresh. In order to restore a point-in-time without a 
 restart, the node needs to live replay the commit logs from JMX and a 
 nodetool command.
 nodetool refreshcommitlogs keyspace table



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6230) Write hints to a file instead of a table

2014-07-11 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058932#comment-14058932
 ] 

Nick Bailey commented on CASSANDRA-6230:


It would be great if we could expose a metric indicating when hints are 
'expired'. Potentially even which nodes hints were expired for.

 Write hints to a file instead of a table
 

 Key: CASSANDRA-6230
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6230
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Jonathan Ellis
Assignee: Aleksey Yeschenko
Priority: Minor
 Fix For: 3.0


 Writing to a file would have less overhead on both hint creation and replay.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-7478) StorageService.getJoiningNodes returns duplicate ips

2014-06-30 Thread Nick Bailey (JIRA)
Nick Bailey created CASSANDRA-7478:
--

 Summary: StorageService.getJoiningNodes returns duplicate ips
 Key: CASSANDRA-7478
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7478
 Project: Cassandra
  Issue Type: Bug
Reporter: Nick Bailey
 Fix For: 1.2.18, 2.0.10, 2.1.0


If a node is bootstrapping with vnodes enabled, getJoiningNodes will return the 
same ip N times where N is the number of vnodes. Looks like we just need to 
convert the list to a set before we stringify it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7317) Repair range validation and calculation is off

2014-06-04 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017659#comment-14017659
 ] 

Nick Bailey commented on CASSANDRA-7317:


Well we at least have some sort of special logic for the case where a keyspace 
doesn't exist in both datacenters, since -pr in that case repaired 2 ranges. I 
do think the right solution is to make TokenMetadata KS/DC aware at least for 
these cases.

 Repair range validation and calculation is off
 --

 Key: CASSANDRA-7317
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7317
 Project: Cassandra
  Issue Type: Bug
Reporter: Nick Bailey
Assignee: Yuki Morishita
 Fix For: 2.0.9

 Attachments: Untitled Diagram(1).png


 From what I can tell the calculation (using the -pr option) and validation of 
 tokens for repairing ranges is broken. Or at least should be improved. Using 
 an example with ccm:
 Nodetool ring:
 {noformat}
 Datacenter: dc1
 ==
 AddressRackStatus State   LoadOwns
 Token
   -10
 127.0.0.1  r1  Up Normal  188.96 KB   50.00%  
 -9223372036854775808
 127.0.0.2  r1  Up Normal  194.77 KB   50.00%  -10
 Datacenter: dc2
 ==
 AddressRackStatus State   LoadOwns
 Token
   0
 127.0.0.4  r1  Up Normal  160.58 KB   0.00%   
 -9223372036854775798
 127.0.0.3  r1  Up Normal  139.46 KB   0.00%   0
 {noformat}
 Schema:
 {noformat}
 CREATE KEYSPACE system_traces WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'dc2': '2',
   'dc1': '2'
 };
 {noformat}
 Repair -pr:
 {noformat}
 [Nicks-MacBook-Pro:21:35:58 cassandra-2.0] cassandra$ bin/nodetool -p 7100 
 repair -pr system_traces
 [2014-05-28 21:36:01,977] Starting repair command #12, repairing 1 ranges for 
 keyspace system_traces
 [2014-05-28 21:36:02,207] Repair session f984d290-e6d9-11e3-9edc-5f8011daec21 
 for range (0,-9223372036854775808] finished
 [2014-05-28 21:36:02,207] Repair command #12 finished
 [Nicks-MacBook-Pro:21:36:02 cassandra-2.0] cassandra$ bin/nodetool -p 7200 
 repair -pr system_traces
 [2014-05-28 21:36:14,086] Starting repair command #1, repairing 1 ranges for 
 keyspace system_traces
 [2014-05-28 21:36:14,406] Repair session 00bd45b0-e6da-11e3-98fc-5f8011daec21 
 for range (-9223372036854775798,-10] finished
 [2014-05-28 21:36:14,406] Repair command #1 finished
 {noformat}
 Note that repairing both nodes in dc1, leaves very small ranges unrepaired. 
 For example (-10,0]. Repairing the 'primary range' in dc2 will repair those 
 small ranges. Maybe that is the behavior we want but it seems 
 counterintuitive.
 The behavior when manually trying to repair the full range of 127.0.0.01 
 definitely needs improvement though.
 Repair command:
 {noformat}
 [Nicks-MacBook-Pro:21:50:44 cassandra-2.0] cassandra$ bin/nodetool -p 7100 
 repair -st -10 -et -9223372036854775808 system_traces
 [2014-05-28 21:50:55,803] Starting repair command #17, repairing 1 ranges for 
 keyspace system_traces
 [2014-05-28 21:50:55,804] Starting repair command #17, repairing 1 ranges for 
 keyspace system_traces
 [2014-05-28 21:50:55,804] Repair command #17 finished
 [Nicks-MacBook-Pro:21:50:56 cassandra-2.0] cassandra$ echo $?
 1
 {noformat}
 system.log:
 {noformat}
 ERROR [Thread-96] 2014-05-28 21:40:05,921 StorageService.java (line 2621) 
 Repair session failed:
 java.lang.IllegalArgumentException: Requested range intersects a local range 
 but is not fully contained in one; this would lead to imprecise repair
 {noformat}
 * The actual output of the repair command doesn't really indicate that there 
 was an issue. Although the command does return with a non zero exit status.
 * The error here is invisible if you are using the synchronous jmx repair 
 api. It will appear as though the repair completed successfully.
 * Personally, I believe that should be a valid repair command. For the 
 system_traces keyspace, 127.0.0.1 is responsible for this range (and I would 
 argue the 'primary range' of the node).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7317) Repair range validation and calculation is off

2014-06-04 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018294#comment-14018294
 ] 

Nick Bailey commented on CASSANDRA-7317:


If -pr is meant to be used that way, I don't think it is communicated very 
well. If that is the case we should be clearer in the documentation and I think 
we would still need to fix combining -pr and -local. Repairing just the local 
dc and using the -pr flag should be a supported use case. Personally it seems 
to me like the best way to fix that is to make the definition of -pr, dc aware, 
but perhaps not.

My bigger issue is the validation of the range when specifying a range, so I 
agree that should be fixed. 

 Repair range validation and calculation is off
 --

 Key: CASSANDRA-7317
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7317
 Project: Cassandra
  Issue Type: Bug
Reporter: Nick Bailey
Assignee: Yuki Morishita
 Fix For: 2.0.9

 Attachments: Untitled Diagram(1).png


 From what I can tell the calculation (using the -pr option) and validation of 
 tokens for repairing ranges is broken. Or at least should be improved. Using 
 an example with ccm:
 Nodetool ring:
 {noformat}
 Datacenter: dc1
 ==
 AddressRackStatus State   LoadOwns
 Token
   -10
 127.0.0.1  r1  Up Normal  188.96 KB   50.00%  
 -9223372036854775808
 127.0.0.2  r1  Up Normal  194.77 KB   50.00%  -10
 Datacenter: dc2
 ==
 AddressRackStatus State   LoadOwns
 Token
   0
 127.0.0.4  r1  Up Normal  160.58 KB   0.00%   
 -9223372036854775798
 127.0.0.3  r1  Up Normal  139.46 KB   0.00%   0
 {noformat}
 Schema:
 {noformat}
 CREATE KEYSPACE system_traces WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'dc2': '2',
   'dc1': '2'
 };
 {noformat}
 Repair -pr:
 {noformat}
 [Nicks-MacBook-Pro:21:35:58 cassandra-2.0] cassandra$ bin/nodetool -p 7100 
 repair -pr system_traces
 [2014-05-28 21:36:01,977] Starting repair command #12, repairing 1 ranges for 
 keyspace system_traces
 [2014-05-28 21:36:02,207] Repair session f984d290-e6d9-11e3-9edc-5f8011daec21 
 for range (0,-9223372036854775808] finished
 [2014-05-28 21:36:02,207] Repair command #12 finished
 [Nicks-MacBook-Pro:21:36:02 cassandra-2.0] cassandra$ bin/nodetool -p 7200 
 repair -pr system_traces
 [2014-05-28 21:36:14,086] Starting repair command #1, repairing 1 ranges for 
 keyspace system_traces
 [2014-05-28 21:36:14,406] Repair session 00bd45b0-e6da-11e3-98fc-5f8011daec21 
 for range (-9223372036854775798,-10] finished
 [2014-05-28 21:36:14,406] Repair command #1 finished
 {noformat}
 Note that repairing both nodes in dc1, leaves very small ranges unrepaired. 
 For example (-10,0]. Repairing the 'primary range' in dc2 will repair those 
 small ranges. Maybe that is the behavior we want but it seems 
 counterintuitive.
 The behavior when manually trying to repair the full range of 127.0.0.01 
 definitely needs improvement though.
 Repair command:
 {noformat}
 [Nicks-MacBook-Pro:21:50:44 cassandra-2.0] cassandra$ bin/nodetool -p 7100 
 repair -st -10 -et -9223372036854775808 system_traces
 [2014-05-28 21:50:55,803] Starting repair command #17, repairing 1 ranges for 
 keyspace system_traces
 [2014-05-28 21:50:55,804] Starting repair command #17, repairing 1 ranges for 
 keyspace system_traces
 [2014-05-28 21:50:55,804] Repair command #17 finished
 [Nicks-MacBook-Pro:21:50:56 cassandra-2.0] cassandra$ echo $?
 1
 {noformat}
 system.log:
 {noformat}
 ERROR [Thread-96] 2014-05-28 21:40:05,921 StorageService.java (line 2621) 
 Repair session failed:
 java.lang.IllegalArgumentException: Requested range intersects a local range 
 but is not fully contained in one; this would lead to imprecise repair
 {noformat}
 * The actual output of the repair command doesn't really indicate that there 
 was an issue. Although the command does return with a non zero exit status.
 * The error here is invisible if you are using the synchronous jmx repair 
 api. It will appear as though the repair completed successfully.
 * Personally, I believe that should be a valid repair command. For the 
 system_traces keyspace, 127.0.0.1 is responsible for this range (and I would 
 argue the 'primary range' of the node).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7317) Repair range validation and calculation is off

2014-06-03 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016258#comment-14016258
 ] 

Nick Bailey commented on CASSANDRA-7317:


Another thing that was brought up on the mailing list (at least incidentally), 
is that the current implementation basically makes the '-local' flag and the 
'-pr' flag incompatible.

 Repair range validation and calculation is off
 --

 Key: CASSANDRA-7317
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7317
 Project: Cassandra
  Issue Type: Bug
Reporter: Nick Bailey
 Fix For: 2.0.9

 Attachments: Untitled Diagram(1).png


 From what I can tell the calculation (using the -pr option) and validation of 
 tokens for repairing ranges is broken. Or at least should be improved. Using 
 an example with ccm:
 Nodetool ring:
 {noformat}
 Datacenter: dc1
 ==
 AddressRackStatus State   LoadOwns
 Token
   -10
 127.0.0.1  r1  Up Normal  188.96 KB   50.00%  
 -9223372036854775808
 127.0.0.2  r1  Up Normal  194.77 KB   50.00%  -10
 Datacenter: dc2
 ==
 AddressRackStatus State   LoadOwns
 Token
   0
 127.0.0.4  r1  Up Normal  160.58 KB   0.00%   
 -9223372036854775798
 127.0.0.3  r1  Up Normal  139.46 KB   0.00%   0
 {noformat}
 Schema:
 {noformat}
 CREATE KEYSPACE system_traces WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'dc2': '2',
   'dc1': '2'
 };
 {noformat}
 Repair -pr:
 {noformat}
 [Nicks-MacBook-Pro:21:35:58 cassandra-2.0] cassandra$ bin/nodetool -p 7100 
 repair -pr system_traces
 [2014-05-28 21:36:01,977] Starting repair command #12, repairing 1 ranges for 
 keyspace system_traces
 [2014-05-28 21:36:02,207] Repair session f984d290-e6d9-11e3-9edc-5f8011daec21 
 for range (0,-9223372036854775808] finished
 [2014-05-28 21:36:02,207] Repair command #12 finished
 [Nicks-MacBook-Pro:21:36:02 cassandra-2.0] cassandra$ bin/nodetool -p 7200 
 repair -pr system_traces
 [2014-05-28 21:36:14,086] Starting repair command #1, repairing 1 ranges for 
 keyspace system_traces
 [2014-05-28 21:36:14,406] Repair session 00bd45b0-e6da-11e3-98fc-5f8011daec21 
 for range (-9223372036854775798,-10] finished
 [2014-05-28 21:36:14,406] Repair command #1 finished
 {noformat}
 Note that repairing both nodes in dc1, leaves very small ranges unrepaired. 
 For example (-10,0]. Repairing the 'primary range' in dc2 will repair those 
 small ranges. Maybe that is the behavior we want but it seems 
 counterintuitive.
 The behavior when manually trying to repair the full range of 127.0.0.01 
 definitely needs improvement though.
 Repair command:
 {noformat}
 [Nicks-MacBook-Pro:21:50:44 cassandra-2.0] cassandra$ bin/nodetool -p 7100 
 repair -st -10 -et -9223372036854775808 system_traces
 [2014-05-28 21:50:55,803] Starting repair command #17, repairing 1 ranges for 
 keyspace system_traces
 [2014-05-28 21:50:55,804] Starting repair command #17, repairing 1 ranges for 
 keyspace system_traces
 [2014-05-28 21:50:55,804] Repair command #17 finished
 [Nicks-MacBook-Pro:21:50:56 cassandra-2.0] cassandra$ echo $?
 1
 {noformat}
 system.log:
 {noformat}
 ERROR [Thread-96] 2014-05-28 21:40:05,921 StorageService.java (line 2621) 
 Repair session failed:
 java.lang.IllegalArgumentException: Requested range intersects a local range 
 but is not fully contained in one; this would lead to imprecise repair
 {noformat}
 * The actual output of the repair command doesn't really indicate that there 
 was an issue. Although the command does return with a non zero exit status.
 * The error here is invisible if you are using the synchronous jmx repair 
 api. It will appear as though the repair completed successfully.
 * Personally, I believe that should be a valid repair command. For the 
 system_traces keyspace, 127.0.0.1 is responsible for this range (and I would 
 argue the 'primary range' of the node).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7336) RepairTask doesn't send a correct message in a JMX notifcation in case of IllegalArgumentException

2014-06-03 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016995#comment-14016995
 ] 

Nick Bailey commented on CASSANDRA-7336:


Is there a way to get this to propagate to the synchronous jmx api as well?

 RepairTask doesn't send a correct message in a JMX notifcation in case of 
 IllegalArgumentException
 --

 Key: CASSANDRA-7336
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7336
 Project: Cassandra
  Issue Type: Bug
Reporter: Mikhail Stepura
Assignee: Mikhail Stepura
Priority: Minor
 Fix For: 1.2.17, 2.0.9

 Attachments: CASSANDRA-1.2-7336.patch


 From CASSANDRA-7317:
 The behavior when manually trying to repair the full range of 127.0.0.01 
 definitely needs improvement though.
 Repair command:
 {noformat}
 [Nicks-MacBook-Pro:21:50:44 cassandra-2.0] cassandra$ bin/nodetool -p 7100 
 repair -st -10 -et -9223372036854775808 system_traces
 [2014-05-28 21:50:55,803] Starting repair command #17, repairing 1 ranges for 
 keyspace system_traces
 [2014-05-28 21:50:55,804] Starting repair command #17, repairing 1 ranges for 
 keyspace system_traces
 [2014-05-28 21:50:55,804] Repair command #17 finished
 [Nicks-MacBook-Pro:21:50:56 cassandra-2.0] cassandra$ echo $?
 1
 {noformat}
 system.log:
 {noformat}
 ERROR [Thread-96] 2014-05-28 21:40:05,921 StorageService.java (line 2621) 
 Repair session failed:
 java.lang.IllegalArgumentException: Requested range intersects a local range 
 but is not fully contained in one; this would lead to imprecise repair
 {noformat}
 * The actual output of the repair command doesn't really indicate that there 
 was an issue. Although the command does return with a non zero exit status.
 * The error here is invisible if you are using the synchronous jmx repair 
 api. It will appear as though the repair completed successfully.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7317) Repair range validation and calculation is off

2014-05-29 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012961#comment-14012961
 ] 

Nick Bailey commented on CASSANDRA-7317:


Well there are only two nodes, so this is the wraparound case. If we were 
repairing node 2 it would be (-9223372036854775808, -10].

 Repair range validation and calculation is off
 --

 Key: CASSANDRA-7317
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7317
 Project: Cassandra
  Issue Type: Bug
Reporter: Nick Bailey
 Fix For: 2.0.9


 From what I can tell the calculation (using the -pr option) and validation of 
 tokens for repairing ranges is broken. Or at least should be improved. Using 
 an example with ccm:
 Nodetool ring:
 {noformat}
 Datacenter: dc1
 ==
 AddressRackStatus State   LoadOwns
 Token
   -10
 127.0.0.1  r1  Up Normal  188.96 KB   50.00%  
 -9223372036854775808
 127.0.0.2  r1  Up Normal  194.77 KB   50.00%  -10
 Datacenter: dc2
 ==
 AddressRackStatus State   LoadOwns
 Token
   0
 127.0.0.4  r1  Up Normal  160.58 KB   0.00%   
 -9223372036854775798
 127.0.0.3  r1  Up Normal  139.46 KB   0.00%   0
 {noformat}
 Schema:
 {noformat}
 CREATE KEYSPACE system_traces WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'dc2': '2',
   'dc1': '2'
 };
 {noformat}
 Repair -pr:
 {noformat}
 [Nicks-MacBook-Pro:21:35:58 cassandra-2.0] cassandra$ bin/nodetool -p 7100 
 repair -pr system_traces
 [2014-05-28 21:36:01,977] Starting repair command #12, repairing 1 ranges for 
 keyspace system_traces
 [2014-05-28 21:36:02,207] Repair session f984d290-e6d9-11e3-9edc-5f8011daec21 
 for range (0,-9223372036854775808] finished
 [2014-05-28 21:36:02,207] Repair command #12 finished
 [Nicks-MacBook-Pro:21:36:02 cassandra-2.0] cassandra$ bin/nodetool -p 7200 
 repair -pr system_traces
 [2014-05-28 21:36:14,086] Starting repair command #1, repairing 1 ranges for 
 keyspace system_traces
 [2014-05-28 21:36:14,406] Repair session 00bd45b0-e6da-11e3-98fc-5f8011daec21 
 for range (-9223372036854775798,-10] finished
 [2014-05-28 21:36:14,406] Repair command #1 finished
 {noformat}
 Note that repairing both nodes in dc1, leaves very small ranges unrepaired. 
 For example (-10,0]. Repairing the 'primary range' in dc2 will repair those 
 small ranges. Maybe that is the behavior we want but it seems 
 counterintuitive.
 The behavior when manually trying to repair the full range of 127.0.0.01 
 definitely needs improvement though.
 Repair command:
 {noformat}
 [Nicks-MacBook-Pro:21:50:44 cassandra-2.0] cassandra$ bin/nodetool -p 7100 
 repair -st -10 -et -9223372036854775808 system_traces
 [2014-05-28 21:50:55,803] Starting repair command #17, repairing 1 ranges for 
 keyspace system_traces
 [2014-05-28 21:50:55,804] Starting repair command #17, repairing 1 ranges for 
 keyspace system_traces
 [2014-05-28 21:50:55,804] Repair command #17 finished
 [Nicks-MacBook-Pro:21:50:56 cassandra-2.0] cassandra$ echo $?
 1
 {noformat}
 system.log:
 {noformat}
 ERROR [Thread-96] 2014-05-28 21:40:05,921 StorageService.java (line 2621) 
 Repair session failed:
 java.lang.IllegalArgumentException: Requested range intersects a local range 
 but is not fully contained in one; this would lead to imprecise repair
 {noformat}
 * The actual output of the repair command doesn't really indicate that there 
 was an issue. Although the command does return with a non zero exit status.
 * The error here is invisible if you are using the synchronous jmx repair 
 api. It will appear as though the repair completed successfully.
 * Personally, I believe that should be a valid repair command. For the 
 system_traces keyspace, 127.0.0.1 is responsible for this range (and I would 
 argue the 'primary range' of the node).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7317) Repair range validation and calculation is off

2014-05-29 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013260#comment-14013260
 ] 

Nick Bailey commented on CASSANDRA-7317:


Perhaps it is a difference in mental models, but let me try to explain my 
reasoning. In general, I think datacenter's should be considered as separate 
rings. Replication is configured separately for each datacenter, tokens are 
balanced separately in each datacenter (for non vnodes), and consistency levels 
can be specified with specific datacenter requirements.

To somewhat further illustrate my point, Cassandra agrees with me when you 
modify the schema of the system_traces keyspace to only exist in dc 1:

{noformat}
[Nicks-MacBook-Pro:22:26:41 cassandra-2.0] cassandra$ bin/nodetool -p 7100 
repair -pr system_traces
[2014-05-29 22:26:46,958] Starting repair command #3, repairing 2 ranges for 
keyspace system_traces
[2014-05-29 22:26:47,148] Repair session 3ae1cce0-e7aa-11e3-aaee-5f8011daec21 
for range (0,-9223372036854775808] finished
[2014-05-29 22:26:47,149] Repair session 3afc80d0-e7aa-11e3-aaee-5f8011daec21 
for range (-10,0] finished
[2014-05-29 22:26:47,149] Repair command #3 finished
[Nicks-MacBook-Pro:22:26:47 cassandra-2.0] cassandra$ bin/nodetool -p 7300 
repair -pr system_traces
[2014-05-29 22:26:54,907] Nothing to repair for keyspace 'system_traces'
[Nicks-MacBook-Pro:22:34:55 cassandra-2.0] cassandra$ bin/nodetool -p 7100 
repair -st -9223372036854775808 -et 10 system_traces
[2014-05-29 22:35:02,604] Starting repair command #6, repairing 1 ranges for 
keyspace system_traces
[2014-05-29 22:35:02,604] Starting repair command #6, repairing 1 ranges for 
keyspace system_traces
[2014-05-29 22:35:02,604] Repair command #6 finished
{noformat}

Repairing the 'primary range' of node1 actually repairs two ranges (although 
those two ranges are really just one). Repairing the primary range of node3 
does nothing. And asking C* to repair the entire range that it just repaired as 
two separate ranges still fails.

 Repair range validation and calculation is off
 --

 Key: CASSANDRA-7317
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7317
 Project: Cassandra
  Issue Type: Bug
Reporter: Nick Bailey
 Fix For: 2.0.9

 Attachments: Untitled Diagram(1).png


 From what I can tell the calculation (using the -pr option) and validation of 
 tokens for repairing ranges is broken. Or at least should be improved. Using 
 an example with ccm:
 Nodetool ring:
 {noformat}
 Datacenter: dc1
 ==
 AddressRackStatus State   LoadOwns
 Token
   -10
 127.0.0.1  r1  Up Normal  188.96 KB   50.00%  
 -9223372036854775808
 127.0.0.2  r1  Up Normal  194.77 KB   50.00%  -10
 Datacenter: dc2
 ==
 AddressRackStatus State   LoadOwns
 Token
   0
 127.0.0.4  r1  Up Normal  160.58 KB   0.00%   
 -9223372036854775798
 127.0.0.3  r1  Up Normal  139.46 KB   0.00%   0
 {noformat}
 Schema:
 {noformat}
 CREATE KEYSPACE system_traces WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'dc2': '2',
   'dc1': '2'
 };
 {noformat}
 Repair -pr:
 {noformat}
 [Nicks-MacBook-Pro:21:35:58 cassandra-2.0] cassandra$ bin/nodetool -p 7100 
 repair -pr system_traces
 [2014-05-28 21:36:01,977] Starting repair command #12, repairing 1 ranges for 
 keyspace system_traces
 [2014-05-28 21:36:02,207] Repair session f984d290-e6d9-11e3-9edc-5f8011daec21 
 for range (0,-9223372036854775808] finished
 [2014-05-28 21:36:02,207] Repair command #12 finished
 [Nicks-MacBook-Pro:21:36:02 cassandra-2.0] cassandra$ bin/nodetool -p 7200 
 repair -pr system_traces
 [2014-05-28 21:36:14,086] Starting repair command #1, repairing 1 ranges for 
 keyspace system_traces
 [2014-05-28 21:36:14,406] Repair session 00bd45b0-e6da-11e3-98fc-5f8011daec21 
 for range (-9223372036854775798,-10] finished
 [2014-05-28 21:36:14,406] Repair command #1 finished
 {noformat}
 Note that repairing both nodes in dc1, leaves very small ranges unrepaired. 
 For example (-10,0]. Repairing the 'primary range' in dc2 will repair those 
 small ranges. Maybe that is the behavior we want but it seems 
 counterintuitive.
 The behavior when manually trying to repair the full range of 127.0.0.01 
 definitely needs improvement though.
 Repair command:
 {noformat}
 [Nicks-MacBook-Pro:21:50:44 cassandra-2.0] cassandra$ bin/nodetool -p 7100 
 repair -st -10 -et -9223372036854775808 system_traces
 [2014-05-28 21:50:55,803] Starting repair command #17, repairing 1 ranges 

[jira] [Commented] (CASSANDRA-7317) Repair range validation and calculation is off

2014-05-29 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013262#comment-14013262
 ] 

Nick Bailey commented on CASSANDRA-7317:


Even if we were to decide that this is the correct behavior, it gets hard to 
reason about as you increase tokens, or datacenters. If we expand the above 
example to three datacenters and the keyspace exists in dc1 and dc2 but not 
dc3, which ranges is dc1 the primary for and which ranges is dc2 the primary 
for. Somebody will have to be responsible for the ranges dc3 'owns'.

 Repair range validation and calculation is off
 --

 Key: CASSANDRA-7317
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7317
 Project: Cassandra
  Issue Type: Bug
Reporter: Nick Bailey
 Fix For: 2.0.9

 Attachments: Untitled Diagram(1).png


 From what I can tell the calculation (using the -pr option) and validation of 
 tokens for repairing ranges is broken. Or at least should be improved. Using 
 an example with ccm:
 Nodetool ring:
 {noformat}
 Datacenter: dc1
 ==
 AddressRackStatus State   LoadOwns
 Token
   -10
 127.0.0.1  r1  Up Normal  188.96 KB   50.00%  
 -9223372036854775808
 127.0.0.2  r1  Up Normal  194.77 KB   50.00%  -10
 Datacenter: dc2
 ==
 AddressRackStatus State   LoadOwns
 Token
   0
 127.0.0.4  r1  Up Normal  160.58 KB   0.00%   
 -9223372036854775798
 127.0.0.3  r1  Up Normal  139.46 KB   0.00%   0
 {noformat}
 Schema:
 {noformat}
 CREATE KEYSPACE system_traces WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'dc2': '2',
   'dc1': '2'
 };
 {noformat}
 Repair -pr:
 {noformat}
 [Nicks-MacBook-Pro:21:35:58 cassandra-2.0] cassandra$ bin/nodetool -p 7100 
 repair -pr system_traces
 [2014-05-28 21:36:01,977] Starting repair command #12, repairing 1 ranges for 
 keyspace system_traces
 [2014-05-28 21:36:02,207] Repair session f984d290-e6d9-11e3-9edc-5f8011daec21 
 for range (0,-9223372036854775808] finished
 [2014-05-28 21:36:02,207] Repair command #12 finished
 [Nicks-MacBook-Pro:21:36:02 cassandra-2.0] cassandra$ bin/nodetool -p 7200 
 repair -pr system_traces
 [2014-05-28 21:36:14,086] Starting repair command #1, repairing 1 ranges for 
 keyspace system_traces
 [2014-05-28 21:36:14,406] Repair session 00bd45b0-e6da-11e3-98fc-5f8011daec21 
 for range (-9223372036854775798,-10] finished
 [2014-05-28 21:36:14,406] Repair command #1 finished
 {noformat}
 Note that repairing both nodes in dc1, leaves very small ranges unrepaired. 
 For example (-10,0]. Repairing the 'primary range' in dc2 will repair those 
 small ranges. Maybe that is the behavior we want but it seems 
 counterintuitive.
 The behavior when manually trying to repair the full range of 127.0.0.01 
 definitely needs improvement though.
 Repair command:
 {noformat}
 [Nicks-MacBook-Pro:21:50:44 cassandra-2.0] cassandra$ bin/nodetool -p 7100 
 repair -st -10 -et -9223372036854775808 system_traces
 [2014-05-28 21:50:55,803] Starting repair command #17, repairing 1 ranges for 
 keyspace system_traces
 [2014-05-28 21:50:55,804] Starting repair command #17, repairing 1 ranges for 
 keyspace system_traces
 [2014-05-28 21:50:55,804] Repair command #17 finished
 [Nicks-MacBook-Pro:21:50:56 cassandra-2.0] cassandra$ echo $?
 1
 {noformat}
 system.log:
 {noformat}
 ERROR [Thread-96] 2014-05-28 21:40:05,921 StorageService.java (line 2621) 
 Repair session failed:
 java.lang.IllegalArgumentException: Requested range intersects a local range 
 but is not fully contained in one; this would lead to imprecise repair
 {noformat}
 * The actual output of the repair command doesn't really indicate that there 
 was an issue. Although the command does return with a non zero exit status.
 * The error here is invisible if you are using the synchronous jmx repair 
 api. It will appear as though the repair completed successfully.
 * Personally, I believe that should be a valid repair command. For the 
 system_traces keyspace, 127.0.0.1 is responsible for this range (and I would 
 argue the 'primary range' of the node).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-7317) Repair range validation and calculation is off

2014-05-28 Thread Nick Bailey (JIRA)
Nick Bailey created CASSANDRA-7317:
--

 Summary: Repair range validation and calculation is off
 Key: CASSANDRA-7317
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7317
 Project: Cassandra
  Issue Type: Bug
Reporter: Nick Bailey
 Fix For: 1.2.17, 2.0.8, 2.1 rc1


From what I can tell the calculation (using the -pr option) and validation of 
tokens for repairing ranges is broken. Or at least should be improved. Using 
an example with ccm:

Nodetool ring:

{noformat}
Datacenter: dc1
==
AddressRackStatus State   LoadOwnsToken
  -10
127.0.0.1  r1  Up Normal  188.96 KB   50.00%  
-9223372036854775808
127.0.0.2  r1  Up Normal  194.77 KB   50.00%  -10

Datacenter: dc2
==
AddressRackStatus State   LoadOwnsToken
  0
127.0.0.4  r1  Up Normal  160.58 KB   0.00%   
-9223372036854775798
127.0.0.3  r1  Up Normal  139.46 KB   0.00%   0
{noformat}

Schema:

{noformat}
CREATE KEYSPACE system_traces WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'dc2': '2',
  'dc1': '2'
};
{noformat}

Repair -pr:
{noformat}
[Nicks-MacBook-Pro:21:35:58 cassandra-2.0] cassandra$ bin/nodetool -p 7100 
repair -pr system_traces
[2014-05-28 21:36:01,977] Starting repair command #12, repairing 1 ranges for 
keyspace system_traces
[2014-05-28 21:36:02,207] Repair session f984d290-e6d9-11e3-9edc-5f8011daec21 
for range (0,-9223372036854775808] finished
[2014-05-28 21:36:02,207] Repair command #12 finished
[Nicks-MacBook-Pro:21:36:02 cassandra-2.0] cassandra$ bin/nodetool -p 7200 
repair -pr system_traces
[2014-05-28 21:36:14,086] Starting repair command #1, repairing 1 ranges for 
keyspace system_traces
[2014-05-28 21:36:14,406] Repair session 00bd45b0-e6da-11e3-98fc-5f8011daec21 
for range (-9223372036854775798,-10] finished
[2014-05-28 21:36:14,406] Repair command #1 finished
{noformat}

Note that repairing both nodes in dc1, leaves very small ranges unrepaired. For 
example (-10,0]. Repairing the 'primary range' in dc2 will repair those small 
ranges. Maybe that is the behavior we want but it seems counterintuitive.

The behavior when manually trying to repair the full range of 127.0.0.01 
definitely needs improvement though.

Repair command:
{noformat}
[Nicks-MacBook-Pro:21:50:44 cassandra-2.0] cassandra$ bin/nodetool -p 7100 
repair -st -10 -et -9223372036854775808 system_traces
[2014-05-28 21:50:55,803] Starting repair command #17, repairing 1 ranges for 
keyspace system_traces
[2014-05-28 21:50:55,804] Starting repair command #17, repairing 1 ranges for 
keyspace system_traces
[2014-05-28 21:50:55,804] Repair command #17 finished
[Nicks-MacBook-Pro:21:50:56 cassandra-2.0] cassandra$ echo $?
1
{noformat}

system.log:
{noformat}
ERROR [Thread-96] 2014-05-28 21:40:05,921 StorageService.java (line 2621) 
Repair session failed:
java.lang.IllegalArgumentException: Requested range intersects a local range 
but is not fully contained in one; this would lead to imprecise repair
{noformat}

* The actual output of the repair command doesn't really indicate that there 
was an issue. Although the command does return with a non zero exit status.
* The error here is invisible if you are using the synchronous jmx repair api. 
It will appear as though the repair completed successfully.
* Personally, I believe that should be a valid repair command. For the 
system_traces keyspace, 127.0.0.1 is responsible for this range (and I would 
argue the 'primary range' of the node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7317) Repair range validation and calculation is off

2014-05-28 Thread Nick Bailey (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Bailey updated CASSANDRA-7317:
---

Description: 
From what I can tell the calculation (using the -pr option) and validation of 
tokens for repairing ranges is broken. Or at least should be improved. Using 
an example with ccm:

Nodetool ring:

{noformat}
Datacenter: dc1
==
AddressRackStatus State   LoadOwnsToken
  -10
127.0.0.1  r1  Up Normal  188.96 KB   50.00%  
-9223372036854775808
127.0.0.2  r1  Up Normal  194.77 KB   50.00%  -10

Datacenter: dc2
==
AddressRackStatus State   LoadOwnsToken
  0
127.0.0.4  r1  Up Normal  160.58 KB   0.00%   
-9223372036854775798
127.0.0.3  r1  Up Normal  139.46 KB   0.00%   0
{noformat}

Schema:

{noformat}
CREATE KEYSPACE system_traces WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'dc2': '2',
  'dc1': '2'
};
{noformat}

Repair -pr:
{noformat}
[Nicks-MacBook-Pro:21:35:58 cassandra-2.0] cassandra$ bin/nodetool -p 7100 
repair -pr system_traces
[2014-05-28 21:36:01,977] Starting repair command #12, repairing 1 ranges for 
keyspace system_traces
[2014-05-28 21:36:02,207] Repair session f984d290-e6d9-11e3-9edc-5f8011daec21 
for range (0,-9223372036854775808] finished
[2014-05-28 21:36:02,207] Repair command #12 finished
[Nicks-MacBook-Pro:21:36:02 cassandra-2.0] cassandra$ bin/nodetool -p 7200 
repair -pr system_traces
[2014-05-28 21:36:14,086] Starting repair command #1, repairing 1 ranges for 
keyspace system_traces
[2014-05-28 21:36:14,406] Repair session 00bd45b0-e6da-11e3-98fc-5f8011daec21 
for range (-9223372036854775798,-10] finished
[2014-05-28 21:36:14,406] Repair command #1 finished
{noformat}

Note that repairing both nodes in dc1, leaves very small ranges unrepaired. For 
example (-10,0]. Repairing the 'primary range' in dc2 will repair those small 
ranges. Maybe that is the behavior we want but it seems counterintuitive.

The behavior when manually trying to repair the full range of 127.0.0.01 
definitely needs improvement though.

Repair command:
{noformat}
[Nicks-MacBook-Pro:21:50:44 cassandra-2.0] cassandra$ bin/nodetool -p 7100 
repair -st -10 -et -9223372036854775808 system_traces
[2014-05-28 21:50:55,803] Starting repair command #17, repairing 1 ranges for 
keyspace system_traces
[2014-05-28 21:50:55,804] Starting repair command #17, repairing 1 ranges for 
keyspace system_traces
[2014-05-28 21:50:55,804] Repair command #17 finished
[Nicks-MacBook-Pro:21:50:56 cassandra-2.0] cassandra$ echo $?
1
{noformat}

system.log:
{noformat}
ERROR [Thread-96] 2014-05-28 21:40:05,921 StorageService.java (line 2621) 
Repair session failed:
java.lang.IllegalArgumentException: Requested range intersects a local range 
but is not fully contained in one; this would lead to imprecise repair
{noformat}

* The actual output of the repair command doesn't really indicate that there 
was an issue. Although the command does return with a non zero exit status.
* The error here is invisible if you are using the synchronous jmx repair api. 
It will appear as though the repair completed successfully.
* Personally, I believe that should be a valid repair command. For the 
system_traces keyspace, 127.0.0.1 is responsible for this range (and I would 
argue the 'primary range' of the node).

  was:
From what I can tell the calculation (using the -pr option) and validation of 
tokens for repairing ranges is broken. Or at least should be improved. Using 
an example with ccm:

Nodetool ring:

{noformat}
Datacenter: dc1
==
AddressRackStatus State   LoadOwnsToken
  -10
127.0.0.1  r1  Up Normal  188.96 KB   50.00%  
-9223372036854775808
127.0.0.2  r1  Up Normal  194.77 KB   50.00%  -10

Datacenter: dc2
==
AddressRackStatus State   LoadOwnsToken
  0
127.0.0.4  r1  Up Normal  160.58 KB   0.00%   
-9223372036854775798
127.0.0.3  r1  Up Normal  139.46 KB   0.00%   0
{noformat}

Schema:

{noformat}
CREATE KEYSPACE system_traces WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'dc2': '2',
  'dc1': '2'
};
{noformat}

Repair -pr:
{noformat}
[Nicks-MacBook-Pro:21:35:58 cassandra-2.0] cassandra$ bin/nodetool -p 7100 
repair -pr system_traces
[2014-05-28 21:36:01,977] Starting repair command #12, repairing 1 ranges 

[jira] [Commented] (CASSANDRA-7206) UDT - allow null / non-existant attributes

2014-05-23 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007418#comment-14007418
 ] 

Nick Bailey commented on CASSANDRA-7206:


Does setting a UDT field to null save the space for that field on disk?

 UDT - allow null / non-existant attributes
 --

 Key: CASSANDRA-7206
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7206
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Robert Stupp
Assignee: Sylvain Lebresne
 Fix For: 2.1 rc1

 Attachments: 7206.txt


 C* 2.1 CQL User-Defined-Types are really fine and useful.
 But it lacks the possibility to omit attributes or set them to null.
 Would be great to have the possibility to create UDT instances with some 
 attributes missing.
 Also changing the UDT definition (for example: {{alter type add new_attr}}) 
 will break running applications that rely on the previous definition of the 
 UDT.
 For exmple:
 {code}
 CREATE TYPE foo (
attr_one text,
attr_two int );
 CREATE TABLE bar (
id int,
comp foo );
 {code}
 {code}
 INSERT INTO bar (id, com) VALUES (1, {attr_one: 'cassandra', attr_two: 2});
 {code}
 works
 {code}
 INSERT INTO bar (id, com) VALUES (1, {attr_one: 'cassandra'});
 {code}
 does not work
 {code}
 ALTER TYPE foo ADD attr_three timestamp;
 {code}
 {code}
 INSERT INTO bar (id, com) VALUES (1, {attr_one: 'cassandra', attr_two: 2});
 {code}
 will no longer work (missing attribute)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CASSANDRA-7294) FileCache metrics incorrectly named

2014-05-23 Thread Nick Bailey (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Bailey updated CASSANDRA-7294:
---

Attachment: 7294.txt

 FileCache metrics incorrectly named
 ---

 Key: CASSANDRA-7294
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7294
 Project: Cassandra
  Issue Type: Bug
Reporter: Nick Bailey
 Fix For: 2.0.8

 Attachments: 7294.txt






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-7294) FileCache metrics incorrectly named

2014-05-23 Thread Nick Bailey (JIRA)
Nick Bailey created CASSANDRA-7294:
--

 Summary: FileCache metrics incorrectly named
 Key: CASSANDRA-7294
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7294
 Project: Cassandra
  Issue Type: Bug
Reporter: Nick Bailey
 Fix For: 2.0.8
 Attachments: 7294.txt





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7294) FileCache metrics incorrectly named

2014-05-23 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007491#comment-14007491
 ] 

Nick Bailey commented on CASSANDRA-7294:


Patch to name things correctly.

 FileCache metrics incorrectly named
 ---

 Key: CASSANDRA-7294
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7294
 Project: Cassandra
  Issue Type: Bug
Reporter: Nick Bailey
Assignee: Nick Bailey
 Fix For: 2.0.8

 Attachments: 7294.txt






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7273) expose global ColumnFamily metrics

2014-05-20 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004001#comment-14004001
 ] 

Nick Bailey commented on CASSANDRA-7273:


Related: CASSANDRA-6539

 expose global ColumnFamily metrics
 --

 Key: CASSANDRA-7273
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7273
 Project: Cassandra
  Issue Type: New Feature
Reporter: Richard Wagner
Priority: Minor

 It would be very useful to have cassandra expose ColumnFamily metrics that 
 span all column families. A general purpose cassandra monitoring system built 
 up around the current ColumnFamily metrics really only has a couple of 
 choices right now: publish metrics for all column families or fetch metrics 
 for all column families, aggregate them and then publish the aggregated 
 metrics. The first can be quite expensive for the downstream monitoring 
 system and the second is a piece of work that it seems is better pushed into 
 cassandra itself.
 Perhaps these global ColumnFamily metrics could be published under a name of:
 org.apache.cassandra.metrics:type=(ColumnFamily|IndexColumnFamily),name=(Metric
  name)
 (Same as existing ColumnFamily metrics, but with no keyspace or scope.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Issue Comment Deleted] (CASSANDRA-7273) expose global ColumnFamily metrics

2014-05-20 Thread Nick Bailey (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Bailey updated CASSANDRA-7273:
---

Comment: was deleted

(was: Related: CASSANDRA-6539)

 expose global ColumnFamily metrics
 --

 Key: CASSANDRA-7273
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7273
 Project: Cassandra
  Issue Type: New Feature
Reporter: Richard Wagner
Priority: Minor

 It would be very useful to have cassandra expose ColumnFamily metrics that 
 span all column families. A general purpose cassandra monitoring system built 
 up around the current ColumnFamily metrics really only has a couple of 
 choices right now: publish metrics for all column families or fetch metrics 
 for all column families, aggregate them and then publish the aggregated 
 metrics. The first can be quite expensive for the downstream monitoring 
 system and the second is a piece of work that it seems is better pushed into 
 cassandra itself.
 Perhaps these global ColumnFamily metrics could be published under a name of:
 org.apache.cassandra.metrics:type=(ColumnFamily|IndexColumnFamily),name=(Metric
  name)
 (Same as existing ColumnFamily metrics, but with no keyspace or scope.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7136) Change default paths to ~ instead of /var

2014-05-02 Thread Nick Bailey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13988379#comment-13988379
 ] 

Nick Bailey commented on CASSANDRA-7136:


Can we make sure that any dynamically generated paths are exposed via jmx for 
monitoring/management tools? 

 Change default paths to ~ instead of /var
 -

 Key: CASSANDRA-7136
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7136
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Albert P Tobey
 Fix For: 2.1.0


 Defaulting to /var makes it more difficult for both multi-user systems and 
 people unfamiliar with the command line.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   3   4   5   >