[jira] [Commented] (CASSANDRA-13079) Repair doesn't work after several replication factor changes

2017-02-10 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861972#comment-15861972
 ] 

Marcus Eriksson commented on CASSANDRA-13079:
-

[~pauloricardomg] do you have a plan for this? Otherwise I made a small patch 
to just output a warning if someone increases the RF: 
https://github.com/krummas/cassandra/commits/marcuse/13079

> Repair doesn't work after several replication factor changes
> 
>
> Key: CASSANDRA-13079
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13079
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Debian 
>Reporter: Vladimir Yudovin
>Assignee: Paulo Motta
>Priority: Critical
>
> Scenario:
> Start two nodes cluster.
> Create keyspace with rep.factor *one*:
> CREATE KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> CREATE TABLE rep.data (str text PRIMARY KEY );
> INSERT INTO rep.data (str) VALUES ( 'qwerty');
> Run *nodetool flush* on all nodes. On one of them table files are created.
> Change replication factor to *two*:
> ALTER KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> Run repair, then *nodetool flush* on all nodes. On all nodes table files are 
> created.
> Change replication factor to *one*:
> ALTER KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> Then *nodetool cleanup*, only on initial node remained data files.
> Change replication factor to *two* again:
> ALTER KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> Run repair, then *nodetool flush* on all nodes. No data files on second node 
> (though expected, as after first repair/flush).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13079) Repair doesn't work after several replication factor changes

2017-02-09 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15859959#comment-15859959
 ] 

Marcus Eriksson commented on CASSANDRA-13079:
-

The problem is that if we automatically mark all stables as unrepaired, all 
repaired sstables will potentially move to L0 in the unrepaired compaction 
strategy, this would cause a lot of compactions across the cluster and that 
would probably be even more surprising to users than the fact that they have to 
run repair -full

> Repair doesn't work after several replication factor changes
> 
>
> Key: CASSANDRA-13079
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13079
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Debian 
>Reporter: Vladimir Yudovin
>Assignee: Paulo Motta
>Priority: Critical
>
> Scenario:
> Start two nodes cluster.
> Create keyspace with rep.factor *one*:
> CREATE KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> CREATE TABLE rep.data (str text PRIMARY KEY );
> INSERT INTO rep.data (str) VALUES ( 'qwerty');
> Run *nodetool flush* on all nodes. On one of them table files are created.
> Change replication factor to *two*:
> ALTER KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> Run repair, then *nodetool flush* on all nodes. On all nodes table files are 
> created.
> Change replication factor to *one*:
> ALTER KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> Then *nodetool cleanup*, only on initial node remained data files.
> Change replication factor to *two* again:
> ALTER KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> Run repair, then *nodetool flush* on all nodes. No data files on second node 
> (though expected, as after first repair/flush).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13079) Repair doesn't work after several replication factor changes

2017-01-02 Thread Vladimir Yudovin (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15794367#comment-15794367
 ] 

Vladimir Yudovin commented on CASSANDRA-13079:
--

"Increased" should also include adding new DC (no matter existing or new) for 
replication , even if current factor for other DC is decreased, so total sum if 
unchanged or even decreased.

May be for simplicity we can reset repair state on any replication factor or 
class change. It's not often operation, besides maybe system_auth, but it's 
usually small keyspace.

> Repair doesn't work after several replication factor changes
> 
>
> Key: CASSANDRA-13079
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13079
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Debian 
>Reporter: Vladimir Yudovin
>Priority: Critical
>
> Scenario:
> Start two nodes cluster.
> Create keyspace with rep.factor *one*:
> CREATE KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> CREATE TABLE rep.data (str text PRIMARY KEY );
> INSERT INTO rep.data (str) VALUES ( 'qwerty');
> Run *nodetool flush* on all nodes. On one of them table files are created.
> Change replication factor to *two*:
> ALTER KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> Run repair, then *nodetool flush* on all nodes. On all nodes table files are 
> created.
> Change replication factor to *one*:
> ALTER KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> Then *nodetool cleanup*, only on initial node remained data files.
> Change replication factor to *two* again:
> ALTER KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> Run repair, then *nodetool flush* on all nodes. No data files on second node 
> (though expected, as after first repair/flush).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-13079) Repair doesn't work after several replication factor changes

2017-01-02 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15793609#comment-15793609
 ] 

Jeff Jirsa commented on CASSANDRA-13079:


{quote}And this rise a question - shouldn't replication factor change also 
reset repair state for this keyspace?{quote}

If replication factor is increased, it seems like it should, in fact, reset the 
repair state for that keyspace. The fact that we don't is probably a bug.  

{quote}
I think it would be a good idea for this type of scenario to change the repair 
state during replication altering, but I'm not sure if that's always the case.
{quote}

The principle of least astonishment applies here - a user running repair should 
expect all data to be repaired, and a user who adds a new DC and then runs 
repair will see a lot of data streamed. That's not something that SHOULD 
surprise a user. They can work around it if they choose. The fact that 
incremental (default) repair doesn't do anything if you change from rf=1 to 
rf=2 is more surprising and dangerous than extra streaming, so I imagine we 
should consider that more important.



> Repair doesn't work after several replication factor changes
> 
>
> Key: CASSANDRA-13079
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13079
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Debian 
>Reporter: Vladimir Yudovin
>Priority: Critical
>
> Scenario:
> Start two nodes cluster.
> Create keyspace with rep.factor *one*:
> CREATE KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> CREATE TABLE rep.data (str text PRIMARY KEY );
> INSERT INTO rep.data (str) VALUES ( 'qwerty');
> Run *nodetool flush* on all nodes. On one of them table files are created.
> Change replication factor to *two*:
> ALTER KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> Run repair, then *nodetool flush* on all nodes. On all nodes table files are 
> created.
> Change replication factor to *one*:
> ALTER KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> Then *nodetool cleanup*, only on initial node remained data files.
> Change replication factor to *two* again:
> ALTER KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> Run repair, then *nodetool flush* on all nodes. No data files on second node 
> (though expected, as after first repair/flush).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-13079) Repair doesn't work after several replication factor changes

2016-12-28 Thread Marcus Olsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15783345#comment-15783345
 ] 

Marcus Olsson commented on CASSANDRA-13079:
---

Good to hear that --full did the trick.

I think it would be a good idea for this type of scenario to change the repair 
state during replication altering, but I'm not sure if that's always the case.

In the scenario for adding a new data center I believe the recommended approach 
is to disable auto bootstrap and instead change the replication factor when the 
full data center is up and running. And then run "nodetool rebuild" to stream 
over the data from an existing data center. In that scenario it could be large 
amounts of data that would get marked as unrepaired and would have to be 
repaired again causing unnecessary load on the cluster.

Another scenario is reducing the replication factor, in that case I don't think 
there would be a need to alter the repair state since there should only be less 
replicas, but to me it feels like this scenario would be easier to cover than 
the multi-dc one.

Unless I'm missing something with the multi-dc scenario I'd say this would need 
to be implemented as an optional feature to avoid complexity(both operational 
and in code), but I'm not sure how this would be done or how feasible it is 
currently. Perhaps by adding some metadata to the schema altering statements?

> Repair doesn't work after several replication factor changes
> 
>
> Key: CASSANDRA-13079
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13079
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Debian 
>Reporter: Vladimir Yudovin
>Priority: Critical
>
> Scenario:
> Start two nodes cluster.
> Create keyspace with rep.factor *one*:
> CREATE KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> CREATE TABLE rep.data (str text PRIMARY KEY );
> INSERT INTO rep.data (str) VALUES ( 'qwerty');
> Run *nodetool flush* on all nodes. On one of them table files are created.
> Change replication factor to *two*:
> ALTER KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> Run repair, then *nodetool flush* on all nodes. On all nodes table files are 
> created.
> Change replication factor to *one*:
> ALTER KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> Then *nodetool cleanup*, only on initial node remained data files.
> Change replication factor to *two* again:
> ALTER KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> Run repair, then *nodetool flush* on all nodes. No data files on second node 
> (though expected, as after first repair/flush).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-13079) Repair doesn't work after several replication factor changes

2016-12-27 Thread Vladimir Yudovin (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15780332#comment-15780332
 ] 

Vladimir Yudovin commented on CASSANDRA-13079:
--

Well, indeed, --full helps.

And this rise a question - shouldn't replication factor change also reset 
repair state for this keyspace?

> Repair doesn't work after several replication factor changes
> 
>
> Key: CASSANDRA-13079
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13079
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Debian 
>Reporter: Vladimir Yudovin
>Priority: Critical
>
> Scenario:
> Start two nodes cluster.
> Create keyspace with rep.factor *one*:
> CREATE KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> CREATE TABLE rep.data (str text PRIMARY KEY );
> INSERT INTO rep.data (str) VALUES ( 'qwerty');
> Run *nodetool flush* on all nodes. On one of them table files are created.
> Change replication factor to *two*:
> ALTER KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> Run repair, then *nodetool flush* on all nodes. On all nodes table files are 
> created.
> Change replication factor to *one*:
> ALTER KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> Then *nodetool cleanup*, only on initial node remained data files.
> Change replication factor to *two* again:
> ALTER KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> Run repair, then *nodetool flush* on all nodes. No data files on second node 
> (though expected, as after first repair/flush).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-13079) Repair doesn't work after several replication factor changes

2016-12-27 Thread Marcus Olsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15780296#comment-15780296
 ] 

Marcus Olsson commented on CASSANDRA-13079:
---

How are you running repair? If repair is run without the '--full' flag I 
believe this is the expected behaviour due to incremental repair being default 
since version 2.2. Incremental repair basically doesn't repair the data that 
has previously been repaired, which could explain the above situation.

> Repair doesn't work after several replication factor changes
> 
>
> Key: CASSANDRA-13079
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13079
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Debian 
>Reporter: Vladimir Yudovin
>Priority: Critical
>
> Scenario:
> Start two nodes cluster.
> Create keyspace with rep.factor *one*:
> CREATE KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> CREATE TABLE rep.data (str text PRIMARY KEY );
> INSERT INTO rep.data (str) VALUES ( 'qwerty');
> Run *nodetool flush* on all nodes. On one of them table files are created.
> Change replication factor to *two*:
> ALTER KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> Run repair, then *nodetool flush* on all nodes. On all nodes table files are 
> created.
> Change replication factor to *one*:
> ALTER KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> Then *nodetool cleanup*, only on initial node remained data files.
> Change replication factor to *two* again:
> ALTER KEYSPACE rep WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 2};
> Run repair, then *nodetool flush* on all nodes. No data files on second node 
> (though expected, as after first repair/flush).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)