[jira] [Commented] (CASSANDRA-13226) StreamPlan for incremental repairs flushing memtables unnecessarily

2017-03-23 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938617#comment-15938617
 ] 

Paulo Motta commented on CASSANDRA-13226:
-

I think the idea behind flushing on stream was to send the most up-to-date data 
during bootstrap/rebuild/decommission/replace, but this doesn't apply to repair 
since you will end up overstreaming non-validated data as pointed out by 
[~brstgt]. In any case this minor improvement is subject to another ticket 
since this ticket is already closed.

> StreamPlan for incremental repairs flushing memtables unnecessarily
> ---
>
> Key: CASSANDRA-13226
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13226
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 4.0
>
>
> Since incremental repairs are run against a fixed dataset, there's no need to 
> flush memtables when streaming for them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13226) StreamPlan for incremental repairs flushing memtables unnecessarily

2017-03-23 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938577#comment-15938577
 ] 

Benjamin Roth commented on CASSANDRA-13226:
---

That does not make sense to me. Why should be streamed more than requested? 
Sounds like waste of resources to me. Streaming more than a repair requires 
assumes that the system is still creating inconsistent data during the repair.

> StreamPlan for incremental repairs flushing memtables unnecessarily
> ---
>
> Key: CASSANDRA-13226
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13226
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 4.0
>
>
> Since incremental repairs are run against a fixed dataset, there's no need to 
> flush memtables when streaming for them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13226) StreamPlan for incremental repairs flushing memtables unnecessarily

2017-03-23 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938562#comment-15938562
 ] 

Blake Eggleston commented on CASSANDRA-13226:
-

[~brstgt] I think the idea behind flushing on stream for full is that you'll be 
streaming even more recent data than when the merkle tree was generated, which 
there's really no harm in doing.

> StreamPlan for incremental repairs flushing memtables unnecessarily
> ---
>
> Key: CASSANDRA-13226
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13226
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 4.0
>
>
> Since incremental repairs are run against a fixed dataset, there's no need to 
> flush memtables when streaming for them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13226) StreamPlan for incremental repairs flushing memtables unnecessarily

2017-02-28 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887961#comment-15887961
 ] 

Benjamin Roth commented on CASSANDRA-13226:
---

Sorry for that many comments, just another thought:

Flushes can be optimized very easily in that way that a flush is only executed 
if the memtable contains mutations for the requested range OR if the memtable 
exceeds a certain size, so that the check is still cheap. I implemented this 
just for fun some months ago but did never create a ticket for it.

> StreamPlan for incremental repairs flushing memtables unnecessarily
> ---
>
> Key: CASSANDRA-13226
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13226
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 4.0
>
>
> Since incremental repairs are run against a fixed dataset, there's no need to 
> flush memtables when streaming for them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13226) StreamPlan for incremental repairs flushing memtables unnecessarily

2017-02-28 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887945#comment-15887945
 ] 

Benjamin Roth commented on CASSANDRA-13226:
---

I am referring to this "stacktrace":

RepairMessageVerbHandler.doVerb (case VALIDATION_REQUEST)
CompactionManager.instance.submitValidation(store, validator) 
CompactionManager.doValidationCompaction
=> StorageService.instance.forceKeyspaceFlush

After that merkle trees are calculated and based on that streams are triggered. 
Thats why all data that is electable for transfer has already been flushed.

Also avoiding a flush locally is only the half way. Streams REQUESTED by a 
stream plan also cause a flush on the sender side. But that sender also has 
already validated (and so flushed) the requested data.

Maybe I missed sth but from what I can see, a REPAIR stream never requires a 
flush.

> StreamPlan for incremental repairs flushing memtables unnecessarily
> ---
>
> Key: CASSANDRA-13226
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13226
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 4.0
>
>
> Since incremental repairs are run against a fixed dataset, there's no need to 
> flush memtables when streaming for them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13226) StreamPlan for incremental repairs flushing memtables unnecessarily

2017-02-28 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15887654#comment-15887654
 ] 

Benjamin Roth commented on CASSANDRA-13226:
---

Isn't this also true for non-incremental repairs?
Merkle tree calculation also triggers a flush and any repair begins with a 
merkle tree. So there is no need to flush as the inconsistent dataset to be 
streamed for repair is always contained in SSTables flushed by MT calculation 
before.

> StreamPlan for incremental repairs flushing memtables unnecessarily
> ---
>
> Key: CASSANDRA-13226
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13226
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 4.0
>
>
> Since incremental repairs are run against a fixed dataset, there's no need to 
> flush memtables when streaming for them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13226) StreamPlan for incremental repairs flushing memtables unnecessarily

2017-02-16 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871301#comment-15871301
 ] 

Marcus Eriksson commented on CASSANDRA-13226:
-

+1 - maybe just add a comment why we don't need to flush on incremental repairs?

> StreamPlan for incremental repairs flushing memtables unnecessarily
> ---
>
> Key: CASSANDRA-13226
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13226
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 4.0
>
>
> Since incremental repairs are run against a fixed dataset, there's no need to 
> flush memtables when streaming for them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)