[jira] [Commented] (CASSANDRA-11548) Anticompaction not removing old sstables

2016-04-20 Thread Ruoran Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251053#comment-15251053
 ] 

Ruoran Wang commented on CASSANDRA-11548:
-

Thank you [~pauloricardomg]. May I ask about the release schedule for 2.1.14?

> Anticompaction not removing old sstables
> 
>
> Key: CASSANDRA-11548
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11548
> Project: Cassandra
>  Issue Type: Bug
> Environment: 2.1.13
>Reporter: Ruoran Wang
>Assignee: Ruoran Wang
> Fix For: 2.1.14
>
> Attachments: 0001-cassandra-2.1.13-potential-fix.patch
>
>
> 1. 12/29/15 https://issues.apache.org/jira/browse/CASSANDRA-10831
> Moved markCompactedSSTablesReplaced out of the loop ```for (SSTableReader 
> sstable : repairedSSTables)```
> 2. 1/18/16 https://issues.apache.org/jira/browse/CASSANDRA-10829
> Added unmarkCompacting into the loop. ```for (SSTableReader sstable : 
> repairedSSTables)```
> I think the effect of those above change might cause the 
> markCompactedSSTablesReplaced fail on 
> DataTracker.java
> {noformat}
>assert newSSTables.size() + newShadowed.size() == newSSTablesSize :
> String.format("Expecting new size of %d, got %d while 
> replacing %s by %s in %s",
>   newSSTablesSize, newSSTables.size() + 
> newShadowed.size(), oldSSTables, replacements, this);
> {noformat}
> Since change CASSANDRA-10831 moved it out. This AssertError won't be caught, 
> leaving the oldsstables not removed. (Then this might cause row out of order 
> error when doing incremental repair if there are L1 un-repaired sstables.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11599) When there are a large number of small repaired L0 sstables, compaction is very slow

2016-04-18 Thread Ruoran Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruoran Wang updated CASSANDRA-11599:

Description: 
This is on 6 node 2.1.13 cluster with leveled compaction strategy. This happens 
when/after running incremental repair, like '/usr/bin/nodetool repair -pr -par 
-local -inc -- KEYSPACE'
 
Initially, I found missing metrics when there is heavy compaction going 
on(https://issues.apache.org/jira/browse/CASSANDRA-9625). Because 
WrappingCompactionStrategy is blocked. 
Then I saw a case where compaction got stucked (progress moves dramatically 
slow). There are 29k sstables after inc repair where I noticed tons of sstables 
are only 200+ Bytes just containing 1 key. Also because of 
WrappingCompactionStrategy is blocked.
 
My guess is, with 8 compaction_executors and a tons of small repaired L0 
sstables, the first thread is able to get some (likely 32) sstables to compact. 
If this task contains a large range of tokens, the following 7 thread will 
iterate through the sstabels trying to find what can be fixed in the meanwhile 
(which will lock WrappingCompactionStrategy when calling 
LevelManifest.getCandidatesFor), but failing in the end, since those sstable 
candidates intersects with what is being compacted by 1st thread. From a series 
of thread dump, I noticed the thread that is doing work always get blocked by 
other 7 threads.
 
1. I tried to separate an inc-repair into 4 token ranges, which helped keeping 
the sstables count down. That seems to be working.

2. Another fix I tried is, replace ageSortedSSTables with a new method 
"keyCountSortedSSTables", which small sstables are returned first. (at 
org/apache/cassandra/db/compaction/LeveledManifest.java:586). Since there will 
be 32 very small sstables, the following condition won't be met 
('SSTableReader.getTotalBytes(candidates) > maxSSTableSizeInBytes'), and the 
compaction will merge those 32 very small sstables. This will help to prevent 
the first compaction job to be working on a set of sstables that covers a wide 
range.

I can provide more info if needed.

  was:
This is on 6 node 2.1.13 cluster with leveled compaction strategy. This happens 
when/after running incremental repair, like '/usr/bin/nodetool repair -pr -par 
-local -inc -- KEYSPACE'
 
Initially, I found missing metrics when there is heavy compaction going 
on(https://issues.apache.org/jira/browse/CASSANDRA-9625). Because 
WrappingCompactionStrategy is blocked. 
Then I saw a case where compaction got stucked (progress moves dramatically 
slow). There are 29k sstables after inc repair where I noticed tons of sstables 
are only 200+ Bytes just containing 1 key. Also because of 
WrappingCompactionStrategy is blocked.
 
My guess is, with 8 compaction_executors and a tons of small repaired L0 
sstables, the first thread is able to get some (likely 32) sstables to compact. 
If this task contains a large range of tokens, the following 7 thread will 
iterate through the sstabels trying to find what can be fixed in the meanwhile, 
but failing in the end, since those sstable candidates intersects with what is 
being compacted by 1st thread. From a series of thread dump, I noticed the 
thread that is doing work always get blocked by other 7 threads.
 
1. I tried to separate an inc-repair into 4 token ranges, which helped keeping 
the sstables count down. That seems to be working.

2. Another fix I tried is, replace ageSortedSSTables with a new method 
"keyCountSortedSSTables", which small sstables are returned first. (at 
org/apache/cassandra/db/compaction/LeveledManifest.java:586). Since there will 
be 32 very small sstables, the following condition won't be met 
('SSTableReader.getTotalBytes(candidates) > maxSSTableSizeInBytes'), and the 
compaction will merge those 32 very small sstables. This will help to prevent 
the first compaction job to be working on a set of sstables that covers a wide 
range.

I can provide more info if needed.


> When there are a large number of small repaired L0 sstables, compaction is 
> very slow
> 
>
> Key: CASSANDRA-11599
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11599
> Project: Cassandra
>  Issue Type: Bug
> Environment: 2.1.13
>Reporter: Ruoran Wang
>
> This is on 6 node 2.1.13 cluster with leveled compaction strategy. This 
> happens when/after running incremental repair, like '/usr/bin/nodetool repair 
> -pr -par -local -inc -- KEYSPACE'
>  
> Initially, I found missing metrics when there is heavy compaction going 
> on(https://issues.apache.org/jira/browse/CASSANDRA-9625). Because 
> WrappingCompactionStrategy is blocked. 
> Then I saw a case where compaction got stucked (progress moves dramatically 
> slow). There are 29k sstables after inc repair where I 

[jira] [Updated] (CASSANDRA-11599) When there are a large number of small repaired L0 sstables, compaction is very slow

2016-04-18 Thread Ruoran Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruoran Wang updated CASSANDRA-11599:

Description: 
This is on 6 node 2.1.13 cluster with leveled compaction strategy. This happens 
when/after running incremental repair, like '/usr/bin/nodetool repair -pr -par 
-local -inc -- KEYSPACE'
 
Initially, I found missing metrics when there is heavy compaction going 
on(https://issues.apache.org/jira/browse/CASSANDRA-9625). Because 
WrappingCompactionStrategy is blocked. 
Then I saw a case where compaction got stucked (progress moves dramatically 
slow). There are 29k sstables after inc repair where I noticed tons of sstables 
are only 200+ Bytes just containing 1 key. Also because of 
WrappingCompactionStrategy is blocked.
 
My guess is, with 8 compaction_executors and a tons of small repaired L0 
sstables, the first thread is able to get some (likely 32) sstables to compact. 
If this task contains a large range of tokens, the following 7 thread will 
iterate through the sstabels trying to find what can be fixed in the meanwhile, 
but failing in the end, since those sstable candidates intersects with what is 
being compacted by 1st thread. From a series of thread dump, I noticed the 
thread that is doing work always get blocked by other 7 threads.
 
1. I tried to separate an inc-repair into 4 token ranges, which helped keeping 
the sstables count down. That seems to be working.

2. Another fix I tried is, replace ageSortedSSTables with a new method 
"keyCountSortedSSTables", which small sstables are returned first. (at 
org/apache/cassandra/db/compaction/LeveledManifest.java:586). Since there will 
be 32 very small sstables, the following condition won't be met 
('SSTableReader.getTotalBytes(candidates) > maxSSTableSizeInBytes'), and the 
compaction will merge those 32 very small sstables. This will help to prevent 
the first compaction job to be working on a set of sstables that covers a wide 
range.

I can provide more info if needed.

  was:
This is on 6 node 2.1.13 cluster with leveled compaction strategy. This happens 
when/after running incremental repair, like '/usr/bin/nodetool repair -pr -par 
-local -inc -- KEYSPACE'
 
Initially, I found missing metrics when there is heavy compaction going on. 
Because WrappingCompactionStrategy is blocked. 
Then I saw a case where compaction got stucked (progress moves dramatically 
slow). There are 29k sstables after inc repair where I noticed tons of sstables 
are only 200+ Bytes just containing 1 key. Also because of 
WrappingCompactionStrategy is blocked.
 
My guess is, with 8 compaction_executors and a tons of small repaired L0 
sstables, the first thread is able to get some (likely 32) sstables to compact. 
If this task contains a large range of tokens, the following 7 thread will 
iterate through the sstabels trying to find what can be fixed in the meanwhile, 
but failing in the end, since those sstable candidates intersects with what is 
being compacted by 1st thread. From a series of thread dump, I noticed the 
thread that is doing work always get blocked by other 7 threads.
 
1. I tried to separate an inc-repair into 4 token ranges, which helped keeping 
the sstables count down. That seems to be working.

2. Another fix I tried is, replace ageSortedSSTables with a new method 
"keyCountSortedSSTables", which small sstables are returned first. (at 
org/apache/cassandra/db/compaction/LeveledManifest.java:586). Since there will 
be 32 very small sstables, the following condition won't be met 
('SSTableReader.getTotalBytes(candidates) > maxSSTableSizeInBytes'), and the 
compaction will merge those 32 very small sstables. This will help to prevent 
the first compaction job to be working on a set of sstables that covers a wide 
range.

I can provide more info if needed.


> When there are a large number of small repaired L0 sstables, compaction is 
> very slow
> 
>
> Key: CASSANDRA-11599
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11599
> Project: Cassandra
>  Issue Type: Bug
> Environment: 2.1.13
>Reporter: Ruoran Wang
>
> This is on 6 node 2.1.13 cluster with leveled compaction strategy. This 
> happens when/after running incremental repair, like '/usr/bin/nodetool repair 
> -pr -par -local -inc -- KEYSPACE'
>  
> Initially, I found missing metrics when there is heavy compaction going 
> on(https://issues.apache.org/jira/browse/CASSANDRA-9625). Because 
> WrappingCompactionStrategy is blocked. 
> Then I saw a case where compaction got stucked (progress moves dramatically 
> slow). There are 29k sstables after inc repair where I noticed tons of 
> sstables are only 200+ Bytes just containing 1 key. Also because of 
> WrappingCompactionStrategy is blocked.
>  
> My guess is, with 

[jira] [Updated] (CASSANDRA-11599) When there are a large number of small repaired L0 sstables, compaction is very slow

2016-04-18 Thread Ruoran Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruoran Wang updated CASSANDRA-11599:

Description: 
This is on 6 node 2.1.13 cluster with leveled compaction strategy. This happens 
when/after running incremental repair, like '/usr/bin/nodetool repair -pr -par 
-local -inc -- KEYSPACE'
 
Initially, I found missing metrics when there is heavy compaction going on. 
Because WrappingCompactionStrategy is blocked. 
Then I saw a case where compaction got stucked (progress moves dramatically 
slow). There are 29k sstables after inc repair where I noticed tons of sstables 
are only 200+ Bytes just containing 1 key. Also because of 
WrappingCompactionStrategy is blocked.
 
My guess is, with 8 compaction_executors and a tons of small repaired L0 
sstables, the first thread is able to get some (likely 32) sstables to compact. 
If this task contains a large range of tokens, the following 7 thread will 
iterate through the sstabels trying to find what can be fixed in the meanwhile, 
but failing in the end, since those sstable candidates intersects with what is 
being compacted by 1st thread. From a series of thread dump, I noticed the 
thread that is doing work always get blocked by other 7 threads.
 
1. I tried to separate an inc-repair into 4 token ranges, which helped keeping 
the sstables count down. That seems to be working.

2. Another fix I tried is, replace ageSortedSSTables with a new method 
"keyCountSortedSSTables", which small sstables are returned first. (at 
org/apache/cassandra/db/compaction/LeveledManifest.java:586). Since there will 
be 32 very small sstables, the following condition won't be met 
('SSTableReader.getTotalBytes(candidates) > maxSSTableSizeInBytes'), and the 
compaction will merge those 32 very small sstables. This will help to prevent 
the first compaction job to be working on a set of sstables that covers a wide 
range.

I can provide more info if needed.

  was:
This is on 6 node 2.1.13 cluster with leveled compaction strategy.
 
Initially, I found missing metrics when there is heavy compaction going on. 
Because WrappingCompactionStrategy is blocked. 
Then I saw a case where compaction got stucked (progress moves dramatically 
slow). There are 29k sstables after inc repair where I noticed tons of sstables 
are only 200+ Bytes just containing 1 key. Also because of 
WrappingCompactionStrategy is blocked.
 
My guess is, with 8 compaction_executors and a tons of small repaired L0 
sstables, the first thread is able to get some (likely 32) sstables to compact. 
If this task contains a large range of tokens, the following 7 thread will 
iterate through the sstabels trying to find what can be fixed in the meanwhile, 
but failing in the end, since those sstable candidates intersects with what is 
being compacted by 1st thread. From a series of thread dump, I noticed the 
thread that is doing work always get blocked by other 7 threads.
 
1. I tried to separate an inc-repair into 4 token ranges, which helped keeping 
the sstables count down. That seems to be working.

2. Another fix I tried is, replace ageSortedSSTables with a new method 
"keyCountSortedSSTables", which small sstables are returned first. (at 
org/apache/cassandra/db/compaction/LeveledManifest.java:586). Since there will 
be 32 very small sstables, the following condition won't be met 
('SSTableReader.getTotalBytes(candidates) > maxSSTableSizeInBytes'), and the 
compaction will merge those 32 very small sstables. This will help to prevent 
the first compaction job to be working on a set of sstables that covers a wide 
range.

I can provide more info if needed.


> When there are a large number of small repaired L0 sstables, compaction is 
> very slow
> 
>
> Key: CASSANDRA-11599
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11599
> Project: Cassandra
>  Issue Type: Bug
> Environment: 2.1.13
>Reporter: Ruoran Wang
>
> This is on 6 node 2.1.13 cluster with leveled compaction strategy. This 
> happens when/after running incremental repair, like '/usr/bin/nodetool repair 
> -pr -par -local -inc -- KEYSPACE'
>  
> Initially, I found missing metrics when there is heavy compaction going on. 
> Because WrappingCompactionStrategy is blocked. 
> Then I saw a case where compaction got stucked (progress moves dramatically 
> slow). There are 29k sstables after inc repair where I noticed tons of 
> sstables are only 200+ Bytes just containing 1 key. Also because of 
> WrappingCompactionStrategy is blocked.
>  
> My guess is, with 8 compaction_executors and a tons of small repaired L0 
> sstables, the first thread is able to get some (likely 32) sstables to 
> compact. If this task contains a large range of tokens, the following 7 
> thread will iterate 

[jira] [Created] (CASSANDRA-11599) When there are a large number of small repaired L0 sstables, compaction is very slow

2016-04-18 Thread Ruoran Wang (JIRA)
Ruoran Wang created CASSANDRA-11599:
---

 Summary: When there are a large number of small repaired L0 
sstables, compaction is very slow
 Key: CASSANDRA-11599
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11599
 Project: Cassandra
  Issue Type: Bug
 Environment: 2.1.13
Reporter: Ruoran Wang


This is on 6 node 2.1.13 cluster with leveled compaction strategy.
 
Initially, I found missing metrics when there is heavy compaction going on. 
Because WrappingCompactionStrategy is blocked. 
Then I saw a case where compaction got stucked (progress moves dramatically 
slow). There are 29k sstables after inc repair where I noticed tons of sstables 
are only 200+ Bytes just containing 1 key. Also because of 
WrappingCompactionStrategy is blocked.
 
My guess is, with 8 compaction_executors and a tons of small repaired L0 
sstables, the first thread is able to get some (likely 32) sstables to compact. 
If this task contains a large range of tokens, the following 7 thread will 
iterate through the sstabels trying to find what can be fixed in the meanwhile, 
but failing in the end, since those sstable candidates intersects with what is 
being compacted by 1st thread. From a series of thread dump, I noticed the 
thread that is doing work always get blocked by other 7 threads.
 
1. I tried to separate an inc-repair into 4 token ranges, which helped keeping 
the sstables count down. That seems to be working.

2. Another fix I tried is, replace ageSortedSSTables with a new method 
"keyCountSortedSSTables", which small sstables are returned first. (at 
org/apache/cassandra/db/compaction/LeveledManifest.java:586). Since there will 
be 32 very small sstables, the following condition won't be met 
('SSTableReader.getTotalBytes(candidates) > maxSSTableSizeInBytes'), and the 
compaction will merge those 32 very small sstables. This will help to prevent 
the first compaction job to be working on a set of sstables that covers a wide 
range.

I can provide more info if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-9625) GraphiteReporter not reporting

2016-04-13 Thread Ruoran Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239684#comment-15239684
 ] 

Ruoran Wang edited comment on CASSANDRA-9625 at 4/13/16 5:49 PM:
-

I tired this following dumb fix, I applied similar change to 
ColumnFamilyMetrics where 
cfs.getCompactionStrategy().getEstimatedRemainingTasks(); is called. 
I hard coded to return 21 when getEstimatedRemainingTasks is taking too long. 
The graph 
(https://issues.apache.org/jira/secure/attachment/12798541/Screen%20Shot%202016-04-13%20at%2010.40.58%20AM.png)
 shows when it's busy pendingCompaction shows 21, but now the graphite-reporter 
will continue to collect other metrics instead of blocked.

{noformat}
diff --git a/src/java/org/apache/cassandra/metrics/CompactionMetrics.java 
b/src/java/org/apache/cassandra/metrics/CompactionMetrics.java
index f7a99e1..e2ac22b 100644
--- a/src/java/org/apache/cassandra/metrics/CompactionMetrics.java
+++ b/src/java/org/apache/cassandra/metrics/CompactionMetrics.java
@@ -18,8 +18,13 @@
 package org.apache.cassandra.metrics;
 
 import java.util.*;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
 import java.util.concurrent.ThreadPoolExecutor;
 import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
 
 import com.yammer.metrics.Metrics;
 import com.yammer.metrics.core.Counter;
@@ -31,12 +36,17 @@ import org.apache.cassandra.db.ColumnFamilyStore;
 import org.apache.cassandra.db.Keyspace;
 import org.apache.cassandra.db.compaction.CompactionInfo;
 import org.apache.cassandra.db.compaction.CompactionManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 /**
  * Metrics for compaction.
  */
 public class CompactionMetrics implements 
CompactionManager.CompactionExecutorStatsCollector
 {
+
+private static final Logger logger = 
LoggerFactory.getLogger(CompactionMetrics.class);
+
 public static final MetricNameFactory factory = new 
DefaultNameFactory("Compaction");
 
 // a synchronized identity set of running tasks to their compaction info
@@ -57,15 +67,36 @@ public class CompactionMetrics implements 
CompactionManager.CompactionExecutorSt
 {
 public Integer value()
 {
-int n = 0;
-// add estimate number of compactions need to be done
-for (String keyspaceName : Schema.instance.getKeyspaces())
-{
-for (ColumnFamilyStore cfs : 
Keyspace.open(keyspaceName).getColumnFamilyStores())
-n += 
cfs.getCompactionStrategy().getEstimatedRemainingTasks();
+// The collector thread is likely to be blocked by compactions
+// This is a quick fix to avoid losing metrics
+ExecutorService executor = Executors.newSingleThreadExecutor();
+
+final Future future = executor.submit(new Callable() {
+@Override
+public Integer call() throws Exception {
+int n = 0;
+// add estimate number of compactions need to be done
+for (String keyspaceName : 
Schema.instance.getKeyspaces())
+{
+for (ColumnFamilyStore cfs : 
Keyspace.open(keyspaceName).getColumnFamilyStores())
+n += 
cfs.getCompactionStrategy().getEstimatedRemainingTasks();
+}
+// add number of currently running compactions
+return n + compactions.size();
+}
+});
+
+try {
+return future.get(20, TimeUnit.SECONDS);
+} catch (TimeoutException e) {
+future.cancel(true);
+logger.error("Skipping PendingTasks because some cfs is 
busy");
+} catch (Exception othere) {
+logger.error("Skipping PendingTasks because an unexpected 
exception", othere);
 }
-// add number of currently running compactions
-return n + compactions.size();
+
+executor.shutdownNow();
+return 21;
 }
 });
 completedTasks = 
Metrics.newGauge(factory.createMetricName("CompletedTasks"), new Gauge()
{noformat}


was (Author: ruoranwang):
I tired this following dumb fix, I applied similar change to 
ColumnFamilyMetrics where 
cfs.getCompactionStrategy().getEstimatedRemainingTasks(); is called. 
I hard coded to return 21 when getEstimatedRemainingTasks is taking too long. 
The graph shows when it's busy pendingCompaction shows 21, but now the 
graphite-reporter will continue to collect other metrics instead 

[jira] [Updated] (CASSANDRA-9625) GraphiteReporter not reporting

2016-04-13 Thread Ruoran Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruoran Wang updated CASSANDRA-9625:
---
Attachment: Screen Shot 2016-04-13 at 10.40.58 AM.png

I tired this following dumb fix, I applied similar change to 
ColumnFamilyMetrics where 
cfs.getCompactionStrategy().getEstimatedRemainingTasks(); is called. 
I hard coded to return 21 when getEstimatedRemainingTasks is taking too long. 
The graph shows when it's busy pendingCompaction shows 21, but now the 
graphite-reporter will continue to collect other metrics instead of blocked.

{noformat}
diff --git a/src/java/org/apache/cassandra/metrics/CompactionMetrics.java 
b/src/java/org/apache/cassandra/metrics/CompactionMetrics.java
index f7a99e1..e2ac22b 100644
--- a/src/java/org/apache/cassandra/metrics/CompactionMetrics.java
+++ b/src/java/org/apache/cassandra/metrics/CompactionMetrics.java
@@ -18,8 +18,13 @@
 package org.apache.cassandra.metrics;
 
 import java.util.*;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
 import java.util.concurrent.ThreadPoolExecutor;
 import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
 
 import com.yammer.metrics.Metrics;
 import com.yammer.metrics.core.Counter;
@@ -31,12 +36,17 @@ import org.apache.cassandra.db.ColumnFamilyStore;
 import org.apache.cassandra.db.Keyspace;
 import org.apache.cassandra.db.compaction.CompactionInfo;
 import org.apache.cassandra.db.compaction.CompactionManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 /**
  * Metrics for compaction.
  */
 public class CompactionMetrics implements 
CompactionManager.CompactionExecutorStatsCollector
 {
+
+private static final Logger logger = 
LoggerFactory.getLogger(CompactionMetrics.class);
+
 public static final MetricNameFactory factory = new 
DefaultNameFactory("Compaction");
 
 // a synchronized identity set of running tasks to their compaction info
@@ -57,15 +67,36 @@ public class CompactionMetrics implements 
CompactionManager.CompactionExecutorSt
 {
 public Integer value()
 {
-int n = 0;
-// add estimate number of compactions need to be done
-for (String keyspaceName : Schema.instance.getKeyspaces())
-{
-for (ColumnFamilyStore cfs : 
Keyspace.open(keyspaceName).getColumnFamilyStores())
-n += 
cfs.getCompactionStrategy().getEstimatedRemainingTasks();
+// The collector thread is likely to be blocked by compactions
+// This is a quick fix to avoid losing metrics
+ExecutorService executor = Executors.newSingleThreadExecutor();
+
+final Future future = executor.submit(new Callable() {
+@Override
+public Integer call() throws Exception {
+int n = 0;
+// add estimate number of compactions need to be done
+for (String keyspaceName : 
Schema.instance.getKeyspaces())
+{
+for (ColumnFamilyStore cfs : 
Keyspace.open(keyspaceName).getColumnFamilyStores())
+n += 
cfs.getCompactionStrategy().getEstimatedRemainingTasks();
+}
+// add number of currently running compactions
+return n + compactions.size();
+}
+});
+
+try {
+return future.get(20, TimeUnit.SECONDS);
+} catch (TimeoutException e) {
+future.cancel(true);
+logger.error("Skipping PendingTasks because some cfs is 
busy");
+} catch (Exception othere) {
+logger.error("Skipping PendingTasks because an unexpected 
exception", othere);
 }
-// add number of currently running compactions
-return n + compactions.size();
+
+executor.shutdownNow();
+return 21;
 }
 });
 completedTasks = 
Metrics.newGauge(factory.createMetricName("CompletedTasks"), new Gauge()
{noformat}

> GraphiteReporter not reporting
> --
>
> Key: CASSANDRA-9625
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9625
> Project: Cassandra
>  Issue Type: Bug
> Environment: Debian Jessie, 7u79-2.5.5-1~deb8u1, Cassandra 2.1.3
>Reporter: Eric Evans
>Assignee: T Jake Luciani
> Attachments: Screen Shot 2016-04-13 at 10.40.58 AM.png, metrics.yaml, 
> thread-dump.log
>
>
> When upgrading from 2.1.3 to 2.1.6, the Graphite 

[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException

2016-04-13 Thread Ruoran Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239639#comment-15239639
 ] 

Ruoran Wang commented on CASSANDRA-9935:


[~pauloricardomg] I have some news. I mentioned earlier I found the two 
sstables returned from getsstables --hex-format always shows one is another's 
ancestor. So I looked at anticompaction, and I think it's the old sstable not 
being removed due to a race condition. CASSANDRA-10831 moved 
'markCompactedSSTablesReplaced' out of a try catch clause. 
{notformat}
cfs.getDataTracker().markCompactedSSTablesReplaced(successfullyAntiCompactedSSTables,
 anticompactedSSTables, OperationType.ANTICOMPACTION);
{notformat}
When I added try catch around this, I found an AssertError when the 
anticompaction process tries remove old sstables.
{notformat}
java.lang.AssertionError: Expecting new size of 95, got 96 while replacing XXX 
by XXX
{notformat}
That is thrown from org.apache.cassandra.db.DataTracker.View#replace

So I think this could be caused by unmarkCompacting called before 
markCompactedSSTablesReplaced. Yesterday I created another ticket for 2.1.13, I 
also attached my proposed patch there.
https://issues.apache.org/jira/browse/CASSANDRA-11548

> Repair fails with RuntimeException
> --
>
> Key: CASSANDRA-9935
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9935
> Project: Cassandra
>  Issue Type: Bug
> Environment: C* 2.1.8, Debian Wheezy
>Reporter: mlowicki
>Assignee: Yuki Morishita
> Fix For: 2.1.x
>
> Attachments: 9935.patch, db1.sync.lati.osa.cassandra.log, 
> db5.sync.lati.osa.cassandra.log, system.log.10.210.3.117, 
> system.log.10.210.3.221, system.log.10.210.3.230
>
>
> We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade 
> to 2.1.8 it started to work faster but now it fails with:
> {code}
> ...
> [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde 
> for range (-5474076923322749342,-5468600594078911162] finished
> [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde 
> for range (-8631877858109464676,-8624040066373718932] finished
> [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde 
> for range (-5372806541854279315,-5369354119480076785] finished
> [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde 
> for range (8166489034383821955,8168408930184216281] finished
> [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde 
> for range (6084602890817326921,6088328703025510057] finished
> [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde 
> for range (-781874602493000830,-781745173070807746] finished
> [2015-07-29 20:44:03,957] Repair command #4 finished
> error: nodetool failed, check server logs
> -- StackTrace --
> java.lang.RuntimeException: nodetool failed, check server logs
> at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290)
> at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202)
> {code}
> After running:
> {code}
> nodetool repair --partitioner-range --parallel --in-local-dc sync
> {code}
> Last records in logs regarding repair are:
> {code}
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range 
> (-7695808664784761779,-7693529816291585568] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range 
> (806371695398849,8065203836608925992] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range 
> (-5474076923322749342,-5468600594078911162] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range 
> (-8631877858109464676,-8624040066373718932] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - 
> Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range 
> (-5372806541854279315,-5369354119480076785] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - 
> Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range 
> (8166489034383821955,8168408930184216281] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - 
> Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range 
> (6084602890817326921,6088328703025510057] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - 
> Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range 
> (-781874602493000830,-781745173070807746] finished
> {code}
> but a 

[jira] [Comment Edited] (CASSANDRA-9935) Repair fails with RuntimeException

2016-04-13 Thread Ruoran Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239639#comment-15239639
 ] 

Ruoran Wang edited comment on CASSANDRA-9935 at 4/13/16 5:12 PM:
-

[~pauloricardomg] I have some news. I mentioned earlier I found the two 
sstables returned from getsstables --hex-format always shows one is another's 
ancestor. So I looked at anticompaction, and I think it's the old sstable not 
being removed due to a race condition. CASSANDRA-10831 moved 
'markCompactedSSTablesReplaced' out of a try catch clause. 
{noformat}
cfs.getDataTracker().markCompactedSSTablesReplaced(successfullyAntiCompactedSSTables,
 anticompactedSSTables, OperationType.ANTICOMPACTION);
{noformat}
When I added try catch around this, I found an AssertError when the 
anticompaction process tries remove old sstables.
{notformat}
java.lang.AssertionError: Expecting new size of 95, got 96 while replacing XXX 
by XXX
{notformat}
That is thrown from org.apache.cassandra.db.DataTracker.View#replace

So I think this could be caused by unmarkCompacting called before 
markCompactedSSTablesReplaced. Yesterday I created another ticket for 2.1.13, I 
also attached my proposed patch there.
https://issues.apache.org/jira/browse/CASSANDRA-11548


was (Author: ruoranwang):
[~pauloricardomg] I have some news. I mentioned earlier I found the two 
sstables returned from getsstables --hex-format always shows one is another's 
ancestor. So I looked at anticompaction, and I think it's the old sstable not 
being removed due to a race condition. CASSANDRA-10831 moved 
'markCompactedSSTablesReplaced' out of a try catch clause. 
{notformat}
cfs.getDataTracker().markCompactedSSTablesReplaced(successfullyAntiCompactedSSTables,
 anticompactedSSTables, OperationType.ANTICOMPACTION);
{notformat}
When I added try catch around this, I found an AssertError when the 
anticompaction process tries remove old sstables.
{notformat}
java.lang.AssertionError: Expecting new size of 95, got 96 while replacing XXX 
by XXX
{notformat}
That is thrown from org.apache.cassandra.db.DataTracker.View#replace

So I think this could be caused by unmarkCompacting called before 
markCompactedSSTablesReplaced. Yesterday I created another ticket for 2.1.13, I 
also attached my proposed patch there.
https://issues.apache.org/jira/browse/CASSANDRA-11548

> Repair fails with RuntimeException
> --
>
> Key: CASSANDRA-9935
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9935
> Project: Cassandra
>  Issue Type: Bug
> Environment: C* 2.1.8, Debian Wheezy
>Reporter: mlowicki
>Assignee: Yuki Morishita
> Fix For: 2.1.x
>
> Attachments: 9935.patch, db1.sync.lati.osa.cassandra.log, 
> db5.sync.lati.osa.cassandra.log, system.log.10.210.3.117, 
> system.log.10.210.3.221, system.log.10.210.3.230
>
>
> We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade 
> to 2.1.8 it started to work faster but now it fails with:
> {code}
> ...
> [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde 
> for range (-5474076923322749342,-5468600594078911162] finished
> [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde 
> for range (-8631877858109464676,-8624040066373718932] finished
> [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde 
> for range (-5372806541854279315,-5369354119480076785] finished
> [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde 
> for range (8166489034383821955,8168408930184216281] finished
> [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde 
> for range (6084602890817326921,6088328703025510057] finished
> [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde 
> for range (-781874602493000830,-781745173070807746] finished
> [2015-07-29 20:44:03,957] Repair command #4 finished
> error: nodetool failed, check server logs
> -- StackTrace --
> java.lang.RuntimeException: nodetool failed, check server logs
> at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290)
> at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202)
> {code}
> After running:
> {code}
> nodetool repair --partitioner-range --parallel --in-local-dc sync
> {code}
> Last records in logs regarding repair are:
> {code}
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range 
> (-7695808664784761779,-7693529816291585568] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range 
> (806371695398849,8065203836608925992] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 

[jira] [Comment Edited] (CASSANDRA-9935) Repair fails with RuntimeException

2016-04-13 Thread Ruoran Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239639#comment-15239639
 ] 

Ruoran Wang edited comment on CASSANDRA-9935 at 4/13/16 5:13 PM:
-

[~pauloricardomg] I have some news. I mentioned earlier I found the two 
sstables returned from getsstables --hex-format always shows one is another's 
ancestor. So I looked at anticompaction, and I think it's the old sstable not 
being removed due to a race condition. CASSANDRA-10831 moved 
'markCompactedSSTablesReplaced' out of a try catch clause. 
{noformat}
cfs.getDataTracker().markCompactedSSTablesReplaced(successfullyAntiCompactedSSTables,
 anticompactedSSTables, OperationType.ANTICOMPACTION);
{noformat}
When I added try catch around this, I found an AssertError when the 
anticompaction process tries remove old sstables.
{noformat}
java.lang.AssertionError: Expecting new size of 95, got 96 while replacing XXX 
by XXX
{noformat}
That is thrown from org.apache.cassandra.db.DataTracker.View#replace

So I think this could be caused by unmarkCompacting called before 
markCompactedSSTablesReplaced. Yesterday I created another ticket for 2.1.13, I 
also attached my proposed patch there.
https://issues.apache.org/jira/browse/CASSANDRA-11548


was (Author: ruoranwang):
[~pauloricardomg] I have some news. I mentioned earlier I found the two 
sstables returned from getsstables --hex-format always shows one is another's 
ancestor. So I looked at anticompaction, and I think it's the old sstable not 
being removed due to a race condition. CASSANDRA-10831 moved 
'markCompactedSSTablesReplaced' out of a try catch clause. 
{noformat}
cfs.getDataTracker().markCompactedSSTablesReplaced(successfullyAntiCompactedSSTables,
 anticompactedSSTables, OperationType.ANTICOMPACTION);
{noformat}
When I added try catch around this, I found an AssertError when the 
anticompaction process tries remove old sstables.
{notformat}
java.lang.AssertionError: Expecting new size of 95, got 96 while replacing XXX 
by XXX
{notformat}
That is thrown from org.apache.cassandra.db.DataTracker.View#replace

So I think this could be caused by unmarkCompacting called before 
markCompactedSSTablesReplaced. Yesterday I created another ticket for 2.1.13, I 
also attached my proposed patch there.
https://issues.apache.org/jira/browse/CASSANDRA-11548

> Repair fails with RuntimeException
> --
>
> Key: CASSANDRA-9935
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9935
> Project: Cassandra
>  Issue Type: Bug
> Environment: C* 2.1.8, Debian Wheezy
>Reporter: mlowicki
>Assignee: Yuki Morishita
> Fix For: 2.1.x
>
> Attachments: 9935.patch, db1.sync.lati.osa.cassandra.log, 
> db5.sync.lati.osa.cassandra.log, system.log.10.210.3.117, 
> system.log.10.210.3.221, system.log.10.210.3.230
>
>
> We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade 
> to 2.1.8 it started to work faster but now it fails with:
> {code}
> ...
> [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde 
> for range (-5474076923322749342,-5468600594078911162] finished
> [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde 
> for range (-8631877858109464676,-8624040066373718932] finished
> [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde 
> for range (-5372806541854279315,-5369354119480076785] finished
> [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde 
> for range (8166489034383821955,8168408930184216281] finished
> [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde 
> for range (6084602890817326921,6088328703025510057] finished
> [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde 
> for range (-781874602493000830,-781745173070807746] finished
> [2015-07-29 20:44:03,957] Repair command #4 finished
> error: nodetool failed, check server logs
> -- StackTrace --
> java.lang.RuntimeException: nodetool failed, check server logs
> at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290)
> at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202)
> {code}
> After running:
> {code}
> nodetool repair --partitioner-range --parallel --in-local-dc sync
> {code}
> Last records in logs regarding repair are:
> {code}
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range 
> (-7695808664784761779,-7693529816291585568] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range 
> (806371695398849,8065203836608925992] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 

[jira] [Updated] (CASSANDRA-11548) Anticompaction not removing old sstables

2016-04-11 Thread Ruoran Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruoran Wang updated CASSANDRA-11548:

Attachment: 0001-cassandra-2.1.13-potential-fix.patch

I only tried unit test for this. Still trying to figure out dtest.

> Anticompaction not removing old sstables
> 
>
> Key: CASSANDRA-11548
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11548
> Project: Cassandra
>  Issue Type: Bug
> Environment: 2.1.13
>Reporter: Ruoran Wang
> Attachments: 0001-cassandra-2.1.13-potential-fix.patch
>
>
> 1. 12/29/15 https://issues.apache.org/jira/browse/CASSANDRA-10831
> Moved markCompactedSSTablesReplaced out of the loop ```for (SSTableReader 
> sstable : repairedSSTables)```
> 2. 1/18/16 https://issues.apache.org/jira/browse/CASSANDRA-10829
> Added unmarkCompacting into the loop. ```for (SSTableReader sstable : 
> repairedSSTables)```
> I think the effect of those above change might cause the 
> markCompactedSSTablesReplaced fail on 
> DataTracker.java
> {noformat}
>assert newSSTables.size() + newShadowed.size() == newSSTablesSize :
> String.format("Expecting new size of %d, got %d while 
> replacing %s by %s in %s",
>   newSSTablesSize, newSSTables.size() + 
> newShadowed.size(), oldSSTables, replacements, this);
> {noformat}
> Since change CASSANDRA-10831 moved it out. This AssertError won't be caught, 
> leaving the oldsstables not removed. (Then this might cause row out of order 
> error when doing incremental repair if there are L1 un-repaired sstables.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11548) Anticompaction not removing old sstables

2016-04-11 Thread Ruoran Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruoran Wang updated CASSANDRA-11548:

Description: 
1. 12/29/15 https://issues.apache.org/jira/browse/CASSANDRA-10831
Moved markCompactedSSTablesReplaced out of the loop ```for (SSTableReader 
sstable : repairedSSTables)```

2. 1/18/16 https://issues.apache.org/jira/browse/CASSANDRA-10829
Added unmarkCompacting into the loop. ```for (SSTableReader sstable : 
repairedSSTables)```

I think the effect of those above change might cause the 
markCompactedSSTablesReplaced fail on 

DataTracker.java
{noformat}
   assert newSSTables.size() + newShadowed.size() == newSSTablesSize :
String.format("Expecting new size of %d, got %d while replacing 
%s by %s in %s",
  newSSTablesSize, newSSTables.size() + 
newShadowed.size(), oldSSTables, replacements, this);
{noformat}

Since change CASSANDRA-10831 moved it out. This AssertError won't be caught, 
leaving the oldsstables not removed. (Then this might cause row out of order 
error when doing incremental repair if there are L1 un-repaired sstables.)

  was:
1. 12/29/15 https://issues.apache.org/jira/browse/CASSANDRA-10831
Moved markCompactedSSTablesReplaced out of the loop ```for (SSTableReader 
sstable : repairedSSTables)```

2. 1/18/16 https://issues.apache.org/jira/browse/CASSANDRA-10829
Added unmarkCompacting into the loop. ```for (SSTableReader sstable : 
repairedSSTables)```

I think the effect of those above change might cause the 
markCompactedSSTablesReplaced fail on 

DataTracker.java
```
assert newSSTables.size() + newShadowed.size() == newSSTablesSize :
String.format("Expecting new size of %d, got %d while replacing 
%s by %s in %s",
  newSSTablesSize, newSSTables.size() + 
newShadowed.size(), oldSSTables, replacements, this);
```

Since change CASSANDRA-10831 moved it out. This AssertError won't be caught, 
leaving the oldsstables not removed. (Then this might cause row out of order 
error when doing incremental repair if there are L1 un-repaired sstables.)


> Anticompaction not removing old sstables
> 
>
> Key: CASSANDRA-11548
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11548
> Project: Cassandra
>  Issue Type: Bug
> Environment: 2.1.13
>Reporter: Ruoran Wang
>
> 1. 12/29/15 https://issues.apache.org/jira/browse/CASSANDRA-10831
> Moved markCompactedSSTablesReplaced out of the loop ```for (SSTableReader 
> sstable : repairedSSTables)```
> 2. 1/18/16 https://issues.apache.org/jira/browse/CASSANDRA-10829
> Added unmarkCompacting into the loop. ```for (SSTableReader sstable : 
> repairedSSTables)```
> I think the effect of those above change might cause the 
> markCompactedSSTablesReplaced fail on 
> DataTracker.java
> {noformat}
>assert newSSTables.size() + newShadowed.size() == newSSTablesSize :
> String.format("Expecting new size of %d, got %d while 
> replacing %s by %s in %s",
>   newSSTablesSize, newSSTables.size() + 
> newShadowed.size(), oldSSTables, replacements, this);
> {noformat}
> Since change CASSANDRA-10831 moved it out. This AssertError won't be caught, 
> leaving the oldsstables not removed. (Then this might cause row out of order 
> error when doing incremental repair if there are L1 un-repaired sstables.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11548) Anticompaction not removing old sstables

2016-04-11 Thread Ruoran Wang (JIRA)
Ruoran Wang created CASSANDRA-11548:
---

 Summary: Anticompaction not removing old sstables
 Key: CASSANDRA-11548
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11548
 Project: Cassandra
  Issue Type: Bug
 Environment: 2.1.13
Reporter: Ruoran Wang


1. 12/29/15 https://issues.apache.org/jira/browse/CASSANDRA-10831
Moved markCompactedSSTablesReplaced out of the loop ```for (SSTableReader 
sstable : repairedSSTables)```

2. 1/18/16 https://issues.apache.org/jira/browse/CASSANDRA-10829
Added unmarkCompacting into the loop. ```for (SSTableReader sstable : 
repairedSSTables)```

I think the effect of those above change might cause the 
markCompactedSSTablesReplaced fail on 

DataTracker.java
```
assert newSSTables.size() + newShadowed.size() == newSSTablesSize :
String.format("Expecting new size of %d, got %d while replacing 
%s by %s in %s",
  newSSTablesSize, newSSTables.size() + 
newShadowed.size(), oldSSTables, replacements, this);
```

Since change CASSANDRA-10831 moved it out. This AssertError won't be caught, 
leaving the oldsstables not removed. (Then this might cause row out of order 
error when doing incremental repair if there are L1 un-repaired sstables.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException

2016-03-31 Thread Ruoran Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220322#comment-15220322
 ] 

Ruoran Wang commented on CASSANDRA-9935:


That's not an ideal fix. I noticed in LevelCompactionStrategy getScanners, it 
uses SSTableScanner for L0. However, those sstables have the issue are at L1, I 
need to figure out why that happens.

> Repair fails with RuntimeException
> --
>
> Key: CASSANDRA-9935
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9935
> Project: Cassandra
>  Issue Type: Bug
> Environment: C* 2.1.8, Debian Wheezy
>Reporter: mlowicki
>Assignee: Yuki Morishita
> Fix For: 2.1.x
>
> Attachments: db1.sync.lati.osa.cassandra.log, 
> db5.sync.lati.osa.cassandra.log, system.log.10.210.3.117, 
> system.log.10.210.3.221, system.log.10.210.3.230
>
>
> We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade 
> to 2.1.8 it started to work faster but now it fails with:
> {code}
> ...
> [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde 
> for range (-5474076923322749342,-5468600594078911162] finished
> [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde 
> for range (-8631877858109464676,-8624040066373718932] finished
> [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde 
> for range (-5372806541854279315,-5369354119480076785] finished
> [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde 
> for range (8166489034383821955,8168408930184216281] finished
> [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde 
> for range (6084602890817326921,6088328703025510057] finished
> [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde 
> for range (-781874602493000830,-781745173070807746] finished
> [2015-07-29 20:44:03,957] Repair command #4 finished
> error: nodetool failed, check server logs
> -- StackTrace --
> java.lang.RuntimeException: nodetool failed, check server logs
> at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290)
> at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202)
> {code}
> After running:
> {code}
> nodetool repair --partitioner-range --parallel --in-local-dc sync
> {code}
> Last records in logs regarding repair are:
> {code}
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range 
> (-7695808664784761779,-7693529816291585568] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range 
> (806371695398849,8065203836608925992] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range 
> (-5474076923322749342,-5468600594078911162] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range 
> (-8631877858109464676,-8624040066373718932] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - 
> Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range 
> (-5372806541854279315,-5369354119480076785] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - 
> Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range 
> (8166489034383821955,8168408930184216281] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - 
> Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range 
> (6084602890817326921,6088328703025510057] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - 
> Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range 
> (-781874602493000830,-781745173070807746] finished
> {code}
> but a bit above I see (at least two times in attached log):
> {code}
> ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - 
> Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range 
> (5765414319217852786,5781018794516851576] failed with error 
> org.apache.cassandra.exceptions.RepairException: [repair 
> #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, 
> (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> org.apache.cassandra.exceptions.RepairException: [repair 
> #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, 
> (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162
> at java.util.concurrent.FutureTask.report(FutureTask.java:122) 
> [na:1.7.0_80]
> at 

[jira] [Comment Edited] (CASSANDRA-9935) Repair fails with RuntimeException

2016-03-30 Thread Ruoran Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219158#comment-15219158
 ] 

Ruoran Wang edited comment on CASSANDRA-9935 at 3/31/16 1:22 AM:
-

[~pauloricardomg] I am able to download the sstables to my local machine and 
step through the code (I'm using 2.1.13). Here are things I found interesting, 
- Whenever the row key out of order error shows up, I can find two sstables, 
say A and B, where B is the subset of A. The average cell size is 93.
- when stepping through the code, I found the unrepairedScanners in 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy#getScanners are 
of type LeveledScanner. 
- I wonder why unrepaired in WrappingCompactionStrategy is set the same way as 
repaired 
(org.apache.cassandra.db.compaction.WrappingCompactionStrategy#setStrategy), 
and there are assert statements checking  assert 
repaired.getClass().equals(unrepaired.getClass()). From the documentation for 
incremental, my understanding is that unrepaired sstables should be using 
SizeTieredCompactionStrategy.

I tried the following fix locally and it worked, gonna test it on prod 
machines. I would appreciated some help here to make sure my theory is not off 
the track.

{noformat}
diff --git 
a/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java 
b/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java
index 77ca404..498a939 100644
--- 
a/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java
+++ 
b/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java
@@ -261,13 +261,18 @@ public abstract class AbstractCompactionStrategy
 });
 }

+public ScannerList getScanners(Collection sstables, 
Range range)
+{
+return getDefaultScanners(sstables, range);
+}
+
 /**
  * Returns a list of KeyScanners given sstables and a range on which to 
scan.
  * The default implementation simply grab one SSTableScanner per-sstable, 
but overriding this method
  * allow for a more memory efficient solution if we know the sstable don't 
overlap (see
  * LeveledCompactionStrategy for instance).
  */
-public ScannerList getScanners(Collection sstables, 
Range range)
+public ScannerList getDefaultScanners(Collection sstables, 
Range range)
 {
 RateLimiter limiter = CompactionManager.instance.getRateLimiter();
 ArrayList scanners = new ArrayList();
diff --git 
a/src/java/org/apache/cassandra/db/compaction/WrappingCompactionStrategy.java 
b/src/java/org/apache/cassandra/db/compaction/WrappingCompactionStrategy.java
index 71a6bc1..f398067 100644
--- 
a/src/java/org/apache/cassandra/db/compaction/WrappingCompactionStrategy.java
+++ 
b/src/java/org/apache/cassandra/db/compaction/WrappingCompactionStrategy.java
@@ -404,7 +404,7 @@ public final class WrappingCompactionStrategy extends 
AbstractCompactionStrategy
 else
 unrepairedSSTables.add(sstable);
 ScannerList repairedScanners = repaired.getScanners(repairedSSTables, 
range);
-ScannerList unrepairedScanners = 
unrepaired.getScanners(unrepairedSSTables, range);
+ScannerList unrepairedScanners = 
unrepaired.getDefaultScanners(unrepairedSSTables, range);
 List scanners = new 
ArrayList<>(repairedScanners.scanners.size() + 
unrepairedScanners.scanners.size());
 scanners.addAll(repairedScanners.scanners);
 scanners.addAll(unrepairedScanners.scanners);
{noformat}


was (Author: ruoranwang):
[~pauloricardomg] I am able to download the sstables to my local machine and 
step through the code. Here are things I found interesting, 
- Whenever the row key out of order error shows up, I can find two sstables, 
say A and B, where B is the subset of A. The average cell size is 93.
- when stepping through the code, I found the unrepairedScanners in 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy#getScanners are 
of type LeveledScanner. 
- I wonder why unrepaired in WrappingCompactionStrategy is set the same way as 
repaired 
(org.apache.cassandra.db.compaction.WrappingCompactionStrategy#setStrategy), 
and there are assert statements checking  assert 
repaired.getClass().equals(unrepaired.getClass()). From the documentation for 
incremental, my understanding is that unrepaired sstables should be using 
SizeTieredCompactionStrategy.

I tried the following fix locally and it worked, gonna test it on prod 
machines. I would appreciated some help here to make sure my theory is not off 
the track.

{noformat}
diff --git 
a/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java 
b/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java
index 77ca404..498a939 100644
--- 
a/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java
+++ 

[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException

2016-03-30 Thread Ruoran Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219158#comment-15219158
 ] 

Ruoran Wang commented on CASSANDRA-9935:


[~pauloricardomg] I am able to download the sstables to my local machine and 
step through the code. Here are things I found interesting, 
- Whenever the row key out of order error shows up, I can find two sstables, 
say A and B, where B is the subset of A. The average cell size is 93.
- when stepping through the code, I found the unrepairedScanners in 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy#getScanners are 
of type LeveledScanner. 
- I wonder why unrepaired in WrappingCompactionStrategy is set the same way as 
repaired 
(org.apache.cassandra.db.compaction.WrappingCompactionStrategy#setStrategy), 
and there are assert statements checking  assert 
repaired.getClass().equals(unrepaired.getClass()). From the documentation for 
incremental, my understanding is that unrepaired sstables should be using 
SizeTieredCompactionStrategy.

I tried the following fix locally and it worked, gonna test it on prod 
machines. I would appreciated some help here to make sure my theory is not off 
the track.

```
diff --git 
a/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java 
b/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java
index 77ca404..498a939 100644
--- 
a/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java
+++ 
b/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java
@@ -261,13 +261,18 @@ public abstract class AbstractCompactionStrategy
 });
 }

+public ScannerList getScanners(Collection sstables, 
Range range)
+{
+return getDefaultScanners(sstables, range);
+}
+
 /**
  * Returns a list of KeyScanners given sstables and a range on which to 
scan.
  * The default implementation simply grab one SSTableScanner per-sstable, 
but overriding this method
  * allow for a more memory efficient solution if we know the sstable don't 
overlap (see
  * LeveledCompactionStrategy for instance).
  */
-public ScannerList getScanners(Collection sstables, 
Range range)
+public ScannerList getDefaultScanners(Collection sstables, 
Range range)
 {
 RateLimiter limiter = CompactionManager.instance.getRateLimiter();
 ArrayList scanners = new ArrayList();
diff --git 
a/src/java/org/apache/cassandra/db/compaction/WrappingCompactionStrategy.java 
b/src/java/org/apache/cassandra/db/compaction/WrappingCompactionStrategy.java
index 71a6bc1..f398067 100644
--- 
a/src/java/org/apache/cassandra/db/compaction/WrappingCompactionStrategy.java
+++ 
b/src/java/org/apache/cassandra/db/compaction/WrappingCompactionStrategy.java
@@ -404,7 +404,7 @@ public final class WrappingCompactionStrategy extends 
AbstractCompactionStrategy
 else
 unrepairedSSTables.add(sstable);
 ScannerList repairedScanners = repaired.getScanners(repairedSSTables, 
range);
-ScannerList unrepairedScanners = 
unrepaired.getScanners(unrepairedSSTables, range);
+ScannerList unrepairedScanners = 
unrepaired.getDefaultScanners(unrepairedSSTables, range);
 List scanners = new 
ArrayList<>(repairedScanners.scanners.size() + 
unrepairedScanners.scanners.size());
 scanners.addAll(repairedScanners.scanners);
 scanners.addAll(unrepairedScanners.scanners);
```

> Repair fails with RuntimeException
> --
>
> Key: CASSANDRA-9935
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9935
> Project: Cassandra
>  Issue Type: Bug
> Environment: C* 2.1.8, Debian Wheezy
>Reporter: mlowicki
>Assignee: Yuki Morishita
> Fix For: 2.1.x
>
> Attachments: db1.sync.lati.osa.cassandra.log, 
> db5.sync.lati.osa.cassandra.log, system.log.10.210.3.117, 
> system.log.10.210.3.221, system.log.10.210.3.230
>
>
> We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade 
> to 2.1.8 it started to work faster but now it fails with:
> {code}
> ...
> [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde 
> for range (-5474076923322749342,-5468600594078911162] finished
> [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde 
> for range (-8631877858109464676,-8624040066373718932] finished
> [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde 
> for range (-5372806541854279315,-5369354119480076785] finished
> [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde 
> for range (8166489034383821955,8168408930184216281] finished
> [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde 
> for range 

[jira] [Comment Edited] (CASSANDRA-9935) Repair fails with RuntimeException

2016-03-30 Thread Ruoran Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219158#comment-15219158
 ] 

Ruoran Wang edited comment on CASSANDRA-9935 at 3/31/16 1:21 AM:
-

[~pauloricardomg] I am able to download the sstables to my local machine and 
step through the code. Here are things I found interesting, 
- Whenever the row key out of order error shows up, I can find two sstables, 
say A and B, where B is the subset of A. The average cell size is 93.
- when stepping through the code, I found the unrepairedScanners in 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy#getScanners are 
of type LeveledScanner. 
- I wonder why unrepaired in WrappingCompactionStrategy is set the same way as 
repaired 
(org.apache.cassandra.db.compaction.WrappingCompactionStrategy#setStrategy), 
and there are assert statements checking  assert 
repaired.getClass().equals(unrepaired.getClass()). From the documentation for 
incremental, my understanding is that unrepaired sstables should be using 
SizeTieredCompactionStrategy.

I tried the following fix locally and it worked, gonna test it on prod 
machines. I would appreciated some help here to make sure my theory is not off 
the track.

{noformat}
diff --git 
a/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java 
b/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java
index 77ca404..498a939 100644
--- 
a/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java
+++ 
b/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java
@@ -261,13 +261,18 @@ public abstract class AbstractCompactionStrategy
 });
 }

+public ScannerList getScanners(Collection sstables, 
Range range)
+{
+return getDefaultScanners(sstables, range);
+}
+
 /**
  * Returns a list of KeyScanners given sstables and a range on which to 
scan.
  * The default implementation simply grab one SSTableScanner per-sstable, 
but overriding this method
  * allow for a more memory efficient solution if we know the sstable don't 
overlap (see
  * LeveledCompactionStrategy for instance).
  */
-public ScannerList getScanners(Collection sstables, 
Range range)
+public ScannerList getDefaultScanners(Collection sstables, 
Range range)
 {
 RateLimiter limiter = CompactionManager.instance.getRateLimiter();
 ArrayList scanners = new ArrayList();
diff --git 
a/src/java/org/apache/cassandra/db/compaction/WrappingCompactionStrategy.java 
b/src/java/org/apache/cassandra/db/compaction/WrappingCompactionStrategy.java
index 71a6bc1..f398067 100644
--- 
a/src/java/org/apache/cassandra/db/compaction/WrappingCompactionStrategy.java
+++ 
b/src/java/org/apache/cassandra/db/compaction/WrappingCompactionStrategy.java
@@ -404,7 +404,7 @@ public final class WrappingCompactionStrategy extends 
AbstractCompactionStrategy
 else
 unrepairedSSTables.add(sstable);
 ScannerList repairedScanners = repaired.getScanners(repairedSSTables, 
range);
-ScannerList unrepairedScanners = 
unrepaired.getScanners(unrepairedSSTables, range);
+ScannerList unrepairedScanners = 
unrepaired.getDefaultScanners(unrepairedSSTables, range);
 List scanners = new 
ArrayList<>(repairedScanners.scanners.size() + 
unrepairedScanners.scanners.size());
 scanners.addAll(repairedScanners.scanners);
 scanners.addAll(unrepairedScanners.scanners);
{noformat}


was (Author: ruoranwang):
[~pauloricardomg] I am able to download the sstables to my local machine and 
step through the code. Here are things I found interesting, 
- Whenever the row key out of order error shows up, I can find two sstables, 
say A and B, where B is the subset of A. The average cell size is 93.
- when stepping through the code, I found the unrepairedScanners in 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy#getScanners are 
of type LeveledScanner. 
- I wonder why unrepaired in WrappingCompactionStrategy is set the same way as 
repaired 
(org.apache.cassandra.db.compaction.WrappingCompactionStrategy#setStrategy), 
and there are assert statements checking  assert 
repaired.getClass().equals(unrepaired.getClass()). From the documentation for 
incremental, my understanding is that unrepaired sstables should be using 
SizeTieredCompactionStrategy.

I tried the following fix locally and it worked, gonna test it on prod 
machines. I would appreciated some help here to make sure my theory is not off 
the track.

```
diff --git 
a/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java 
b/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java
index 77ca404..498a939 100644
--- 
a/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java
+++ 

[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException

2016-03-23 Thread Ruoran Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209712#comment-15209712
 ] 

Ruoran Wang commented on CASSANDRA-9935:


Using your nodetool getsstables --hex-format, I found all those hex keys show 
up in two sstables. ()
I checked those two sstables, found two entries in each sstable are exactly the 
same.

> Repair fails with RuntimeException
> --
>
> Key: CASSANDRA-9935
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9935
> Project: Cassandra
>  Issue Type: Bug
> Environment: C* 2.1.8, Debian Wheezy
>Reporter: mlowicki
>Assignee: Yuki Morishita
> Fix For: 2.1.x
>
> Attachments: db1.sync.lati.osa.cassandra.log, 
> db5.sync.lati.osa.cassandra.log, system.log.10.210.3.117, 
> system.log.10.210.3.221, system.log.10.210.3.230
>
>
> We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade 
> to 2.1.8 it started to work faster but now it fails with:
> {code}
> ...
> [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde 
> for range (-5474076923322749342,-5468600594078911162] finished
> [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde 
> for range (-8631877858109464676,-8624040066373718932] finished
> [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde 
> for range (-5372806541854279315,-5369354119480076785] finished
> [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde 
> for range (8166489034383821955,8168408930184216281] finished
> [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde 
> for range (6084602890817326921,6088328703025510057] finished
> [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde 
> for range (-781874602493000830,-781745173070807746] finished
> [2015-07-29 20:44:03,957] Repair command #4 finished
> error: nodetool failed, check server logs
> -- StackTrace --
> java.lang.RuntimeException: nodetool failed, check server logs
> at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290)
> at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202)
> {code}
> After running:
> {code}
> nodetool repair --partitioner-range --parallel --in-local-dc sync
> {code}
> Last records in logs regarding repair are:
> {code}
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range 
> (-7695808664784761779,-7693529816291585568] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range 
> (806371695398849,8065203836608925992] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range 
> (-5474076923322749342,-5468600594078911162] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range 
> (-8631877858109464676,-8624040066373718932] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - 
> Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range 
> (-5372806541854279315,-5369354119480076785] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - 
> Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range 
> (8166489034383821955,8168408930184216281] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - 
> Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range 
> (6084602890817326921,6088328703025510057] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - 
> Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range 
> (-781874602493000830,-781745173070807746] finished
> {code}
> but a bit above I see (at least two times in attached log):
> {code}
> ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - 
> Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range 
> (5765414319217852786,5781018794516851576] failed with error 
> org.apache.cassandra.exceptions.RepairException: [repair 
> #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, 
> (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> org.apache.cassandra.exceptions.RepairException: [repair 
> #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, 
> (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162
> at java.util.concurrent.FutureTask.report(FutureTask.java:122) 
> [na:1.7.0_80]
> at 

[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException

2016-03-22 Thread Ruoran Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207380#comment-15207380
 ] 

Ruoran Wang commented on CASSANDRA-9935:


Yes, I am able to reproduce with new keyspace.
{noformat}
CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': '3'}  AND durable_writes = true;

CREATE TABLE test.ui_by_modification (
bucket int,
modified_hour timestamp,
user_id bigint,
challenge_id uuid,
created timestamp,
creator_user_id bigint,
type int,
PRIMARY KEY ((bucket, modified_hour), user_id, challenge_id)
) WITH CLUSTERING ORDER BY (user_id ASC, challenge_id ASC)
AND bloom_filter_fp_chance = 0.1
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'class': 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
AND compression = {'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 604800
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
{noformat}

Then I am generating data using
{noformat}
Long creatorId = (long) random.nextInt(1);

UUID uuid = UUID_GENERATOR.generate();
int type = random.nextInt(10);
getIdCache().put(creatorId, uuid);

Date date = DateTime.now(DateTimeZone.UTC).toDate();

try {

runQuery(
"insert into test.ui_by_modification(bucket, modified_hour, 
user_id, challenge_id, created, creator_user_id, type) VALUES (?, ?, ?, ?, ?, 
?, ?)",
new Random().nextInt(1024), date, creatorId, 
UUID_GENERATOR.generate(), date, creatorId, type
);

} catch (Exception e) {
log.error("error", e);
}
{noformat}

I insert ~200 per second. Then I start first round of incremental repairs, 
repair -pr -par --in-local-dc -inc -- test, on this 6 nodes in the cluster. 
Then I waited ~1.5 hour then run the same inc repair, and then I got the same 
error. 
I think there is a correlation between the composite partition key and this 
error.

> Repair fails with RuntimeException
> --
>
> Key: CASSANDRA-9935
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9935
> Project: Cassandra
>  Issue Type: Bug
> Environment: C* 2.1.8, Debian Wheezy
>Reporter: mlowicki
>Assignee: Yuki Morishita
> Fix For: 2.1.x
>
> Attachments: db1.sync.lati.osa.cassandra.log, 
> db5.sync.lati.osa.cassandra.log, system.log.10.210.3.117, 
> system.log.10.210.3.221, system.log.10.210.3.230
>
>
> We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade 
> to 2.1.8 it started to work faster but now it fails with:
> {code}
> ...
> [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde 
> for range (-5474076923322749342,-5468600594078911162] finished
> [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde 
> for range (-8631877858109464676,-8624040066373718932] finished
> [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde 
> for range (-5372806541854279315,-5369354119480076785] finished
> [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde 
> for range (8166489034383821955,8168408930184216281] finished
> [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde 
> for range (6084602890817326921,6088328703025510057] finished
> [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde 
> for range (-781874602493000830,-781745173070807746] finished
> [2015-07-29 20:44:03,957] Repair command #4 finished
> error: nodetool failed, check server logs
> -- StackTrace --
> java.lang.RuntimeException: nodetool failed, check server logs
> at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290)
> at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202)
> {code}
> After running:
> {code}
> nodetool repair --partitioner-range --parallel --in-local-dc sync
> {code}
> Last records in logs regarding repair are:
> {code}
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range 
> (-7695808664784761779,-7693529816291585568] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range 
> (806371695398849,8065203836608925992] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair 

[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException

2016-03-20 Thread Ruoran Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198847#comment-15198847
 ] 

Ruoran Wang commented on CASSANDRA-9935:


I cleaned the data in the cluseter, then generated some data and the error is 
reproduced again.

Here are part of those out of order errors. The first part of DecoratedKey is 
the Toke. In org.apache.cassandra.repair.Validator#add, there is an assertion 
{noformat}  assert lastKey == null || lastKey.compareTo(row.key) < 0 : "row " + 
row.key + " received out of order wrt " + lastKey; {noformat}, which is trying 
to make sure if the lastKey is not null lastKey should be smaller than 
currentkey. The compare method will try to compare token (if tokens of those 
two keys are equal, compare the byte), and all those following failures 
(including other failures posted above) are caused by lastKey.token >= 
currentKey.token. Not sure why that's happening.

{noformat}
java.lang.AssertionError: row DecoratedKey(-8369102073622366180, 
000400010801538239650100) received out of order wrt 
DecoratedKey(-8357216522748296009, 000408015382acbe8d00) column 
statsnull
java.lang.AssertionError: row DecoratedKey(-6257362846602517264, 
000408015382949a0500) received out of order wrt 
DecoratedKey(-6236290075537674781, 0004000108015382a27e9600) column 
statsnull
java.lang.AssertionError: row DecoratedKey(2478458424628257677, 
000400010801538271539a00) received out of order wrt 
DecoratedKey(2490779404447159202, 0004000108015382662a5000) column 
statsnull
java.lang.AssertionError: row DecoratedKey(8880802316577320376, 
0004000108015382821cf300) received out of order wrt 
DecoratedKey(8881355423529151128, 00040001080153829533b900) column 
statsnull
java.lang.AssertionError: row DecoratedKey(-1344138391208803679, 
00040001080153828d23dc00) received out of order wrt 
DecoratedKey(-1339348872117800450, 00040001080153829e30ea00) column 
statsnull
java.lang.AssertionError: row DecoratedKey(-3057182277536415874, 
00040801538286c44600) received out of order wrt 
DecoratedKey(-3053575924537508805, 00040801538294cb5a00) column 
statsnull
java.lang.AssertionError: row DecoratedKey(5646261254263909895, 
0004080153824e3a4f00) received out of order wrt 
DecoratedKey(5658365860829244661, 0004080153827dd3a600) column 
statsnull
{noformat}

> Repair fails with RuntimeException
> --
>
> Key: CASSANDRA-9935
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9935
> Project: Cassandra
>  Issue Type: Bug
> Environment: C* 2.1.8, Debian Wheezy
>Reporter: mlowicki
>Assignee: Yuki Morishita
> Fix For: 2.1.x
>
> Attachments: db1.sync.lati.osa.cassandra.log, 
> db5.sync.lati.osa.cassandra.log, system.log.10.210.3.117, 
> system.log.10.210.3.221, system.log.10.210.3.230
>
>
> We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade 
> to 2.1.8 it started to work faster but now it fails with:
> {code}
> ...
> [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde 
> for range (-5474076923322749342,-5468600594078911162] finished
> [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde 
> for range (-8631877858109464676,-8624040066373718932] finished
> [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde 
> for range (-5372806541854279315,-5369354119480076785] finished
> [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde 
> for range (8166489034383821955,8168408930184216281] finished
> [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde 
> for range (6084602890817326921,6088328703025510057] finished
> [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde 
> for range (-781874602493000830,-781745173070807746] finished
> [2015-07-29 20:44:03,957] Repair command #4 finished
> error: nodetool failed, check server logs
> -- StackTrace --
> java.lang.RuntimeException: nodetool failed, check server logs
> at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290)
> at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202)
> {code}
> After running:
> {code}
> nodetool repair --partitioner-range --parallel --in-local-dc sync
> {code}
> Last records in logs regarding repair are:
> {code}
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range 
> (-7695808664784761779,-7693529816291585568] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 

[jira] [Comment Edited] (CASSANDRA-9935) Repair fails with RuntimeException

2016-03-19 Thread Ruoran Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198847#comment-15198847
 ] 

Ruoran Wang edited comment on CASSANDRA-9935 at 3/17/16 6:56 AM:
-

I cleaned the data in the cluseter, then generated some data and the error is 
reproduced again.

Here are part of those out of order errors. The first part of DecoratedKey is 
the Toke. In org.apache.cassandra.repair.Validator#add, there is an assertion 
{noformat}  assert lastKey == null || lastKey.compareTo(row.key) < 0 : "row " + 
row.key + " received out of order wrt " + lastKey; {noformat}, which is trying 
to make sure if the lastKey is not null lastKey should be smaller than 
currentkey. The compare method will try to compare token (if tokens of those 
two keys are equal, compare the byte), and all those following failures 
(including other failures posted above) are caused by lastKey.token >= 
currentKey.token. Not sure why that's happening.

{noformat}
java.lang.AssertionError: row DecoratedKey(-8369102073622366180, 
000400010801538239650100) received out of order wrt 
DecoratedKey(-8357216522748296009, 000408015382acbe8d00) column 
statsnull
java.lang.AssertionError: row DecoratedKey(-6257362846602517264, 
000408015382949a0500) received out of order wrt 
DecoratedKey(-6236290075537674781, 0004000108015382a27e9600) column 
statsnull
java.lang.AssertionError: row DecoratedKey(2478458424628257677, 
000400010801538271539a00) received out of order wrt 
DecoratedKey(2490779404447159202, 0004000108015382662a5000) column 
statsnull
java.lang.AssertionError: row DecoratedKey(8880802316577320376, 
0004000108015382821cf300) received out of order wrt 
DecoratedKey(8881355423529151128, 00040001080153829533b900) column 
statsnull
java.lang.AssertionError: row DecoratedKey(-1344138391208803679, 
00040001080153828d23dc00) received out of order wrt 
DecoratedKey(-1339348872117800450, 00040001080153829e30ea00) column 
statsnull
java.lang.AssertionError: row DecoratedKey(-3057182277536415874, 
00040801538286c44600) received out of order wrt 
DecoratedKey(-3053575924537508805, 00040801538294cb5a00) column 
statsnull
java.lang.AssertionError: row DecoratedKey(5646261254263909895, 
0004080153824e3a4f00) received out of order wrt 
DecoratedKey(5658365860829244661, 0004080153827dd3a600) column 
statsnull
{noformat}

Another information might be useful is the partition key for that column family 
could collide, not sure if that's the cause.

PRIMARY KEY ((bucket, modified_minute), id)



was (Author: ruoranwang):
I cleaned the data in the cluseter, then generated some data and the error is 
reproduced again.

Here are part of those out of order errors. The first part of DecoratedKey is 
the Toke. In org.apache.cassandra.repair.Validator#add, there is an assertion 
{noformat}  assert lastKey == null || lastKey.compareTo(row.key) < 0 : "row " + 
row.key + " received out of order wrt " + lastKey; {noformat}, which is trying 
to make sure if the lastKey is not null lastKey should be smaller than 
currentkey. The compare method will try to compare token (if tokens of those 
two keys are equal, compare the byte), and all those following failures 
(including other failures posted above) are caused by lastKey.token >= 
currentKey.token. Not sure why that's happening.

{noformat}
java.lang.AssertionError: row DecoratedKey(-8369102073622366180, 
000400010801538239650100) received out of order wrt 
DecoratedKey(-8357216522748296009, 000408015382acbe8d00) column 
statsnull
java.lang.AssertionError: row DecoratedKey(-6257362846602517264, 
000408015382949a0500) received out of order wrt 
DecoratedKey(-6236290075537674781, 0004000108015382a27e9600) column 
statsnull
java.lang.AssertionError: row DecoratedKey(2478458424628257677, 
000400010801538271539a00) received out of order wrt 
DecoratedKey(2490779404447159202, 0004000108015382662a5000) column 
statsnull
java.lang.AssertionError: row DecoratedKey(8880802316577320376, 
0004000108015382821cf300) received out of order wrt 
DecoratedKey(8881355423529151128, 00040001080153829533b900) column 
statsnull
java.lang.AssertionError: row DecoratedKey(-1344138391208803679, 
00040001080153828d23dc00) received out of order wrt 
DecoratedKey(-1339348872117800450, 00040001080153829e30ea00) column 
statsnull
java.lang.AssertionError: row DecoratedKey(-3057182277536415874, 
00040801538286c44600) received out of order wrt 
DecoratedKey(-3053575924537508805, 00040801538294cb5a00) column 
statsnull
java.lang.AssertionError: row DecoratedKey(5646261254263909895, 

[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException

2016-03-19 Thread Ruoran Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15198666#comment-15198666
 ] 

Ruoran Wang commented on CASSANDRA-9935:


I stopped cassandra then did the offline scrub. It's not fixed yet. 

> Repair fails with RuntimeException
> --
>
> Key: CASSANDRA-9935
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9935
> Project: Cassandra
>  Issue Type: Bug
> Environment: C* 2.1.8, Debian Wheezy
>Reporter: mlowicki
>Assignee: Yuki Morishita
> Fix For: 2.1.x
>
> Attachments: db1.sync.lati.osa.cassandra.log, 
> db5.sync.lati.osa.cassandra.log, system.log.10.210.3.117, 
> system.log.10.210.3.221, system.log.10.210.3.230
>
>
> We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade 
> to 2.1.8 it started to work faster but now it fails with:
> {code}
> ...
> [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde 
> for range (-5474076923322749342,-5468600594078911162] finished
> [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde 
> for range (-8631877858109464676,-8624040066373718932] finished
> [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde 
> for range (-5372806541854279315,-5369354119480076785] finished
> [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde 
> for range (8166489034383821955,8168408930184216281] finished
> [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde 
> for range (6084602890817326921,6088328703025510057] finished
> [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde 
> for range (-781874602493000830,-781745173070807746] finished
> [2015-07-29 20:44:03,957] Repair command #4 finished
> error: nodetool failed, check server logs
> -- StackTrace --
> java.lang.RuntimeException: nodetool failed, check server logs
> at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290)
> at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202)
> {code}
> After running:
> {code}
> nodetool repair --partitioner-range --parallel --in-local-dc sync
> {code}
> Last records in logs regarding repair are:
> {code}
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range 
> (-7695808664784761779,-7693529816291585568] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range 
> (806371695398849,8065203836608925992] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range 
> (-5474076923322749342,-5468600594078911162] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range 
> (-8631877858109464676,-8624040066373718932] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - 
> Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range 
> (-5372806541854279315,-5369354119480076785] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - 
> Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range 
> (8166489034383821955,8168408930184216281] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - 
> Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range 
> (6084602890817326921,6088328703025510057] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - 
> Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range 
> (-781874602493000830,-781745173070807746] finished
> {code}
> but a bit above I see (at least two times in attached log):
> {code}
> ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - 
> Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range 
> (5765414319217852786,5781018794516851576] failed with error 
> org.apache.cassandra.exceptions.RepairException: [repair 
> #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, 
> (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162
> java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
> org.apache.cassandra.exceptions.RepairException: [repair 
> #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, 
> (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162
> at java.util.concurrent.FutureTask.report(FutureTask.java:122) 
> [na:1.7.0_80]
> at java.util.concurrent.FutureTask.get(FutureTask.java:188) 
> [na:1.7.0_80]
> at 
> 

[jira] [Comment Edited] (CASSANDRA-9935) Repair fails with RuntimeException

2016-03-16 Thread Ruoran Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196949#comment-15196949
 ] 

Ruoran Wang edited comment on CASSANDRA-9935 at 3/16/16 8:08 AM:
-

I did a offline scrub but that doesn't seem to help, the error showed up again.

Btw here is another observation, the second of two consecutive repairs worked 
on one of the failing partition ranges (-st 5646258101641427476  -et 
5658366818450316790). No scrub applied in between, but there is a cassandra 
restart on the node printing key out-of-order message.

{noformat}
$ nodetool repair -pr -par -inc -st 5646258101641427476  -et 
5658366818450316790 -- KEYSPACE COLUM_FAMILY
[2016-03-16 06:57:54,519] Starting repair command #1, repairing 1 ranges for 
keyspace KEYSPACE (parallelism=PARALLEL, full=false)
[2016-03-16 06:57:56,101] Repair session 685850f0-eb44-11e5-88ab-ffeee0307673 
for range (5646258101641427476,5658366818450316790] failed with error 
org.apache.cassandra.exceptions.RepairException: [repair 
#685850f0-eb44-11e5-88ab-ffeee0307673 on KEYSPACE/COLUM_FAMILY, 
(5646258101641427476,5658366818450316790]] Validation failed in /10.57.198.217
[2016-03-16 06:57:56,110] Repair command #1 finished
error: nodetool failed, check server logs
-- StackTrace --
java.lang.RuntimeException: nodetool failed, check server logs
at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:294)
at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:206)
{noformat}

{noformat}
$ nodetool repair -pr -par -inc -st 5646258101641427476  -et 
5658366818450316790 -- KEYSPACE COLUM_FAMILY
[2016-03-16 07:06:16,557] Starting repair command #2, repairing 1 ranges for 
keyspace KEYSPACE (parallelism=PARALLEL, full=false)
[2016-03-16 07:06:20,879] Repair session 9393b5b0-eb45-11e5-88ab-ffeee0307673 
for range (5646258101641427476,5658366818450316790] finished
[2016-03-16 07:08:32,581] Repair command #2 finished
{noformat}


was (Author: ruoranwang):
I did a offline scrub but that doesn't seem to help, the error showed up again.
But the second of two consecutive repairs worked on one of the failing 
partition ranges (-st 5646258101641427476  -et 5658366818450316790). No scrub 
applied in between.

{noformat}
$ nodetool repair -pr -par -inc -st 5646258101641427476  -et 
5658366818450316790 -- KEYSPACE COLUM_FAMILY
[2016-03-16 06:57:54,519] Starting repair command #1, repairing 1 ranges for 
keyspace KEYSPACE (parallelism=PARALLEL, full=false)
[2016-03-16 06:57:56,101] Repair session 685850f0-eb44-11e5-88ab-ffeee0307673 
for range (5646258101641427476,5658366818450316790] failed with error 
org.apache.cassandra.exceptions.RepairException: [repair 
#685850f0-eb44-11e5-88ab-ffeee0307673 on KEYSPACE/COLUM_FAMILY, 
(5646258101641427476,5658366818450316790]] Validation failed in /10.57.198.217
[2016-03-16 06:57:56,110] Repair command #1 finished
error: nodetool failed, check server logs
-- StackTrace --
java.lang.RuntimeException: nodetool failed, check server logs
at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:294)
at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:206)
{noformat}

{noformat}
$ nodetool repair -pr -par -inc -st 5646258101641427476  -et 
5658366818450316790 -- KEYSPACE COLUM_FAMILY
[2016-03-16 07:06:16,557] Starting repair command #2, repairing 1 ranges for 
keyspace KEYSPACE (parallelism=PARALLEL, full=false)
[2016-03-16 07:06:20,879] Repair session 9393b5b0-eb45-11e5-88ab-ffeee0307673 
for range (5646258101641427476,5658366818450316790] finished
[2016-03-16 07:08:32,581] Repair command #2 finished
{noformat}

> Repair fails with RuntimeException
> --
>
> Key: CASSANDRA-9935
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9935
> Project: Cassandra
>  Issue Type: Bug
> Environment: C* 2.1.8, Debian Wheezy
>Reporter: mlowicki
>Assignee: Yuki Morishita
> Fix For: 2.1.x
>
> Attachments: db1.sync.lati.osa.cassandra.log, 
> db5.sync.lati.osa.cassandra.log, system.log.10.210.3.117, 
> system.log.10.210.3.221, system.log.10.210.3.230
>
>
> We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade 
> to 2.1.8 it started to work faster but now it fails with:
> {code}
> ...
> [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde 
> for range (-5474076923322749342,-5468600594078911162] finished
> [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde 
> for range (-8631877858109464676,-8624040066373718932] finished
> [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde 
> for range (-5372806541854279315,-5369354119480076785] finished
> [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde 
> for range (8166489034383821955,8168408930184216281] 

[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException

2016-03-16 Thread Ruoran Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196949#comment-15196949
 ] 

Ruoran Wang commented on CASSANDRA-9935:


I did a offline scrub but that doesn't seem to help, the error showed up again.
But the second of two consecutive repairs worked on one of the failing 
partition ranges (-st 5646258101641427476  -et 5658366818450316790). No scrub 
applied in between.

{noformat}
$ nodetool repair -pr -par -inc -st 5646258101641427476  -et 
5658366818450316790 -- KEYSPACE COLUM_FAMILY
[2016-03-16 06:57:54,519] Starting repair command #1, repairing 1 ranges for 
keyspace KEYSPACE (parallelism=PARALLEL, full=false)
[2016-03-16 06:57:56,101] Repair session 685850f0-eb44-11e5-88ab-ffeee0307673 
for range (5646258101641427476,5658366818450316790] failed with error 
org.apache.cassandra.exceptions.RepairException: [repair 
#685850f0-eb44-11e5-88ab-ffeee0307673 on KEYSPACE/COLUM_FAMILY, 
(5646258101641427476,5658366818450316790]] Validation failed in /10.57.198.217
[2016-03-16 06:57:56,110] Repair command #1 finished
error: nodetool failed, check server logs
-- StackTrace --
java.lang.RuntimeException: nodetool failed, check server logs
at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:294)
at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:206)
{noformat}

{noformat}
$ nodetool repair -pr -par -inc -st 5646258101641427476  -et 
5658366818450316790 -- KEYSPACE COLUM_FAMILY
[2016-03-16 07:06:16,557] Starting repair command #2, repairing 1 ranges for 
keyspace KEYSPACE (parallelism=PARALLEL, full=false)
[2016-03-16 07:06:20,879] Repair session 9393b5b0-eb45-11e5-88ab-ffeee0307673 
for range (5646258101641427476,5658366818450316790] finished
[2016-03-16 07:08:32,581] Repair command #2 finished
{noformat}

> Repair fails with RuntimeException
> --
>
> Key: CASSANDRA-9935
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9935
> Project: Cassandra
>  Issue Type: Bug
> Environment: C* 2.1.8, Debian Wheezy
>Reporter: mlowicki
>Assignee: Yuki Morishita
> Fix For: 2.1.x
>
> Attachments: db1.sync.lati.osa.cassandra.log, 
> db5.sync.lati.osa.cassandra.log, system.log.10.210.3.117, 
> system.log.10.210.3.221, system.log.10.210.3.230
>
>
> We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade 
> to 2.1.8 it started to work faster but now it fails with:
> {code}
> ...
> [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde 
> for range (-5474076923322749342,-5468600594078911162] finished
> [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde 
> for range (-8631877858109464676,-8624040066373718932] finished
> [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde 
> for range (-5372806541854279315,-5369354119480076785] finished
> [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde 
> for range (8166489034383821955,8168408930184216281] finished
> [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde 
> for range (6084602890817326921,6088328703025510057] finished
> [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde 
> for range (-781874602493000830,-781745173070807746] finished
> [2015-07-29 20:44:03,957] Repair command #4 finished
> error: nodetool failed, check server logs
> -- StackTrace --
> java.lang.RuntimeException: nodetool failed, check server logs
> at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290)
> at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202)
> {code}
> After running:
> {code}
> nodetool repair --partitioner-range --parallel --in-local-dc sync
> {code}
> Last records in logs regarding repair are:
> {code}
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range 
> (-7695808664784761779,-7693529816291585568] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range 
> (806371695398849,8065203836608925992] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range 
> (-5474076923322749342,-5468600594078911162] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range 
> (-8631877858109464676,-8624040066373718932] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - 
> Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range 
> (-5372806541854279315,-5369354119480076785] finished
> INFO  [Thread-173887] 2015-07-29 

[jira] [Comment Edited] (CASSANDRA-9935) Repair fails with RuntimeException

2016-03-14 Thread Ruoran Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194696#comment-15194696
 ] 

Ruoran Wang edited comment on CASSANDRA-9935 at 3/15/16 3:51 AM:
-

[~pauloricardomg] Here is the recent Error message, and following are the 
sstables and their metadata.

{noformat}
ERROR [ValidationExecutor:8] 2016-03-15 03:19:25,473 Validator.java:245 - 
Failed creating a merkle tree for [repair #b82c4cf0-ea5c-11e5-8b54-71e192c0496a 
on KEYSPACE/COLUM_FAMILY, (8825693858844788422,8825705737822637605]], 
/10.57.198.67 (see log for details)
ERROR [ValidationExecutor:8] 2016-03-15 03:19:25,474 CassandraDaemon.java:229 - 
Exception in thread Thread[ValidationExecutor:8,1,main]
java.lang.AssertionError: row DecoratedKey(8825694477039867191, 
000403b708015363e13ed200) received out of order wrt 
DecoratedKey(8825705587125016582, 0004004208015363141ed900)
at org.apache.cassandra.repair.Validator.add(Validator.java:126) 
~[apache-cassandra-2.1.13.jar:2.1.13]
at 
org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1051)
 ~[apache-cassandra-2.1.13.jar:2.1.13]
at 
org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:89)
 ~[apache-cassandra-2.1.13.jar:2.1.13]
at 
org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:662)
 ~[apache-cassandra-2.1.13.jar:2.1.13]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[na:1.8.0_66]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[na:1.8.0_66]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_66]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66]
{noformat}

{noformat}
nodetool getsstables --hex-format -- KEYSPACE COLUM_FAMILY 
000403b708015363e13ed200
/var/lib/cassandra/data/KEYSPACE/COLUM_FAMILY-d0500b80d14a11e5a42361571269f00d/KEYSPACE-COLUM_FAMILY-ka-59389-Data.db
/var/lib/cassandra/data/KEYSPACE/COLUM_FAMILY-d0500b80d14a11e5a42361571269f00d/KEYSPACE-COLUM_FAMILY-ka-59225-Data.db
{noformat}


{noformat}
nodetool getsstables --hex-format -- KEYSPACE COLUM_FAMILY 
0004004208015363141ed900/var/lib/cassandra/data/KEYSPACE/COLUM_FAMILY-d0500b80d14a11e5a42361571269f00d/KEYSPACE-COLUM_FAMILY-ka-59389-Data.db
/var/lib/cassandra/data/KEYSPACE/COLUM_FAMILY-d0500b80d14a11e5a42361571269f00d/KEYSPACE-COLUM_FAMILY-ka-59225-Data.db
{noformat}


{noformat}
SSTable: 
/var/lib/cassandra/data/KEYSPACE/COLUM_FAMILY-d0500b80d14a11e5a42361571269f00d/KEYSPACE-COLUM_FAMILY-ka-59225
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.10
Minimum timestamp: 1457647152189000
Maximum timestamp: 1457683010045000
SSTable max local deletion time: 1458287810
Compression ratio: 0.2804368699432709
Estimated droppable tombstones: 0.1136631298580633
SSTable Level: 0
Repaired at: 0
ReplayPosition(segmentId=1457685762291, position=384)
{noformat}


{noformat}
SSTable: 
/var/lib/cassandra/data/KEYSPACE/COLUM_FAMILY-d0500b80d14a11e5a42361571269f00d/KEYSPACE-COLUM_FAMILY-ka-59389
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.10
Minimum timestamp: 1457647152172001
Maximum timestamp: 1458009746854000
SSTable max local deletion time: 1458614546
Compression ratio: 0.2809352366738701
Estimated droppable tombstones: 0.11049303066041988
SSTable Level: 0
Repaired at: 0
ReplayPosition(segmentId=1457995474961, position=24034207)
{noformat}



was (Author: ruoranwang):
{noformat}
ERROR [ValidationExecutor:8] 2016-03-15 03:19:25,473 Validator.java:245 - 
Failed creating a merkle tree for [repair #b82c4cf0-ea5c-11e5-8b54-71e192c0496a 
on KEYSPACE/COLUM_FAMILY, (8825693858844788422,8825705737822637605]], 
/10.57.198.67 (see log for details)
ERROR [ValidationExecutor:8] 2016-03-15 03:19:25,474 CassandraDaemon.java:229 - 
Exception in thread Thread[ValidationExecutor:8,1,main]
java.lang.AssertionError: row DecoratedKey(8825694477039867191, 
000403b708015363e13ed200) received out of order wrt 
DecoratedKey(8825705587125016582, 0004004208015363141ed900)
at org.apache.cassandra.repair.Validator.add(Validator.java:126) 
~[apache-cassandra-2.1.13.jar:2.1.13]
at 
org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1051)
 ~[apache-cassandra-2.1.13.jar:2.1.13]
at 
org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:89)
 ~[apache-cassandra-2.1.13.jar:2.1.13]
at 
org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:662)
 ~[apache-cassandra-2.1.13.jar:2.1.13]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[na:1.8.0_66]
at 

[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException

2016-03-14 Thread Ruoran Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194696#comment-15194696
 ] 

Ruoran Wang commented on CASSANDRA-9935:


{noformat}
ERROR [ValidationExecutor:8] 2016-03-15 03:19:25,473 Validator.java:245 - 
Failed creating a merkle tree for [repair #b82c4cf0-ea5c-11e5-8b54-71e192c0496a 
on KEYSPACE/COLUM_FAMILY, (8825693858844788422,8825705737822637605]], 
/10.57.198.67 (see log for details)
ERROR [ValidationExecutor:8] 2016-03-15 03:19:25,474 CassandraDaemon.java:229 - 
Exception in thread Thread[ValidationExecutor:8,1,main]
java.lang.AssertionError: row DecoratedKey(8825694477039867191, 
000403b708015363e13ed200) received out of order wrt 
DecoratedKey(8825705587125016582, 0004004208015363141ed900)
at org.apache.cassandra.repair.Validator.add(Validator.java:126) 
~[apache-cassandra-2.1.13.jar:2.1.13]
at 
org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1051)
 ~[apache-cassandra-2.1.13.jar:2.1.13]
at 
org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:89)
 ~[apache-cassandra-2.1.13.jar:2.1.13]
at 
org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:662)
 ~[apache-cassandra-2.1.13.jar:2.1.13]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[na:1.8.0_66]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[na:1.8.0_66]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_66]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66]
{noformat}

{noformat}
getsstables --hex-format -- KEYSPACE COLUM_FAMILY 
000403b708015363e13ed200
/var/lib/cassandra/data/KEYSPACE/COLUM_FAMILY-d0500b80d14a11e5a42361571269f00d/KEYSPACE-COLUM_FAMILY-ka-59389-Data.db
/var/lib/cassandra/data/KEYSPACE/COLUM_FAMILY-d0500b80d14a11e5a42361571269f00d/KEYSPACE-COLUM_FAMILY-ka-59225-Data.db
{noformat}


{noformat}
nodetool getsstables --hex-format -- KEYSPACE COLUM_FAMILY 
0004004208015363141ed900/var/lib/cassandra/data/KEYSPACE/COLUM_FAMILY-d0500b80d14a11e5a42361571269f00d/KEYSPACE-COLUM_FAMILY-ka-59389-Data.db
/var/lib/cassandra/data/KEYSPACE/COLUM_FAMILY-d0500b80d14a11e5a42361571269f00d/KEYSPACE-COLUM_FAMILY-ka-59225-Data.db
{noformat}


{noformat}
SSTable: 
/var/lib/cassandra/data/KEYSPACE/COLUM_FAMILY-d0500b80d14a11e5a42361571269f00d/KEYSPACE-COLUM_FAMILY-ka-59225
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.10
Minimum timestamp: 1457647152189000
Maximum timestamp: 1457683010045000
SSTable max local deletion time: 1458287810
Compression ratio: 0.2804368699432709
Estimated droppable tombstones: 0.1136631298580633
SSTable Level: 0
Repaired at: 0
ReplayPosition(segmentId=1457685762291, position=384)
{noformat}


{noformat}
SSTable: 
/var/lib/cassandra/data/KEYSPACE/COLUM_FAMILY-d0500b80d14a11e5a42361571269f00d/KEYSPACE-COLUM_FAMILY-ka-59389
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.10
Minimum timestamp: 1457647152172001
Maximum timestamp: 1458009746854000
SSTable max local deletion time: 1458614546
Compression ratio: 0.2809352366738701
Estimated droppable tombstones: 0.11049303066041988
SSTable Level: 0
Repaired at: 0
ReplayPosition(segmentId=1457995474961, position=24034207)
{noformat}


> Repair fails with RuntimeException
> --
>
> Key: CASSANDRA-9935
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9935
> Project: Cassandra
>  Issue Type: Bug
> Environment: C* 2.1.8, Debian Wheezy
>Reporter: mlowicki
>Assignee: Yuki Morishita
> Fix For: 2.1.x
>
> Attachments: db1.sync.lati.osa.cassandra.log, 
> db5.sync.lati.osa.cassandra.log, system.log.10.210.3.117, 
> system.log.10.210.3.221, system.log.10.210.3.230
>
>
> We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade 
> to 2.1.8 it started to work faster but now it fails with:
> {code}
> ...
> [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde 
> for range (-5474076923322749342,-5468600594078911162] finished
> [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde 
> for range (-8631877858109464676,-8624040066373718932] finished
> [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde 
> for range (-5372806541854279315,-5369354119480076785] finished
> [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde 
> for range (8166489034383821955,8168408930184216281] finished
> [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde 
> for range (6084602890817326921,6088328703025510057] finished
> 

[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException

2016-03-14 Thread Ruoran Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194431#comment-15194431
 ] 

Ruoran Wang commented on CASSANDRA-9935:


[~pauloricardomg] thanks, that patch works. I am reproducing the error, I will 
post the result when the error shows up again.

Btw, I noticed those two failing column families have high number of sstable 
count at level 1.
The follwing output is the sstable count for the 6 nodes we have. Top two are 
the column families that had the issue, the bottom two are two normal ones. I 
noticed this last Friday, the level 1 count didn't drop until today. I don't 
see any pending compactions (This is a performace testing cluster and I stopped 
read and write from last friday)

{noformat}
SSTables in each level: [2, 20/10, 88, 0, 0, 0, 0, 0, 0]
SSTables in each level: [0, 20/10, 103/100, 90, 0, 0, 0, 0, 0]
SSTables in each level: [2, 10, 39, 0, 0, 0, 0, 0, 0]
SSTables in each level: [2, 10, 58, 0, 0, 0, 0, 0, 0]
 
SSTables in each level: [50/4, 20/10, 85, 0, 0, 0, 0, 0, 0]
SSTables in each level: [1, 18/10, 108/100, 81, 0, 0, 0, 0, 0]
SSTables in each level: [2, 10, 35, 0, 0, 0, 0, 0, 0]
SSTables in each level: [2, 10, 59, 0, 0, 0, 0, 0, 0]
 
SSTables in each level: [1, 22/10, 97, 0, 0, 0, 0, 0, 0]
SSTables in each level: [0, 18/10, 107/100, 91, 0, 0, 0, 0, 0]
SSTables in each level: [2, 10, 43, 0, 0, 0, 0, 0, 0]
SSTables in each level: [2, 10, 67, 0, 0, 0, 0, 0, 0]
 
SSTables in each level: [1, 20/10, 91, 0, 0, 0, 0, 0, 0]
SSTables in each level: [1, 20/10, 108/100, 102, 0, 0, 0, 0, 0]
SSTables in each level: [2, 10, 37, 0, 0, 0, 0, 0, 0]
SSTables in each level: [2, 10, 61, 0, 0, 0, 0, 0, 0]
 
SSTables in each level: [1, 21/10, 95, 0, 0, 0, 0, 0, 0]
SSTables in each level: [1, 18/10, 114/100, 84, 0, 0, 0, 0, 0]
SSTables in each level: [2, 10, 41, 0, 0, 0, 0, 0, 0]
SSTables in each level: [2, 10, 67, 0, 0, 0, 0, 0, 0]

SSTables in each level: [1, 20/10, 88, 0, 0, 0, 0, 0, 0]
SSTables in each level: [1, 20/10, 110/100, 151, 0, 0, 0, 0, 0]
SSTables in each level: [2, 10, 37, 0, 0, 0, 0, 0, 0]
SSTables in each level: [2, 10, 56, 0, 0, 0, 0, 0, 0]
{noformat}

> Repair fails with RuntimeException
> --
>
> Key: CASSANDRA-9935
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9935
> Project: Cassandra
>  Issue Type: Bug
> Environment: C* 2.1.8, Debian Wheezy
>Reporter: mlowicki
>Assignee: Yuki Morishita
> Fix For: 2.1.x
>
> Attachments: db1.sync.lati.osa.cassandra.log, 
> db5.sync.lati.osa.cassandra.log, system.log.10.210.3.117, 
> system.log.10.210.3.221, system.log.10.210.3.230
>
>
> We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade 
> to 2.1.8 it started to work faster but now it fails with:
> {code}
> ...
> [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde 
> for range (-5474076923322749342,-5468600594078911162] finished
> [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde 
> for range (-8631877858109464676,-8624040066373718932] finished
> [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde 
> for range (-5372806541854279315,-5369354119480076785] finished
> [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde 
> for range (8166489034383821955,8168408930184216281] finished
> [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde 
> for range (6084602890817326921,6088328703025510057] finished
> [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde 
> for range (-781874602493000830,-781745173070807746] finished
> [2015-07-29 20:44:03,957] Repair command #4 finished
> error: nodetool failed, check server logs
> -- StackTrace --
> java.lang.RuntimeException: nodetool failed, check server logs
> at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290)
> at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202)
> {code}
> After running:
> {code}
> nodetool repair --partitioner-range --parallel --in-local-dc sync
> {code}
> Last records in logs regarding repair are:
> {code}
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range 
> (-7695808664784761779,-7693529816291585568] finished
> INFO  [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - 
> Repair 

[jira] [Commented] (CASSANDRA-9625) GraphiteReporter not reporting

2016-03-14 Thread Ruoran Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194091#comment-15194091
 ] 

Ruoran Wang commented on CASSANDRA-9625:


Here are the thread-dump result. First one is when the reporter is still 
working, the second one is when reporter is stopped. 

{noformat}
"metrics-graphite-reporter-thread-1" #574 daemon prio=5 os_prio=0 
tid=0x7fae39b21800 nid=0x4940 waiting on condition [0x7fa57191]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x7fa67d7972d0> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}


{noformat}
"metrics-graphite-reporter-thread-1" #555 daemon prio=5 os_prio=0 
tid=0x7fdf4e7f7800 nid=0xe43 waiting for monitor entry [0x7fd6bb86b000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getEstimatedRemainingTasks(WrappingCompactionStrategy.java:162)
- waiting to lock <0x7fd72ced3e38> (a 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
at 
org.apache.cassandra.metrics.ColumnFamilyMetrics$13.value(ColumnFamilyMetrics.java:357)
at 
org.apache.cassandra.metrics.ColumnFamilyMetrics$13.value(ColumnFamilyMetrics.java:354)
at 
org.apache.cassandra.metrics.ColumnFamilyMetrics$33.value(ColumnFamilyMetrics.java:662)
at 
org.apache.cassandra.metrics.ColumnFamilyMetrics$33.value(ColumnFamilyMetrics.java:656)
at 
com.yammer.metrics.reporting.GraphiteReporter.processGauge(GraphiteReporter.java:304)
at 
com.yammer.metrics.reporting.GraphiteReporter.processGauge(GraphiteReporter.java:26)
at com.yammer.metrics.core.Gauge.processWith(Gauge.java:28)
at 
com.yammer.metrics.reporting.GraphiteReporter.printRegularMetrics(GraphiteReporter.java:247)
at 
com.yammer.metrics.reporting.GraphiteReporter.run(GraphiteReporter.java:213)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}


> GraphiteReporter not reporting
> --
>
> Key: CASSANDRA-9625
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9625
> Project: Cassandra
>  Issue Type: Bug
> Environment: Debian Jessie, 7u79-2.5.5-1~deb8u1, Cassandra 2.1.3
>Reporter: Eric Evans
>Assignee: T Jake Luciani
> Attachments: metrics.yaml, thread-dump.log
>
>
> When upgrading from 2.1.3 to 2.1.6, the Graphite metrics reporter stops 
> working.  The usual startup is logged, and one batch of samples is sent, but 
> the reporting interval comes and goes, and no other samples are ever sent.  
> The logs are free from errors.
> Frustratingly, metrics reporting works in our smaller (staging) environment 
> on 2.1.6; We are able to reproduce this on all 6 of production nodes, but not 
> on a 3 node (otherwise identical) staging cluster (maybe it takes a certain 
> level of concurrency?).
> Attached is a thread dump, and our metrics.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException

2016-03-10 Thread Ruoran Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15190304#comment-15190304
 ] 

Ruoran Wang commented on CASSANDRA-9935:


I am geting the following error when running nodetool getsstables. Also tired 
the first number in DecoratedKey(2774747040849866654, 
0004019b08015348847eb200), same error. Those are independent 
tables. 
{noformat}
error: For input string: "000402bf08015362933f0b00"
-- StackTrace --
java.lang.NumberFormatException: For input string: 
"000402bf08015362933f0b00"
at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at 
org.apache.cassandra.db.marshal.Int32Type.fromString(Int32Type.java:58)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.fromString(AbstractCompositeType.java:242)
at 
org.apache.cassandra.db.ColumnFamilyStore.getSSTablesForKey(ColumnFamilyStore.java:1980)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
at 
com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1471)
at 
javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1312)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1404)
at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:832)
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323)
at sun.rmi.transport.Transport$1.run(Transport.java:200)
at sun.rmi.transport.Transport$1.run(Transport.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$95(TCPTransport.java:683)
at java.security.AccessController.doPrivileged(Native Method)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}

> Repair fails with RuntimeException
> --
>
> Key: CASSANDRA-9935
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9935
> Project: Cassandra
>  Issue Type: Bug
> Environment: C* 2.1.8, Debian Wheezy
>Reporter: mlowicki
>Assignee: Yuki Morishita
> Fix For: 2.1.x
>
> Attachments: db1.sync.lati.osa.cassandra.log, 
> db5.sync.lati.osa.cassandra.log, system.log.10.210.3.117, 
> system.log.10.210.3.221, system.log.10.210.3.230
>
>
> We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade 
> to 2.1.8 it started to work faster but now it fails with:
> {code}
> ...
> [2015-07-29 

[jira] [Comment Edited] (CASSANDRA-9935) Repair fails with RuntimeException

2016-03-10 Thread Ruoran Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15188891#comment-15188891
 ] 

Ruoran Wang edited comment on CASSANDRA-9935 at 3/10/16 8:15 AM:
-

We are running 2.1.13, 1 DC 6 nodes, LCS, replication 3. We've done a full 
repair on the cluster, and used sstablerepairedset marked all those are 
repaired.

However, when we run incremental repair, nodetool repair --in-local-dc -par -pr 
-inc KEYSPACE, we got the same error log from the repairing node, and got the 
same DecoratedKey from the node that is sending merkle tree to repairing node.
We tried scrub on the failing keyspace/colum_family and restart, (tried on 
failing node, then tried on all nodes) but we are still occasionally getting 
the repair failures. So we haven't been able to run incremental repair on our 
cluster.

{noformat}
ERROR [Thread-46463] 2016-03-06 06:02:34,632 StorageService.java:3050 - Repair 
session 01e9f1b0-e361-11e5-9531-ffeee0307673 for range 
(5646258101641427476,5658366818450316790] failed with error 
org.apache.cassandra.exceptions.RepairException: [repair 
#01e9f1b0-e361-11e5-9531-ffeee0307673 on challenges/message, 
(5646258101641427476,5658366818450316790]] Validation failed in /10.125.218.156
java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
org.apache.cassandra.exceptions.RepairException: [repair 
#01e9f1b0-e361-11e5-9531-ffeee0307673 on challenges/message, 
(5646258101641427476,5658366818450316790]] Validation failed in /10.125.218.156
at java.util.concurrent.FutureTask.report(FutureTask.java:122) 
[na:1.8.0_66]
at java.util.concurrent.FutureTask.get(FutureTask.java:192) 
[na:1.8.0_66]
at 
org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:3041)
 ~[apache-cassandra-2.1.13.jar:2.1.13]
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
[apache-cassandra-2.1.13.jar:2.1.13]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_66]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[na:1.8.0_66]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66]
Caused by: java.lang.RuntimeException: 
org.apache.cassandra.exceptions.RepairException: [repair 
#01e9f1b0-e361-11e5-9531-ffeee0307673 on challenges/message, 
(5646258101641427476,5658366818450316790]] Validation failed in /10.125.218.156
at com.google.common.base.Throwables.propagate(Throwables.java:160) 
~[guava-16.0.jar:na]
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) 
[apache-cassandra-2.1.13.jar:2.1.13]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_66]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[na:1.8.0_66]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[na:1.8.0_66]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
~[na:1.8.0_66]
... 1 common frames omitted
Caused by: org.apache.cassandra.exceptions.RepairException: [repair 
#01e9f1b0-e361-11e5-9531-ffeee0307673 on challenges/message, 
(5646258101641427476,5658366818450316790]] Validation failed in /10.125.218.156
at 
org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166)
 ~[apache-cassandra-2.1.13.jar:2.1.13]
at 
org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:415)
 ~[apache-cassandra-2.1.13.jar:2.1.13]
at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:134)
 ~[apache-cassandra-2.1.13.jar:2.1.13]
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) 
~[apache-cassandra-2.1.13.jar:2.1.13]
... 3 common frames omitted
{noformat}

{noformat}
ERROR [ValidationExecutor:205] 2016-03-07 18:47:15,009 Validator.java:245 - 
Failed creating a merkle tree for [repair #02132fa0-e495-11e5-80cd-61571269f00d 
on challenges/message_by_modification, 
(2769065886542373503,2774747608185850009]], /10.57.198.15 (see log for details)
ERROR [ValidationExecutor:205] 2016-03-07 18:47:15,011 CassandraDaemon.java:229 
- Exception in thread Thread[ValidationExecutor:205,1,main]
java.lang.AssertionError: row DecoratedKey(2769066505137675224, 
0004002e080153441a3ef000) received out of order wrt 
DecoratedKey(2774747040849866654, 0004019b08015348847eb200)
at org.apache.cassandra.repair.Validator.add(Validator.java:126) 
~[apache-cassandra-2.1.13.jar:2.1.13]
at 
org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1051)
 ~[apache-cassandra-2.1.13.jar:2.1.13]
at 
org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:89)
 

[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException

2016-03-10 Thread Ruoran Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15188891#comment-15188891
 ] 

Ruoran Wang commented on CASSANDRA-9935:


We are running 1 DC, 6 nodes, LCS, replication 3. We've done a full repair on 
the cluster, and used sstablerepairedset marked all those are repaired.

However, when we run incremental repair, nodetool repair --in-local-dc -par -pr 
-inc KEYSPACE, we got the same error log from the repairing node, and got the 
same DecoratedKey from the node that is sending merkle tree to repairing node.
We tried scrub on the failing keyspace/colum_family and restart, (tried on 
failing node, then tried on all nodes) but we are still occasionally getting 
the repair failures. So we haven't been able to run incremental repair on our 
cluster.

{noformat}
ERROR [Thread-46463] 2016-03-06 06:02:34,632 StorageService.java:3050 - Repair 
session 01e9f1b0-e361-11e5-9531-ffeee0307673 for range 
(5646258101641427476,5658366818450316790] failed with error 
org.apache.cassandra.exceptions.RepairException: [repair 
#01e9f1b0-e361-11e5-9531-ffeee0307673 on challenges/message, 
(5646258101641427476,5658366818450316790]] Validation failed in /10.125.218.156
java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
org.apache.cassandra.exceptions.RepairException: [repair 
#01e9f1b0-e361-11e5-9531-ffeee0307673 on challenges/message, 
(5646258101641427476,5658366818450316790]] Validation failed in /10.125.218.156
at java.util.concurrent.FutureTask.report(FutureTask.java:122) 
[na:1.8.0_66]
at java.util.concurrent.FutureTask.get(FutureTask.java:192) 
[na:1.8.0_66]
at 
org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:3041)
 ~[apache-cassandra-2.1.13.jar:2.1.13]
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
[apache-cassandra-2.1.13.jar:2.1.13]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_66]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[na:1.8.0_66]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66]
Caused by: java.lang.RuntimeException: 
org.apache.cassandra.exceptions.RepairException: [repair 
#01e9f1b0-e361-11e5-9531-ffeee0307673 on challenges/message, 
(5646258101641427476,5658366818450316790]] Validation failed in /10.125.218.156
at com.google.common.base.Throwables.propagate(Throwables.java:160) 
~[guava-16.0.jar:na]
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) 
[apache-cassandra-2.1.13.jar:2.1.13]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_66]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[na:1.8.0_66]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[na:1.8.0_66]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
~[na:1.8.0_66]
... 1 common frames omitted
Caused by: org.apache.cassandra.exceptions.RepairException: [repair 
#01e9f1b0-e361-11e5-9531-ffeee0307673 on challenges/message, 
(5646258101641427476,5658366818450316790]] Validation failed in /10.125.218.156
at 
org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166)
 ~[apache-cassandra-2.1.13.jar:2.1.13]
at 
org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:415)
 ~[apache-cassandra-2.1.13.jar:2.1.13]
at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:134)
 ~[apache-cassandra-2.1.13.jar:2.1.13]
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) 
~[apache-cassandra-2.1.13.jar:2.1.13]
... 3 common frames omitted
{noformat}

{noformat}
ERROR [ValidationExecutor:205] 2016-03-07 18:47:15,009 Validator.java:245 - 
Failed creating a merkle tree for [repair #02132fa0-e495-11e5-80cd-61571269f00d 
on challenges/message_by_modification, 
(2769065886542373503,2774747608185850009]], /10.57.198.15 (see log for details)
ERROR [ValidationExecutor:205] 2016-03-07 18:47:15,011 CassandraDaemon.java:229 
- Exception in thread Thread[ValidationExecutor:205,1,main]
java.lang.AssertionError: row DecoratedKey(2769066505137675224, 
0004002e080153441a3ef000) received out of order wrt 
DecoratedKey(2774747040849866654, 0004019b08015348847eb200)
at org.apache.cassandra.repair.Validator.add(Validator.java:126) 
~[apache-cassandra-2.1.13.jar:2.1.13]
at 
org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1051)
 ~[apache-cassandra-2.1.13.jar:2.1.13]
at 
org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:89)
 ~[apache-cassandra-2.1.13.jar:2.1.13]
at 

[jira] [Commented] (CASSANDRA-9625) GraphiteReporter not reporting

2016-03-08 Thread Ruoran Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186531#comment-15186531
 ] 

Ruoran Wang commented on CASSANDRA-9625:


I just upgraded to 2.1.13 last week, then when testing full repair, one node 
stopped reporting. Later, when testing incremental repair, two more nodes 
stopped.

> GraphiteReporter not reporting
> --
>
> Key: CASSANDRA-9625
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9625
> Project: Cassandra
>  Issue Type: Bug
> Environment: Debian Jessie, 7u79-2.5.5-1~deb8u1, Cassandra 2.1.3
>Reporter: Eric Evans
>Assignee: T Jake Luciani
> Attachments: metrics.yaml, thread-dump.log
>
>
> When upgrading from 2.1.3 to 2.1.6, the Graphite metrics reporter stops 
> working.  The usual startup is logged, and one batch of samples is sent, but 
> the reporting interval comes and goes, and no other samples are ever sent.  
> The logs are free from errors.
> Frustratingly, metrics reporting works in our smaller (staging) environment 
> on 2.1.6; We are able to reproduce this on all 6 of production nodes, but not 
> on a 3 node (otherwise identical) staging cluster (maybe it takes a certain 
> level of concurrency?).
> Attached is a thread dump, and our metrics.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)