[jira] [Commented] (CASSANDRA-9914) Millions of fake pending compaction tasks + high CPU

2015-07-28 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645469#comment-14645469
 ] 

Marcus Eriksson commented on CASSANDRA-9914:


[~rstrickland] could you try with the patch in CASSANDRA-9662?

> Millions of fake pending compaction tasks + high CPU
> 
>
> Key: CASSANDRA-9914
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9914
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: CentOS
>Reporter: Robbie Strickland
>Assignee: Marcus Eriksson
> Fix For: 2.1.x
>
> Attachments: cass_high_cpu.png, high_pending_compactions.txt
>
>
> We have a 3-node test cluster (initially running 2.1.8) with *zero traffic* 
> and about 10GB of data on each node.  It's showing millions of pending 
> compaction tasks (but no actual work in progress), and the CPUs are pegged on 
> all three nodes.  The task count goes down rapidly, but then jumps back up 
> again seconds later.  All tables are set to STCS.  The issue persists after 
> restart, but takes a few minutes before it becomes a problem.  SSTable counts 
> are below 10 for every table.  We're also seeing 20s Old Gen GC pauses about 
> every 2-3 mins.
> This started happening after bulk loading some old data.  We started seeing 
> very long GC pauses (sometimes 30 min or more) that would bring down the 
> nodes.  We then truncated this table, which resulted in the current behavior. 
>  We attempted to roll back our cluster to 2.1.7 patched with CASSANDRA-9637, 
> but we observed the same behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9899) If compaction is disabled in schema, you can't enable a single node through nodetool

2015-07-28 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645509#comment-14645509
 ] 

Marcus Eriksson commented on CASSANDRA-9899:


Right, so the disabledWithJMX check is to make sure we don't enable ourselves 
if someone has disabled with nodetool and then changes something else in the 
compaction options.

If we were enabled in schema, someone disables with nodetool, we always stay 
disabled after any alter that changes compaction options. This means that if 
you have enabled: true in schema, do nodetool disable, you can't enable it 
again with a schema change setting enabled: true (as that is the value the 
schema already has). This can be fixed of course, do you think we should always 
be able to enable with a schema change?

> If compaction is disabled in schema, you can't enable a single node through 
> nodetool
> 
>
> Key: CASSANDRA-9899
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9899
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeremiah Jordan
>Assignee: Marcus Eriksson
> Fix For: 2.1.x
>
>
> If you disable compaction in the schema through alter table, and then try to 
> enable compaction on just one node with "nodetool enableautocompaction" it 
> doesn't work.  WrappingCompactionStrategy needs to pass through the enable to 
> the wrapped strategies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9914) Millions of fake pending compaction tasks + high CPU

2015-07-29 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-9914:
---
Fix Version/s: (was: 2.1.x)

> Millions of fake pending compaction tasks + high CPU
> 
>
> Key: CASSANDRA-9914
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9914
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: CentOS
>Reporter: Robbie Strickland
>Assignee: Marcus Eriksson
> Attachments: cass_high_cpu.png, high_pending_compactions.txt
>
>
> We have a 3-node test cluster (initially running 2.1.8) with *zero traffic* 
> and about 10GB of data on each node.  It's showing millions of pending 
> compaction tasks (but no actual work in progress), and the CPUs are pegged on 
> all three nodes.  The task count goes down rapidly, but then jumps back up 
> again seconds later.  All tables are set to STCS.  The issue persists after 
> restart, but takes a few minutes before it becomes a problem.  SSTable counts 
> are below 10 for every table.  We're also seeing 20s Old Gen GC pauses about 
> every 2-3 mins.
> This started happening after bulk loading some old data.  We started seeing 
> very long GC pauses (sometimes 30 min or more) that would bring down the 
> nodes.  We then truncated this table, which resulted in the current behavior. 
>  We attempted to roll back our cluster to 2.1.7 patched with CASSANDRA-9637, 
> but we observed the same behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6434) Repair-aware gc grace period

2015-07-30 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647402#comment-14647402
 ] 

Marcus Eriksson commented on CASSANDRA-6434:


bq. Couldn't we use a non-optimal (but much simpler and hopefully good enough) 
solution for that? Typically, unrepaired sstables should, by design, be fairly 
recent. So what if we said that we only purge tombstone if it's older than 
gcGrace and its localDeletionTime is older than then oldest localDeletion in 
the unrepaired sstables used by the query?

That is a much better idea, first stab at that here: 
https://github.com/krummas/cassandra/commits/marcuse/6434-trunk-2

> Repair-aware gc grace period 
> -
>
> Key: CASSANDRA-6434
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6434
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: sankalp kohli
>Assignee: Marcus Eriksson
> Fix For: 3.0 beta 1
>
>
> Since the reason for gcgs is to ensure that we don't purge tombstones until 
> every replica has been notified, it's redundant in a world where we're 
> tracking repair times per sstable (and repairing frequentily), i.e., a world 
> where we default to incremental repair a la CASSANDRA-5351.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6434) Repair-aware gc grace period

2015-07-30 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647494#comment-14647494
 ] 

Marcus Eriksson commented on CASSANDRA-6434:


bq. the option probably belongs to compaction params, not as a standalone thing.
good point, fixed and pushed

> Repair-aware gc grace period 
> -
>
> Key: CASSANDRA-6434
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6434
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: sankalp kohli
>Assignee: Marcus Eriksson
> Fix For: 3.0 beta 1
>
>
> Since the reason for gcgs is to ensure that we don't purge tombstones until 
> every replica has been notified, it's redundant in a world where we're 
> tracking repair times per sstable (and repairing frequentily), i.e., a world 
> where we default to incremental repair a la CASSANDRA-5351.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9266) Repair failed with error Already repairing SSTableReader(path='/path/to/keyspace/column_family/keyspace-column_family--Data.db'), can not continue.

2015-07-30 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647545#comment-14647545
 ] 

Marcus Eriksson commented on CASSANDRA-9266:


[~kha...@ncbi.nlm.nih.gov] any updates?

> Repair failed with error Already repairing 
> SSTableReader(path='/path/to/keyspace/column_family/keyspace-column_family--Data.db'),
>  can not continue.
> ---
>
> Key: CASSANDRA-9266
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9266
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 2.1.3
>Reporter: Razi Khaja
>Assignee: Marcus Eriksson
> Fix For: 2.1.x
>
> Attachments: cassandra_system.log.partial.cass320.CASSANDRA-9266.txt, 
> cassandra_system.log.partial.cass321.CASSANDRA-9266.txt, 
> cassandra_system.log.partial.cass322.CASSANDRA-9266.txt, 
> cassandra_system.log.partial.cass323.CASSANDRA-9266.txt
>
>
> I am running incremental parallel repair using the following command:
> {code}
> nodetool --host my_hostname repair --incremental --in-local-dc --parallel
> {code}
> and get the following error:
> {code}
> Repair failed with error Already repairing 
> SSTableReader(path='/path/to/keyspace/column_family/keyspace-column_family--Data.db'),
>  can not continue.
> {code}
> I have 3 data centers, each with 4 nodes. Neither incremental or full repair 
> is running on any of my other nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9142) DC Local repair or -hosts should only be allowed with -full repair

2015-07-30 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647548#comment-14647548
 ] 

Marcus Eriksson commented on CASSANDRA-9142:


since we missed 2.2.0, I will need to make sure we don't send the flag to nodes 
that don't support it, stand by for new patch

> DC Local repair or -hosts should only be allowed with -full repair
> --
>
> Key: CASSANDRA-9142
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9142
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: sankalp kohli
>Assignee: Marcus Eriksson
>Priority: Minor
> Fix For: 2.2.x
>
> Attachments: trunk_9142.txt
>
>
> We should not let users mix incremental repair with dc local repair or -host 
> or any repair which does not include all replicas. 
> This will currently cause stables on some replicas to be marked as repaired. 
> The next incremental repair will not work on same set of data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9142) DC Local repair or -hosts should only be allowed with -full repair

2015-07-31 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648936#comment-14648936
 ] 

Marcus Eriksson commented on CASSANDRA-9142:


Pushed a new commit to  
https://github.com/krummas/cassandra/tree/marcuse/9142-2.2 which introduces a 
new message type to indicate if the repair is global. This was the simplest way 
to keep it backwards compatible. We should probably go back to using a flag in 
3.0

> DC Local repair or -hosts should only be allowed with -full repair
> --
>
> Key: CASSANDRA-9142
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9142
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: sankalp kohli
>Assignee: Marcus Eriksson
>Priority: Minor
> Fix For: 2.2.x
>
> Attachments: trunk_9142.txt
>
>
> We should not let users mix incremental repair with dc local repair or -host 
> or any repair which does not include all replicas. 
> This will currently cause stables on some replicas to be marked as repaired. 
> The next incremental repair will not work on same set of data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-9959) Expected bloom filter size should not be an int

2015-08-03 Thread Marcus Eriksson (JIRA)
Marcus Eriksson created CASSANDRA-9959:
--

 Summary: Expected bloom filter size should not be an int
 Key: CASSANDRA-9959
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9959
 Project: Cassandra
  Issue Type: Bug
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
 Fix For: 2.0.x


We cast the expected number of rows in scrub and cleanup to an int. Seems to 
have been this way since 0.7 days. Patch here: 
https://github.com/krummas/cassandra/commits/marcuse/dontcastscrubcleanup



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9959) Expected bloom filter size should not be an int

2015-08-03 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-9959:
---
Reviewer: Jason Brown  (was: Yuki Morishita)

> Expected bloom filter size should not be an int
> ---
>
> Key: CASSANDRA-9959
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9959
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 2.0.x
>
>
> We cast the expected number of rows in scrub and cleanup to an int. Seems to 
> have been this way since 0.7 days. Patch here: 
> https://github.com/krummas/cassandra/commits/marcuse/dontcastscrubcleanup



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9963) Compaction not starting for new tables

2015-08-03 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-9963:
---
Attachment: 0001-dont-use-isEnabled-since-that-checks-isActive.patch

isEnabled() checks isActive which is false when creating a new table

> Compaction not starting for new tables
> --
>
> Key: CASSANDRA-9963
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9963
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeremiah Jordan
>Assignee: Marcus Eriksson
> Fix For: 2.1.x
>
> Attachments: 0001-dont-use-isEnabled-since-that-checks-isActive.patch
>
>
> Something committed since 2.1.8 broke cassandra-2.1 HEAD
> {noformat}
> create keyspace test with replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> create table test.stcs ( a int PRIMARY KEY , b int);
> {noformat}
> repeat  more than 4 times:
> {noformat}
> insert into test.stcs (a, b) VALUES ( 1, 1);
> nodetool flush test stcs
> ls /test/stcs-*
> {noformat}
> See a bunch of sstables where STCS should have kicked in and compacted them 
> down some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9963) Compaction not starting for new tables

2015-08-03 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-9963:
---
Reviewer: Jeremiah Jordan

> Compaction not starting for new tables
> --
>
> Key: CASSANDRA-9963
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9963
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeremiah Jordan
>Assignee: Marcus Eriksson
> Fix For: 2.1.x
>
> Attachments: 0001-dont-use-isEnabled-since-that-checks-isActive.patch
>
>
> Something committed since 2.1.8 broke cassandra-2.1 HEAD
> {noformat}
> create keyspace test with replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> create table test.stcs ( a int PRIMARY KEY , b int);
> {noformat}
> repeat  more than 4 times:
> {noformat}
> insert into test.stcs (a, b) VALUES ( 1, 1);
> nodetool flush test stcs
> ls /test/stcs-*
> {noformat}
> See a bunch of sstables where STCS should have kicked in and compacted them 
> down some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9963) Compaction not starting for new tables

2015-08-03 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652032#comment-14652032
 ] 

Marcus Eriksson commented on CASSANDRA-9963:


yeah, will fix

> Compaction not starting for new tables
> --
>
> Key: CASSANDRA-9963
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9963
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeremiah Jordan
>Assignee: Marcus Eriksson
> Fix For: 2.1.x
>
> Attachments: 0001-dont-use-isEnabled-since-that-checks-isActive.patch
>
>
> Something committed since 2.1.8 broke cassandra-2.1 HEAD
> {noformat}
> create keyspace test with replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> create table test.stcs ( a int PRIMARY KEY , b int);
> {noformat}
> repeat  more than 4 times:
> {noformat}
> insert into test.stcs (a, b) VALUES ( 1, 1);
> nodetool flush test stcs
> ls /test/stcs-*
> {noformat}
> See a bunch of sstables where STCS should have kicked in and compacted them 
> down some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-9998) LEAK DETECTED with snapshot/sequential repairs

2015-08-06 Thread Marcus Eriksson (JIRA)
Marcus Eriksson created CASSANDRA-9998:
--

 Summary: LEAK DETECTED with snapshot/sequential repairs
 Key: CASSANDRA-9998
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9998
 Project: Cassandra
  Issue Type: Bug
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson


http://cassci.datastax.com/job/cassandra-2.1_dtest/lastCompletedBuild/testReport/repair_test/TestRepair/simple_sequential_repair_test/

does not happen if I add -par to the test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9998) LEAK DETECTED with snapshot/sequential repairs

2015-08-06 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-9998:
---
Fix Version/s: 2.2.x
   2.1.x
   3.x

> LEAK DETECTED with snapshot/sequential repairs
> --
>
> Key: CASSANDRA-9998
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9998
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 3.x, 2.1.x, 2.2.x
>
>
> http://cassci.datastax.com/job/cassandra-2.1_dtest/lastCompletedBuild/testReport/repair_test/TestRepair/simple_sequential_repair_test/
> does not happen if I add -par to the test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9965) ColumnFamilyStore.setCompactionStrategyClass() is (somewhat) broken

2015-08-06 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660682#comment-14660682
 ] 

Marcus Eriksson commented on CASSANDRA-9965:


pushed with nits fixed:
https://github.com/krummas/cassandra/commits/marcuse/9965
https://github.com/krummas/cassandra/commits/marcuse/9965-2.2
https://github.com/krummas/cassandra/commits/marcuse/9965-3.0
https://github.com/krummas/cassandra/commits/marcuse/9965-trunk

note that for 3.0+ I removed the deprecated methods and added a jmx method to 
get the globally configured compaction strategy, so that users can compare and 
see if the local one is changed.

tests (soon):
http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-9965-dtest/
http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-9965-2.2-dtest/
http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-9965-3.0-dtest/
http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-9965-trunk-dtest/

http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-9965-testall/
http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-9965-2.2-testall/
http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-9965-3.0-testall/
http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-9965-trunk-testall/



> ColumnFamilyStore.setCompactionStrategyClass() is (somewhat) broken
> ---
>
> Key: CASSANDRA-9965
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9965
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Aleksey Yeschenko
>Assignee: Marcus Eriksson
>Priority: Minor
> Fix For: 2.1.x, 2.2.x, 3.0.0 rc1
>
>
> {{ColumnFamilyStore.setCompactionStrategyClass()}} should get the same 
> treatment wrt JMX/schema switches that {{enabled}} got in CASSANDRA-9899.
> It should also not alter the {{CFMetaData}} object directly, ever. Only DDL 
> statements should be allowed to do that.
> CASSANDRA-9712 will temporarily throw UOE for that call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10015) Create tool to debug why expired sstables are not getting dropped

2015-08-07 Thread Marcus Eriksson (JIRA)
Marcus Eriksson created CASSANDRA-10015:
---

 Summary: Create tool to debug why expired sstables are not getting 
dropped
 Key: CASSANDRA-10015
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10015
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
 Fix For: 3.x, 2.1.x, 2.0.x, 2.2.x


Sometimes fully expired sstables are not getting dropped, and it is a real pain 
to manually find out why.

A tool that outputs which sstable blocks (by having older data than the newest 
tombstone in an expired sstable) expired ones would save a lot of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10015) Create tool to debug why expired sstables are not getting dropped

2015-08-07 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662010#comment-14662010
 ] 

Marcus Eriksson commented on CASSANDRA-10015:
-

https://github.com/krummas/cassandra/commits/marcuse/check_for_expired_blockers 
has the patch, I will write up a dtest that runs the tool as well

> Create tool to debug why expired sstables are not getting dropped
> -
>
> Key: CASSANDRA-10015
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10015
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 3.x, 2.1.x, 2.0.x, 2.2.x
>
>
> Sometimes fully expired sstables are not getting dropped, and it is a real 
> pain to manually find out why.
> A tool that outputs which sstable blocks (by having older data than the 
> newest tombstone in an expired sstable) expired ones would save a lot of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9623) Added column does not sort as the last column

2015-08-07 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662290#comment-14662290
 ] 

Marcus Eriksson commented on CASSANDRA-9623:


Could you post full logs for the node (or atleast a while before and after the 
exception)?

> Added column does not sort as the last column
> -
>
> Key: CASSANDRA-9623
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9623
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcin Pietraszek
>Assignee: Marcus Eriksson
> Fix For: 2.0.x
>
>
> After adding new machines to existing cluster running cleanup one of the 
> tables ends with:
> {noformat}
> ERROR [CompactionExecutor:1015] 2015-06-19 11:24:05,038 CassandraDaemon.java 
> (line 199) Exception in thread Thread[CompactionExecutor:1015,1,main]
> java.lang.AssertionError: Added column does not sort as the last column
> at 
> org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:116)
> at 
> org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:121)
> at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:155)
> at 
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:186)
> at 
> org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:98)
> at 
> org.apache.cassandra.db.compaction.PrecompactedRow.(PrecompactedRow.java:85)
> at 
> org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:196)
> at 
> org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:74)
> at 
> org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:55)
> at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115)
> at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98)
> at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:161)
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
> at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
> at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We're using patched 2.0.13-190ef4f



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-6434) Repair-aware gc grace period

2015-08-07 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662320#comment-14662320
 ] 

Marcus Eriksson commented on CASSANDRA-6434:


rebased on 3.0 here: 
https://github.com/krummas/cassandra/commits/marcuse/6434-3.0

> Repair-aware gc grace period 
> -
>
> Key: CASSANDRA-6434
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6434
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: sankalp kohli
>Assignee: Marcus Eriksson
> Fix For: 3.0 beta 1
>
>
> Since the reason for gcgs is to ensure that we don't purge tombstones until 
> every replica has been notified, it's redundant in a world where we're 
> tracking repair times per sstable (and repairing frequentily), i.e., a world 
> where we default to incremental repair a la CASSANDRA-5351.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10015) Create tool to debug why expired sstables are not getting dropped

2015-08-10 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14679812#comment-14679812
 ] 

Marcus Eriksson commented on CASSANDRA-10015:
-

bq. Do we need a Windows script and why do we have some sstable* scripts in bin 
and some in tools/bin?
I will add one in 2.1 as that is where we started supporting Windows
bq. In the documentation at the top of SSTableExpiredBlocker I would reference 
the ticket number and perhaps explicitly spell out how an sstable can block 
expired sstables, rather than just limiting it to "cover anything in other 
sstables". Something like what's mentioned in this ticket description would 
maybe be a little clearer.
done
bq. I would replace columnfamily with table, at least in the usage and error 
messages.
fixed, will rename variables in 3.0+

dtest:
https://github.com/krummas/cassandra-dtest/commits/marcuse/10015
which requires this ccm patch:
https://github.com/krummas/ccm/commits/marcuse/10015
and nits fixed here:
https://github.com/krummas/cassandra/commits/marcuse/10015



> Create tool to debug why expired sstables are not getting dropped
> -
>
> Key: CASSANDRA-10015
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10015
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 3.x, 2.1.x, 2.0.x, 2.2.x
>
>
> Sometimes fully expired sstables are not getting dropped, and it is a real 
> pain to manually find out why.
> A tool that outputs which sstable blocks (by having older data than the 
> newest tombstone in an expired sstable) expired ones would save a lot of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10015) Create tool to debug why expired sstables are not getting dropped

2015-08-10 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14679812#comment-14679812
 ] 

Marcus Eriksson edited comment on CASSANDRA-10015 at 8/10/15 8:59 AM:
--

bq. Do we need a Windows script and why do we have some sstable* scripts in bin 
and some in tools/bin?
I will add one in 2.1 as that is where we started supporting Windows. And, 
regarding the bin/ and tools/bin stuff: CASSANDRA-7160
bq. In the documentation at the top of SSTableExpiredBlocker I would reference 
the ticket number and perhaps explicitly spell out how an sstable can block 
expired sstables, rather than just limiting it to "cover anything in other 
sstables". Something like what's mentioned in this ticket description would 
maybe be a little clearer.
done
bq. I would replace columnfamily with table, at least in the usage and error 
messages.
fixed, will rename variables in 3.0+

dtest:
https://github.com/krummas/cassandra-dtest/commits/marcuse/10015
which requires this ccm patch:
https://github.com/krummas/ccm/commits/marcuse/10015
and nits fixed here:
https://github.com/krummas/cassandra/commits/marcuse/10015




was (Author: krummas):
bq. Do we need a Windows script and why do we have some sstable* scripts in bin 
and some in tools/bin?
I will add one in 2.1 as that is where we started supporting Windows
bq. In the documentation at the top of SSTableExpiredBlocker I would reference 
the ticket number and perhaps explicitly spell out how an sstable can block 
expired sstables, rather than just limiting it to "cover anything in other 
sstables". Something like what's mentioned in this ticket description would 
maybe be a little clearer.
done
bq. I would replace columnfamily with table, at least in the usage and error 
messages.
fixed, will rename variables in 3.0+

dtest:
https://github.com/krummas/cassandra-dtest/commits/marcuse/10015
which requires this ccm patch:
https://github.com/krummas/ccm/commits/marcuse/10015
and nits fixed here:
https://github.com/krummas/cassandra/commits/marcuse/10015



> Create tool to debug why expired sstables are not getting dropped
> -
>
> Key: CASSANDRA-10015
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10015
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 3.x, 2.1.x, 2.0.x, 2.2.x
>
>
> Sometimes fully expired sstables are not getting dropped, and it is a real 
> pain to manually find out why.
> A tool that outputs which sstable blocks (by having older data than the 
> newest tombstone in an expired sstable) expired ones would save a lot of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-9266) Repair failed with error Already repairing SSTableReader(path='/path/to/keyspace/column_family/keyspace-column_family--Data.db'), can not continue.

2015-08-10 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson resolved CASSANDRA-9266.

   Resolution: Cannot Reproduce
Fix Version/s: (was: 2.1.x)

resolving due to lack of updates in 1+ months

> Repair failed with error Already repairing 
> SSTableReader(path='/path/to/keyspace/column_family/keyspace-column_family--Data.db'),
>  can not continue.
> ---
>
> Key: CASSANDRA-9266
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9266
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 2.1.3
>Reporter: Razi Khaja
>Assignee: Marcus Eriksson
> Attachments: cassandra_system.log.partial.cass320.CASSANDRA-9266.txt, 
> cassandra_system.log.partial.cass321.CASSANDRA-9266.txt, 
> cassandra_system.log.partial.cass322.CASSANDRA-9266.txt, 
> cassandra_system.log.partial.cass323.CASSANDRA-9266.txt
>
>
> I am running incremental parallel repair using the following command:
> {code}
> nodetool --host my_hostname repair --incremental --in-local-dc --parallel
> {code}
> and get the following error:
> {code}
> Repair failed with error Already repairing 
> SSTableReader(path='/path/to/keyspace/column_family/keyspace-column_family--Data.db'),
>  can not continue.
> {code}
> I have 3 data centers, each with 4 nodes. Neither incremental or full repair 
> is running on any of my other nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-9997) Document removal of cold_reads_to_omit in 2.2 and 3.0 NEWS.txt

2015-08-10 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson resolved CASSANDRA-9997.

   Resolution: Fixed
Fix Version/s: (was: 3.0.0 rc1)
   3.0 beta 1

we don't expect create table scripts to be backwards compatible between 2.1 and 
2.2, ninja-added the NEWS.txt change as 4f14c85471a6ee443429dd3da06d9cc8f56feeca

> Document removal of cold_reads_to_omit in 2.2 and 3.0 NEWS.txt
> --
>
> Key: CASSANDRA-9997
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9997
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tommy Stendahl
>Assignee: Marcus Eriksson
>Priority: Minor
> Fix For: 2.2.1, 3.0 beta 1
>
>
> Shouldn’t 2.2 be backwards compatible with 2.1? 
> The removal of the cold_reads_to_omit parameter in 
> SizeTieredCompactionStrategy breaks backwards compatibility, wouldn't it be 
> better to keep the same behaviour as in later 2.1 versions, the parameter is 
> there but doesn’t do anything. The cold_reads_to_omit parameter should be 
> removed in 3.0.
> This could be fixed by applying the patch in CASSANDRA-9203 on the 2.2 branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9577) Cassandra not performing GC on stale SStables after compaction

2015-08-10 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1467#comment-1467
 ] 

Marcus Eriksson commented on CASSANDRA-9577:


So, it seems individual nodes end up in this state randomly, and the only way 
to get out of it is by decommissioning the problematic node? I have to say that 
this makes it a bit hard to reproduce/fix.

> Cassandra not performing GC on stale SStables after compaction
> --
>
> Key: CASSANDRA-9577
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9577
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: 2.0.12.200 / DSE 4.6.1.
>Reporter: Jeff Ferland
>Assignee: Marcus Eriksson
>
>   Space used (live), bytes:   878681716067
>   Space used (total), bytes: 2227857083852
> jbf@ip-10-0-2-98:/ebs/cassandra/data/trends/trends$ sudo lsof *-Data.db 
> COMMAND  PID  USER   FD   TYPE DEVICE SIZE/OFF  NODE NAME
> java4473 cassandra  446r   REG   0,26  17582559172 39241 
> trends-trends-jb-144864-Data.db
> java4473 cassandra  448r   REG   0,26 62040962 37431 
> trends-trends-jb-144731-Data.db
> java4473 cassandra  449r   REG   0,26 829935047545 21150 
> trends-trends-jb-143581-Data.db
> java4473 cassandra  452r   REG   0,26  8980406 39503 
> trends-trends-jb-144882-Data.db
> java4473 cassandra  454r   REG   0,26  8980406 39503 
> trends-trends-jb-144882-Data.db
> java4473 cassandra  462r   REG   0,26  9487703 39542 
> trends-trends-jb-144883-Data.db
> java4473 cassandra  463r   REG   0,26 36158226 39629 
> trends-trends-jb-144889-Data.db
> java4473 cassandra  468r   REG   0,26105693505 39447 
> trends-trends-jb-144881-Data.db
> java4473 cassandra  530r   REG   0,26  17582559172 39241 
> trends-trends-jb-144864-Data.db
> java4473 cassandra  535r   REG   0,26105693505 39447 
> trends-trends-jb-144881-Data.db
> java4473 cassandra  542r   REG   0,26  9487703 39542 
> trends-trends-jb-144883-Data.db
> java4473 cassandra  553u   REG   0,26   6431729821 39556 
> trends-trends-tmp-jb-144884-Data.db
> jbf@ip-10-0-2-98:/ebs/cassandra/data/trends/trends$ ls *-Data.db
> trends-trends-jb-142631-Data.db  trends-trends-jb-143562-Data.db  
> trends-trends-jb-143581-Data.db  trends-trends-jb-144731-Data.db  
> trends-trends-jb-144883-Data.db
> trends-trends-jb-142633-Data.db  trends-trends-jb-143563-Data.db  
> trends-trends-jb-144530-Data.db  trends-trends-jb-144864-Data.db  
> trends-trends-jb-144889-Data.db
> trends-trends-jb-143026-Data.db  trends-trends-jb-143564-Data.db  
> trends-trends-jb-144551-Data.db  trends-trends-jb-144881-Data.db  
> trends-trends-tmp-jb-144884-Data.db
> trends-trends-jb-143533-Data.db  trends-trends-jb-143578-Data.db  
> trends-trends-jb-144552-Data.db  trends-trends-jb-144882-Data.db
> jbf@ip-10-0-2-98:/ebs/cassandra/data/trends/trends$ cd -
> /mnt/cassandra/data/trends/trends
> jbf@ip-10-0-2-98:/mnt/cassandra/data/trends/trends$ sudo lsof * 
> jbf@ip-10-0-2-98:/mnt/cassandra/data/trends/trends$ ls *-Data.db
> trends-trends-jb-124502-Data.db  trends-trends-jb-141113-Data.db  
> trends-trends-jb-141377-Data.db  trends-trends-jb-141846-Data.db  
> trends-trends-jb-144890-Data.db
> trends-trends-jb-125457-Data.db  trends-trends-jb-141123-Data.db  
> trends-trends-jb-141391-Data.db  trends-trends-jb-141871-Data.db  
> trends-trends-jb-41121-Data.db
> trends-trends-jb-130016-Data.db  trends-trends-jb-141137-Data.db  
> trends-trends-jb-141538-Data.db  trends-trends-jb-141883-Data.db  
> trends-trends.trends_date_idx-jb-2100-Data.db
> trends-trends-jb-139563-Data.db  trends-trends-jb-141358-Data.db  
> trends-trends-jb-141806-Data.db  trends-trends-jb-142033-Data.db
> trends-trends-jb-141102-Data.db  trends-trends-jb-141363-Data.db  
> trends-trends-jb-141829-Data.db  trends-trends-jb-144553-Data.db
> Compaction started  INFO [CompactionExecutor:6661] 2015-06-05 14:02:36,515 
> CompactionTask.java (line 120) Compacting 
> [SSTableReader(path='/mnt/cassandra/data/trends/trends/trends-trends-jb-124502-Data.db'),
>  
> SSTableReader(path='/mnt/cassandra/data/trends/trends/trends-trends-jb-141358-Data.db'),
>  
> SSTableReader(path='/mnt/cassandra/data/trends/trends/trends-trends-jb-141883-Data.db'),
>  
> SSTableReader(path='/mnt/cassandra/data/trends/trends/trends-trends-jb-141846-Data.db'),
>  
> SSTableReader(path='/mnt/cassandra/data/trends/trends/trends-trends-jb-141871-Data.db'),
>  
> SSTableReader(path='/mnt/cassandra/data/trends/trends/trends-trends-jb-141391-Data.db'),
>  
> SSTableReader(path='/mnt/cassandra/data/trends/trends/trends-trends-jb-139563-Data.db'),
>  
> SSTableReader(path='/mnt/cassandra/data/trends/trends/trends-trends-jb-125457-Data.db'

[jira] [Assigned] (CASSANDRA-9882) DTCS (maybe other strategies) can block flushing when there are lots of sstables

2015-08-11 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson reassigned CASSANDRA-9882:
--

Assignee: Marcus Eriksson

> DTCS (maybe other strategies) can block flushing when there are lots of 
> sstables
> 
>
> Key: CASSANDRA-9882
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9882
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeremiah Jordan
>Assignee: Marcus Eriksson
>  Labels: dtcs
> Fix For: 2.1.x, 2.2.x
>
>
> MemtableFlushWriter tasks can get blocked by Compaction 
> getNextBackgroundTask.  This is in a wonky cluster with 200k sstables in the 
> CF, but seems bad for flushing to be blocked by getNextBackgroundTask when we 
> are trying to make these new "smart" strategies that may take some time to 
> calculate what to do.
> {noformat}
> "MemtableFlushWriter:21" daemon prio=10 tid=0x7ff7ad965000 nid=0x6693 
> waiting for monitor entry [0x7ff78a667000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:237)
>   - waiting to lock <0x0006fcdbbf60> (a 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
>   at org.apache.cassandra.db.DataTracker.notifyAdded(DataTracker.java:518)
>   at 
> org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:178)
>   at 
> org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1475)
>   at 
> org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:336)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1127)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - <0x000743b3ac38> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
> "MemtableFlushWriter:19" daemon prio=10 tid=0x7ff7ac57a000 nid=0x649b 
> waiting for monitor entry [0x7ff78b8ee000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:237)
>   - waiting to lock <0x0006fcdbbf60> (a 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
>   at org.apache.cassandra.db.DataTracker.notifyAdded(DataTracker.java:518)
>   at 
> org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:178)
>   at 
> org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1475)
>   at 
> org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:336)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1127)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> "CompactionExecutor:14" daemon prio=10 tid=0x7ff7ad359800 nid=0x4d59 
> runnable [0x7fecce3ea000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.cassandra.io.sstable.SSTableReader.equals(SSTableReader.java:628)
>   at 
> com.google.common.collect.ImmutableSet.construct(ImmutableSet.java:206)
>   at 
> com.google.common.collect.ImmutableSet.construct(ImmutableSet.java:220)
>   at 
> com.google.common.collect.ImmutableSet.access$000(ImmutableSet.java:74)
>   at 
> com.google.common.collect.ImmutableSet$Builder.build(ImmutableSet.java:531)
>   at com.google.common.collect.Sets$1.immutableCopy(Sets.java:606)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.getOverlappingSSTables(ColumnFamilyStore.java:1352)
>   at 
> org.apache.cassandra.db.compaction.DateTieredCompactionStrategy.getNextBackground

[jira] [Commented] (CASSANDRA-9882) DTCS (maybe other strategies) can block flushing when there are lots of sstables

2015-08-11 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682314#comment-14682314
 ] 

Marcus Eriksson commented on CASSANDRA-9882:


Pushed another commit to the repo that fixes the unit test failure: 
http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-9882-testall/
 hopefully this will trigger a proper dtest run as well. It also sets the 
lastExpiredCheck time after the calculation to always have 10 minutes of 
non-calculation 

> DTCS (maybe other strategies) can block flushing when there are lots of 
> sstables
> 
>
> Key: CASSANDRA-9882
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9882
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeremiah Jordan
>Assignee: Marcus Eriksson
>  Labels: dtcs
> Fix For: 2.1.x, 2.2.x
>
>
> MemtableFlushWriter tasks can get blocked by Compaction 
> getNextBackgroundTask.  This is in a wonky cluster with 200k sstables in the 
> CF, but seems bad for flushing to be blocked by getNextBackgroundTask when we 
> are trying to make these new "smart" strategies that may take some time to 
> calculate what to do.
> {noformat}
> "MemtableFlushWriter:21" daemon prio=10 tid=0x7ff7ad965000 nid=0x6693 
> waiting for monitor entry [0x7ff78a667000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:237)
>   - waiting to lock <0x0006fcdbbf60> (a 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
>   at org.apache.cassandra.db.DataTracker.notifyAdded(DataTracker.java:518)
>   at 
> org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:178)
>   at 
> org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1475)
>   at 
> org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:336)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1127)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - <0x000743b3ac38> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
> "MemtableFlushWriter:19" daemon prio=10 tid=0x7ff7ac57a000 nid=0x649b 
> waiting for monitor entry [0x7ff78b8ee000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:237)
>   - waiting to lock <0x0006fcdbbf60> (a 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
>   at org.apache.cassandra.db.DataTracker.notifyAdded(DataTracker.java:518)
>   at 
> org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:178)
>   at 
> org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1475)
>   at 
> org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:336)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1127)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> "CompactionExecutor:14" daemon prio=10 tid=0x7ff7ad359800 nid=0x4d59 
> runnable [0x7fecce3ea000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.cassandra.io.sstable.SSTableReader.equals(SSTableReader.java:628)
>   at 
> com.google.common.collect.ImmutableSet.construct(ImmutableSet.java:206)
>   at 
> com.google.common.collect.ImmutableSet.construct(ImmutableSet.java:220)
>   at 
> com.google.common.collect.ImmutableSet.access$000(ImmutableSet.java:74)
>   at 
> com.google.co

[jira] [Comment Edited] (CASSANDRA-9882) DTCS (maybe other strategies) can block flushing when there are lots of sstables

2015-08-11 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682314#comment-14682314
 ] 

Marcus Eriksson edited comment on CASSANDRA-9882 at 8/11/15 7:06 PM:
-

Pushed another commit to the repo that fixes the unit test failure: 
http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-9882-testall/
 hopefully this will trigger a proper dtest run as well: 
http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-9882-dtest/.
 It also sets the lastExpiredCheck time after the calculation to always have 10 
minutes of non-calculation 


was (Author: krummas):
Pushed another commit to the repo that fixes the unit test failure: 
http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-9882-testall/
 hopefully this will trigger a proper dtest run as well. It also sets the 
lastExpiredCheck time after the calculation to always have 10 minutes of 
non-calculation 

> DTCS (maybe other strategies) can block flushing when there are lots of 
> sstables
> 
>
> Key: CASSANDRA-9882
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9882
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeremiah Jordan
>Assignee: Marcus Eriksson
>  Labels: dtcs
> Fix For: 2.1.x, 2.2.x
>
>
> MemtableFlushWriter tasks can get blocked by Compaction 
> getNextBackgroundTask.  This is in a wonky cluster with 200k sstables in the 
> CF, but seems bad for flushing to be blocked by getNextBackgroundTask when we 
> are trying to make these new "smart" strategies that may take some time to 
> calculate what to do.
> {noformat}
> "MemtableFlushWriter:21" daemon prio=10 tid=0x7ff7ad965000 nid=0x6693 
> waiting for monitor entry [0x7ff78a667000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:237)
>   - waiting to lock <0x0006fcdbbf60> (a 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
>   at org.apache.cassandra.db.DataTracker.notifyAdded(DataTracker.java:518)
>   at 
> org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:178)
>   at 
> org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1475)
>   at 
> org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:336)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1127)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - <0x000743b3ac38> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
> "MemtableFlushWriter:19" daemon prio=10 tid=0x7ff7ac57a000 nid=0x649b 
> waiting for monitor entry [0x7ff78b8ee000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:237)
>   - waiting to lock <0x0006fcdbbf60> (a 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
>   at org.apache.cassandra.db.DataTracker.notifyAdded(DataTracker.java:518)
>   at 
> org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:178)
>   at 
> org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1475)
>   at 
> org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:336)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1127)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> "CompactionExecutor:14" daemon prio=10 tid=0x

[jira] [Commented] (CASSANDRA-9882) DTCS (maybe other strategies) can block flushing when there are lots of sstables

2015-08-12 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14693305#comment-14693305
 ] 

Marcus Eriksson commented on CASSANDRA-9882:


I'm not sure this is the way forward, the only(?) time we loop here is if we 
fail to mark sstables as compacting, and that should be very rare. If it is not 
very rare, this patch could stall compactions for 5 minutes. Perhaps we could 
move the looping outside of the synchronized block though? Something like this: 
https://github.com/krummas/cassandra/blob/79191d5d04d29e386855a04bfc080c49fe5a97b4/src/java/org/apache/cassandra/db/compaction/LeveledCompactionStrategy.java#L109-L157
 might work?

I also really think we need to do what is in my patch above as calculating the 
expired sstables can take a very long time with many sstables (does not really 
matter if we loop or not if that method takes minutes to call)

> DTCS (maybe other strategies) can block flushing when there are lots of 
> sstables
> 
>
> Key: CASSANDRA-9882
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9882
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeremiah Jordan
>Assignee: Marcus Eriksson
>  Labels: dtcs
> Fix For: 2.1.x, 2.2.x
>
>
> MemtableFlushWriter tasks can get blocked by Compaction 
> getNextBackgroundTask.  This is in a wonky cluster with 200k sstables in the 
> CF, but seems bad for flushing to be blocked by getNextBackgroundTask when we 
> are trying to make these new "smart" strategies that may take some time to 
> calculate what to do.
> {noformat}
> "MemtableFlushWriter:21" daemon prio=10 tid=0x7ff7ad965000 nid=0x6693 
> waiting for monitor entry [0x7ff78a667000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:237)
>   - waiting to lock <0x0006fcdbbf60> (a 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
>   at org.apache.cassandra.db.DataTracker.notifyAdded(DataTracker.java:518)
>   at 
> org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:178)
>   at 
> org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1475)
>   at 
> org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:336)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1127)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - <0x000743b3ac38> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
> "MemtableFlushWriter:19" daemon prio=10 tid=0x7ff7ac57a000 nid=0x649b 
> waiting for monitor entry [0x7ff78b8ee000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:237)
>   - waiting to lock <0x0006fcdbbf60> (a 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
>   at org.apache.cassandra.db.DataTracker.notifyAdded(DataTracker.java:518)
>   at 
> org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:178)
>   at 
> org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1475)
>   at 
> org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:336)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1127)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> "CompactionExecutor:14" daemon prio=10 tid=0x7ff7ad359800 nid=0x4d59 
> runnable [0x7fecce3ea000]
>jav

[jira] [Commented] (CASSANDRA-9882) DTCS (maybe other strategies) can block flushing when there are lots of sstables

2015-08-13 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695094#comment-14695094
 ] 

Marcus Eriksson commented on CASSANDRA-9882:


Been running some experiments with 10k sstables, and the majority of the time 
(95%+) spent is spent on calculating overlapping sstables for 
getFullyExpiredSSTables.

Patch here: 
https://github.com/krummas/cassandra/commits/marcuse/9882-overlapping - it does 
the following;
* break out the looping into non-synchronized methods
* only check for expired sstables every 10 minutes
* and the scary part: normalizes the intervals searched in 
getOverlappingSSTables - if we give getOverlappingSSTables many overlapping 
sstables to search (which we likely do in the DTCS case), we will reduce the 
number of times we search the interval tree a lot.

> DTCS (maybe other strategies) can block flushing when there are lots of 
> sstables
> 
>
> Key: CASSANDRA-9882
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9882
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeremiah Jordan
>Assignee: Marcus Eriksson
>  Labels: dtcs
> Fix For: 2.1.x, 2.2.x
>
>
> MemtableFlushWriter tasks can get blocked by Compaction 
> getNextBackgroundTask.  This is in a wonky cluster with 200k sstables in the 
> CF, but seems bad for flushing to be blocked by getNextBackgroundTask when we 
> are trying to make these new "smart" strategies that may take some time to 
> calculate what to do.
> {noformat}
> "MemtableFlushWriter:21" daemon prio=10 tid=0x7ff7ad965000 nid=0x6693 
> waiting for monitor entry [0x7ff78a667000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:237)
>   - waiting to lock <0x0006fcdbbf60> (a 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
>   at org.apache.cassandra.db.DataTracker.notifyAdded(DataTracker.java:518)
>   at 
> org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:178)
>   at 
> org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1475)
>   at 
> org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:336)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1127)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - <0x000743b3ac38> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
> "MemtableFlushWriter:19" daemon prio=10 tid=0x7ff7ac57a000 nid=0x649b 
> waiting for monitor entry [0x7ff78b8ee000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:237)
>   - waiting to lock <0x0006fcdbbf60> (a 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
>   at org.apache.cassandra.db.DataTracker.notifyAdded(DataTracker.java:518)
>   at 
> org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:178)
>   at 
> org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1475)
>   at 
> org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:336)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1127)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> "CompactionExecutor:14" daemon prio=10 tid=0x7ff7ad359800 nid=0x4d59 
> runnable [0x7fecce3ea000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.cassandra.io.sstable.

[jira] [Commented] (CASSANDRA-10068) Batchlog replay fails with exception after a node is decommissioned

2015-08-14 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14696884#comment-14696884
 ] 

Marcus Eriksson commented on CASSANDRA-10068:
-

bq. java.lang.RuntimeException: Trying to get the view natural endpoint on a 
non-data replica
this is due to the fact that while we are decommissioning, the leaving node is 
still in TokenMetadata so the nodes receiving the rows don't think they should 
own them. Patch here: 
https://github.com/krummas/cassandra/commits/marcuse/10068 that solves that. 
DTest here: https://github.com/krummas/cassandra-dtest/commits/marcuse/10068

[~jkni] I doubt this is related to the other errors you are seeing so I will 
keep looking for that, but could you rerun the test just to make sure it is not 
related?

> Batchlog replay fails with exception after a node is decommissioned
> ---
>
> Key: CASSANDRA-10068
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10068
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Joel Knighton
>Assignee: Marcus Eriksson
> Fix For: 3.0.0 rc1
>
> Attachments: n1.log, n2.log, n3.log, n4.log, n5.log
>
>
> This issue is reproducible through a Jepsen test of materialized views that 
> crashes and decommissions nodes throughout the test.
> At the conclusion of the test, a batchlog replay is initiated through 
> nodetool and hits the following assertion due to a missing host ID: 
> https://github.com/apache/cassandra/blob/3413e557b95d9448b0311954e9b4f53eaf4758cd/src/java/org/apache/cassandra/service/StorageProxy.java#L1197
> A nodetool status on the node with failed batchlog replay shows the following 
> entry for the decommissioned node:
> DN  10.0.0.5  ?  256  ?   null
>   rack1
> On the unaffected nodes, there is no entry for the decommissioned node as 
> expected.
> There are occasional hits of the same assertions for logs in other nodes; it 
> looks like the issue might occasionally resolve itself, but one node seems to 
> have the errant null entry indefinitely.
> In logs for the nodes, this possibly unrelated exception also appears:
> java.lang.RuntimeException: Trying to get the view natural endpoint on a 
> non-data replica
>   at 
> org.apache.cassandra.db.view.MaterializedViewUtils.getViewNaturalEndpoint(MaterializedViewUtils.java:91)
>  ~[apache-cassandra-3.0.0-alpha1-SNAPSHOT.jar:3.0.0-alpha1-SNAPSHOT]
> I have a running cluster with the issue on my machine; it is also repeatable.
> Nothing stands out in the logs of the decommissioned node (n4) for me. The 
> logs of each node in the cluster are attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9117) LEAK DETECTED during repair, startup

2015-08-14 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697301#comment-14697301
 ] 

Marcus Eriksson commented on CASSANDRA-9117:


[~victortrac] yours is probably CASSANDRA-9998
[~sebastian.este...@datastax.com] could you open a new ticket for that?

> LEAK DETECTED during repair, startup
> 
>
> Key: CASSANDRA-9117
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9117
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Tyler Hobbs
>Assignee: Marcus Eriksson
> Fix For: 2.2.0 beta 1
>
> Attachments: 
> 0001-dont-initialize-writer-before-checking-if-iter-is-em.patch, node1.log, 
> node2.log.gz
>
>
> When running the 
> {{incremental_repair_test.TestIncRepair.multiple_repair_test}} dtest, the 
> following error logs show up:
> {noformat}
> ERROR [Reference-Reaper:1] 2015-04-03 15:48:25,491 Ref.java:181 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@83f047e) to class 
> org.apache.cassandra.io.util.SafeMemory$MemoryTidy@1631580268:Memory@[7f354800bdc0..7f354800bde8)
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2015-04-03 15:48:25,493 Ref.java:181 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@50bc8f67) to class 
> org.apache.cassandra.io.util.SafeMemory$MemoryTidy@191552666:Memory@[7f354800ba90..7f354800bdb0)
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2015-04-03 15:48:25,493 Ref.java:181 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@7fd10877) to class 
> org.apache.cassandra.io.util.SafeMemory$MemoryTidy@1954741807:Memory@[7f3548101190..7f3548101194)
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2015-04-03 15:48:25,494 Ref.java:181 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@578550ac) to class 
> org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@1903393047:[[OffHeapBitSet]]
>  was not released before the reference was garbage collected
> {noformat}
> The test is being run against trunk (commit {{1dff098e}}).  I've attached a 
> DEBUG-level log from the test run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9130) reduct default dtcs max_sstable_age

2015-08-16 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-9130:
---
Assignee: Jeff Jirsa  (was: Marcus Eriksson)

> reduct default dtcs max_sstable_age
> ---
>
> Key: CASSANDRA-9130
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9130
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jonathan Ellis
>Assignee: Jeff Jirsa
>Priority: Minor
> Fix For: 3.x, 2.1.x, 2.0.x
>
>
> Now that CASSANDRA-9056 is fixed it should be safe to reduce the default age 
> and increase performance correspondingly.  [~jshook] suggests that two weeks 
> may be appropriate, or we could make it dynamic based on gcgs (since that's 
> the window past which we should expect repair to not introduce fragmentation 
> anymore).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (CASSANDRA-9882) DTCS (maybe other strategies) can block flushing when there are lots of sstables

2015-08-17 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson reopened CASSANDRA-9882:


missed a dtest failure, reopening to fix

reason is that we only check for expired every 10 minutes now: 
http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-marcuse-9882-2.2-dtest/lastCompletedBuild/testReport/compaction_test/TestCompaction_with_DateTieredCompactionStrategy/dtcs_deletion_test/

> DTCS (maybe other strategies) can block flushing when there are lots of 
> sstables
> 
>
> Key: CASSANDRA-9882
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9882
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeremiah Jordan
>Assignee: Marcus Eriksson
>  Labels: dtcs
> Fix For: 2.1.9, 2.0.17, 2.2.1, 3.0 beta 1
>
>
> MemtableFlushWriter tasks can get blocked by Compaction 
> getNextBackgroundTask.  This is in a wonky cluster with 200k sstables in the 
> CF, but seems bad for flushing to be blocked by getNextBackgroundTask when we 
> are trying to make these new "smart" strategies that may take some time to 
> calculate what to do.
> {noformat}
> "MemtableFlushWriter:21" daemon prio=10 tid=0x7ff7ad965000 nid=0x6693 
> waiting for monitor entry [0x7ff78a667000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:237)
>   - waiting to lock <0x0006fcdbbf60> (a 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
>   at org.apache.cassandra.db.DataTracker.notifyAdded(DataTracker.java:518)
>   at 
> org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:178)
>   at 
> org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1475)
>   at 
> org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:336)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1127)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - <0x000743b3ac38> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
> "MemtableFlushWriter:19" daemon prio=10 tid=0x7ff7ac57a000 nid=0x649b 
> waiting for monitor entry [0x7ff78b8ee000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:237)
>   - waiting to lock <0x0006fcdbbf60> (a 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
>   at org.apache.cassandra.db.DataTracker.notifyAdded(DataTracker.java:518)
>   at 
> org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:178)
>   at 
> org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1475)
>   at 
> org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:336)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1127)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> "CompactionExecutor:14" daemon prio=10 tid=0x7ff7ad359800 nid=0x4d59 
> runnable [0x7fecce3ea000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.cassandra.io.sstable.SSTableReader.equals(SSTableReader.java:628)
>   at 
> com.google.common.collect.ImmutableSet.construct(ImmutableSet.java:206)
>   at 
> com.google.common.collect.ImmutableSet.construct(ImmutableSet.java:220)
>   at 
> com.google.common.collect.ImmutableSet.access$000(ImmutableSet.java:74)
>   at 
> com.google.common.collect.ImmutableSet$Builder.build(Immutabl

[jira] [Commented] (CASSANDRA-9882) DTCS (maybe other strategies) can block flushing when there are lots of sstables

2015-08-17 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699148#comment-14699148
 ] 

Marcus Eriksson commented on CASSANDRA-9882:


fix here: https://github.com/krummas/cassandra/commits/marcuse/9882-fix - adds 
a compaction parameter for DTCS that sets the check frequency to be able to 
override it in tests

dtest fix: https://github.com/krummas/cassandra-dtest/commits/marcuse/9882-fix

> DTCS (maybe other strategies) can block flushing when there are lots of 
> sstables
> 
>
> Key: CASSANDRA-9882
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9882
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeremiah Jordan
>Assignee: Marcus Eriksson
>  Labels: dtcs
> Fix For: 2.1.9, 2.0.17, 2.2.1, 3.0 beta 1
>
>
> MemtableFlushWriter tasks can get blocked by Compaction 
> getNextBackgroundTask.  This is in a wonky cluster with 200k sstables in the 
> CF, but seems bad for flushing to be blocked by getNextBackgroundTask when we 
> are trying to make these new "smart" strategies that may take some time to 
> calculate what to do.
> {noformat}
> "MemtableFlushWriter:21" daemon prio=10 tid=0x7ff7ad965000 nid=0x6693 
> waiting for monitor entry [0x7ff78a667000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:237)
>   - waiting to lock <0x0006fcdbbf60> (a 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
>   at org.apache.cassandra.db.DataTracker.notifyAdded(DataTracker.java:518)
>   at 
> org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:178)
>   at 
> org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1475)
>   at 
> org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:336)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1127)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - <0x000743b3ac38> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
> "MemtableFlushWriter:19" daemon prio=10 tid=0x7ff7ac57a000 nid=0x649b 
> waiting for monitor entry [0x7ff78b8ee000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:237)
>   - waiting to lock <0x0006fcdbbf60> (a 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
>   at org.apache.cassandra.db.DataTracker.notifyAdded(DataTracker.java:518)
>   at 
> org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:178)
>   at 
> org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1475)
>   at 
> org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:336)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1127)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> "CompactionExecutor:14" daemon prio=10 tid=0x7ff7ad359800 nid=0x4d59 
> runnable [0x7fecce3ea000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.cassandra.io.sstable.SSTableReader.equals(SSTableReader.java:628)
>   at 
> com.google.common.collect.ImmutableSet.construct(ImmutableSet.java:206)
>   at 
> com.google.common.collect.ImmutableSet.construct(ImmutableSet.java:220)
>   at 
> com.google.common.collect.ImmutableSet.access$000(ImmutableSet.java:74)
>   at 
> com.google.common.collect.ImmutableSet$Builder.build(

[jira] [Commented] (CASSANDRA-9623) Added column does not sort as the last column

2015-08-17 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699210#comment-14699210
 ] 

Marcus Eriksson commented on CASSANDRA-9623:


[~forsberg] is it on different nodes? Have you tried to scrub the data?

> Added column does not sort as the last column
> -
>
> Key: CASSANDRA-9623
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9623
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcin Pietraszek
>Assignee: Marcus Eriksson
> Fix For: 2.0.x
>
>
> After adding new machines to existing cluster running cleanup one of the 
> tables ends with:
> {noformat}
> ERROR [CompactionExecutor:1015] 2015-06-19 11:24:05,038 CassandraDaemon.java 
> (line 199) Exception in thread Thread[CompactionExecutor:1015,1,main]
> java.lang.AssertionError: Added column does not sort as the last column
> at 
> org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:116)
> at 
> org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:121)
> at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:155)
> at 
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:186)
> at 
> org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:98)
> at 
> org.apache.cassandra.db.compaction.PrecompactedRow.(PrecompactedRow.java:85)
> at 
> org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:196)
> at 
> org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:74)
> at 
> org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:55)
> at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115)
> at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98)
> at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:161)
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
> at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
> at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We're using patched 2.0.13-190ef4f



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10068) Batchlog replay fails with exception after a node is decommissioned

2015-08-17 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699214#comment-14699214
 ] 

Marcus Eriksson commented on CASSANDRA-10068:
-

[~jkni] how do I run this Jepsen test locally?

> Batchlog replay fails with exception after a node is decommissioned
> ---
>
> Key: CASSANDRA-10068
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10068
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Joel Knighton
>Assignee: Marcus Eriksson
> Fix For: 3.0.0 rc1
>
> Attachments: n1.log, n2.log, n3.log, n4.log, n5.log
>
>
> This issue is reproducible through a Jepsen test of materialized views that 
> crashes and decommissions nodes throughout the test.
> At the conclusion of the test, a batchlog replay is initiated through 
> nodetool and hits the following assertion due to a missing host ID: 
> https://github.com/apache/cassandra/blob/3413e557b95d9448b0311954e9b4f53eaf4758cd/src/java/org/apache/cassandra/service/StorageProxy.java#L1197
> A nodetool status on the node with failed batchlog replay shows the following 
> entry for the decommissioned node:
> DN  10.0.0.5  ?  256  ?   null
>   rack1
> On the unaffected nodes, there is no entry for the decommissioned node as 
> expected.
> There are occasional hits of the same assertions for logs in other nodes; it 
> looks like the issue might occasionally resolve itself, but one node seems to 
> have the errant null entry indefinitely.
> In logs for the nodes, this possibly unrelated exception also appears:
> java.lang.RuntimeException: Trying to get the view natural endpoint on a 
> non-data replica
>   at 
> org.apache.cassandra.db.view.MaterializedViewUtils.getViewNaturalEndpoint(MaterializedViewUtils.java:91)
>  ~[apache-cassandra-3.0.0-alpha1-SNAPSHOT.jar:3.0.0-alpha1-SNAPSHOT]
> I have a running cluster with the issue on my machine; it is also repeatable.
> Nothing stands out in the logs of the decommissioned node (n4) for me. The 
> logs of each node in the cluster are attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8671) Give compaction strategy more control over where sstables are created, including for flushing and streaming.

2015-08-17 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699389#comment-14699389
 ] 

Marcus Eriksson commented on CASSANDRA-8671:


could you rebase and push a branch so we get cassci results?

> Give compaction strategy more control over where sstables are created, 
> including for flushing and streaming.
> 
>
> Key: CASSANDRA-8671
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8671
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
> Fix For: 3.x
>
> Attachments: 
> 0001-C8671-creating-sstable-writers-for-flush-and-stream-.patch, 
> 8671-giving-compaction-strategies-more-control-over.txt
>
>
> This would enable routing different partitions to different disks based on 
> some user defined parameters.
> My initial take on how to do this would be to make an interface from 
> SSTableWriter, and have a table's compaction strategy do all SSTableWriter 
> instantiation. Compaction strategies could then implement their own 
> SSTableWriter implementations (which basically wrap one or more normal 
> sstablewriters) for compaction, flushing, and streaming. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10072) "Replica(s) failed to execute read" on simple select on stress-created table with >1 nodes

2015-08-18 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700898#comment-14700898
 ] 

Marcus Eriksson commented on CASSANDRA-10072:
-

The script works on my machine (see below)

Do you run the correct python driver? (iirc 3.0 requires the cassandra-testing 
branch of python-driver) If you do, you will need to attach more logs etc


{code}
oss/cassandra [trunk●] » sh repro.sh



trunk, 1 nodes

select-failure does not appear to be a valid cluster (use ccm list to view 
valid clusters)
http://git-wip-us.apache.org/repos/asf/cassandra.git git:trunk
Cloning Cassandra...
Cloning Cassandra (from local cache)
Checking out requested branch (trunk)
Compiling Cassandra trunk ...
Current cluster is now: select-failure
Failed to connect over JMX; not collecting these stats
[Row(key='0P37709P21', 
C0='\x85\x18/\xdf\xe7\xf8c\x06\xcdX\xce\x81\xaaS\xcc\xa2L\x198A\xd6\xae\x8a\x91djL0M\xd9\xf4x\x92\xf9',
 
C1='\x1f\\\x9f\x8c0\xa6,\xe4\x81^\x93m\xe8\x14QN\xa3>\xd4\xd8_\xa3?\x90"-X\xa5\xa0\xa4\x9b8\xd2\xfe',
 
C2='"U\x10\x93+\xd8\x81+F\x10\x81vS\xb7\x9c\x86U,\x99\xae\xfb\x17\x7fL\xef\xa4\x85\n\x919\xdbinn',
 
C3='\xd7\xe0\x99\xa5\x9d\xa5Y\xe09`\x0fn,\x0e\xde\x94\xba\xda\x8c\xfe]\x8dO\xf2mh\xffY};?h\xf2\xb4',
 
C4='\xfb\xdd\x9c\xec\x02O\xcb\xdeC\x83\x99g\x04u\xaa\x89\x00\xd8\x0e\x0e\xd3\xd0\xc31-\x9dJ\xe7\x92A!Mt\xc3')]

trunk, 2 nodes

http://git-wip-us.apache.org/repos/asf/cassandra.git git:trunk
Fetching Cassandra updates...
Current cluster is now: select-failure
Failed to connect over JMX; not collecting these stats
[Row(key='0P37709P21', 
C0='\x85\x18/\xdf\xe7\xf8c\x06\xcdX\xce\x81\xaaS\xcc\xa2L\x198A\xd6\xae\x8a\x91djL0M\xd9\xf4x\x92\xf9',
 
C1='\x1f\\\x9f\x8c0\xa6,\xe4\x81^\x93m\xe8\x14QN\xa3>\xd4\xd8_\xa3?\x90"-X\xa5\xa0\xa4\x9b8\xd2\xfe',
 
C2='"U\x10\x93+\xd8\x81+F\x10\x81vS\xb7\x9c\x86U,\x99\xae\xfb\x17\x7fL\xef\xa4\x85\n\x919\xdbinn',
 
C3='\xd7\xe0\x99\xa5\x9d\xa5Y\xe09`\x0fn,\x0e\xde\x94\xba\xda\x8c\xfe]\x8dO\xf2mh\xffY};?h\xf2\xb4',
 
C4='\xfb\xdd\x9c\xec\x02O\xcb\xdeC\x83\x99g\x04u\xaa\x89\x00\xd8\x0e\x0e\xd3\xd0\xc31-\x9dJ\xe7\x92A!Mt\xc3')]

cassandra-2.2, 1 nodes

http://git-wip-us.apache.org/repos/asf/cassandra.git git:cassandra-2.2
Fetching Cassandra updates...
Cloning Cassandra (from local cache)
Checking out requested branch (cassandra-2.2)
Compiling Cassandra cassandra-2.2 ...
Current cluster is now: select-failure
Failed to connect over JMX; not collecting these stats
[Row(key='0P37709P21', 
C0='\x85\x18/\xdf\xe7\xf8c\x06\xcdX\xce\x81\xaaS\xcc\xa2L\x198A\xd6\xae\x8a\x91djL0M\xd9\xf4x\x92\xf9',
 
C1='\x1f\\\x9f\x8c0\xa6,\xe4\x81^\x93m\xe8\x14QN\xa3>\xd4\xd8_\xa3?\x90"-X\xa5\xa0\xa4\x9b8\xd2\xfe',
 
C2='"U\x10\x93+\xd8\x81+F\x10\x81vS\xb7\x9c\x86U,\x99\xae\xfb\x17\x7fL\xef\xa4\x85\n\x919\xdbinn',
 
C3='\xd7\xe0\x99\xa5\x9d\xa5Y\xe09`\x0fn,\x0e\xde\x94\xba\xda\x8c\xfe]\x8dO\xf2mh\xffY};?h\xf2\xb4',
 
C4='\xfb\xdd\x9c\xec\x02O\xcb\xdeC\x83\x99g\x04u\xaa\x89\x00\xd8\x0e\x0e\xd3\xd0\xc31-\x9dJ\xe7\x92A!Mt\xc3')]

cassandra-2.2, 2 nodes

http://git-wip-us.apache.org/repos/asf/cassandra.git git:cassandra-2.2
Fetching Cassandra updates...
Current cluster is now: select-failure
Failed to connect over JMX; not collecting these stats
[Row(key='0P37709P21', 
C0='\x85\x18/\xdf\xe7\xf8c\x06\xcdX\xce\x81\xaaS\xcc\xa2L\x198A\xd6\xae\x8a\x91djL0M\xd9\xf4x\x92\xf9',
 
C1='\x1f\\\x9f\x8c0\xa6,\xe4\x81^\x93m\xe8\x14QN\xa3>\xd4\xd8_\xa3?\x90"-X\xa5\xa0\xa4\x9b8\xd2\xfe',
 
C2='"U\x10\x93+\xd8\x81+F\x10\x81vS\xb7\x9c\x86U,\x99\xae\xfb\x17\x7fL\xef\xa4\x85\n\x919\xdbinn',
 
C3='\xd7\xe0\x99\xa5\x9d\xa5Y\xe09`\x0fn,\x0e\xde\x94\xba\xda\x8c\xfe]\x8dO\xf2mh\xffY};?h\xf2\xb4',
 
C4='\xfb\xdd\x9c\xec\x02O\xcb\xdeC\x83\x99g\x04u\xaa\x89\x00\xd8\x0e\x0e\xd3\xd0\xc31-\x9dJ\xe7\x92A!Mt\xc3')]

cassandra-3.0, 1 nodes

http://git-wip-us.apache.org/repos/asf/cassandra.git git:cassandra-3.0
Fetching Cassandra updates...
Cloning Cassandra (from local cache)
Checking out requested branch (cassandra-3.0)
Compiling Cassandra cassandra-3.0 ...
Current cluster is now: select-failure
Failed to connect over JMX; not collecting these stats
[Row(key='0P37709P21', 
C0='\x85\x18/\xdf\xe7\xf8c\x06\xcdX\xce\x81\xaaS\xcc\xa2L\x198A\xd6\xae\x8a\x91djL0M\xd9\xf4x\x92\xf9',
 
C1='\x1f\\\x9f\x8c0\xa6,\xe4\x81^\x93m\xe8\x14QN\xa3>\xd4\xd8_\xa3?\x90"-X\xa5\xa0\xa4\x9b8\xd2\xfe',
 
C2='"U\x10\x93+\xd8\x81+F\x10\x81vS\xb7\x9c\x86U,\x9

[jira] [Commented] (CASSANDRA-8671) Give compaction strategy more control over where sstables are created, including for flushing and streaming.

2015-08-19 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702962#comment-14702962
 ] 

Marcus Eriksson commented on CASSANDRA-8671:


Pushed a few updates to the branch here: 
https://github.com/krummas/cassandra/commits/blake/8671-2 - biggest thing is 
probably that I removed the CompactionAwareWriter interface and refactored the 
API a bit.

Is this still enough for you?

tests here:
http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-blake-8671-2-dtest/
http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-blake-8671-2-testall/

> Give compaction strategy more control over where sstables are created, 
> including for flushing and streaming.
> 
>
> Key: CASSANDRA-8671
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8671
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
> Fix For: 3.x
>
> Attachments: 
> 0001-C8671-creating-sstable-writers-for-flush-and-stream-.patch, 
> 8671-giving-compaction-strategies-more-control-over.txt
>
>
> This would enable routing different partitions to different disks based on 
> some user defined parameters.
> My initial take on how to do this would be to make an interface from 
> SSTableWriter, and have a table's compaction strategy do all SSTableWriter 
> instantiation. Compaction strategies could then implement their own 
> SSTableWriter implementations (which basically wrap one or more normal 
> sstablewriters) for compaction, flushing, and streaming. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8671) Give compaction strategy more control over where sstables are created, including for flushing and streaming.

2015-08-19 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702962#comment-14702962
 ] 

Marcus Eriksson edited comment on CASSANDRA-8671 at 8/19/15 1:00 PM:
-

Pushed a few updates to the branch here: 
https://github.com/krummas/cassandra/commits/blake/8671-2 - biggest thing is 
probably that I removed the CompactionAwareWriter interface and refactored the 
API a bit.

Is this still enough for you?

tests here:
http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-blake-8671-2-dtest/
http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-blake-8671-2-testall/

Edit: btw, the changes in CQLSSTableWriter - I guess they are because we now 
need a column family when writing? Could you add a comment?


was (Author: krummas):
Pushed a few updates to the branch here: 
https://github.com/krummas/cassandra/commits/blake/8671-2 - biggest thing is 
probably that I removed the CompactionAwareWriter interface and refactored the 
API a bit.

Is this still enough for you?

tests here:
http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-blake-8671-2-dtest/
http://cassci.datastax.com/view/Dev/view/krummas/job/krummas-blake-8671-2-testall/

> Give compaction strategy more control over where sstables are created, 
> including for flushing and streaming.
> 
>
> Key: CASSANDRA-8671
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8671
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
> Fix For: 3.x
>
> Attachments: 
> 0001-C8671-creating-sstable-writers-for-flush-and-stream-.patch, 
> 8671-giving-compaction-strategies-more-control-over.txt
>
>
> This would enable routing different partitions to different disks based on 
> some user defined parameters.
> My initial take on how to do this would be to make an interface from 
> SSTableWriter, and have a table's compaction strategy do all SSTableWriter 
> instantiation. Compaction strategies could then implement their own 
> SSTableWriter implementations (which basically wrap one or more normal 
> sstablewriters) for compaction, flushing, and streaming. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10115) Windows utest 2.2: LeveledCompactionStrategyTest.testGrouperLevels flaky

2015-08-19 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702978#comment-14702978
 ] 

Marcus Eriksson commented on CASSANDRA-10115:
-

+1

> Windows utest 2.2: LeveledCompactionStrategyTest.testGrouperLevels flaky
> 
>
> Key: CASSANDRA-10115
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10115
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Joshua McKenzie
>Assignee: Joshua McKenzie
>  Labels: Windows
> Fix For: 2.2.x
>
> Attachments: 10115_v1.txt
>
>
> {noformat}
> junit.framework.AssertionFailedError
>   at 
> org.apache.cassandra.db.compaction.LeveledCompactionStrategyTest.testGrouperLevels(LeveledCompactionStrategyTest.java:131)
> {noformat}
> [Test is flaky on 
> Windows|http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_utest_win32/lastCompletedBuild/testReport/org.apache.cassandra.db.compaction/LeveledCompactionStrategyTest/testGrouperLevels/history/]
> [Test is consistent on 
> linux|http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_utest/lastCompletedBuild/testReport/org.apache.cassandra.db.compaction/LeveledCompactionStrategyTest/testGrouperLevels/history/]
> Doesn't repro locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9882) DTCS (maybe other strategies) can block flushing when there are lots of sstables

2015-08-19 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703088#comment-14703088
 ] 

Marcus Eriksson commented on CASSANDRA-9882:


bq. Should expired_sstable_check_frequency_seconds be added to cqlsh completion?
I intentionally left it out as it is something users should never tweak, no 
strong feelings though, so if you want it in, I can add it

> DTCS (maybe other strategies) can block flushing when there are lots of 
> sstables
> 
>
> Key: CASSANDRA-9882
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9882
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeremiah Jordan
>Assignee: Marcus Eriksson
>  Labels: dtcs
> Fix For: 2.1.9, 2.0.17, 2.2.1, 3.0 beta 1
>
>
> MemtableFlushWriter tasks can get blocked by Compaction 
> getNextBackgroundTask.  This is in a wonky cluster with 200k sstables in the 
> CF, but seems bad for flushing to be blocked by getNextBackgroundTask when we 
> are trying to make these new "smart" strategies that may take some time to 
> calculate what to do.
> {noformat}
> "MemtableFlushWriter:21" daemon prio=10 tid=0x7ff7ad965000 nid=0x6693 
> waiting for monitor entry [0x7ff78a667000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:237)
>   - waiting to lock <0x0006fcdbbf60> (a 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
>   at org.apache.cassandra.db.DataTracker.notifyAdded(DataTracker.java:518)
>   at 
> org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:178)
>   at 
> org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1475)
>   at 
> org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:336)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1127)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - <0x000743b3ac38> (a 
> java.util.concurrent.ThreadPoolExecutor$Worker)
> "MemtableFlushWriter:19" daemon prio=10 tid=0x7ff7ac57a000 nid=0x649b 
> waiting for monitor entry [0x7ff78b8ee000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy.handleNotification(WrappingCompactionStrategy.java:237)
>   - waiting to lock <0x0006fcdbbf60> (a 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy)
>   at org.apache.cassandra.db.DataTracker.notifyAdded(DataTracker.java:518)
>   at 
> org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:178)
>   at 
> org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1475)
>   at 
> org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:336)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1127)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> "CompactionExecutor:14" daemon prio=10 tid=0x7ff7ad359800 nid=0x4d59 
> runnable [0x7fecce3ea000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.cassandra.io.sstable.SSTableReader.equals(SSTableReader.java:628)
>   at 
> com.google.common.collect.ImmutableSet.construct(ImmutableSet.java:206)
>   at 
> com.google.common.collect.ImmutableSet.construct(ImmutableSet.java:220)
>   at 
> com.google.common.collect.ImmutableSet.access$000(ImmutableSet.java:74)
>   at 
> com.google.common.collect.ImmutableSet$Builder.build(ImmutableSet.java:531)
>   at com.goo

[jira] [Updated] (CASSANDRA-9623) Added column does not sort as the last column

2015-08-20 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-9623:
---
Fix Version/s: (was: 2.0.x)

> Added column does not sort as the last column
> -
>
> Key: CASSANDRA-9623
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9623
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcin Pietraszek
>Assignee: Marcus Eriksson
> Attachments: cassandra_log.txt
>
>
> After adding new machines to existing cluster running cleanup one of the 
> tables ends with:
> {noformat}
> ERROR [CompactionExecutor:1015] 2015-06-19 11:24:05,038 CassandraDaemon.java 
> (line 199) Exception in thread Thread[CompactionExecutor:1015,1,main]
> java.lang.AssertionError: Added column does not sort as the last column
> at 
> org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:116)
> at 
> org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:121)
> at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:155)
> at 
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:186)
> at 
> org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:98)
> at 
> org.apache.cassandra.db.compaction.PrecompactedRow.(PrecompactedRow.java:85)
> at 
> org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:196)
> at 
> org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:74)
> at 
> org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:55)
> at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115)
> at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98)
> at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:161)
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
> at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
> at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We're using patched 2.0.13-190ef4f



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-9623) Added column does not sort as the last column

2015-08-20 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson resolved CASSANDRA-9623.

Resolution: Duplicate

Looks like the exceptions are not during cleanup, they are happening with 
regular compaction, we have fixed a few issues regarding overlap with LCS since 
2.0.13, so I would recommend that you upgrade to latest 2.0 to make sure this 
is actually a new bug.

If it keeps happening, please reopen this ticket

> Added column does not sort as the last column
> -
>
> Key: CASSANDRA-9623
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9623
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcin Pietraszek
>Assignee: Marcus Eriksson
> Fix For: 2.0.x
>
> Attachments: cassandra_log.txt
>
>
> After adding new machines to existing cluster running cleanup one of the 
> tables ends with:
> {noformat}
> ERROR [CompactionExecutor:1015] 2015-06-19 11:24:05,038 CassandraDaemon.java 
> (line 199) Exception in thread Thread[CompactionExecutor:1015,1,main]
> java.lang.AssertionError: Added column does not sort as the last column
> at 
> org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:116)
> at 
> org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:121)
> at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:155)
> at 
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:186)
> at 
> org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:98)
> at 
> org.apache.cassandra.db.compaction.PrecompactedRow.(PrecompactedRow.java:85)
> at 
> org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:196)
> at 
> org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:74)
> at 
> org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:55)
> at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115)
> at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98)
> at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:161)
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
> at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
> at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We're using patched 2.0.13-190ef4f



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8671) Give compaction strategy more control over where sstables are created, including for flushing and streaming.

2015-08-20 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704729#comment-14704729
 ] 

Marcus Eriksson commented on CASSANDRA-8671:


This is basically ready to commit now, just 2 comments

* A bit worried about the CQLSSTableWriter change to create the cfs if it does 
not exist - could we instead create a default SSTableMultiWriter for the 
SSTableTxnWriter and avoid creating the CFS instance? (something like this: 
https://github.com/krummas/cassandra/commit/acb133e99d464aba73f14a405e9ca7115fd24500)?
* setInitialDirectories in ColumnFamilyStore is unused - is it needed?

> Give compaction strategy more control over where sstables are created, 
> including for flushing and streaming.
> 
>
> Key: CASSANDRA-8671
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8671
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
> Fix For: 3.x
>
> Attachments: 
> 0001-C8671-creating-sstable-writers-for-flush-and-stream-.patch, 
> 8671-giving-compaction-strategies-more-control-over.txt
>
>
> This would enable routing different partitions to different disks based on 
> some user defined parameters.
> My initial take on how to do this would be to make an interface from 
> SSTableWriter, and have a table's compaction strategy do all SSTableWriter 
> instantiation. Compaction strategies could then implement their own 
> SSTableWriter implementations (which basically wrap one or more normal 
> sstablewriters) for compaction, flushing, and streaming. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8199) CQL Spec needs to be updated with DateTieredCompactionStrategy

2015-08-20 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-8199:
---
Reviewer: Yuki Morishita  (was: Michaël Figuière)

> CQL Spec needs to be updated with DateTieredCompactionStrategy
> --
>
> Key: CASSANDRA-8199
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8199
> Project: Cassandra
>  Issue Type: Task
>Reporter: Michaël Figuière
>Assignee: Marcus Eriksson
>Priority: Minor
>  Labels: dtcs
> Attachments: 0001-update-docs.patch
>
>
> The {{CREATE TABLE}} section of the CQL Specification isn't up to date for 
> the latest {{DateTieredCompactionStrategy}} that has been added in 2.0.11 and 
> 2.1.1. We need to cover all its options just like it's done for the other 
> strategies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9142) DC Local repair or -hosts should only be allowed with -full repair

2015-08-20 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704766#comment-14704766
 ] 

Marcus Eriksson commented on CASSANDRA-9142:


ping on this [~kohlisankalp] - do you have time to review or should I find 
someone else?

> DC Local repair or -hosts should only be allowed with -full repair
> --
>
> Key: CASSANDRA-9142
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9142
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: sankalp kohli
>Assignee: Marcus Eriksson
>Priority: Minor
> Fix For: 2.2.x
>
> Attachments: trunk_9142.txt
>
>
> We should not let users mix incremental repair with dc local repair or -host 
> or any repair which does not include all replicas. 
> This will currently cause stables on some replicas to be marked as repaired. 
> The next incremental repair will not work on same set of data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9045) Deleted columns are resurrected after repair in wide rows

2015-08-20 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704767#comment-14704767
 ] 

Marcus Eriksson commented on CASSANDRA-9045:


Ping [~r0mant] - any updates? Is this still happening?

> Deleted columns are resurrected after repair in wide rows
> -
>
> Key: CASSANDRA-9045
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9045
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Roman Tkachenko
>Assignee: Marcus Eriksson
>Priority: Critical
> Fix For: 2.0.x
>
> Attachments: 9045-debug-tracing.txt, another.txt, 
> apache-cassandra-2.0.13-SNAPSHOT.jar, cqlsh.txt, debug.txt, inconsistency.txt
>
>
> Hey guys,
> After almost a week of researching the issue and trying out multiple things 
> with (almost) no luck I was suggested (on the user@cass list) to file a 
> report here.
> h5. Setup
> Cassandra 2.0.13 (we had the issue with 2.0.10 as well and upgraded to see if 
> it goes away)
> Multi datacenter 12+6 nodes cluster.
> h5. Schema
> {code}
> cqlsh> describe keyspace blackbook;
> CREATE KEYSPACE blackbook WITH replication = {
>   'class': 'NetworkTopologyStrategy',
>   'IAD': '3',
>   'ORD': '3'
> };
> USE blackbook;
> CREATE TABLE bounces (
>   domainid text,
>   address text,
>   message text,
>   "timestamp" bigint,
>   PRIMARY KEY (domainid, address)
> ) WITH
>   bloom_filter_fp_chance=0.10 AND
>   caching='KEYS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.10 AND
>   gc_grace_seconds=864000 AND
>   index_interval=128 AND
>   read_repair_chance=0.00 AND
>   populate_io_cache_on_flush='false' AND
>   default_time_to_live=0 AND
>   speculative_retry='99.0PERCENTILE' AND
>   memtable_flush_period_in_ms=0 AND
>   compaction={'class': 'LeveledCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
> {code}
> h5. Use case
> Each row (defined by a domainid) can have many many columns (bounce entries) 
> so rows can get pretty wide. In practice, most of the rows are not that big 
> but some of them contain hundreds of thousands and even millions of columns.
> Columns are not TTL'ed but can be deleted using the following CQL3 statement:
> {code}
> delete from bounces where domainid = 'domain.com' and address = 
> 'al...@example.com';
> {code}
> All queries are performed using LOCAL_QUORUM CL.
> h5. Problem
> We weren't very diligent about running repairs on the cluster initially, but 
> shorty after we started doing it we noticed that some of previously deleted 
> columns (bounce entries) are there again, as if tombstones have disappeared.
> I have run this test multiple times via cqlsh, on the row of the customer who 
> originally reported the issue:
> * delete an entry
> * verify it's not returned even with CL=ALL
> * run repair on nodes that own this row's key
> * the columns reappear and are returned even with CL=ALL
> I tried the same test on another row with much less data and everything was 
> correctly deleted and didn't reappear after repair.
> h5. Other steps I've taken so far
> Made sure NTP is running on all servers and clocks are synchronized.
> Increased gc_grace_seconds to 100 days, ran full repair (on the affected 
> keyspace) on all nodes, then changed it back to the default 10 days again. 
> Didn't help.
> Performed one more test. Updated one of the resurrected columns, then deleted 
> it and ran repair again. This time the updated version of the column 
> reappeared.
> Finally, I noticed these log entries for the row in question:
> {code}
> INFO [ValidationExecutor:77] 2015-03-25 20:27:43,936 
> CompactionController.java (line 192) Compacting large row 
> blackbook/bounces:4ed558feba8a483733001d6a (279067683 bytes) incrementally
> {code}
> Figuring it may be related I bumped "in_memory_compaction_limit_in_mb" to 
> 512MB so the row fits into it, deleted the entry and ran repair once again. 
> The log entry for this row was gone and the columns didn't reappear.
> We have a lot of rows much larger than 512MB so can't increase this 
> parameters forever, if that is the issue.
> Please let me know if you need more information on the case or if I can run 
> more experiments.
> Thanks!
> Roman



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10138) Millions of compaction tasks on empty DB

2015-08-20 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704808#comment-14704808
 ] 

Marcus Eriksson commented on CASSANDRA-10138:
-

the many pending compaction tasks were fixed in CASSANDRA-9662

The other issues I cannot explain though, could you attach logs and more 
details about your nodes?

> Millions of compaction tasks on empty DB
> 
>
> Key: CASSANDRA-10138
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10138
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS 6.5 and Cassandra 2.1.8
>Reporter: A Markov
>
> Fresh installation of 2.1.8 Cassandra with no data in the database except 
> systems tables becomes unresponsive after about 5-10 minutes from the start.
> Initially problem was discovered on empty cluster of 12 nodes because of the 
> creation schema error - script was exiting by timeout giving an error. 
> Analysis of log files showed that nodes were constantly reported as DOWN and 
> then after some period of time UP. That was reported for multiple nodes.
> Verification of the system.log file showed that nodes constantly perform GC 
> and while doing that all cores of the system were 100% busy which caused node 
> disconnect after some time.
> Further analysis with nodetool (tpstats option) showed us that just after 10 
> minutes since clean node restart node completed more then 47M compaction 
> tasks and had more then 12M pending. Here is example of the output:
> nodetool tpstats
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> CounterMutationStage  0 0  0 0
>  0
> ReadStage 0 0  0 0
>  0
> RequestResponseStage  0 0  0 0
>  0
> MutationStage 0 0257 0
>  0
> ReadRepairStage   0 0  0 0
>  0
> GossipStage   0 0  0 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> MigrationStage0 0  0 0
>  0
> ValidationExecutor0 0  0 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableReclaimMemory 0 0  8 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> MiscStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> MemtableFlushWriter   0 0  8 0
>  0
> PendingRangeCalculator0 0  1 0
>  0
> MemtablePostFlush 0 0 44 0
>  0
> CompactionExecutor0  12996398   47578625 0
>  0
> AntiEntropySessions   0 0  0 0
>  0
> HintedHandoff 0 1  2 0
>  0
> I am repeating myself but that was on TOTALLY EMPTY DB after 10 minutes since 
> cassandra was started.
> I was able to repeateadly reproduce same issue and behaviour with single 
> cassandra instance. Issue was persistent after I did full cassandra wipe out 
> and reinstall from repository.
> I discovered that issue dissipaters if I execute
> nodetool disableautocompaction
> in that case system quickly (in a matter of 20-30 seconds) goes though all 
> pending tasks and becomes idle. If I enable autocompaction again in about 1 
> minute it jumps to millions of pending tasks again.
> I verified it on the save server with version of Cassandra 2.1.6 and issue 
> was not present.
> logs file do not show any ERROR messages. There were only warnings about GC 
> events that were taking too long.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9623) Added column does not sort as the last column

2015-08-20 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704821#comment-14704821
 ] 

Marcus Eriksson commented on CASSANDRA-9623:


probably not, so either this is a duplicate of some LCS ticket or CASSANDRA-9450

> Added column does not sort as the last column
> -
>
> Key: CASSANDRA-9623
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9623
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcin Pietraszek
>Assignee: Marcus Eriksson
> Attachments: cassandra_log.txt
>
>
> After adding new machines to existing cluster running cleanup one of the 
> tables ends with:
> {noformat}
> ERROR [CompactionExecutor:1015] 2015-06-19 11:24:05,038 CassandraDaemon.java 
> (line 199) Exception in thread Thread[CompactionExecutor:1015,1,main]
> java.lang.AssertionError: Added column does not sort as the last column
> at 
> org.apache.cassandra.db.ArrayBackedSortedColumns.addColumn(ArrayBackedSortedColumns.java:116)
> at 
> org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:121)
> at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:155)
> at 
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:186)
> at 
> org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:98)
> at 
> org.apache.cassandra.db.compaction.PrecompactedRow.(PrecompactedRow.java:85)
> at 
> org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:196)
> at 
> org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:74)
> at 
> org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:55)
> at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115)
> at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98)
> at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:161)
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
> at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
> at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> We're using patched 2.0.13-190ef4f



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9126) java.lang.RuntimeException: Last written key DecoratedKey >= current key DecoratedKey

2015-08-22 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708010#comment-14708010
 ] 

Marcus Eriksson commented on CASSANDRA-9126:


[~saprykin] could you post full logs (ie, from restart until the exception 
occurs?) and also a nodetool cfstats would be good for the affected column 
family

[~dkblinux98] in 2.0.9 it is most likely a LCS overlap issue - upgrade to 
latest 2.0.x to fix

> java.lang.RuntimeException: Last written key DecoratedKey >= current key 
> DecoratedKey
> -
>
> Key: CASSANDRA-9126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: srinivasu gottipati
>Priority: Critical
> Fix For: 2.0.x
>
>
> Cassandra V: 2.0.14,
> Getting the following exceptions while trying to compact (I see this issue 
> was raised in earlier versions and marked as closed. However it still appears 
> in 2.0.14). In our case, compaction is not getting succeeded and keep failing 
> with this error.:
> {code}java.lang.RuntimeException: Last written key 
> DecoratedKey(3462767860784856708, 
> 354038323137333038305f3330325f31355f474d4543454f) >= current key 
> DecoratedKey(3462334604624154281, 
> 354036333036353334315f3336315f31355f474d4543454f) writing into {code}
> ...
> Stacktrace:{code}
>   at 
> org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:143)
>   at 
> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:166)
>   at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:167)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
>   at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
>   at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745){code}
> Any help is greatly appreciated



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9126) java.lang.RuntimeException: Last written key DecoratedKey >= current key DecoratedKey

2015-08-22 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708016#comment-14708016
 ] 

Marcus Eriksson commented on CASSANDRA-9126:


[~saprykin] could you post your schema as well?

> java.lang.RuntimeException: Last written key DecoratedKey >= current key 
> DecoratedKey
> -
>
> Key: CASSANDRA-9126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: srinivasu gottipati
>Priority: Critical
> Fix For: 2.0.x
>
>
> Cassandra V: 2.0.14,
> Getting the following exceptions while trying to compact (I see this issue 
> was raised in earlier versions and marked as closed. However it still appears 
> in 2.0.14). In our case, compaction is not getting succeeded and keep failing 
> with this error.:
> {code}java.lang.RuntimeException: Last written key 
> DecoratedKey(3462767860784856708, 
> 354038323137333038305f3330325f31355f474d4543454f) >= current key 
> DecoratedKey(3462334604624154281, 
> 354036333036353334315f3336315f31355f474d4543454f) writing into {code}
> ...
> Stacktrace:{code}
>   at 
> org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:143)
>   at 
> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:166)
>   at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:167)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
>   at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
>   at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745){code}
> Any help is greatly appreciated



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9126) java.lang.RuntimeException: Last written key DecoratedKey >= current key DecoratedKey

2015-08-22 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708020#comment-14708020
 ] 

Marcus Eriksson commented on CASSANDRA-9126:


that log makes no sense - did you remove the INFO lines?

> java.lang.RuntimeException: Last written key DecoratedKey >= current key 
> DecoratedKey
> -
>
> Key: CASSANDRA-9126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: srinivasu gottipati
>Priority: Critical
> Fix For: 2.0.x
>
> Attachments: cassandra-system.log
>
>
> Cassandra V: 2.0.14,
> Getting the following exceptions while trying to compact (I see this issue 
> was raised in earlier versions and marked as closed. However it still appears 
> in 2.0.14). In our case, compaction is not getting succeeded and keep failing 
> with this error.:
> {code}java.lang.RuntimeException: Last written key 
> DecoratedKey(3462767860784856708, 
> 354038323137333038305f3330325f31355f474d4543454f) >= current key 
> DecoratedKey(3462334604624154281, 
> 354036333036353334315f3336315f31355f474d4543454f) writing into {code}
> ...
> Stacktrace:{code}
>   at 
> org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:143)
>   at 
> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:166)
>   at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:167)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
>   at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
>   at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745){code}
> Any help is greatly appreciated



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9126) java.lang.RuntimeException: Last written key DecoratedKey >= current key DecoratedKey

2015-08-22 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708025#comment-14708025
 ] 

Marcus Eriksson commented on CASSANDRA-9126:


yes, that would be helpful

> java.lang.RuntimeException: Last written key DecoratedKey >= current key 
> DecoratedKey
> -
>
> Key: CASSANDRA-9126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: srinivasu gottipati
>Priority: Critical
> Fix For: 2.0.x
>
> Attachments: cassandra-system.log
>
>
> Cassandra V: 2.0.14,
> Getting the following exceptions while trying to compact (I see this issue 
> was raised in earlier versions and marked as closed. However it still appears 
> in 2.0.14). In our case, compaction is not getting succeeded and keep failing 
> with this error.:
> {code}java.lang.RuntimeException: Last written key 
> DecoratedKey(3462767860784856708, 
> 354038323137333038305f3330325f31355f474d4543454f) >= current key 
> DecoratedKey(3462334604624154281, 
> 354036333036353334315f3336315f31355f474d4543454f) writing into {code}
> ...
> Stacktrace:{code}
>   at 
> org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:143)
>   at 
> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:166)
>   at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:167)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
>   at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
>   at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745){code}
> Any help is greatly appreciated



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10132) sstablerepairedset throws exception while loading metadata

2015-08-24 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14709227#comment-14709227
 ] 

Marcus Eriksson commented on CASSANDRA-10132:
-

+1

> sstablerepairedset throws exception while loading metadata
> --
>
> Key: CASSANDRA-10132
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10132
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Yuki Morishita
>Assignee: Yuki Morishita
> Fix For: 3.0.0 rc1
>
>
> {{sstablerepairedset}} displays exception trying to load schema through 
> DatabaseDescriptor.
> {code}
> $ ./tools/bin/sstablerepairedset --really-set --is-repaired 
> ~/.ccm/3.0/node1/data/keyspace1/standard1-2c0b226046aa11e596f58106a0d438e8/ma-1-big-Data.db
> 14:42:36.714 [main] DEBUG o.a.c.i.s.m.MetadataSerializer - Mutating 
> /home/yuki/.ccm/3.0/node1/data/keyspace1/standard1-2c0b226046aa11e596f58106a0d438e8/ma-1-big-Statistics.db
>  to repairedAt time 1440013248000
> 14:42:36.721 [main] DEBUG o.a.c.i.s.m.MetadataSerializer - Load metadata for 
> /home/yuki/.ccm/3.0/node1/data/keyspace1/standard1-2c0b226046aa11e596f58106a0d438e8/ma-1-big
> Exception in thread "main" java.lang.ExceptionInInitializerError
> at 
> org.apache.cassandra.config.DatabaseDescriptor.loadConfig(DatabaseDescriptor.java:123)
> at 
> org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:110)
> at 
> org.apache.cassandra.utils.memory.BufferPool.(BufferPool.java:51)
> at 
> org.apache.cassandra.io.util.RandomAccessReader.allocateBuffer(RandomAccessReader.java:76)
> at 
> org.apache.cassandra.io.util.RandomAccessReader.(RandomAccessReader.java:58)
> at 
> org.apache.cassandra.io.util.RandomAccessReader$RandomAccessReaderWithChannel.(RandomAccessReader.java:89)
> at 
> org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:108)
> at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:91)
> at 
> org.apache.cassandra.io.sstable.metadata.MetadataSerializer.mutateRepairedAt(MetadataSerializer.java:143)
> at 
> org.apache.cassandra.tools.SSTableRepairedAtSetter.main(SSTableRepairedAtSetter.java:86)
> Caused by: org.apache.cassandra.exceptions.ConfigurationException: Expecting 
> URI in variable: [cassandra.config]. Found[cassandra.yaml]. Please prefix the 
> file with [file:///] for local files and [file:///] for remote files. 
> If you are executing this from an external tool, it needs to set 
> Config.setClientMode(true) to avoid loading configuration.
> at 
> org.apache.cassandra.config.YamlConfigurationLoader.getStorageConfigURL(YamlConfigurationLoader.java:78)
> at 
> org.apache.cassandra.config.YamlConfigurationLoader.(YamlConfigurationLoader.java:92)
> ... 10 more
> {code}
> MetadataSerializer uses RandomAccessReader which allocates buffer through 
> BufferPool. BufferPool gets its settings from DatabaseDescriptor and it won't 
> work in offline tool.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-9577) Cassandra not performing GC on stale SStables after compaction

2015-08-25 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson resolved CASSANDRA-9577.

Resolution: Fixed

we are closing out 2.0, if this is still an issue in 2.1+, please reopen

> Cassandra not performing GC on stale SStables after compaction
> --
>
> Key: CASSANDRA-9577
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9577
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: 2.0.12.200 / DSE 4.6.1.
>Reporter: Jeff Ferland
>Assignee: Marcus Eriksson
>
>   Space used (live), bytes:   878681716067
>   Space used (total), bytes: 2227857083852
> jbf@ip-10-0-2-98:/ebs/cassandra/data/trends/trends$ sudo lsof *-Data.db 
> COMMAND  PID  USER   FD   TYPE DEVICE SIZE/OFF  NODE NAME
> java4473 cassandra  446r   REG   0,26  17582559172 39241 
> trends-trends-jb-144864-Data.db
> java4473 cassandra  448r   REG   0,26 62040962 37431 
> trends-trends-jb-144731-Data.db
> java4473 cassandra  449r   REG   0,26 829935047545 21150 
> trends-trends-jb-143581-Data.db
> java4473 cassandra  452r   REG   0,26  8980406 39503 
> trends-trends-jb-144882-Data.db
> java4473 cassandra  454r   REG   0,26  8980406 39503 
> trends-trends-jb-144882-Data.db
> java4473 cassandra  462r   REG   0,26  9487703 39542 
> trends-trends-jb-144883-Data.db
> java4473 cassandra  463r   REG   0,26 36158226 39629 
> trends-trends-jb-144889-Data.db
> java4473 cassandra  468r   REG   0,26105693505 39447 
> trends-trends-jb-144881-Data.db
> java4473 cassandra  530r   REG   0,26  17582559172 39241 
> trends-trends-jb-144864-Data.db
> java4473 cassandra  535r   REG   0,26105693505 39447 
> trends-trends-jb-144881-Data.db
> java4473 cassandra  542r   REG   0,26  9487703 39542 
> trends-trends-jb-144883-Data.db
> java4473 cassandra  553u   REG   0,26   6431729821 39556 
> trends-trends-tmp-jb-144884-Data.db
> jbf@ip-10-0-2-98:/ebs/cassandra/data/trends/trends$ ls *-Data.db
> trends-trends-jb-142631-Data.db  trends-trends-jb-143562-Data.db  
> trends-trends-jb-143581-Data.db  trends-trends-jb-144731-Data.db  
> trends-trends-jb-144883-Data.db
> trends-trends-jb-142633-Data.db  trends-trends-jb-143563-Data.db  
> trends-trends-jb-144530-Data.db  trends-trends-jb-144864-Data.db  
> trends-trends-jb-144889-Data.db
> trends-trends-jb-143026-Data.db  trends-trends-jb-143564-Data.db  
> trends-trends-jb-144551-Data.db  trends-trends-jb-144881-Data.db  
> trends-trends-tmp-jb-144884-Data.db
> trends-trends-jb-143533-Data.db  trends-trends-jb-143578-Data.db  
> trends-trends-jb-144552-Data.db  trends-trends-jb-144882-Data.db
> jbf@ip-10-0-2-98:/ebs/cassandra/data/trends/trends$ cd -
> /mnt/cassandra/data/trends/trends
> jbf@ip-10-0-2-98:/mnt/cassandra/data/trends/trends$ sudo lsof * 
> jbf@ip-10-0-2-98:/mnt/cassandra/data/trends/trends$ ls *-Data.db
> trends-trends-jb-124502-Data.db  trends-trends-jb-141113-Data.db  
> trends-trends-jb-141377-Data.db  trends-trends-jb-141846-Data.db  
> trends-trends-jb-144890-Data.db
> trends-trends-jb-125457-Data.db  trends-trends-jb-141123-Data.db  
> trends-trends-jb-141391-Data.db  trends-trends-jb-141871-Data.db  
> trends-trends-jb-41121-Data.db
> trends-trends-jb-130016-Data.db  trends-trends-jb-141137-Data.db  
> trends-trends-jb-141538-Data.db  trends-trends-jb-141883-Data.db  
> trends-trends.trends_date_idx-jb-2100-Data.db
> trends-trends-jb-139563-Data.db  trends-trends-jb-141358-Data.db  
> trends-trends-jb-141806-Data.db  trends-trends-jb-142033-Data.db
> trends-trends-jb-141102-Data.db  trends-trends-jb-141363-Data.db  
> trends-trends-jb-141829-Data.db  trends-trends-jb-144553-Data.db
> Compaction started  INFO [CompactionExecutor:6661] 2015-06-05 14:02:36,515 
> CompactionTask.java (line 120) Compacting 
> [SSTableReader(path='/mnt/cassandra/data/trends/trends/trends-trends-jb-124502-Data.db'),
>  
> SSTableReader(path='/mnt/cassandra/data/trends/trends/trends-trends-jb-141358-Data.db'),
>  
> SSTableReader(path='/mnt/cassandra/data/trends/trends/trends-trends-jb-141883-Data.db'),
>  
> SSTableReader(path='/mnt/cassandra/data/trends/trends/trends-trends-jb-141846-Data.db'),
>  
> SSTableReader(path='/mnt/cassandra/data/trends/trends/trends-trends-jb-141871-Data.db'),
>  
> SSTableReader(path='/mnt/cassandra/data/trends/trends/trends-trends-jb-141391-Data.db'),
>  
> SSTableReader(path='/mnt/cassandra/data/trends/trends/trends-trends-jb-139563-Data.db'),
>  
> SSTableReader(path='/mnt/cassandra/data/trends/trends/trends-trends-jb-125457-Data.db'),
>  
> SSTableReader(path='/mnt/cassandra/data/trends/trends/trends-trends-jb-141806-Data.db'),
>  
> SSTableReader(path='/mnt/cassandra/data/trends/trends/trend

[jira] [Commented] (CASSANDRA-10057) RepairMessageVerbHandler.java:95 - Cannot start multiple repair sessions over the same sstables

2015-08-25 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711136#comment-14711136
 ] 

Marcus Eriksson commented on CASSANDRA-10057:
-

Can't we send the failure response in the existing catch block? That would make 
sure we fail properly during all repair messages

> RepairMessageVerbHandler.java:95 - Cannot start multiple repair sessions over 
> the same sstables
> ---
>
> Key: CASSANDRA-10057
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10057
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Amazon Linux: 3.14.48-33.39.amzn1.x86_64
> java version "1.7.0_85"
> OpenJDK Runtime Environment (amzn-2.6.1.3.61.amzn1-x86_64 u85-b01)
> OpenJDK 64-Bit Server VM (build 24.85-b03, mixed mode)
> Cassandra RPM: cassandra22-2.2.0-1.noarch
>Reporter: Victor Trac
>Assignee: Yuki Morishita
> Fix For: 2.2.x
>
>
> I bootstrapped a DC2 by restoring the snapshots from DC1 into equivalent 
> nodes in DC2. Everything comes up just fine, but when I tried to run a 
> {code}repair -dcpar -j4{code} in DC2, I got this error:
> {code}
> [root@cassandra-i-2677cce3 ~]# nodetool repair -dcpar -j4
> [2015-08-12 15:56:05,682] Nothing to repair for keyspace 'system_auth'
> [2015-08-12 15:56:05,949] Starting repair command #4, repairing keyspace 
> crawl with repair options (parallelism: dc_parallel, primary range: false, 
> incremental: true, job threads: 4, ColumnFamilies: [], dataCenters: [], 
> hosts: [], # of ranges: 2275)
> [2015-08-12 15:59:33,050] Repair session 1b8d7810-410b-11e5-b71c-71288cf05b1d 
> for range (-1630840392403060839,-1622173360499444177] finished (progress: 0%)
> [2015-08-12 15:59:33,284] Repair session 1b92a830-410b-11e5-b71c-71288cf05b1d 
> for range (-2766833977081486018,-2766120936176524808] failed with error Could 
> not create snapshot at /10.20.144.15 (progress: 0%)
> [2015-08-12 15:59:35,543] Repair session 1b8fe910-410b-11e5-b71c-71288cf05b1d 
> for range (5127720400742928658,5138864412691114632] finished (progress: 0%)
> [2015-08-12 15:59:36,040] Repair session 1b960390-410b-11e5-b71c-71288cf05b1d 
> for range (749871306972906628,751065038788146229] failed with error Could not 
> create snapshot at /10.20.144.15 (progress: 0%)
> [2015-08-12 15:59:36,454] Repair session 1b9455e0-410b-11e5-b71c-71288cf05b1d 
> for range (-8769666365699147423,-8767955202550789015] finished (progress: 0%)
> [2015-08-12 15:59:38,765] Repair session 1b97b140-410b-11e5-b71c-71288cf05b1d 
> for range (-4434580467371714601,-4433394767535421669] finished (progress: 0%)
> [2015-08-12 15:59:41,520] Repair session 1b99d420-410b-11e5-b71c-71288cf05b1d 
> for range (-1085112943862424751,-1083156277882030877] finished (progress: 0%)
> [2015-08-12 15:59:43,806] Repair session 1b9da4b0-410b-11e5-b71c-71288cf05b1d 
> for range (2125359121242932804,2126816999370470831] failed with error Could 
> not create snapshot at /10.20.144.15 (progress: 0%)
> [2015-08-12 15:59:43,874] Repair session 1b9ba8e0-410b-11e5-b71c-71288cf05b1d 
> for range (-7469857353178912795,-7459624955099554284] finished (progress: 0%)
> [2015-08-12 15:59:48,384] Repair session 1b9fa080-410b-11e5-b71c-71288cf05b1d 
> for range (-8005238987831093686,-8005057803798566519] finished (progress: 0%)
> [2015-08-12 15:59:48,392] Repair session 1ba17540-410b-11e5-b71c-71288cf05b1d 
> for range (7291056720707652994,7292508243124389877] failed with error Could 
> not create snapshot at /10.20.144.15 (progress: 0%)
> {code}
> It seems like now that all 4 threads ran into an error, the repair process 
> just sits forever.
> Looking at 10.20.144.15, I see this:
> {code}
> ERROR [AntiEntropyStage:2] 2015-08-12 15:59:35,965 
> RepairMessageVerbHandler.java:95 - Cannot start multiple repair sessions over 
> the same sstables
> ERROR [AntiEntropyStage:2] 2015-08-12 15:59:35,966 
> RepairMessageVerbHandler.java:153 - Got error, removing parent repair session
> ERROR [AntiEntropyStage:2] 2015-08-12 15:59:35,966 CassandraDaemon.java:182 - 
> Exception in thread Thread[AntiEntropyStage:2,5,main]
> java.lang.RuntimeException: java.lang.RuntimeException: Cannot start multiple 
> repair sessions over the same sstables
> at 
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:156)
>  ~[apache-cassandra-2.2.0.jar:2.2.0]
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) 
> ~[apache-cassandra-2.2.0.jar:2.2.0]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[na:1.7.0_85]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  ~[na:1.7.0_85]
> at java

[jira] [Commented] (CASSANDRA-10057) RepairMessageVerbHandler.java:95 - Cannot start multiple repair sessions over the same sstables

2015-08-31 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723092#comment-14723092
 ] 

Marcus Eriksson commented on CASSANDRA-10057:
-

ok, +1 on the current patch then

> RepairMessageVerbHandler.java:95 - Cannot start multiple repair sessions over 
> the same sstables
> ---
>
> Key: CASSANDRA-10057
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10057
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Amazon Linux: 3.14.48-33.39.amzn1.x86_64
> java version "1.7.0_85"
> OpenJDK Runtime Environment (amzn-2.6.1.3.61.amzn1-x86_64 u85-b01)
> OpenJDK 64-Bit Server VM (build 24.85-b03, mixed mode)
> Cassandra RPM: cassandra22-2.2.0-1.noarch
>Reporter: Victor Trac
>Assignee: Yuki Morishita
> Fix For: 2.2.x
>
>
> I bootstrapped a DC2 by restoring the snapshots from DC1 into equivalent 
> nodes in DC2. Everything comes up just fine, but when I tried to run a 
> {code}repair -dcpar -j4{code} in DC2, I got this error:
> {code}
> [root@cassandra-i-2677cce3 ~]# nodetool repair -dcpar -j4
> [2015-08-12 15:56:05,682] Nothing to repair for keyspace 'system_auth'
> [2015-08-12 15:56:05,949] Starting repair command #4, repairing keyspace 
> crawl with repair options (parallelism: dc_parallel, primary range: false, 
> incremental: true, job threads: 4, ColumnFamilies: [], dataCenters: [], 
> hosts: [], # of ranges: 2275)
> [2015-08-12 15:59:33,050] Repair session 1b8d7810-410b-11e5-b71c-71288cf05b1d 
> for range (-1630840392403060839,-1622173360499444177] finished (progress: 0%)
> [2015-08-12 15:59:33,284] Repair session 1b92a830-410b-11e5-b71c-71288cf05b1d 
> for range (-2766833977081486018,-2766120936176524808] failed with error Could 
> not create snapshot at /10.20.144.15 (progress: 0%)
> [2015-08-12 15:59:35,543] Repair session 1b8fe910-410b-11e5-b71c-71288cf05b1d 
> for range (5127720400742928658,5138864412691114632] finished (progress: 0%)
> [2015-08-12 15:59:36,040] Repair session 1b960390-410b-11e5-b71c-71288cf05b1d 
> for range (749871306972906628,751065038788146229] failed with error Could not 
> create snapshot at /10.20.144.15 (progress: 0%)
> [2015-08-12 15:59:36,454] Repair session 1b9455e0-410b-11e5-b71c-71288cf05b1d 
> for range (-8769666365699147423,-8767955202550789015] finished (progress: 0%)
> [2015-08-12 15:59:38,765] Repair session 1b97b140-410b-11e5-b71c-71288cf05b1d 
> for range (-4434580467371714601,-4433394767535421669] finished (progress: 0%)
> [2015-08-12 15:59:41,520] Repair session 1b99d420-410b-11e5-b71c-71288cf05b1d 
> for range (-1085112943862424751,-1083156277882030877] finished (progress: 0%)
> [2015-08-12 15:59:43,806] Repair session 1b9da4b0-410b-11e5-b71c-71288cf05b1d 
> for range (2125359121242932804,2126816999370470831] failed with error Could 
> not create snapshot at /10.20.144.15 (progress: 0%)
> [2015-08-12 15:59:43,874] Repair session 1b9ba8e0-410b-11e5-b71c-71288cf05b1d 
> for range (-7469857353178912795,-7459624955099554284] finished (progress: 0%)
> [2015-08-12 15:59:48,384] Repair session 1b9fa080-410b-11e5-b71c-71288cf05b1d 
> for range (-8005238987831093686,-8005057803798566519] finished (progress: 0%)
> [2015-08-12 15:59:48,392] Repair session 1ba17540-410b-11e5-b71c-71288cf05b1d 
> for range (7291056720707652994,7292508243124389877] failed with error Could 
> not create snapshot at /10.20.144.15 (progress: 0%)
> {code}
> It seems like now that all 4 threads ran into an error, the repair process 
> just sits forever.
> Looking at 10.20.144.15, I see this:
> {code}
> ERROR [AntiEntropyStage:2] 2015-08-12 15:59:35,965 
> RepairMessageVerbHandler.java:95 - Cannot start multiple repair sessions over 
> the same sstables
> ERROR [AntiEntropyStage:2] 2015-08-12 15:59:35,966 
> RepairMessageVerbHandler.java:153 - Got error, removing parent repair session
> ERROR [AntiEntropyStage:2] 2015-08-12 15:59:35,966 CassandraDaemon.java:182 - 
> Exception in thread Thread[AntiEntropyStage:2,5,main]
> java.lang.RuntimeException: java.lang.RuntimeException: Cannot start multiple 
> repair sessions over the same sstables
> at 
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:156)
>  ~[apache-cassandra-2.2.0.jar:2.2.0]
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) 
> ~[apache-cassandra-2.2.0.jar:2.2.0]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[na:1.7.0_85]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  ~[na:1.7.0_85]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_85]
> Caused by: java.lang.RuntimeException: Cannot 

[jira] [Commented] (CASSANDRA-10198) 3.0 hints should be streamed on decomission

2015-09-01 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14725406#comment-14725406
 ] 

Marcus Eriksson commented on CASSANDRA-10198:
-

branch here: https://github.com/krummas/cassandra/commits/marcuse/10198

will mark patchavail once ci has run

> 3.0 hints should be streamed on decomission
> ---
>
> Key: CASSANDRA-10198
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10198
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Aleksey Yeschenko
>Assignee: Marcus Eriksson
> Fix For: 3.0.0 rc1
>
>
> CASSANDRA-6230 added all the necessary pieces in the initial release, but 
> streaming itself didn't make it in time.
> Now that hints are stored in flat files, we cannot just stream hints 
> sstables. Instead we need to handoff hints files.
> Essentially we need to rewrite {{StorageService::streamHints}} to be 
> CASSANDRA-6230 aware.
> {{HintMessage}} and {{HintVerbHandler}} can already handle hints targeted for 
> other nodes (see javadoc for both, it's documented reasonably).
> {{HintsDispatcher}} also takes hostId as an argument, and can stream any 
> hints to any nodes.
> The building blocks are all there - we just need 
> {{StorageService::streamHints}} to pick the optimal candidate for each file 
> and use {{HintsDispatcher}} to stream the files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10198) 3.0 hints should be streamed on decomission

2015-09-01 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-10198:

Reviewer: Aleksey Yeschenko

> 3.0 hints should be streamed on decomission
> ---
>
> Key: CASSANDRA-10198
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10198
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Aleksey Yeschenko
>Assignee: Marcus Eriksson
> Fix For: 3.0.0 rc1
>
>
> CASSANDRA-6230 added all the necessary pieces in the initial release, but 
> streaming itself didn't make it in time.
> Now that hints are stored in flat files, we cannot just stream hints 
> sstables. Instead we need to handoff hints files.
> Essentially we need to rewrite {{StorageService::streamHints}} to be 
> CASSANDRA-6230 aware.
> {{HintMessage}} and {{HintVerbHandler}} can already handle hints targeted for 
> other nodes (see javadoc for both, it's documented reasonably).
> {{HintsDispatcher}} also takes hostId as an argument, and can stream any 
> hints to any nodes.
> The building blocks are all there - we just need 
> {{StorageService::streamHints}} to pick the optimal candidate for each file 
> and use {{HintsDispatcher}} to stream the files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10198) 3.0 hints should be streamed on decomission

2015-09-01 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14725435#comment-14725435
 ] 

Marcus Eriksson edited comment on CASSANDRA-10198 at 9/1/15 2:02 PM:
-

a dtest is pushed 
[here|https://github.com/krummas/cassandra-dtest/commits/marcuse/10198] as 
well, it fails for other reasons sometimes during decom, i will look into that 
further


was (Author: krummas):
a dtest is pushed 
[here|https://github.com/krummas/cassandra-dtest/commits/marcuse/10198] as 
well, it fails sometimes during decom, i will look into that further

> 3.0 hints should be streamed on decomission
> ---
>
> Key: CASSANDRA-10198
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10198
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Aleksey Yeschenko
>Assignee: Marcus Eriksson
> Fix For: 3.0.0 rc1
>
>
> CASSANDRA-6230 added all the necessary pieces in the initial release, but 
> streaming itself didn't make it in time.
> Now that hints are stored in flat files, we cannot just stream hints 
> sstables. Instead we need to handoff hints files.
> Essentially we need to rewrite {{StorageService::streamHints}} to be 
> CASSANDRA-6230 aware.
> {{HintMessage}} and {{HintVerbHandler}} can already handle hints targeted for 
> other nodes (see javadoc for both, it's documented reasonably).
> {{HintsDispatcher}} also takes hostId as an argument, and can stream any 
> hints to any nodes.
> The building blocks are all there - we just need 
> {{StorageService::streamHints}} to pick the optimal candidate for each file 
> and use {{HintsDispatcher}} to stream the files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10198) 3.0 hints should be streamed on decomission

2015-09-01 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14725435#comment-14725435
 ] 

Marcus Eriksson commented on CASSANDRA-10198:
-

a dtest is pushed 
[here|https://github.com/krummas/cassandra-dtest/commits/marcuse/10198] as 
well, it fails sometimes during decom, i will look into that further

> 3.0 hints should be streamed on decomission
> ---
>
> Key: CASSANDRA-10198
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10198
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Aleksey Yeschenko
>Assignee: Marcus Eriksson
> Fix For: 3.0.0 rc1
>
>
> CASSANDRA-6230 added all the necessary pieces in the initial release, but 
> streaming itself didn't make it in time.
> Now that hints are stored in flat files, we cannot just stream hints 
> sstables. Instead we need to handoff hints files.
> Essentially we need to rewrite {{StorageService::streamHints}} to be 
> CASSANDRA-6230 aware.
> {{HintMessage}} and {{HintVerbHandler}} can already handle hints targeted for 
> other nodes (see javadoc for both, it's documented reasonably).
> {{HintsDispatcher}} also takes hostId as an argument, and can stream any 
> hints to any nodes.
> The building blocks are all there - we just need 
> {{StorageService::streamHints}} to pick the optimal candidate for each file 
> and use {{HintsDispatcher}} to stream the files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10218) Remove unnecessary use of streams in IndexTransactions

2015-09-01 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14725456#comment-14725456
 ] 

Marcus Eriksson commented on CASSANDRA-10218:
-

+1

> Remove unnecessary use of streams in IndexTransactions
> --
>
> Key: CASSANDRA-10218
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10218
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
> Fix For: 3.0.0 rc1
>
>
> As noted by [~benedict]  in CASSANDRA-9459 
> ([link|https://issues.apache.org/jira/browse/CASSANDRA-9459?focusedCommentId=14708330&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14708330]),
> the three implementations of {{IndexTransaction}} in 
> {{SecondaryIndexManager}} all use streams unnecessarily and wastefully, which 
> we should fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10193) Improve Rows.diff and Rows.merge efficiency; downgrade Row.columns() to Collection

2015-09-03 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729052#comment-14729052
 ] 

Marcus Eriksson commented on CASSANDRA-10193:
-

bq. we should consider if we want an expiry date on the assertion
Agree - update the comment above it and perhaps a TODO item to remove eventually

First read through, looks good, pushed a commit 
[here|https://github.com/krummas/cassandra/commits/bes/10193] that:
* removes unused imports
* remove a few unused parameters in MemtableAllocator#rowBuilder (note, if you 
foresee that we need those when implementing NativeAllocator, ignore this)
* removes unused Column parameter in a couple of places

Could you rebase and I'll have a second pass over this?

> Improve Rows.diff and Rows.merge efficiency; downgrade Row.columns() to 
> Collection
> 
>
> Key: CASSANDRA-10193
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10193
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 3.0.0 rc1
>
>
> There's not really a lot of reason to store a Columns instance in each row. 
> Retaining it introduces extra costs on every row merge, in both consumed CPU 
> time and heap. 
> While working on CASSANDRA-10045 it became apparent this would be very easy 
> to remove, however to avoid scope creep I have filed this as a follow up 
> ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10253) Incrimental repairs not working as expected with DTCS

2015-09-03 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729057#comment-14729057
 ] 

Marcus Eriksson commented on CASSANDRA-10253:
-

Are you using vnodes? Was this working properly before you started using 
incremental backups?

> Incrimental repairs not working as expected with DTCS
> -
>
> Key: CASSANDRA-10253
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10253
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Pre-prod
>Reporter: vijay
>Assignee: Marcus Eriksson
> Fix For: 2.1.x
>
> Attachments: systemfiles 2.zip
>
>
> HI,
> we are ingesting data 6 million records every 15 mins into one DTCS table and 
> relaying on Cassandra for purging the data.Table Schema given below, Issue 1: 
> we are expecting to see table sstable created on day d1 will not be compacted 
> after d1 how we are not seeing this, how ever i see some data being purged at 
> random intervals
> Issue 2: when we run incremental repair using "nodetool repair keyspace table 
> -inc -pr" each sstable is splitting up to multiple smaller SStables and 
> increasing the total storage.This behavior is same running repairs on any 
> node and any number of times
> There are mutation drop's in the cluster
> Table:
> {code}
> CREATE TABLE TableA (
> F1 text,
> F2 int,
> createts bigint,
> stats blob,
> PRIMARY KEY ((F1,F2), createts)
> ) WITH CLUSTERING ORDER BY (createts DESC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
> AND comment = ''
> AND compaction = {'min_threshold': '12', 'max_sstable_age_days': '1', 
> 'base_time_seconds': '50', 'class': 
> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy'}
> AND compression = {'sstable_compression': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND dclocal_read_repair_chance = 0.0
> AND default_time_to_live = 93600
> AND gc_grace_seconds = 3600
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99.0PERCENTILE';
> {code}
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10253) Incrimental repairs not working as expected with DTCS

2015-09-03 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729196#comment-14729196
 ] 

Marcus Eriksson commented on CASSANDRA-10253:
-

could you post tools/bin/sstablemetadata for all sstables in the table you are 
running incremental repair on?

> Incrimental repairs not working as expected with DTCS
> -
>
> Key: CASSANDRA-10253
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10253
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Pre-prod
>Reporter: vijay
>Assignee: Marcus Eriksson
> Fix For: 2.1.x
>
> Attachments: systemfiles 2.zip
>
>
> HI,
> we are ingesting data 6 million records every 15 mins into one DTCS table and 
> relaying on Cassandra for purging the data.Table Schema given below, Issue 1: 
> we are expecting to see table sstable created on day d1 will not be compacted 
> after d1 how we are not seeing this, how ever i see some data being purged at 
> random intervals
> Issue 2: when we run incremental repair using "nodetool repair keyspace table 
> -inc -pr" each sstable is splitting up to multiple smaller SStables and 
> increasing the total storage.This behavior is same running repairs on any 
> node and any number of times
> There are mutation drop's in the cluster
> Table:
> {code}
> CREATE TABLE TableA (
> F1 text,
> F2 int,
> createts bigint,
> stats blob,
> PRIMARY KEY ((F1,F2), createts)
> ) WITH CLUSTERING ORDER BY (createts DESC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
> AND comment = ''
> AND compaction = {'min_threshold': '12', 'max_sstable_age_days': '1', 
> 'base_time_seconds': '50', 'class': 
> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy'}
> AND compression = {'sstable_compression': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND dclocal_read_repair_chance = 0.0
> AND default_time_to_live = 93600
> AND gc_grace_seconds = 3600
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99.0PERCENTILE';
> {code}
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10253) Incremental repairs not working as expected with DTCS

2015-09-04 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730412#comment-14730412
 ] 

Marcus Eriksson commented on CASSANDRA-10253:
-

the sstablemetadata actually looks good from an incremental repair standpoint:
||repaired || unrepaired ||
|2737|471|
|3052|437|
|2796|450|
|2746|456|
|3273|317|
|2572|384|

this means that instead of repairing ~3000 sstables per node per repair 
session, you are only repairing ~500 - this means that the impact of the repair 
is a lot smaller with incremental repair. There are general problems with 
vnodes and repair though: CASSANDRA-5220 and those problems are aggravated with 
DTCS: CASSANDRA-9644

I have to say I don't understand what the problem is in your issue 1 above 
though, could you elaborate?

> Incremental repairs not working as expected with DTCS
> -
>
> Key: CASSANDRA-10253
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10253
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Pre-prod
>Reporter: vijay
>Assignee: Marcus Eriksson
> Fix For: 2.1.x
>
> Attachments: sstablemetadata-cluster-logs.zip, systemfiles 2.zip
>
>
> HI,
> we are ingesting data 6 million records every 15 mins into one DTCS table and 
> relaying on Cassandra for purging the data.Table Schema given below, Issue 1: 
> we are expecting to see table sstable created on day d1 will not be compacted 
> after d1 how we are not seeing this, how ever i see some data being purged at 
> random intervals
> Issue 2: when we run incremental repair using "nodetool repair keyspace table 
> -inc -pr" each sstable is splitting up to multiple smaller SStables and 
> increasing the total storage.This behavior is same running repairs on any 
> node and any number of times
> There are mutation drop's in the cluster
> Table:
> {code}
> CREATE TABLE TableA (
> F1 text,
> F2 int,
> createts bigint,
> stats blob,
> PRIMARY KEY ((F1,F2), createts)
> ) WITH CLUSTERING ORDER BY (createts DESC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
> AND comment = ''
> AND compaction = {'min_threshold': '12', 'max_sstable_age_days': '1', 
> 'base_time_seconds': '50', 'class': 
> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy'}
> AND compression = {'sstable_compression': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND dclocal_read_repair_chance = 0.0
> AND default_time_to_live = 93600
> AND gc_grace_seconds = 3600
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99.0PERCENTILE';
> {code}
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10265) Properly deserialize PREPARE_GLOBAL_MESSAGE and update NEWS.txt about repair defaults in 2.2

2015-09-04 Thread Marcus Eriksson (JIRA)
Marcus Eriksson created CASSANDRA-10265:
---

 Summary: Properly deserialize PREPARE_GLOBAL_MESSAGE and update 
NEWS.txt about repair defaults in 2.2
 Key: CASSANDRA-10265
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10265
 Project: Cassandra
  Issue Type: Bug
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
 Fix For: 2.2.x


We deserialize all PREPARE_MESSAGE as non-global, meaning we don't do any 
anticompaction after incremental repairs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10265) Properly deserialize PREPARE_GLOBAL_MESSAGE and update NEWS.txt about repair defaults in 2.2

2015-09-04 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730495#comment-14730495
 ] 

Marcus Eriksson commented on CASSANDRA-10265:
-

Patch [here|https://github.com/krummas/cassandra/commits/marcuse/10265]

> Properly deserialize PREPARE_GLOBAL_MESSAGE and update NEWS.txt about repair 
> defaults in 2.2
> 
>
> Key: CASSANDRA-10265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10265
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 2.2.x
>
>
> We deserialize all PREPARE_MESSAGE as non-global, meaning we don't do any 
> anticompaction after incremental repairs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10193) Improve Rows.diff and Rows.merge efficiency; downgrade Row.columns() to Collection

2015-09-04 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730716#comment-14730716
 ] 

Marcus Eriksson commented on CASSANDRA-10193:
-

code looks good, it would be nice with some explicit tests for {{Rows.merge}} 
and {{Rows.diff}} though - I'm sure we have test coverage of them, but it would 
be nice to have direct tests here

> Improve Rows.diff and Rows.merge efficiency; downgrade Row.columns() to 
> Collection
> 
>
> Key: CASSANDRA-10193
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10193
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 3.0.0 rc1
>
>
> There's not really a lot of reason to store a Columns instance in each row. 
> Retaining it introduces extra costs on every row merge, in both consumed CPU 
> time and heap. 
> While working on CASSANDRA-10045 it became apparent this would be very easy 
> to remove, however to avoid scope creep I have filed this as a follow up 
> ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10193) Improve Rows.diff and Rows.merge efficiency; downgrade Row.columns() to Collection

2015-09-04 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730722#comment-14730722
 ] 

Marcus Eriksson commented on CASSANDRA-10193:
-

OK, +1

> Improve Rows.diff and Rows.merge efficiency; downgrade Row.columns() to 
> Collection
> 
>
> Key: CASSANDRA-10193
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10193
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 3.0.0 rc1
>
>
> There's not really a lot of reason to store a Columns instance in each row. 
> Retaining it introduces extra costs on every row merge, in both consumed CPU 
> time and heap. 
> While working on CASSANDRA-10045 it became apparent this would be very easy 
> to remove, however to avoid scope creep I have filed this as a follow up 
> ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10265) Properly deserialize PREPARE_GLOBAL_MESSAGE and update NEWS.txt about repair defaults in 2.2

2015-09-04 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730924#comment-14730924
 ] 

Marcus Eriksson commented on CASSANDRA-10265:
-

I have no idea how 
[this|https://github.com/riptano/cassandra-dtest/blob/master/incremental_repair_test.py#L26]
 is not catching it, but I'll have a look

> Properly deserialize PREPARE_GLOBAL_MESSAGE and update NEWS.txt about repair 
> defaults in 2.2
> 
>
> Key: CASSANDRA-10265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10265
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 2.2.x
>
>
> We deserialize all PREPARE_MESSAGE as non-global, meaning we don't do any 
> anticompaction after incremental repairs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10265) Properly deserialize PREPARE_GLOBAL_MESSAGE and update NEWS.txt about repair defaults in 2.2

2015-09-04 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730938#comment-14730938
 ] 

Marcus Eriksson commented on CASSANDRA-10265:
-

test was broken, fix pushed here: 
https://github.com/krummas/cassandra-dtest/commit/cc4bc8ea14cb3563cf1613eb990673d2622cac26

> Properly deserialize PREPARE_GLOBAL_MESSAGE and update NEWS.txt about repair 
> defaults in 2.2
> 
>
> Key: CASSANDRA-10265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10265
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 2.2.x
>
>
> We deserialize all PREPARE_MESSAGE as non-global, meaning we don't do any 
> anticompaction after incremental repairs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10268) Improve incremental repair tests

2015-09-04 Thread Marcus Eriksson (JIRA)
Marcus Eriksson created CASSANDRA-10268:
---

 Summary: Improve incremental repair tests
 Key: CASSANDRA-10268
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10268
 Project: Cassandra
  Issue Type: Bug
Reporter: Marcus Eriksson


Incremental repairs were broken for a while due to CASSANDRA-10265 - and none 
of the tests in incremental_repair_tests.py caught that. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10265) Properly deserialize PREPARE_GLOBAL_MESSAGE and update NEWS.txt about repair defaults in 2.2

2015-09-04 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730961#comment-14730961
 ] 

Marcus Eriksson commented on CASSANDRA-10265:
-

also filed CASSANDRA-10268

> Properly deserialize PREPARE_GLOBAL_MESSAGE and update NEWS.txt about repair 
> defaults in 2.2
> 
>
> Key: CASSANDRA-10265
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10265
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 2.2.x
>
>
> We deserialize all PREPARE_MESSAGE as non-global, meaning we don't do any 
> anticompaction after incremental repairs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10276) With DTCS, do STCS in windows if more than max_threshold sstables

2015-09-07 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-10276:

Fix Version/s: 2.2.x

> With DTCS, do STCS in windows if more than max_threshold sstables
> -
>
> Key: CASSANDRA-10276
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10276
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Core
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 3.x, 2.1.x, 2.2.x
>
>
> To avoid constant recompaction of files in DTCS windows, we should do STCS of 
> those files.
> Patch here: https://github.com/krummas/cassandra/commits/marcuse/dtcs_stcs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10276) With DTCS, do STCS in windows if more than max_threshold sstables

2015-09-07 Thread Marcus Eriksson (JIRA)
Marcus Eriksson created CASSANDRA-10276:
---

 Summary: With DTCS, do STCS in windows if more than max_threshold 
sstables
 Key: CASSANDRA-10276
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10276
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson


To avoid constant recompaction of files in DTCS windows, we should do STCS of 
those files.

Patch here: https://github.com/krummas/cassandra/commits/marcuse/dtcs_stcs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10276) With DTCS, do STCS in windows if more than max_threshold sstables

2015-09-07 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-10276:

Description: 
To avoid constant recompaction of files in big ( > max threshold) DTCS windows, 
we should do STCS of those files.

Patch here: https://github.com/krummas/cassandra/commits/marcuse/dtcs_stcs

  was:
To avoid constant recompaction of files in DTCS windows, we should do STCS of 
those files.

Patch here: https://github.com/krummas/cassandra/commits/marcuse/dtcs_stcs


> With DTCS, do STCS in windows if more than max_threshold sstables
> -
>
> Key: CASSANDRA-10276
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10276
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Core
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 3.x, 2.1.x, 2.2.x
>
>
> To avoid constant recompaction of files in big ( > max threshold) DTCS 
> windows, we should do STCS of those files.
> Patch here: https://github.com/krummas/cassandra/commits/marcuse/dtcs_stcs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10280) Make DTCS work well with old data

2015-09-07 Thread Marcus Eriksson (JIRA)
Marcus Eriksson created CASSANDRA-10280:
---

 Summary: Make DTCS work well with old data
 Key: CASSANDRA-10280
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10280
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
 Fix For: 3.x, 2.1.x, 2.2.x


Operational tasks become incredibly expensive if you keep around a long 
timespan of data with DTCS - with default settings and 1 year of data, the 
oldest window covers about 180 days. Bootstrapping a node with vnodes with this 
data layout will force cassandra to compact very many sstables in this window.

We should probably put a cap on how big the biggest windows can get. We could 
probably default this to something sane based on max_sstable_age (ie, say we 
can reasonably handle 1000 sstables per node, then we can make calculate how 
big the windows should be to allow that)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-9597) DTCS should consider file SIZE in addition to time windowing

2015-09-07 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson resolved CASSANDRA-9597.

Resolution: Duplicate

> DTCS should consider file SIZE in addition to time windowing
> 
>
> Key: CASSANDRA-9597
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9597
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeff Jirsa
>Priority: Minor
>  Labels: dtcs
>
> DTCS seems to work well for the typical use case - writing data in perfect 
> time order, compacting recent files, and ignoring older files.
> However, there are "normal" operational actions where DTCS will fall behind 
> and is unlikely to recover.
> An example of this is streaming operations (for example, bootstrap or loading 
> data into a cluster using sstableloader), where lots (tens of thousands) of 
> very small sstables can be created spanning multiple time buckets. In these 
> case, even if max_sstable_age_days is extended to allow the older incoming 
> files to be compacted, the selection logic is likely to re-compact large 
> files with fewer small files over and over, rather than prioritizing 
> selection of max_threshold smallest files to decrease the number of candidate 
> sstables as quickly as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-8371) DateTieredCompactionStrategy is always compacting

2015-09-07 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson resolved CASSANDRA-8371.

Resolution: Duplicate

Closing this as duplicate, hoping that this will be resolved in CASSANDRA-9644

> DateTieredCompactionStrategy is always compacting 
> --
>
> Key: CASSANDRA-8371
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8371
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: mck
>Assignee: Björn Hegerfors
>  Labels: compaction, dtcs, performance
> Attachments: java_gc_counts_rate-month.png, 
> read-latency-recommenders-adview.png, read-latency.png, 
> sstables-recommenders-adviews.png, sstables.png, vg2_iad-month.png
>
>
> Running 2.0.11 and having switched a table to 
> [DTCS|https://issues.apache.org/jira/browse/CASSANDRA-6602] we've seen that 
> disk IO and gc count increase, along with the number of reads happening in 
> the "compaction" hump of cfhistograms.
> Data, and generally performance, looks good, but compactions are always 
> happening, and pending compactions are building up.
> The schema for this is 
> {code}CREATE TABLE search (
>   loginid text,
>   searchid timeuuid,
>   description text,
>   searchkey text,
>   searchurl text,
>   PRIMARY KEY ((loginid), searchid)
> );{code}
> We're sitting on about 82G (per replica) across 6 nodes in 4 DCs.
> CQL executed against this keyspace, and traffic patterns, can be seen in 
> slides 7+8 of https://prezi.com/b9-aj6p2esft/
> Attached are sstables-per-read and read-latency graphs from cfhistograms, and 
> screenshots of our munin graphs as we have gone from STCS, to LCS (week ~44), 
> to DTCS (week ~46).
> These screenshots are also found in the prezi on slides 9-11.
> [~pmcfadin], [~Bj0rn], 
> Can this be a consequence of occasional deleted rows, as is described under 
> (3) in the description of CASSANDRA-6602 ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-10253) Incremental repairs not working as expected with DTCS

2015-09-07 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson resolved CASSANDRA-10253.
-
   Resolution: Duplicate
Fix Version/s: (was: 2.1.x)

Closing this as a duplicate of CASSANDRA-9644 - hoping all these issues will be 
fixed there

> Incremental repairs not working as expected with DTCS
> -
>
> Key: CASSANDRA-10253
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10253
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Pre-prod
>Reporter: vijay
>Assignee: Marcus Eriksson
>  Labels: dtcs
> Attachments: sstablemetadata-cluster-logs.zip, systemfiles 2.zip
>
>
> HI,
> we are ingesting data 6 million records every 15 mins into one DTCS table and 
> relaying on Cassandra for purging the data.Table Schema given below, Issue 1: 
> we are expecting to see table sstable created on day d1 will not be compacted 
> after d1 how we are not seeing this, how ever i see some data being purged at 
> random intervals
> Issue 2: when we run incremental repair using "nodetool repair keyspace table 
> -inc -pr" each sstable is splitting up to multiple smaller SStables and 
> increasing the total storage.This behavior is same running repairs on any 
> node and any number of times
> There are mutation drop's in the cluster
> Table:
> {code}
> CREATE TABLE TableA (
> F1 text,
> F2 int,
> createts bigint,
> stats blob,
> PRIMARY KEY ((F1,F2), createts)
> ) WITH CLUSTERING ORDER BY (createts DESC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
> AND comment = ''
> AND compaction = {'min_threshold': '12', 'max_sstable_age_days': '1', 
> 'base_time_seconds': '50', 'class': 
> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy'}
> AND compression = {'sstable_compression': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND dclocal_read_repair_chance = 0.0
> AND default_time_to_live = 93600
> AND gc_grace_seconds = 3600
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99.0PERCENTILE';
> {code}
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10274) Assertion Errors when interrupting Cleanup

2015-09-07 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-10274:

Assignee: Benedict
Reviewer: Marcus Eriksson

> Assertion Errors when interrupting Cleanup
> --
>
> Key: CASSANDRA-10274
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10274
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jeff Jirsa
>Assignee: Benedict
>Priority: Critical
> Fix For: 2.1.x
>
>
> Exceptions encountered after Interrupting cleanup on 2.1.7 - logging due to 
> nature. Seen on 2 different nodes. May be related to CASSANDRA-10260 . 
> {code}
> INFO  [CompactionExecutor:5836] 2015-09-06 11:33:39,630 
> CompactionManager.java:1286 - Compaction interrupted: 
> Cleanup@74bffc10-0fbe-11e5-a5ce-37a8b36fe285(keyspace, table, 
> 44698243387/139158
> 624314)bytes
> INFO  [CompactionExecutor:5838] 2015-09-06 11:33:39,638 
> CompactionManager.java:1286 - Compaction interrupted: 
> Cleanup@74bffc10-0fbe-11e5-a5ce-37a8b36fe285(keyspace, table, 
> 37886026781/133123
> 638379)bytes
> INFO  [CompactionExecutor:5836] 2015-09-06 11:33:39,639 
> CompactionManager.java:749 - Cleaning up 
> SSTableReader(path='/mnt/cassandra/data/keyspace/table-74bffc100fbe11e5a5ce37a8b36fe285/keyspace-table-26598-Data.db')
> ERROR [CompactionExecutor:5838] 2015-09-06 11:33:39,639 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[CompactionExecutor:5838,1,main]
> java.lang.AssertionError: Memory was freed
> at 
> org.apache.cassandra.io.util.SafeMemory.checkBounds(SafeMemory.java:97) 
> ~[apache-cassandra-2.1.7.jar:2.1.7]
> at org.apache.cassandra.io.util.Memory.getInt(Memory.java:281) 
> ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.io.sstable.IndexSummary.getPositionInSummary(IndexSummary.java:139)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.io.sstable.IndexSummary.getKey(IndexSummary.java:144) 
> ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.io.sstable.IndexSummary.binarySearch(IndexSummary.java:113)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.io.sstable.SSTableReader.getIndexScanPosition(SSTableReader.java:1183)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.io.sstable.SSTableReader.firstKeyBeyond(SSTableReader.java:1667)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.db.compaction.CompactionManager.needsCleanup(CompactionManager.java:693)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.db.compaction.CompactionManager.doCleanupOne(CompactionManager.java:734)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.db.compaction.CompactionManager.access$400(CompactionManager.java:94)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.db.compaction.CompactionManager$5.execute(CompactionManager.java:389)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:285)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_51]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_51]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_51]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_51]
> {code}
> And: 
> {code}
> ERROR [IndexSummaryManager:1] 2015-09-06 09:32:23,346 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[IndexSummaryManager:1,1,main]
> java.lang.AssertionError: null
> at 
> org.apache.cassandra.io.sstable.SSTableReader.setReplacedBy(SSTableReader.java:955)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.io.sstable.SSTableReader.cloneAndReplace(SSTableReader.java:1002)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.io.sstable.SSTableReader.cloneWithNewSummarySamplingLevel(SSTableReader.java:1105)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.io.sstable.IndexSummaryManager.adjustSamplingLevels(IndexSummaryManager.java:421)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(IndexSummaryManager.java:299)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(IndexSummaryManager.java:238)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow(IndexSummaryManager.java:139)
>  ~[apache-cassandra-2.1.7.jar:2

[jira] [Commented] (CASSANDRA-10274) Assertion Errors when interrupting Cleanup

2015-09-07 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733924#comment-14733924
 ] 

Marcus Eriksson commented on CASSANDRA-10274:
-

+1

> Assertion Errors when interrupting Cleanup
> --
>
> Key: CASSANDRA-10274
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10274
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jeff Jirsa
>Assignee: Benedict
>Priority: Critical
> Fix For: 2.1.x
>
>
> Exceptions encountered after Interrupting cleanup on 2.1.7 - logging due to 
> nature. Seen on 2 different nodes. May be related to CASSANDRA-10260 . 
> {code}
> INFO  [CompactionExecutor:5836] 2015-09-06 11:33:39,630 
> CompactionManager.java:1286 - Compaction interrupted: 
> Cleanup@74bffc10-0fbe-11e5-a5ce-37a8b36fe285(keyspace, table, 
> 44698243387/139158
> 624314)bytes
> INFO  [CompactionExecutor:5838] 2015-09-06 11:33:39,638 
> CompactionManager.java:1286 - Compaction interrupted: 
> Cleanup@74bffc10-0fbe-11e5-a5ce-37a8b36fe285(keyspace, table, 
> 37886026781/133123
> 638379)bytes
> INFO  [CompactionExecutor:5836] 2015-09-06 11:33:39,639 
> CompactionManager.java:749 - Cleaning up 
> SSTableReader(path='/mnt/cassandra/data/keyspace/table-74bffc100fbe11e5a5ce37a8b36fe285/keyspace-table-26598-Data.db')
> ERROR [CompactionExecutor:5838] 2015-09-06 11:33:39,639 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[CompactionExecutor:5838,1,main]
> java.lang.AssertionError: Memory was freed
> at 
> org.apache.cassandra.io.util.SafeMemory.checkBounds(SafeMemory.java:97) 
> ~[apache-cassandra-2.1.7.jar:2.1.7]
> at org.apache.cassandra.io.util.Memory.getInt(Memory.java:281) 
> ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.io.sstable.IndexSummary.getPositionInSummary(IndexSummary.java:139)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.io.sstable.IndexSummary.getKey(IndexSummary.java:144) 
> ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.io.sstable.IndexSummary.binarySearch(IndexSummary.java:113)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.io.sstable.SSTableReader.getIndexScanPosition(SSTableReader.java:1183)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.io.sstable.SSTableReader.firstKeyBeyond(SSTableReader.java:1667)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.db.compaction.CompactionManager.needsCleanup(CompactionManager.java:693)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.db.compaction.CompactionManager.doCleanupOne(CompactionManager.java:734)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.db.compaction.CompactionManager.access$400(CompactionManager.java:94)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.db.compaction.CompactionManager$5.execute(CompactionManager.java:389)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:285)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_51]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_51]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_51]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_51]
> {code}
> And: 
> {code}
> ERROR [IndexSummaryManager:1] 2015-09-06 09:32:23,346 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[IndexSummaryManager:1,1,main]
> java.lang.AssertionError: null
> at 
> org.apache.cassandra.io.sstable.SSTableReader.setReplacedBy(SSTableReader.java:955)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.io.sstable.SSTableReader.cloneAndReplace(SSTableReader.java:1002)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.io.sstable.SSTableReader.cloneWithNewSummarySamplingLevel(SSTableReader.java:1105)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.io.sstable.IndexSummaryManager.adjustSamplingLevels(IndexSummaryManager.java:421)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(IndexSummaryManager.java:299)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(IndexSummaryManager.java:238)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
> at 
> org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow(IndexSummaryManager.java:139)
>  ~[apache-cassandra-2.1

[jira] [Updated] (CASSANDRA-10280) Make DTCS work well with old data

2015-09-07 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-10280:

Description: 
Operational tasks become incredibly expensive if you keep around a long 
timespan of data with DTCS - with default settings and 1 year of data, the 
oldest window covers about 180 days. Bootstrapping a node with vnodes with this 
data layout will force cassandra to compact very many sstables in this window.

We should probably put a cap on how big the biggest windows can get. We could 
probably default this to something sane based on max_sstable_age (ie, say we 
can reasonably handle 1000 sstables per node, then we can calculate how big the 
windows should be to allow that)

  was:
Operational tasks become incredibly expensive if you keep around a long 
timespan of data with DTCS - with default settings and 1 year of data, the 
oldest window covers about 180 days. Bootstrapping a node with vnodes with this 
data layout will force cassandra to compact very many sstables in this window.

We should probably put a cap on how big the biggest windows can get. We could 
probably default this to something sane based on max_sstable_age (ie, say we 
can reasonably handle 1000 sstables per node, then we can make calculate how 
big the windows should be to allow that)


> Make DTCS work well with old data
> -
>
> Key: CASSANDRA-10280
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10280
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Core
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 3.x, 2.1.x, 2.2.x
>
>
> Operational tasks become incredibly expensive if you keep around a long 
> timespan of data with DTCS - with default settings and 1 year of data, the 
> oldest window covers about 180 days. Bootstrapping a node with vnodes with 
> this data layout will force cassandra to compact very many sstables in this 
> window.
> We should probably put a cap on how big the biggest windows can get. We could 
> probably default this to something sane based on max_sstable_age (ie, say we 
> can reasonably handle 1000 sstables per node, then we can calculate how big 
> the windows should be to allow that)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10276) With DTCS, do STCS in windows if more than max_threshold sstables

2015-09-07 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-10276:

Reviewer: Jeff Jirsa

> With DTCS, do STCS in windows if more than max_threshold sstables
> -
>
> Key: CASSANDRA-10276
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10276
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Core
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 3.x, 2.1.x, 2.2.x
>
>
> To avoid constant recompaction of files in big ( > max threshold) DTCS 
> windows, we should do STCS of those files.
> Patch here: https://github.com/krummas/cassandra/commits/marcuse/dtcs_stcs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10276) With DTCS, do STCS in windows if more than max_threshold sstables

2015-09-08 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734488#comment-14734488
 ] 

Marcus Eriksson commented on CASSANDRA-10276:
-

Thanks for the review, pushed another commit to the repo in the description 
that fixes your comment and cleans up the code a bit

> With DTCS, do STCS in windows if more than max_threshold sstables
> -
>
> Key: CASSANDRA-10276
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10276
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Core
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 3.x, 2.1.x, 2.2.x
>
>
> To avoid constant recompaction of files in big ( > max threshold) DTCS 
> windows, we should do STCS of those files.
> Patch here: https://github.com/krummas/cassandra/commits/marcuse/dtcs_stcs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-10270) Cassandra stops compacting

2015-09-08 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson reassigned CASSANDRA-10270:
---

Assignee: Marcus Eriksson

> Cassandra stops compacting
> --
>
> Key: CASSANDRA-10270
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10270
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: linux (google cloud click-to-deploy, default settings, 3 
> nodes)
>Reporter: Adam Bliss
>Assignee: Marcus Eriksson
> Fix For: 2.2.x
>
> Attachments: system.log.gz, system.txt.gz
>
>
> My cassandra cluster won't keep compacting. I notice that if I restart, it 
> does compact for a while, but after a time it stops. As a result, after 
> adding a bunch of rows, I ended up with about 1000 sstables per node.
> I'll attach more logs in a minute, but it seems like this might be the most 
> relevant part:
> {noformat}
> INFO  [CompactionExecutor:1] 2015-09-04 14:22:55,796 
> CompactionManager.java:1433 - Compaction interrupted: 
> Compaction@fff9bcd0-3b1f-11e5-8df6-33158d7bf3bf(megacrawl2, ranks_by_domain, 
> 812501702/7543091905)bytes
> DEBUG [CompactionExecutor:1] 2015-09-04 14:22:55,797 
> CompactionManager.java:1437 - Full interruption stack trace:
> org.apache.cassandra.db.compaction.CompactionInterruptedException: Compaction 
> interrupted: Compaction@fff9bcd0-3b1f-11e5-8df6-33158d7bf3bf(megacrawl2, 
> ranks_by_
> domain, 812501702/7543091905)bytes
> at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:180)
>  ~[apache-cassandra-2.2.1.jar:2.2.1]
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[apache-cassandra-2.2.1.jar:2.2.1]
> at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:74)
>  ~[apache-cassandra-2.2.1.jar:2.2.1]
> at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
>  ~[apache-cassandra-2.2.1.jar:2.2.1]at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:236)
>  ~[apache-cassandra-2.2.1.jar:2.2.1]  
>   
>at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> ~[na:1.7.0_79]
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
> ~[na:1.7.0_79]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[na:1.7.0_79]   
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_79]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
> DEBUG [CompactionExecutor:1] 2015-09-04 14:22:55,797 
> CompactionManager.java:222 - Checking system.local
>  DEBUG [CompactionExecutor:1] 2015-09-04 
> 14:22:55,797 SizeTieredCompactionStrategy.java:85 - Compaction buckets are 
> [[BigTableReader(path='/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/la-83-big-Data.db'),
>  
> BigTableReader(path='/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/la-81-big-Data.db'),
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10270) Cassandra stops compacting

2015-09-08 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735244#comment-14735244
 ] 

Marcus Eriksson commented on CASSANDRA-10270:
-

[~abliss] could you post more details about your setup? schema, type of storage 
etc?

> Cassandra stops compacting
> --
>
> Key: CASSANDRA-10270
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10270
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: linux (google cloud click-to-deploy, default settings, 3 
> nodes)
>Reporter: Adam Bliss
>Assignee: Marcus Eriksson
> Fix For: 2.2.x
>
> Attachments: system.log.gz, system.txt.gz
>
>
> My cassandra cluster won't keep compacting. I notice that if I restart, it 
> does compact for a while, but after a time it stops. As a result, after 
> adding a bunch of rows, I ended up with about 1000 sstables per node.
> I'll attach more logs in a minute, but it seems like this might be the most 
> relevant part:
> {noformat}
> INFO  [CompactionExecutor:1] 2015-09-04 14:22:55,796 
> CompactionManager.java:1433 - Compaction interrupted: 
> Compaction@fff9bcd0-3b1f-11e5-8df6-33158d7bf3bf(megacrawl2, ranks_by_domain, 
> 812501702/7543091905)bytes
> DEBUG [CompactionExecutor:1] 2015-09-04 14:22:55,797 
> CompactionManager.java:1437 - Full interruption stack trace:
> org.apache.cassandra.db.compaction.CompactionInterruptedException: Compaction 
> interrupted: Compaction@fff9bcd0-3b1f-11e5-8df6-33158d7bf3bf(megacrawl2, 
> ranks_by_
> domain, 812501702/7543091905)bytes
> at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:180)
>  ~[apache-cassandra-2.2.1.jar:2.2.1]
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[apache-cassandra-2.2.1.jar:2.2.1]
> at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:74)
>  ~[apache-cassandra-2.2.1.jar:2.2.1]
> at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
>  ~[apache-cassandra-2.2.1.jar:2.2.1]at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:236)
>  ~[apache-cassandra-2.2.1.jar:2.2.1]  
>   
>at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> ~[na:1.7.0_79]
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
> ~[na:1.7.0_79]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[na:1.7.0_79]   
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_79]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
> DEBUG [CompactionExecutor:1] 2015-09-04 14:22:55,797 
> CompactionManager.java:222 - Checking system.local
>  DEBUG [CompactionExecutor:1] 2015-09-04 
> 14:22:55,797 SizeTieredCompactionStrategy.java:85 - Compaction buckets are 
> [[BigTableReader(path='/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/la-83-big-Data.db'),
>  
> BigTableReader(path='/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/la-81-big-Data.db'),
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10270) Cassandra stops compacting

2015-09-08 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-10270:

Reproduced In: 2.2.1, 2.2.0  (was: 2.2.0, 2.2.1)
 Priority: Critical  (was: Major)

bumping to Critical, I've been unsuccessful to reproduce this so far

> Cassandra stops compacting
> --
>
> Key: CASSANDRA-10270
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10270
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: linux (google cloud click-to-deploy, default settings, 3 
> nodes)
>Reporter: Adam Bliss
>Assignee: Marcus Eriksson
>Priority: Critical
> Fix For: 2.2.x
>
> Attachments: system.log.gz, system.txt.gz
>
>
> My cassandra cluster won't keep compacting. I notice that if I restart, it 
> does compact for a while, but after a time it stops. As a result, after 
> adding a bunch of rows, I ended up with about 1000 sstables per node.
> I'll attach more logs in a minute, but it seems like this might be the most 
> relevant part:
> {noformat}
> INFO  [CompactionExecutor:1] 2015-09-04 14:22:55,796 
> CompactionManager.java:1433 - Compaction interrupted: 
> Compaction@fff9bcd0-3b1f-11e5-8df6-33158d7bf3bf(megacrawl2, ranks_by_domain, 
> 812501702/7543091905)bytes
> DEBUG [CompactionExecutor:1] 2015-09-04 14:22:55,797 
> CompactionManager.java:1437 - Full interruption stack trace:
> org.apache.cassandra.db.compaction.CompactionInterruptedException: Compaction 
> interrupted: Compaction@fff9bcd0-3b1f-11e5-8df6-33158d7bf3bf(megacrawl2, 
> ranks_by_
> domain, 812501702/7543091905)bytes
> at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:180)
>  ~[apache-cassandra-2.2.1.jar:2.2.1]
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[apache-cassandra-2.2.1.jar:2.2.1]
> at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:74)
>  ~[apache-cassandra-2.2.1.jar:2.2.1]
> at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
>  ~[apache-cassandra-2.2.1.jar:2.2.1]at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:236)
>  ~[apache-cassandra-2.2.1.jar:2.2.1]  
>   
>at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> ~[na:1.7.0_79]
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
> ~[na:1.7.0_79]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[na:1.7.0_79]   
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_79]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
> DEBUG [CompactionExecutor:1] 2015-09-04 14:22:55,797 
> CompactionManager.java:222 - Checking system.local
>  DEBUG [CompactionExecutor:1] 2015-09-04 
> 14:22:55,797 SizeTieredCompactionStrategy.java:85 - Compaction buckets are 
> [[BigTableReader(path='/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/la-83-big-Data.db'),
>  
> BigTableReader(path='/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/la-81-big-Data.db'),
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10270) Cassandra stops compacting

2015-09-09 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736700#comment-14736700
 ] 

Marcus Eriksson commented on CASSANDRA-10270:
-

Ok, this happens when cassandra decides to redistribute index summaries, by 
default every 60 minutes. Working on a fix, but in the mean time this can be 
avoided by setting {{index_summary_resize_interval_in_minutes}} to -1 in 
cassandra.yaml to disable this feature.

> Cassandra stops compacting
> --
>
> Key: CASSANDRA-10270
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10270
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: linux (google cloud click-to-deploy, default settings, 3 
> nodes)
>Reporter: Adam Bliss
>Assignee: Marcus Eriksson
>Priority: Critical
> Fix For: 2.2.x
>
> Attachments: system.log.gz, system.txt.gz
>
>
> My cassandra cluster won't keep compacting. I notice that if I restart, it 
> does compact for a while, but after a time it stops. As a result, after 
> adding a bunch of rows, I ended up with about 1000 sstables per node.
> I'll attach more logs in a minute, but it seems like this might be the most 
> relevant part:
> {noformat}
> INFO  [CompactionExecutor:1] 2015-09-04 14:22:55,796 
> CompactionManager.java:1433 - Compaction interrupted: 
> Compaction@fff9bcd0-3b1f-11e5-8df6-33158d7bf3bf(megacrawl2, ranks_by_domain, 
> 812501702/7543091905)bytes
> DEBUG [CompactionExecutor:1] 2015-09-04 14:22:55,797 
> CompactionManager.java:1437 - Full interruption stack trace:
> org.apache.cassandra.db.compaction.CompactionInterruptedException: Compaction 
> interrupted: Compaction@fff9bcd0-3b1f-11e5-8df6-33158d7bf3bf(megacrawl2, 
> ranks_by_
> domain, 812501702/7543091905)bytes
> at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:180)
>  ~[apache-cassandra-2.2.1.jar:2.2.1]
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[apache-cassandra-2.2.1.jar:2.2.1]
> at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:74)
>  ~[apache-cassandra-2.2.1.jar:2.2.1]
> at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
>  ~[apache-cassandra-2.2.1.jar:2.2.1]at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:236)
>  ~[apache-cassandra-2.2.1.jar:2.2.1]  
>   
>at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> ~[na:1.7.0_79]
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
> ~[na:1.7.0_79]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[na:1.7.0_79]   
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_79]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
> DEBUG [CompactionExecutor:1] 2015-09-04 14:22:55,797 
> CompactionManager.java:222 - Checking system.local
>  DEBUG [CompactionExecutor:1] 2015-09-04 
> 14:22:55,797 SizeTieredCompactionStrategy.java:85 - Compaction buckets are 
> [[BigTableReader(path='/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/la-83-big-Data.db'),
>  
> BigTableReader(path='/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/la-81-big-Data.db'),
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10195) TWCS experiments and improvement proposals

2015-09-09 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738246#comment-14738246
 ] 

Marcus Eriksson commented on CASSANDRA-10195:
-

[~philipthompson] could you run the same tests on 
https://github.com/krummas/cassandra/commits/marcuse/9644 ? It puts a limit on 
the size of the windows and does STCS within the windows.

> TWCS experiments and improvement proposals
> --
>
> Key: CASSANDRA-10195
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10195
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Antti Nissinen
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 20150814_1027_compaction_hierarchy.txt, 
> node0_20150727_1250_time_graph.txt, node0_20150810_1017_time_graph.txt, 
> node0_20150812_1531_time_graph.txt, node0_20150813_0835_time_graph.txt, 
> node0_20150814_1054_time_graph.txt, node1_20150727_1250_time_graph.txt, 
> node1_20150810_1017_time_graph.txt, node1_20150812_1531_time_graph.txt, 
> node1_20150813_0835_time_graph.txt, node1_20150814_1054_time_graph.txt, 
> node2_20150727_1250_time_graph.txt, node2_20150810_1017_time_graph.txt, 
> node2_20150812_1531_time_graph.txt, node2_20150813_0835_time_graph.txt, 
> node2_20150814_1054_time_graph.txt, sstable_count_figure1.png, 
> sstable_count_figure2.png
>
>
> This JIRA item describes experiments with DateTieredCompactionStartegy (DTCS) 
> and TimeWindowCompactionStrategy (TWCS) and proposes modifications to the 
> TWCS. In a test system several crashes were caused intentionally (and 
> unintentionally) and repair operations were executed leading to flood of 
> small SSTables. Target was to be able compact those files are release disk 
> space reserved by duplicate data. Setup is following:
> - Three nodes
> - DateTieredCompactionStrategy, max_sstable_age_days = 5
> Cassandra 2.1.2
> The setup and data format has been documented in detailed here 
> https://issues.apache.org/jira/browse/CASSANDRA-9644.
> The test was started by dumping  few days worth of data to the database for 
> 100 000 signals. Time graphs of SStables from different nodes indicates that 
> the DTCS has been working as expected and SStables are nicely ordered in time 
> wise.
> See files:
> node0_20150727_1250_time_graph.txt
> node1_20150727_1250_time_graph.txt
> node2_20150727_1250_time_graph.txt
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  AddressLoad   Tokens  OwnsHost ID 
>   Rack
> UN  139.66.43.170  188.87 GB  256 ?   
> dfc29863-c935-4909-9d7f-c59a47eda03d  rack1
> UN  139.66.43.169  198.37 GB  256 ?   
> 12e7628b-7f05-48f6-b7e4-35a82010021a  rack1
> UN  139.66.43.168  191.88 GB  256 ?   
> 26088392-f803-4d59-9073-c75f857fb332  rack1
> All nodes crashed due to power failure (know beforehand) and repair 
> operations were started for each node one at the time. Below is the behavior 
> of SSTable count on different nodes. New data was dumped simultaneously with 
> repair operation.
> SEE FIGURE: sstable_count_figure1.png
> Vertical lines indicate following events.
> 1) Cluster was down due to power shutdown and was restarted. At the first 
> vertical line the repair operation (nodetool repair -pr) was started for the 
> first node
> 2) Repair for the second repair operation was started after the first node 
> was successfully repaired.
> 3) Repair for the third repair operation was started
> 4) Third repair operation was finished
> 5) One of the nodes crashed (unknown reason in OS level)
> 6) Repair operation (nodetool repair -pr) was started for the first node
> 7) Repair operation for the second node was started
> 8) Repair operation for the third node was started
> 9) Repair operations finished
> These repair operations are leading to huge amount of small SSTables covering 
> the whole time span of the data. The compaction horizon of DTCS was limited 
> to 5 days (max_sstable_age_days) due to the size of the SStables on the disc. 
> Therefore, small SStables won't be compacted. Below are the time graphs from 
> SSTables after the second round of repairs.
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  AddressLoad   Tokens  OwnsHost ID 
>   Rack
> UN  xx.xx.xx.170  663.61 GB  256 ?   
> dfc29863-c935-4909-9d7f-c59a47eda03d  rack1
> UN  xx.xx.xx.169  763.52 GB  256 ?   
> 12e7628b-7f05-48f6-b7e4-35a82010021a  rack1
> UN  xx.xx.xx.168  651.59 GB  256 ?   
> 26088392-f803-4d59-9073-c75f857fb332  rack1
> See files:
> node0_20150810_1017_time_graph.txt
> node1_20150810_1017_time_graph.txt
> node2_20150810_1017_time_graph.txt
> To get rid of the SStables the TimeWindowCompactionStrategy was taken into 
> use. Window si

[jira] [Updated] (CASSANDRA-10299) Issue with sstable selection when anti-compacting

2015-09-10 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-10299:

Reviewer: Marcus Eriksson

> Issue with sstable selection when anti-compacting
> -
>
> Key: CASSANDRA-10299
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10299
> Project: Cassandra
>  Issue Type: Bug
> Environment: 4 node Cassandra 2.1.9 cluster (256 vnodes)
>Reporter: Marcus Olsson
>Assignee: Marcus Olsson
> Attachments: sstable-selection-for-anticompaction-2.1.patch
>
>
> While running some tests with incremental repair we ran into some issues with 
> some data being repaired over and over again. The repairs where scheduled to 
> run every two hours on a different node. So e.g.
> {noformat}
> node1 would repair on hours 0,  8, 16
> node2 would repair on hours 2, 10, 18
> node3 would repair on hours 4, 12, 20
> node4 would repair on hours 6, 14, 22
> {noformat}
> The data being repaired over and over where in a table with static data, so 
> it should've only been required to run repair once for that table. This table 
> generated ~700 small sstables per repair, and when I checked one node had 
> several thousands of sstables in that table alone.
> The repair command used on each node was:
> {noformat}
> repair -inc -par
> {noformat}
> So after stopping all clients and waiting for compactions to finish I ran 
> sstablemetadata on the tables and saw that one table wasn't repaired. After 
> checking in the logs I something like this:
> {noformat}
> SSTable ..-ka-X-Data.db (..) will be anticompacted on range (..)
> ...
> SSTable ..-ka-X-Data.db (..) does not intersect repaired range (..), not 
> touching repairedAt.
> {noformat}
> So I checked the code and there seems to be an issue when one of the repaired 
> ranges does not intersect the sstable range. In that case it just removes the 
> sstable from the anticompaction regardless if any other repaired range 
> intersects with it.
> Attaching patch for 2.1 that solves this and working on dtest for this. Will 
> create patch for 2.2 and 3.0 as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10299) Issue with sstable selection when anti-compacting

2015-09-10 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738509#comment-14738509
 ] 

Marcus Eriksson commented on CASSANDRA-10299:
-

nice catch, looks good to me, just letting CI run before committing

> Issue with sstable selection when anti-compacting
> -
>
> Key: CASSANDRA-10299
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10299
> Project: Cassandra
>  Issue Type: Bug
> Environment: 4 node Cassandra 2.1.9 cluster (256 vnodes)
>Reporter: Marcus Olsson
>Assignee: Marcus Olsson
> Attachments: sstable-selection-for-anticompaction-2.1.patch
>
>
> While running some tests with incremental repair we ran into some issues with 
> some data being repaired over and over again. The repairs where scheduled to 
> run every two hours on a different node. So e.g.
> {noformat}
> node1 would repair on hours 0,  8, 16
> node2 would repair on hours 2, 10, 18
> node3 would repair on hours 4, 12, 20
> node4 would repair on hours 6, 14, 22
> {noformat}
> The data being repaired over and over where in a table with static data, so 
> it should've only been required to run repair once for that table. This table 
> generated ~700 small sstables per repair, and when I checked one node had 
> several thousands of sstables in that table alone.
> The repair command used on each node was:
> {noformat}
> repair -inc -par
> {noformat}
> So after stopping all clients and waiting for compactions to finish I ran 
> sstablemetadata on the tables and saw that one table wasn't repaired. After 
> checking in the logs I something like this:
> {noformat}
> SSTable ..-ka-X-Data.db (..) will be anticompacted on range (..)
> ...
> SSTable ..-ka-X-Data.db (..) does not intersect repaired range (..), not 
> touching repairedAt.
> {noformat}
> So I checked the code and there seems to be an issue when one of the repaired 
> ranges does not intersect the sstable range. In that case it just removes the 
> sstable from the anticompaction regardless if any other repaired range 
> intersects with it.
> Attaching patch for 2.1 that solves this and working on dtest for this. Will 
> create patch for 2.2 and 3.0 as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-10299) Issue with sstable selection when anti-compacting

2015-09-10 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson resolved CASSANDRA-10299.
-
Resolution: Fixed

committed, thanks!

> Issue with sstable selection when anti-compacting
> -
>
> Key: CASSANDRA-10299
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10299
> Project: Cassandra
>  Issue Type: Bug
> Environment: 4 node Cassandra 2.1.9 cluster (256 vnodes)
>Reporter: Marcus Olsson
>Assignee: Marcus Olsson
> Attachments: CASSANDRA-10299-2.2.patch, CASSANDRA-10299-3.0.patch, 
> sstable-selection-for-anticompaction-2.1.patch
>
>
> While running some tests with incremental repair we ran into some issues with 
> some data being repaired over and over again. The repairs where scheduled to 
> run every two hours on a different node. So e.g.
> {noformat}
> node1 would repair on hours 0,  8, 16
> node2 would repair on hours 2, 10, 18
> node3 would repair on hours 4, 12, 20
> node4 would repair on hours 6, 14, 22
> {noformat}
> The data being repaired over and over where in a table with static data, so 
> it should've only been required to run repair once for that table. This table 
> generated ~700 small sstables per repair, and when I checked one node had 
> several thousands of sstables in that table alone.
> The repair command used on each node was:
> {noformat}
> repair -inc -par
> {noformat}
> So after stopping all clients and waiting for compactions to finish I ran 
> sstablemetadata on the tables and saw that one table wasn't repaired. After 
> checking in the logs I something like this:
> {noformat}
> SSTable ..-ka-X-Data.db (..) will be anticompacted on range (..)
> ...
> SSTable ..-ka-X-Data.db (..) does not intersect repaired range (..), not 
> touching repairedAt.
> {noformat}
> So I checked the code and there seems to be an issue when one of the repaired 
> ranges does not intersect the sstable range. In that case it just removes the 
> sstable from the anticompaction regardless if any other repaired range 
> intersects with it.
> Attaching patch for 2.1 that solves this and working on dtest for this. Will 
> create patch for 2.2 and 3.0 as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10299) Issue with sstable selection when anti-compacting

2015-09-10 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-10299:

Fix Version/s: 2.2.2
   2.1.10
   3.0.0 rc1

> Issue with sstable selection when anti-compacting
> -
>
> Key: CASSANDRA-10299
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10299
> Project: Cassandra
>  Issue Type: Bug
> Environment: 4 node Cassandra 2.1.9 cluster (256 vnodes)
>Reporter: Marcus Olsson
>Assignee: Marcus Olsson
> Fix For: 3.0.0 rc1, 2.1.10, 2.2.2
>
> Attachments: CASSANDRA-10299-2.2.patch, CASSANDRA-10299-3.0.patch, 
> sstable-selection-for-anticompaction-2.1.patch
>
>
> While running some tests with incremental repair we ran into some issues with 
> some data being repaired over and over again. The repairs where scheduled to 
> run every two hours on a different node. So e.g.
> {noformat}
> node1 would repair on hours 0,  8, 16
> node2 would repair on hours 2, 10, 18
> node3 would repair on hours 4, 12, 20
> node4 would repair on hours 6, 14, 22
> {noformat}
> The data being repaired over and over where in a table with static data, so 
> it should've only been required to run repair once for that table. This table 
> generated ~700 small sstables per repair, and when I checked one node had 
> several thousands of sstables in that table alone.
> The repair command used on each node was:
> {noformat}
> repair -inc -par
> {noformat}
> So after stopping all clients and waiting for compactions to finish I ran 
> sstablemetadata on the tables and saw that one table wasn't repaired. After 
> checking in the logs I something like this:
> {noformat}
> SSTable ..-ka-X-Data.db (..) will be anticompacted on range (..)
> ...
> SSTable ..-ka-X-Data.db (..) does not intersect repaired range (..), not 
> touching repairedAt.
> {noformat}
> So I checked the code and there seems to be an issue when one of the repaired 
> ranges does not intersect the sstable range. In that case it just removes the 
> sstable from the anticompaction regardless if any other repaired range 
> intersects with it.
> Attaching patch for 2.1 that solves this and working on dtest for this. Will 
> create patch for 2.2 and 3.0 as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10198) 3.0 hints should be streamed on decomission

2015-09-10 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738772#comment-14738772
 ] 

Marcus Eriksson commented on CASSANDRA-10198:
-

and dtest is [fixed | 
https://github.com/krummas/cassandra-dtest/commits/marcuse/10198]

> 3.0 hints should be streamed on decomission
> ---
>
> Key: CASSANDRA-10198
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10198
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Aleksey Yeschenko
>Assignee: Marcus Eriksson
> Fix For: 3.0.0 rc1
>
>
> CASSANDRA-6230 added all the necessary pieces in the initial release, but 
> streaming itself didn't make it in time.
> Now that hints are stored in flat files, we cannot just stream hints 
> sstables. Instead we need to handoff hints files.
> Essentially we need to rewrite {{StorageService::streamHints}} to be 
> CASSANDRA-6230 aware.
> {{HintMessage}} and {{HintVerbHandler}} can already handle hints targeted for 
> other nodes (see javadoc for both, it's documented reasonably).
> {{HintsDispatcher}} also takes hostId as an argument, and can stream any 
> hints to any nodes.
> The building blocks are all there - we just need 
> {{StorageService::streamHints}} to pick the optimal candidate for each file 
> and use {{HintsDispatcher}} to stream the files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10195) TWCS experiments and improvement proposals

2015-09-10 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740276#comment-14740276
 ] 

Marcus Eriksson commented on CASSANDRA-10195:
-

[~philipthompson] defaults should be good

> TWCS experiments and improvement proposals
> --
>
> Key: CASSANDRA-10195
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10195
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Antti Nissinen
> Fix For: 2.1.x, 2.2.x
>
> Attachments: 20150814_1027_compaction_hierarchy.txt, 
> node0_20150727_1250_time_graph.txt, node0_20150810_1017_time_graph.txt, 
> node0_20150812_1531_time_graph.txt, node0_20150813_0835_time_graph.txt, 
> node0_20150814_1054_time_graph.txt, node1_20150727_1250_time_graph.txt, 
> node1_20150810_1017_time_graph.txt, node1_20150812_1531_time_graph.txt, 
> node1_20150813_0835_time_graph.txt, node1_20150814_1054_time_graph.txt, 
> node2_20150727_1250_time_graph.txt, node2_20150810_1017_time_graph.txt, 
> node2_20150812_1531_time_graph.txt, node2_20150813_0835_time_graph.txt, 
> node2_20150814_1054_time_graph.txt, sstable_count_figure1.png, 
> sstable_count_figure2.png
>
>
> This JIRA item describes experiments with DateTieredCompactionStartegy (DTCS) 
> and TimeWindowCompactionStrategy (TWCS) and proposes modifications to the 
> TWCS. In a test system several crashes were caused intentionally (and 
> unintentionally) and repair operations were executed leading to flood of 
> small SSTables. Target was to be able compact those files are release disk 
> space reserved by duplicate data. Setup is following:
> - Three nodes
> - DateTieredCompactionStrategy, max_sstable_age_days = 5
> Cassandra 2.1.2
> The setup and data format has been documented in detailed here 
> https://issues.apache.org/jira/browse/CASSANDRA-9644.
> The test was started by dumping  few days worth of data to the database for 
> 100 000 signals. Time graphs of SStables from different nodes indicates that 
> the DTCS has been working as expected and SStables are nicely ordered in time 
> wise.
> See files:
> node0_20150727_1250_time_graph.txt
> node1_20150727_1250_time_graph.txt
> node2_20150727_1250_time_graph.txt
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  AddressLoad   Tokens  OwnsHost ID 
>   Rack
> UN  139.66.43.170  188.87 GB  256 ?   
> dfc29863-c935-4909-9d7f-c59a47eda03d  rack1
> UN  139.66.43.169  198.37 GB  256 ?   
> 12e7628b-7f05-48f6-b7e4-35a82010021a  rack1
> UN  139.66.43.168  191.88 GB  256 ?   
> 26088392-f803-4d59-9073-c75f857fb332  rack1
> All nodes crashed due to power failure (know beforehand) and repair 
> operations were started for each node one at the time. Below is the behavior 
> of SSTable count on different nodes. New data was dumped simultaneously with 
> repair operation.
> SEE FIGURE: sstable_count_figure1.png
> Vertical lines indicate following events.
> 1) Cluster was down due to power shutdown and was restarted. At the first 
> vertical line the repair operation (nodetool repair -pr) was started for the 
> first node
> 2) Repair for the second repair operation was started after the first node 
> was successfully repaired.
> 3) Repair for the third repair operation was started
> 4) Third repair operation was finished
> 5) One of the nodes crashed (unknown reason in OS level)
> 6) Repair operation (nodetool repair -pr) was started for the first node
> 7) Repair operation for the second node was started
> 8) Repair operation for the third node was started
> 9) Repair operations finished
> These repair operations are leading to huge amount of small SSTables covering 
> the whole time span of the data. The compaction horizon of DTCS was limited 
> to 5 days (max_sstable_age_days) due to the size of the SStables on the disc. 
> Therefore, small SStables won't be compacted. Below are the time graphs from 
> SSTables after the second round of repairs.
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  AddressLoad   Tokens  OwnsHost ID 
>   Rack
> UN  xx.xx.xx.170  663.61 GB  256 ?   
> dfc29863-c935-4909-9d7f-c59a47eda03d  rack1
> UN  xx.xx.xx.169  763.52 GB  256 ?   
> 12e7628b-7f05-48f6-b7e4-35a82010021a  rack1
> UN  xx.xx.xx.168  651.59 GB  256 ?   
> 26088392-f803-4d59-9073-c75f857fb332  rack1
> See files:
> node0_20150810_1017_time_graph.txt
> node1_20150810_1017_time_graph.txt
> node2_20150810_1017_time_graph.txt
> To get rid of the SStables the TimeWindowCompactionStrategy was taken into 
> use. Window size was set to 5 days. Cassandra version was updated to 2.1.8. 
> Below figure shows the behavior of SStable count. TWCS was taken into use 
> 10.8.

[jira] [Comment Edited] (CASSANDRA-10172) Hint compaction isn't actually disabled

2015-09-12 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14742097#comment-14742097
 ] 

Marcus Eriksson edited comment on CASSANDRA-10172 at 9/12/15 3:36 PM:
--

Without knowing full context about this issue, we submit user defined 
compactions in HintedHandoffManager, so the {{'enabled' : 'false'}} compaction 
parameter is not relevant here


was (Author: krummas):
Without knowing full context, we submit user defined compactions in 
HintedHandoffManager, so the {{'enabled' : 'false'}} compaction parameter is 
not relevant here

> Hint compaction isn't actually disabled
> ---
>
> Key: CASSANDRA-10172
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10172
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Aleksey Yeschenko
> Fix For: 2.2.x
>
> Attachments: system (3).log
>
>
> 3 node cluster, 100M writes.
> Test Start: 00:00:00
> Node 1 Killed: 00:05:48
> Node 2 Killed: 00:13:33
> Node 1 Started: 00:24:20
> Node 2 Started: 00:32:23
> Test Done: 00:38:33
> Node 1 hints replay finished: 00:56:16
> Node 2 hints replay finished: 01:00:16
> Node 3 hints replay finished: 02:08:00
> Log attached.  Note that lots of compaction happens on system.hints before 
> handoff begins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   9   10   >