date:20150402


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393337#comment-14393337
 ] 

Roman Tkachenko commented on CASSANDRA-9045:


Okay, thanks for letting me know!

Nope, there were no compactions around that time. I ran repair on this node 
earlier this morning but the queries were performed some time after it was done.

 Deleted columns are resurrected after repair in wide rows
 -

 Key: CASSANDRA-9045
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9045
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Roman Tkachenko
Assignee: Marcus Eriksson
Priority: Critical
 Fix For: 2.0.15

 Attachments: cqlsh.txt, inconsistency.txt


 Hey guys,
 After almost a week of researching the issue and trying out multiple things 
 with (almost) no luck I was suggested (on the user@cass list) to file a 
 report here.
 h5. Setup
 Cassandra 2.0.13 (we had the issue with 2.0.10 as well and upgraded to see if 
 it goes away)
 Multi datacenter 12+6 nodes cluster.
 h5. Schema
 {code}
 cqlsh describe keyspace blackbook;
 CREATE KEYSPACE blackbook WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'IAD': '3',
   'ORD': '3'
 };
 USE blackbook;
 CREATE TABLE bounces (
   domainid text,
   address text,
   message text,
   timestamp bigint,
   PRIMARY KEY (domainid, address)
 ) WITH
   bloom_filter_fp_chance=0.10 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.10 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.00 AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'LeveledCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor'};
 {code}
 h5. Use case
 Each row (defined by a domainid) can have many many columns (bounce entries) 
 so rows can get pretty wide. In practice, most of the rows are not that big 
 but some of them contain hundreds of thousands and even millions of columns.
 Columns are not TTL'ed but can be deleted using the following CQL3 statement:
 {code}
 delete from bounces where domainid = 'domain.com' and address = 
 'al...@example.com';
 {code}
 All queries are performed using LOCAL_QUORUM CL.
 h5. Problem
 We weren't very diligent about running repairs on the cluster initially, but 
 shorty after we started doing it we noticed that some of previously deleted 
 columns (bounce entries) are there again, as if tombstones have disappeared.
 I have run this test multiple times via cqlsh, on the row of the customer who 
 originally reported the issue:
 * delete an entry
 * verify it's not returned even with CL=ALL
 * run repair on nodes that own this row's key
 * the columns reappear and are returned even with CL=ALL
 I tried the same test on another row with much less data and everything was 
 correctly deleted and didn't reappear after repair.
 h5. Other steps I've taken so far
 Made sure NTP is running on all servers and clocks are synchronized.
 Increased gc_grace_seconds to 100 days, ran full repair (on the affected 
 keyspace) on all nodes, then changed it back to the default 10 days again. 
 Didn't help.
 Performed one more test. Updated one of the resurrected columns, then deleted 
 it and ran repair again. This time the updated version of the column 
 reappeared.
 Finally, I noticed these log entries for the row in question:
 {code}
 INFO [ValidationExecutor:77] 2015-03-25 20:27:43,936 
 CompactionController.java (line 192) Compacting large row 
 blackbook/bounces:4ed558feba8a483733001d6a (279067683 bytes) incrementally
 {code}
 Figuring it may be related I bumped in_memory_compaction_limit_in_mb to 
 512MB so the row fits into it, deleted the entry and ran repair once again. 
 The log entry for this row was gone and the columns didn't reappear.
 We have a lot of rows much larger than 512MB so can't increase this 
 parameters forever, if that is the issue.
 Please let me know if you need more information on the case or if I can run 
 more experiments.
 Thanks!
 Roman



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-6559) cqlsh should warn about ALLOW FILTERING


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-6559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-6559:
---
Reviewer: Tyler Hobbs

 cqlsh should warn about ALLOW FILTERING
 ---

 Key: CASSANDRA-6559
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6559
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Tupshin Harper
Priority: Minor
  Labels: cqlsh
 Fix For: 2.0.15

 Attachments: CASSANDRA-6559.txt


 ALLOW FILTERING can be a convenience for preliminary exploration of your 
 data, and can be useful for batch jobs, but it is such an anti-pattern for 
 regular production queries, that cqlsh should provie an explicit warn 
 ingwhenever such a query is performed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8915) Improve MergeIterator performance

2015-04-02 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393517#comment-14393517
 ] 

Benedict commented on CASSANDRA-8915:
-

Suggestion: convert each Candidate into a linked-list of colliding iterators, 
so that consume() becomes guaranteed O(1). In the event of many equal items, 
this would incur only linear costs, and only on push down, rather than 
logarithmic costs on both push down and advance. This would particularly help 
the partition level (sstable/memtable) merge, as we are likely to encounter the 
same DecoratedKey many times.

 Improve MergeIterator performance
 -

 Key: CASSANDRA-8915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
 Project: Cassandra
  Issue Type: Improvement
Reporter: Branimir Lambov
Assignee: Branimir Lambov
Priority: Minor

 The implementation of {{MergeIterator}} uses a priority queue and applies a 
 pair of {{poll}}+{{add}} operations for every item in the resulting sequence. 
 This is quite inefficient as {{poll}} necessarily applies at least {{log N}} 
 comparisons (up to {{2log N}}), and {{add}} often requires another {{log N}}, 
 for example in the case where the inputs largely don't overlap (where {{N}} 
 is the number of iterators being merged).
 This can easily be replaced with a simple custom structure that can perform 
 replacement of the top of the queue in a single step, which will very often 
 complete after a couple of comparisons and in the worst case scenarios will 
 match the complexity of the current implementation.
 This should significantly improve merge performance for iterators with 
 limited overlap (e.g. levelled compaction).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-9107) More accurate row count estimates

2015-04-02 Thread Chris Lohfink (JIRA)

Chris Lohfink created CASSANDRA-9107:


 Summary: More accurate row count estimates
 Key: CASSANDRA-9107
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9107
 Project: Cassandra
  Issue Type: Improvement
Reporter: Chris Lohfink
 Attachments: 9107-cassandra2-1.patch

Currently the estimated row count from cfstats is the sum of the number of rows 
in all the sstables. This becomes very inaccurate with wide rows or heavily 
updated datasets since the same partition would exist in many sstables.  In 
example:

{code}
create KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': 1};

create TABLE wide (key text PRIMARY KEY , value text) WITH compaction = 
{'class': 'SizeTieredCompactionStrategy', 'min_threshold': 30, 
'max_threshold': 100} ;
---

insert INTO wide (key, value) VALUES ('key', 'value');
// flush
// cfstats output: Number of keys (estimate): 1  (128 in older version from 
index)

insert INTO wide (key, value) VALUES ('key', 'value');
// flush
// cfstats output: Number of keys (estimate): 2  (256 in older version from 
index)

... etc
{code}

previously it used the index but it still did it per sstable and summed them up 
which became inaccurate as there are more sstables (just by much worse). With 
new versions of sstables we can merge the cardinalities to resolve this with a 
slight hit to accuracy in the case of every sstable having completely unique 
partitions.

Furthermore I think it would be pretty minimal effort to include the number of 
rows in the memtables to this count. We wont have the cardinality merging 
between memtables and sstables but I would consider that a relatively minor 
negative.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (CASSANDRA-9107) More accurate row count estimates

2015-04-02 Thread Chris Lohfink (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Lohfink reassigned CASSANDRA-9107:


Assignee: Chris Lohfink

 More accurate row count estimates
 -

 Key: CASSANDRA-9107
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9107
 Project: Cassandra
  Issue Type: Improvement
Reporter: Chris Lohfink
Assignee: Chris Lohfink
 Attachments: 9107-cassandra2-1.patch


 Currently the estimated row count from cfstats is the sum of the number of 
 rows in all the sstables. This becomes very inaccurate with wide rows or 
 heavily updated datasets since the same partition would exist in many 
 sstables.  In example:
 {code}
 create KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 
 'replication_factor': 1};
 create TABLE wide (key text PRIMARY KEY , value text) WITH compaction = 
 {'class': 'SizeTieredCompactionStrategy', 'min_threshold': 30, 
 'max_threshold': 100} ;
 ---
 insert INTO wide (key, value) VALUES ('key', 'value');
 // flush
 // cfstats output: Number of keys (estimate): 1  (128 in older version from 
 index)
 insert INTO wide (key, value) VALUES ('key', 'value');
 // flush
 // cfstats output: Number of keys (estimate): 2  (256 in older version from 
 index)
 ... etc
 {code}
 previously it used the index but it still did it per sstable and summed them 
 up which became inaccurate as there are more sstables (just by much worse). 
 With new versions of sstables we can merge the cardinalities to resolve this 
 with a slight hit to accuracy in the case of every sstable having completely 
 unique partitions.
 Furthermore I think it would be pretty minimal effort to include the number 
 of rows in the memtables to this count. We wont have the cardinality merging 
 between memtables and sstables but I would consider that a relatively minor 
 negative.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (CASSANDRA-9045) Deleted columns are resurrected after repair in wide rows


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Hobbs updated CASSANDRA-9045:
---
Comment: was deleted

(was: [~r0mant] we're currently working on reproducing the issue.  Thanks for 
the additional info!  That's pretty odd.  I presume that there were no 
compactions for that table on 173.203.37.151 around the time of those queries?)

 Deleted columns are resurrected after repair in wide rows
 -

 Key: CASSANDRA-9045
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9045
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Roman Tkachenko
Assignee: Marcus Eriksson
Priority: Critical
 Fix For: 2.0.15

 Attachments: cqlsh.txt, inconsistency.txt


 Hey guys,
 After almost a week of researching the issue and trying out multiple things 
 with (almost) no luck I was suggested (on the user@cass list) to file a 
 report here.
 h5. Setup
 Cassandra 2.0.13 (we had the issue with 2.0.10 as well and upgraded to see if 
 it goes away)
 Multi datacenter 12+6 nodes cluster.
 h5. Schema
 {code}
 cqlsh describe keyspace blackbook;
 CREATE KEYSPACE blackbook WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'IAD': '3',
   'ORD': '3'
 };
 USE blackbook;
 CREATE TABLE bounces (
   domainid text,
   address text,
   message text,
   timestamp bigint,
   PRIMARY KEY (domainid, address)
 ) WITH
   bloom_filter_fp_chance=0.10 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.10 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.00 AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'LeveledCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor'};
 {code}
 h5. Use case
 Each row (defined by a domainid) can have many many columns (bounce entries) 
 so rows can get pretty wide. In practice, most of the rows are not that big 
 but some of them contain hundreds of thousands and even millions of columns.
 Columns are not TTL'ed but can be deleted using the following CQL3 statement:
 {code}
 delete from bounces where domainid = 'domain.com' and address = 
 'al...@example.com';
 {code}
 All queries are performed using LOCAL_QUORUM CL.
 h5. Problem
 We weren't very diligent about running repairs on the cluster initially, but 
 shorty after we started doing it we noticed that some of previously deleted 
 columns (bounce entries) are there again, as if tombstones have disappeared.
 I have run this test multiple times via cqlsh, on the row of the customer who 
 originally reported the issue:
 * delete an entry
 * verify it's not returned even with CL=ALL
 * run repair on nodes that own this row's key
 * the columns reappear and are returned even with CL=ALL
 I tried the same test on another row with much less data and everything was 
 correctly deleted and didn't reappear after repair.
 h5. Other steps I've taken so far
 Made sure NTP is running on all servers and clocks are synchronized.
 Increased gc_grace_seconds to 100 days, ran full repair (on the affected 
 keyspace) on all nodes, then changed it back to the default 10 days again. 
 Didn't help.
 Performed one more test. Updated one of the resurrected columns, then deleted 
 it and ran repair again. This time the updated version of the column 
 reappeared.
 Finally, I noticed these log entries for the row in question:
 {code}
 INFO [ValidationExecutor:77] 2015-03-25 20:27:43,936 
 CompactionController.java (line 192) Compacting large row 
 blackbook/bounces:4ed558feba8a483733001d6a (279067683 bytes) incrementally
 {code}
 Figuring it may be related I bumped in_memory_compaction_limit_in_mb to 
 512MB so the row fits into it, deleted the entry and ran repair once again. 
 The log entry for this row was gone and the columns didn't reappear.
 We have a lot of rows much larger than 512MB so can't increase this 
 parameters forever, if that is the issue.
 Please let me know if you need more information on the case or if I can run 
 more experiments.
 Thanks!
 Roman



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9045) Deleted columns are resurrected after repair in wide rows


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393277#comment-14393277
 ] 

Tyler Hobbs commented on CASSANDRA-9045:


[~r0mant] we're currently working on reproducing the issue.  Thanks for the 
additional info!  That's pretty odd.  I presume that there were no compactions 
for that table on 173.203.37.151 around the time of those queries?

 Deleted columns are resurrected after repair in wide rows
 -

 Key: CASSANDRA-9045
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9045
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Roman Tkachenko
Assignee: Marcus Eriksson
Priority: Critical
 Fix For: 2.0.15

 Attachments: cqlsh.txt, inconsistency.txt


 Hey guys,
 After almost a week of researching the issue and trying out multiple things 
 with (almost) no luck I was suggested (on the user@cass list) to file a 
 report here.
 h5. Setup
 Cassandra 2.0.13 (we had the issue with 2.0.10 as well and upgraded to see if 
 it goes away)
 Multi datacenter 12+6 nodes cluster.
 h5. Schema
 {code}
 cqlsh describe keyspace blackbook;
 CREATE KEYSPACE blackbook WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'IAD': '3',
   'ORD': '3'
 };
 USE blackbook;
 CREATE TABLE bounces (
   domainid text,
   address text,
   message text,
   timestamp bigint,
   PRIMARY KEY (domainid, address)
 ) WITH
   bloom_filter_fp_chance=0.10 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.10 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.00 AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'LeveledCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor'};
 {code}
 h5. Use case
 Each row (defined by a domainid) can have many many columns (bounce entries) 
 so rows can get pretty wide. In practice, most of the rows are not that big 
 but some of them contain hundreds of thousands and even millions of columns.
 Columns are not TTL'ed but can be deleted using the following CQL3 statement:
 {code}
 delete from bounces where domainid = 'domain.com' and address = 
 'al...@example.com';
 {code}
 All queries are performed using LOCAL_QUORUM CL.
 h5. Problem
 We weren't very diligent about running repairs on the cluster initially, but 
 shorty after we started doing it we noticed that some of previously deleted 
 columns (bounce entries) are there again, as if tombstones have disappeared.
 I have run this test multiple times via cqlsh, on the row of the customer who 
 originally reported the issue:
 * delete an entry
 * verify it's not returned even with CL=ALL
 * run repair on nodes that own this row's key
 * the columns reappear and are returned even with CL=ALL
 I tried the same test on another row with much less data and everything was 
 correctly deleted and didn't reappear after repair.
 h5. Other steps I've taken so far
 Made sure NTP is running on all servers and clocks are synchronized.
 Increased gc_grace_seconds to 100 days, ran full repair (on the affected 
 keyspace) on all nodes, then changed it back to the default 10 days again. 
 Didn't help.
 Performed one more test. Updated one of the resurrected columns, then deleted 
 it and ran repair again. This time the updated version of the column 
 reappeared.
 Finally, I noticed these log entries for the row in question:
 {code}
 INFO [ValidationExecutor:77] 2015-03-25 20:27:43,936 
 CompactionController.java (line 192) Compacting large row 
 blackbook/bounces:4ed558feba8a483733001d6a (279067683 bytes) incrementally
 {code}
 Figuring it may be related I bumped in_memory_compaction_limit_in_mb to 
 512MB so the row fits into it, deleted the entry and ran repair once again. 
 The log entry for this row was gone and the columns didn't reappear.
 We have a lot of rows much larger than 512MB so can't increase this 
 parameters forever, if that is the issue.
 Please let me know if you need more information on the case or if I can run 
 more experiments.
 Thanks!
 Roman



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9045) Deleted columns are resurrected after repair in wide rows


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393278#comment-14393278
 ] 

Tyler Hobbs commented on CASSANDRA-9045:


[~r0mant] we're currently working on reproducing the issue.  Thanks for the 
additional info!  That's pretty odd.  I presume that there were no compactions 
for that table on 173.203.37.151 around the time of those queries?

 Deleted columns are resurrected after repair in wide rows
 -

 Key: CASSANDRA-9045
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9045
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Roman Tkachenko
Assignee: Marcus Eriksson
Priority: Critical
 Fix For: 2.0.15

 Attachments: cqlsh.txt, inconsistency.txt


 Hey guys,
 After almost a week of researching the issue and trying out multiple things 
 with (almost) no luck I was suggested (on the user@cass list) to file a 
 report here.
 h5. Setup
 Cassandra 2.0.13 (we had the issue with 2.0.10 as well and upgraded to see if 
 it goes away)
 Multi datacenter 12+6 nodes cluster.
 h5. Schema
 {code}
 cqlsh describe keyspace blackbook;
 CREATE KEYSPACE blackbook WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'IAD': '3',
   'ORD': '3'
 };
 USE blackbook;
 CREATE TABLE bounces (
   domainid text,
   address text,
   message text,
   timestamp bigint,
   PRIMARY KEY (domainid, address)
 ) WITH
   bloom_filter_fp_chance=0.10 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.10 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.00 AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'LeveledCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor'};
 {code}
 h5. Use case
 Each row (defined by a domainid) can have many many columns (bounce entries) 
 so rows can get pretty wide. In practice, most of the rows are not that big 
 but some of them contain hundreds of thousands and even millions of columns.
 Columns are not TTL'ed but can be deleted using the following CQL3 statement:
 {code}
 delete from bounces where domainid = 'domain.com' and address = 
 'al...@example.com';
 {code}
 All queries are performed using LOCAL_QUORUM CL.
 h5. Problem
 We weren't very diligent about running repairs on the cluster initially, but 
 shorty after we started doing it we noticed that some of previously deleted 
 columns (bounce entries) are there again, as if tombstones have disappeared.
 I have run this test multiple times via cqlsh, on the row of the customer who 
 originally reported the issue:
 * delete an entry
 * verify it's not returned even with CL=ALL
 * run repair on nodes that own this row's key
 * the columns reappear and are returned even with CL=ALL
 I tried the same test on another row with much less data and everything was 
 correctly deleted and didn't reappear after repair.
 h5. Other steps I've taken so far
 Made sure NTP is running on all servers and clocks are synchronized.
 Increased gc_grace_seconds to 100 days, ran full repair (on the affected 
 keyspace) on all nodes, then changed it back to the default 10 days again. 
 Didn't help.
 Performed one more test. Updated one of the resurrected columns, then deleted 
 it and ran repair again. This time the updated version of the column 
 reappeared.
 Finally, I noticed these log entries for the row in question:
 {code}
 INFO [ValidationExecutor:77] 2015-03-25 20:27:43,936 
 CompactionController.java (line 192) Compacting large row 
 blackbook/bounces:4ed558feba8a483733001d6a (279067683 bytes) incrementally
 {code}
 Figuring it may be related I bumped in_memory_compaction_limit_in_mb to 
 512MB so the row fits into it, deleted the entry and ran repair once again. 
 The log entry for this row was gone and the columns didn't reappear.
 We have a lot of rows much larger than 512MB so can't increase this 
 parameters forever, if that is the issue.
 Please let me know if you need more information on the case or if I can run 
 more experiments.
 Thanks!
 Roman



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-04-02 Thread Aleksey Yeschenko (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393360#comment-14393360
 ] 

Aleksey Yeschenko commented on CASSANDRA-7688:
--

You probably just have schema left from running 2.1-head.

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.5

 Attachments: 7688.txt


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8238) NPE in SizeTieredCompactionStrategy.filterColdSSTables


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Hobbs updated CASSANDRA-8238:
---
Fix Version/s: 2.0.15

 NPE in SizeTieredCompactionStrategy.filterColdSSTables
 --

 Key: CASSANDRA-8238
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8238
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Tyler Hobbs
Assignee: Marcus Eriksson
 Fix For: 2.0.15, 2.1.5

 Attachments: 0001-assert-that-readMeter-is-not-null.patch, 
 0001-dont-always-set-client-mode-for-sstable-loader.patch


 {noformat}
 ERROR [CompactionExecutor:15] 2014-10-31 15:28:32,318 
 CassandraDaemon.java:153 - Exception in thread 
 Thread[CompactionExecutor:15,1,main]
 java.lang.NullPointerException: null
 at 
 org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy.filterColdSSTables(SizeTieredCompactionStrategy.java:181)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy.getNextBackgroundSSTables(SizeTieredCompactionStrategy.java:83)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy.getNextBackgroundTask(SizeTieredCompactionStrategy.java:267)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:226)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
 ~[na:1.7.0_72]
 at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_72]
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  ~[na:1.7.0_72]
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_72]
 at java.lang.Thread.run(Thread.java:745) [na:1.7.0_72]
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9045) Deleted columns are resurrected after repair in wide rows


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393552#comment-14393552
 ] 

Tyler Hobbs commented on CASSANDRA-9045:


[~r0mant] since we're having no luck reproducing the issue, would you be 
willing to deploy a patched version of 2.0.13 with additional tracing entries 
if we create a patch?

 Deleted columns are resurrected after repair in wide rows
 -

 Key: CASSANDRA-9045
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9045
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Roman Tkachenko
Assignee: Marcus Eriksson
Priority: Critical
 Fix For: 2.0.15

 Attachments: cqlsh.txt, inconsistency.txt


 Hey guys,
 After almost a week of researching the issue and trying out multiple things 
 with (almost) no luck I was suggested (on the user@cass list) to file a 
 report here.
 h5. Setup
 Cassandra 2.0.13 (we had the issue with 2.0.10 as well and upgraded to see if 
 it goes away)
 Multi datacenter 12+6 nodes cluster.
 h5. Schema
 {code}
 cqlsh describe keyspace blackbook;
 CREATE KEYSPACE blackbook WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'IAD': '3',
   'ORD': '3'
 };
 USE blackbook;
 CREATE TABLE bounces (
   domainid text,
   address text,
   message text,
   timestamp bigint,
   PRIMARY KEY (domainid, address)
 ) WITH
   bloom_filter_fp_chance=0.10 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.10 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.00 AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'LeveledCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor'};
 {code}
 h5. Use case
 Each row (defined by a domainid) can have many many columns (bounce entries) 
 so rows can get pretty wide. In practice, most of the rows are not that big 
 but some of them contain hundreds of thousands and even millions of columns.
 Columns are not TTL'ed but can be deleted using the following CQL3 statement:
 {code}
 delete from bounces where domainid = 'domain.com' and address = 
 'al...@example.com';
 {code}
 All queries are performed using LOCAL_QUORUM CL.
 h5. Problem
 We weren't very diligent about running repairs on the cluster initially, but 
 shorty after we started doing it we noticed that some of previously deleted 
 columns (bounce entries) are there again, as if tombstones have disappeared.
 I have run this test multiple times via cqlsh, on the row of the customer who 
 originally reported the issue:
 * delete an entry
 * verify it's not returned even with CL=ALL
 * run repair on nodes that own this row's key
 * the columns reappear and are returned even with CL=ALL
 I tried the same test on another row with much less data and everything was 
 correctly deleted and didn't reappear after repair.
 h5. Other steps I've taken so far
 Made sure NTP is running on all servers and clocks are synchronized.
 Increased gc_grace_seconds to 100 days, ran full repair (on the affected 
 keyspace) on all nodes, then changed it back to the default 10 days again. 
 Didn't help.
 Performed one more test. Updated one of the resurrected columns, then deleted 
 it and ran repair again. This time the updated version of the column 
 reappeared.
 Finally, I noticed these log entries for the row in question:
 {code}
 INFO [ValidationExecutor:77] 2015-03-25 20:27:43,936 
 CompactionController.java (line 192) Compacting large row 
 blackbook/bounces:4ed558feba8a483733001d6a (279067683 bytes) incrementally
 {code}
 Figuring it may be related I bumped in_memory_compaction_limit_in_mb to 
 512MB so the row fits into it, deleted the entry and ran repair once again. 
 The log entry for this row was gone and the columns didn't reappear.
 We have a lot of rows much larger than 512MB so can't increase this 
 parameters forever, if that is the issue.
 Please let me know if you need more information on the case or if I can run 
 more experiments.
 Thanks!
 Roman



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9045) Deleted columns are resurrected after repair in wide rows


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393568#comment-14393568
 ] 

Roman Tkachenko commented on CASSANDRA-9045:


Also, what is the digest mismatch that I'm getting in some tracing query 
logs? Can it be the reason of this weird behavior?

 Deleted columns are resurrected after repair in wide rows
 -

 Key: CASSANDRA-9045
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9045
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Roman Tkachenko
Assignee: Marcus Eriksson
Priority: Critical
 Fix For: 2.0.15

 Attachments: cqlsh.txt, inconsistency.txt


 Hey guys,
 After almost a week of researching the issue and trying out multiple things 
 with (almost) no luck I was suggested (on the user@cass list) to file a 
 report here.
 h5. Setup
 Cassandra 2.0.13 (we had the issue with 2.0.10 as well and upgraded to see if 
 it goes away)
 Multi datacenter 12+6 nodes cluster.
 h5. Schema
 {code}
 cqlsh describe keyspace blackbook;
 CREATE KEYSPACE blackbook WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'IAD': '3',
   'ORD': '3'
 };
 USE blackbook;
 CREATE TABLE bounces (
   domainid text,
   address text,
   message text,
   timestamp bigint,
   PRIMARY KEY (domainid, address)
 ) WITH
   bloom_filter_fp_chance=0.10 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.10 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.00 AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'LeveledCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor'};
 {code}
 h5. Use case
 Each row (defined by a domainid) can have many many columns (bounce entries) 
 so rows can get pretty wide. In practice, most of the rows are not that big 
 but some of them contain hundreds of thousands and even millions of columns.
 Columns are not TTL'ed but can be deleted using the following CQL3 statement:
 {code}
 delete from bounces where domainid = 'domain.com' and address = 
 'al...@example.com';
 {code}
 All queries are performed using LOCAL_QUORUM CL.
 h5. Problem
 We weren't very diligent about running repairs on the cluster initially, but 
 shorty after we started doing it we noticed that some of previously deleted 
 columns (bounce entries) are there again, as if tombstones have disappeared.
 I have run this test multiple times via cqlsh, on the row of the customer who 
 originally reported the issue:
 * delete an entry
 * verify it's not returned even with CL=ALL
 * run repair on nodes that own this row's key
 * the columns reappear and are returned even with CL=ALL
 I tried the same test on another row with much less data and everything was 
 correctly deleted and didn't reappear after repair.
 h5. Other steps I've taken so far
 Made sure NTP is running on all servers and clocks are synchronized.
 Increased gc_grace_seconds to 100 days, ran full repair (on the affected 
 keyspace) on all nodes, then changed it back to the default 10 days again. 
 Didn't help.
 Performed one more test. Updated one of the resurrected columns, then deleted 
 it and ran repair again. This time the updated version of the column 
 reappeared.
 Finally, I noticed these log entries for the row in question:
 {code}
 INFO [ValidationExecutor:77] 2015-03-25 20:27:43,936 
 CompactionController.java (line 192) Compacting large row 
 blackbook/bounces:4ed558feba8a483733001d6a (279067683 bytes) incrementally
 {code}
 Figuring it may be related I bumped in_memory_compaction_limit_in_mb to 
 512MB so the row fits into it, deleted the entry and ran repair once again. 
 The log entry for this row was gone and the columns didn't reappear.
 We have a lot of rows much larger than 512MB so can't increase this 
 parameters forever, if that is the issue.
 Please let me know if you need more information on the case or if I can run 
 more experiments.
 Thanks!
 Roman



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-9045) Deleted columns are resurrected after repair in wide rows


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Tkachenko updated CASSANDRA-9045:
---
Attachment: inconsistency.txt

 Deleted columns are resurrected after repair in wide rows
 -

 Key: CASSANDRA-9045
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9045
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Roman Tkachenko
Assignee: Marcus Eriksson
Priority: Critical
 Fix For: 2.0.15

 Attachments: cqlsh.txt, inconsistency.txt


 Hey guys,
 After almost a week of researching the issue and trying out multiple things 
 with (almost) no luck I was suggested (on the user@cass list) to file a 
 report here.
 h5. Setup
 Cassandra 2.0.13 (we had the issue with 2.0.10 as well and upgraded to see if 
 it goes away)
 Multi datacenter 12+6 nodes cluster.
 h5. Schema
 {code}
 cqlsh describe keyspace blackbook;
 CREATE KEYSPACE blackbook WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'IAD': '3',
   'ORD': '3'
 };
 USE blackbook;
 CREATE TABLE bounces (
   domainid text,
   address text,
   message text,
   timestamp bigint,
   PRIMARY KEY (domainid, address)
 ) WITH
   bloom_filter_fp_chance=0.10 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.10 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.00 AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'LeveledCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor'};
 {code}
 h5. Use case
 Each row (defined by a domainid) can have many many columns (bounce entries) 
 so rows can get pretty wide. In practice, most of the rows are not that big 
 but some of them contain hundreds of thousands and even millions of columns.
 Columns are not TTL'ed but can be deleted using the following CQL3 statement:
 {code}
 delete from bounces where domainid = 'domain.com' and address = 
 'al...@example.com';
 {code}
 All queries are performed using LOCAL_QUORUM CL.
 h5. Problem
 We weren't very diligent about running repairs on the cluster initially, but 
 shorty after we started doing it we noticed that some of previously deleted 
 columns (bounce entries) are there again, as if tombstones have disappeared.
 I have run this test multiple times via cqlsh, on the row of the customer who 
 originally reported the issue:
 * delete an entry
 * verify it's not returned even with CL=ALL
 * run repair on nodes that own this row's key
 * the columns reappear and are returned even with CL=ALL
 I tried the same test on another row with much less data and everything was 
 correctly deleted and didn't reappear after repair.
 h5. Other steps I've taken so far
 Made sure NTP is running on all servers and clocks are synchronized.
 Increased gc_grace_seconds to 100 days, ran full repair (on the affected 
 keyspace) on all nodes, then changed it back to the default 10 days again. 
 Didn't help.
 Performed one more test. Updated one of the resurrected columns, then deleted 
 it and ran repair again. This time the updated version of the column 
 reappeared.
 Finally, I noticed these log entries for the row in question:
 {code}
 INFO [ValidationExecutor:77] 2015-03-25 20:27:43,936 
 CompactionController.java (line 192) Compacting large row 
 blackbook/bounces:4ed558feba8a483733001d6a (279067683 bytes) incrementally
 {code}
 Figuring it may be related I bumped in_memory_compaction_limit_in_mb to 
 512MB so the row fits into it, deleted the entry and ran repair once again. 
 The log entry for this row was gone and the columns didn't reappear.
 We have a lot of rows much larger than 512MB so can't increase this 
 parameters forever, if that is the issue.
 Please let me know if you need more information on the case or if I can run 
 more experiments.
 Thanks!
 Roman



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-9105) JMX APIs appear untested

Ariel Weisberg created CASSANDRA-9105:
-

 Summary: JMX APIs appear untested
 Key: CASSANDRA-9105
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9105
 Project: Cassandra
  Issue Type: Test
Reporter: Ariel Weisberg


Anything supported via JMX is part of the public API of the database.

Node tool uses JMX but doesn't seem to have its own unit test and the dtest 
nodetool_test.py is pretty sparse.

For values returned by JMX for the purposes of reporting we should test as best 
we can that we are getting real values end to end. Occasionally metrics end 
up with no values, or values in the wrong units.

For commands going the other direction they should be exercised. There is 
probably a lot of coverage of commands since they may be used when testing the 
features those commands are a part of so no need for duplication there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-7557) User permissions for UDFs

2015-04-02 Thread Sam Tunnicliffe (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-7557:
---
Reviewer: Tyler Hobbs

 User permissions for UDFs
 -

 Key: CASSANDRA-7557
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7557
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core
Reporter: Tyler Hobbs
Assignee: Sam Tunnicliffe
  Labels: client-impacting, cql, udf
 Fix For: 3.0


 We probably want some new permissions for user defined functions.  Most 
 RDBMSes split function permissions roughly into {{EXECUTE}} and 
 {{CREATE}}/{{ALTER}}/{{DROP}} permissions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-9092) Nodes in DC2 die during and after huge write workload


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-9092:
---
Assignee: Sam Tunnicliffe

 Nodes in DC2 die during and after huge write workload
 -

 Key: CASSANDRA-9092
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9092
 Project: Cassandra
  Issue Type: Bug
 Environment: CentOS 6.2 64-bit, Cassandra 2.1.2, 
 java version 1.7.0_71
 Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
 Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode)
Reporter: Sergey Maznichenko
Assignee: Sam Tunnicliffe
 Fix For: 2.1.5

 Attachments: cassandra_crash1.txt


 Hello,
 We have Cassandra 2.1.2 with 8 nodes, 4 in DC1 and 4 in DC2.
 Node is VM 8 CPU, 32GB RAM
 During significant workload (loading several millions blobs ~3.5MB each), 1 
 node in DC2 stops and after some time next 2 nodes in DC2 also stops.
 Now, 2 of nodes in DC2 do not work and stops after 5-10 minutes after start. 
 I see many files in system.hints table and error appears in 2-3 minutes after 
 starting system.hints auto compaction.
 Stops, means ERROR [CompactionExecutor:1] 2015-04-01 23:33:44,456 
 CassandraDaemon.java:153 - Exception in thread 
 Thread[CompactionExecutor:1,1,main]
 java.lang.OutOfMemoryError: Java heap space
 ERROR [HintedHandoff:1] 2015-04-01 23:33:44,456 CassandraDaemon.java:153 - 
 Exception in thread Thread[HintedHandoff:1,1,main]
 java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
 java.lang.OutOfMemoryError: Java heap space
 Full errors listing attached in cassandra_crash1.txt
 The problem exists only in DC2. We have 1GbE between DC1 and DC2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (CASSANDRA-8991) CQL3 DropIndexStatement should expose getColumnFamily like the CQL2 version does.

2015-04-02 Thread Ulises Cervino Beresi (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ulises Cervino Beresi updated CASSANDRA-8991:
-
Comment: was deleted

(was: Can this patch be backported to 2.0 please?)

 CQL3 DropIndexStatement should expose getColumnFamily like the CQL2 version 
 does.
 -

 Key: CASSANDRA-8991
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8991
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ulises Cervino Beresi
Assignee: Ulises Cervino Beresi
Priority: Minor
 Fix For: 2.0.15, 2.1.5

 Attachments: CASSANDRA-2.0.13-8991.txt


 CQL3 DropIndexStatement should expose getColumnFamily like the CQL2 version 
 does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[Cassandra Wiki] Update of ThirdPartySupport by AlekseyYeschenko

2015-04-02 Thread Apache Wiki

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The ThirdPartySupport page has been changed by AlekseyYeschenko:
https://wiki.apache.org/cassandra/ThirdPartySupport?action=diffrev1=44rev2=45

  
  == Companies that employ Apache Cassandra Committers: ==
  
- 
{{http://www.datastax.com/wp-content/themes/datastax-2014-08/images/common/logo.png}}
 [[http://datastax.com|Datastax]], the commercial leader in Apache Cassandra™ 
offers products and services that make it easy for customers to build, deploy 
and operate elastically scalable and cloud-optimized applications and data 
services. [[http://datastax.com|DataStax]] has over 100 customers, including 
leaders such as Netflix, Cisco, Rackspace, HP, Constant Contact and 
[[http://www.datastax.com/cassandrausers|more]], and spanning verticals 
including web, financial services, telecommunications, logistics and government.
+ {{ https://upload.wikimedia.org/wikipedia/en/d/d3/Datastax_Logo.png }} 
[[http://datastax.com|Datastax]], the commercial leader in Apache Cassandra™ 
offers products and services that make it easy for customers to build, deploy 
and operate elastically scalable and cloud-optimized applications and data 
services. [[http://datastax.com|DataStax]] has over 100 customers, including 
leaders such as Netflix, Cisco, Rackspace, HP, Constant Contact and 
[[http://www.datastax.com/cassandrausers|more]], and spanning verticals 
including web, financial services, telecommunications, logistics and government.
  
  == Other companies: ==

[jira] [Commented] (CASSANDRA-9045) Deleted columns are resurrected after repair in wide rows


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393596#comment-14393596
 ] 

Roman Tkachenko commented on CASSANDRA-9045:


We can try that. Would you be able to provide a binary that I could use as a 
drop-in replacement? Also, will I need to replace it on all nodes in the 
cluster?

 Deleted columns are resurrected after repair in wide rows
 -

 Key: CASSANDRA-9045
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9045
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Roman Tkachenko
Assignee: Marcus Eriksson
Priority: Critical
 Fix For: 2.0.15

 Attachments: cqlsh.txt, inconsistency.txt


 Hey guys,
 After almost a week of researching the issue and trying out multiple things 
 with (almost) no luck I was suggested (on the user@cass list) to file a 
 report here.
 h5. Setup
 Cassandra 2.0.13 (we had the issue with 2.0.10 as well and upgraded to see if 
 it goes away)
 Multi datacenter 12+6 nodes cluster.
 h5. Schema
 {code}
 cqlsh describe keyspace blackbook;
 CREATE KEYSPACE blackbook WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'IAD': '3',
   'ORD': '3'
 };
 USE blackbook;
 CREATE TABLE bounces (
   domainid text,
   address text,
   message text,
   timestamp bigint,
   PRIMARY KEY (domainid, address)
 ) WITH
   bloom_filter_fp_chance=0.10 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.10 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.00 AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'LeveledCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor'};
 {code}
 h5. Use case
 Each row (defined by a domainid) can have many many columns (bounce entries) 
 so rows can get pretty wide. In practice, most of the rows are not that big 
 but some of them contain hundreds of thousands and even millions of columns.
 Columns are not TTL'ed but can be deleted using the following CQL3 statement:
 {code}
 delete from bounces where domainid = 'domain.com' and address = 
 'al...@example.com';
 {code}
 All queries are performed using LOCAL_QUORUM CL.
 h5. Problem
 We weren't very diligent about running repairs on the cluster initially, but 
 shorty after we started doing it we noticed that some of previously deleted 
 columns (bounce entries) are there again, as if tombstones have disappeared.
 I have run this test multiple times via cqlsh, on the row of the customer who 
 originally reported the issue:
 * delete an entry
 * verify it's not returned even with CL=ALL
 * run repair on nodes that own this row's key
 * the columns reappear and are returned even with CL=ALL
 I tried the same test on another row with much less data and everything was 
 correctly deleted and didn't reappear after repair.
 h5. Other steps I've taken so far
 Made sure NTP is running on all servers and clocks are synchronized.
 Increased gc_grace_seconds to 100 days, ran full repair (on the affected 
 keyspace) on all nodes, then changed it back to the default 10 days again. 
 Didn't help.
 Performed one more test. Updated one of the resurrected columns, then deleted 
 it and ran repair again. This time the updated version of the column 
 reappeared.
 Finally, I noticed these log entries for the row in question:
 {code}
 INFO [ValidationExecutor:77] 2015-03-25 20:27:43,936 
 CompactionController.java (line 192) Compacting large row 
 blackbook/bounces:4ed558feba8a483733001d6a (279067683 bytes) incrementally
 {code}
 Figuring it may be related I bumped in_memory_compaction_limit_in_mb to 
 512MB so the row fits into it, deleted the entry and ran repair once again. 
 The log entry for this row was gone and the columns didn't reappear.
 We have a lot of rows much larger than 512MB so can't increase this 
 parameters forever, if that is the issue.
 Please let me know if you need more information on the case or if I can run 
 more experiments.
 Thanks!
 Roman



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[Cassandra Wiki] Update of ThirdPartySupport by AlekseyYeschenko

2015-04-02 Thread Apache Wiki

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The ThirdPartySupport page has been changed by AlekseyYeschenko:
https://wiki.apache.org/cassandra/ThirdPartySupport?action=diffrev1=43rev2=44

  
  == Companies that employ Apache Cassandra Committers: ==
  
- {{http://www.datastax.com/wp-content/themes/datastax-custom/images/logo.png}} 
[[http://datastax.com|Datastax]], the commercial leader in Apache Cassandra™ 
offers products and services that make it easy for customers to build, deploy 
and operate elastically scalable and cloud-optimized applications and data 
services. [[http://datastax.com|DataStax]] has over 100 customers, including 
leaders such as Netflix, Cisco, Rackspace, HP, Constant Contact and 
[[http://www.datastax.com/cassandrausers|more]], and spanning verticals 
including web, financial services, telecommunications, logistics and government.
+ 
{{http://www.datastax.com/wp-content/themes/datastax-2014-08/images/common/logo.png}}
 [[http://datastax.com|Datastax]], the commercial leader in Apache Cassandra™ 
offers products and services that make it easy for customers to build, deploy 
and operate elastically scalable and cloud-optimized applications and data 
services. [[http://datastax.com|DataStax]] has over 100 customers, including 
leaders such as Netflix, Cisco, Rackspace, HP, Constant Contact and 
[[http://www.datastax.com/cassandrausers|more]], and spanning verticals 
including web, financial services, telecommunications, logistics and government.
  
  == Other companies: ==
- 
- {{http://www.acunu.com/uploads/1/1/5/5/11559475/1335714080.png}} 
[[http://www.acunu.com|Acunu]] are world experts in Apache Cassandra and 
beyond. Some of the most challenging Cassandra deployments already rely on 
Acunu's technology, training and support. With a focus real time applications, 
Acunu makes it easy to build Cassandra based real-time Big Data solutions that 
derive instant answers from event streams and deliver fresh insight
  
  {{ http://www.urimagnation.com/wp-content/themes/v2.0/images/logo.jpg}}
  [[http://www.urimagnation.com | URimagination]] , a group of peers, highly 
qualified, dedicated and motivated professionals, and with huge passion for 
open source technologies and projects such Apache Hadoop Big Data Platform, 
OpenSuse, Casandra and more, offers services to enhance performance of your 
applications and databases at quite a low cost. Our emphasis is to provide 
remote database services and thereafter-remote database monitoring of your 
business program. We provide competent services in Oracle, PostgreSQL, NoSQL, 
Hbase, MongoDB, DB2, and many more for commercial and government institutions, 
check our services[[http://www.urimagnation.com/services | services]]

[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393367#comment-14393367
 ] 

Philip Thompson commented on CASSANDRA-7688:


I'm using ccm, so the data dirs are being created from scratch

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.5

 Attachments: 7688.txt


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-9106) disable secondary indexes by default

2015-04-02 Thread Jon Haddad (JIRA)

Jon Haddad created CASSANDRA-9106:
-

 Summary: disable secondary indexes by default
 Key: CASSANDRA-9106
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9106
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jon Haddad


This feature is misused constantly.  Can we disable it by default, and provide 
a yaml config to explicitly enable it?  Along with a massive warning about how 
they aren't there for performance, maybe with a link to documentation that 
explains why?  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-04-02 Thread JIRA


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393296#comment-14393296
 ] 

Piotr Kołaczkowski commented on CASSANDRA-7688:
---

Why is this ticket marked as fixed in 2.1.5, if I can see this working in just 
released 2.1.4?

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.5

 Attachments: 7688.txt


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-9104) Unit test failures, trunk + Windows

2015-04-02 Thread Joshua McKenzie (JIRA)

Joshua McKenzie created CASSANDRA-9104:
--

 Summary: Unit test failures, trunk + Windows
 Key: CASSANDRA-9104
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9104
 Project: Cassandra
  Issue Type: Test
Reporter: Joshua McKenzie
Assignee: Joshua McKenzie
 Fix For: 3.0


Variety of different test failures have cropped up over the past 2-3 weeks:

h6. org.apache.cassandra.cql3.UFTest FAILED (timeout)
h6. org.apache.cassandra.db.KeyCacheTest FAILED
{noformat}
   expected:4 but was:2
   junit.framework.AssertionFailedError: expected:4 but was:2
   at 
org.apache.cassandra.db.KeyCacheTest.assertKeyCacheSize(KeyCacheTest.java:221)
   at org.apache.cassandra.db.KeyCacheTest.testKeyCache(KeyCacheTest.java:181)
{noformat}

h6. RecoveryManagerTest:
{noformat}
   org.apache.cassandra.db.RecoveryManagerTest FAILED
   org.apache.cassandra.db.RecoveryManager2Test FAILED
   org.apache.cassandra.db.RecoveryManager3Test FAILED
   org.apache.cassandra.db.RecoveryManagerTruncateTest FAILED
   All are the following:
  java.nio.file.AccessDeniedException: 
build\test\cassandra\commitlog;0\CommitLog-5-1427995105229.log
  FSWriteError in 
build\test\cassandra\commitlog;0\CommitLog-5-1427995105229.log
 at 
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:128)
 at 
org.apache.cassandra.db.commitlog.CommitLogSegmentManager.recycleSegment(CommitLogSegmentManager.java:360)
 at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:156)
 at 
org.apache.cassandra.db.RecoveryManagerTest.testNothingToRecover(RecoveryManagerTest.java:75)
  Caused by: java.nio.file.AccessDeniedException: 
build\test\cassandra\commitlog;0\CommitLog-5-1427995105229.log
 at 
sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:83)
 at 
sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
 at 
sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102)
 at 
sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:269)
 at 
sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
 at java.nio.file.Files.delete(Files.java:1079)
 at 
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:124)
{noformat}

h6. testScrubCorruptedCounterRow(org.apache.cassandra.db.ScrubTest):  FAILED
{noformat}
Expecting new size of 1, got 2 while replacing 
[BigTableReader(path='C:\src\refCassandra\build\test\cassandra\data;0\Keyspace1\Counter1-deab62b2d95c11e489c6e117fe147c1d\la-1-big-Data.db')]
 by 
[BigTableReader(path='C:\src\refCassandra\build\test\cassandra\data;0\Keyspace1\Counter1-deab62b2d95c11e489c6e117fe147c1d\la-1-big-Data.db')]
 in View(pending_count=0, 
sstables=[BigTableReader(path='C:\src\refCassandra\build\test\cassandra\data;0\Keyspace1\Counter1-deab62b2d95c11e489c6e117fe147c1d\la-3-big-Data.db')],
 compacting=[])
junit.framework.AssertionFailedError: Expecting new size of 1, got 2 while 
replacing 
[BigTableReader(path='C:\src\refCassandra\build\test\cassandra\data;0\Keyspace1\Counter1-deab62b2d95c11e489c6e117fe147c1d\la-1-big-Data.db')]
 by 
[BigTableReader(path='C:\src\refCassandra\build\test\cassandra\data;0\Keyspace1\Counter1-deab62b2d95c11e489c6e117fe147c1d\la-1-big-Data.db')]
 in View(pending_count=0, 
sstables=[BigTableReader(path='C:\src\refCassandra\build\test\cassandra\data;0\Keyspace1\Counter1-deab62b2d95c11e489c6e117fe147c1d\la-3-big-Data.db')],
 compacting=[])
   at org.apache.cassandra.db.DataTracker$View.replace(DataTracker.java:767)
   at org.apache.cassandra.db.DataTracker.replaceReaders(DataTracker.java:408)
   at 
org.apache.cassandra.db.DataTracker.replaceWithNewInstances(DataTracker.java:312)
   at 
org.apache.cassandra.io.sstable.SSTableRewriter.moveStarts(SSTableRewriter.java:341)
   at 
org.apache.cassandra.io.sstable.SSTableRewriter.abort(SSTableRewriter.java:202)
   at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:277)
   at 
org.apache.cassandra.db.ScrubTest.testScrubCorruptedCounterRow(ScrubTest.java:152)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9037) Terminal UDFs evaluated at prepare time throw protocol version error


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393255#comment-14393255
 ] 

Tyler Hobbs commented on CASSANDRA-9037:


I don't think we need to defer for collection args, only return types.  The 
get() method for collection Terminals (e.g. {{Lists.Value.get()}}) serializes 
the collection based on the requested protocol version (the highest version 
supported by the driver, in this case).  The function will also execute 
assuming that same protocol version, so it will have no problems deserializing 
the collection.  The reason that the return type specifically is problematic is 
that the function results may need further processing (such as another function 
call) or reserialization (if the protocol version is less than 3).  Basically, 
the serialization format for the results has to match the protocol version in 
use.

Also, don't forget to handle tuple and UDT return types specially.  If they 
contain collections, those collections will be serialized with the format for 
whatever protocol version we execute the function with.

 Terminal UDFs evaluated at prepare time throw protocol version error
 

 Key: CASSANDRA-9037
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9037
 Project: Cassandra
  Issue Type: Bug
Reporter: Sam Tunnicliffe
Assignee: Sam Tunnicliffe
 Fix For: 3.0


 When a pure function with only terminal arguments (or with no arguments) is 
 used in a where clause, it's executed at prepare time and 
 {{Server.CURRENT_VERSION}} passed as the protocol version for serialization 
 purposes. For native functions, this isn't a problem, but UDFs use classes in 
 the bundled java-driver-core jar for (de)serialization of args and return 
 values. When {{Server.CURRENT_VERSION}} is greater than the highest version 
 supported by the bundled java driver the execution fails with the following 
 exception:
 {noformat}
 ERROR [SharedPool-Worker-1] 2015-03-24 18:10:59,391 QueryMessage.java:132 - 
 Unexpected error during query
 org.apache.cassandra.exceptions.FunctionExecutionException: execution of 
 'ks.overloaded[text]' failed: java.lang.IllegalArgumentException: No protocol 
 version matching integer version 4
 at 
 org.apache.cassandra.exceptions.FunctionExecutionException.create(FunctionExecutionException.java:35)
  ~[main/:na]
 at 
 org.apache.cassandra.cql3.udf.gen.Cksoverloaded_1.execute(Cksoverloaded_1.java)
  ~[na:na]
 at 
 org.apache.cassandra.cql3.functions.FunctionCall.executeInternal(FunctionCall.java:78)
  ~[main/:na]
 at 
 org.apache.cassandra.cql3.functions.FunctionCall.access$200(FunctionCall.java:34)
  ~[main/:na]
 at 
 org.apache.cassandra.cql3.functions.FunctionCall$Raw.execute(FunctionCall.java:176)
  ~[main/:na]
 at 
 org.apache.cassandra.cql3.functions.FunctionCall$Raw.prepare(FunctionCall.java:161)
  ~[main/:na]
 at 
 org.apache.cassandra.cql3.SingleColumnRelation.toTerm(SingleColumnRelation.java:108)
  ~[main/:na]
 at 
 org.apache.cassandra.cql3.SingleColumnRelation.newEQRestriction(SingleColumnRelation.java:143)
  ~[main/:na]
 at org.apache.cassandra.cql3.Relation.toRestriction(Relation.java:127) 
 ~[main/:na]
 at 
 org.apache.cassandra.cql3.restrictions.StatementRestrictions.init(StatementRestrictions.java:126)
  ~[main/:na]
 at 
 org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepareRestrictions(SelectStatement.java:787)
  ~[main/:na]
 at 
 org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:740)
  ~[main/:na]
 at 
 org.apache.cassandra.cql3.QueryProcessor.getStatement(QueryProcessor.java:488)
  ~[main/:na]
 at 
 org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:252) 
 ~[main/:na]
 at 
 org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:246) 
 ~[main/:na]
 at 
 org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:119)
  ~[main/:na]
 at 
 org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:475)
  [main/:na]
 at 
 org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:371)
  [main/:na]
 at 
 io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
  [netty-all-4.0.23.Final.jar:4.0.23.Final]
 at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
  [netty-all-4.0.23.Final.jar:4.0.23.Final]
 at 
 io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32)
  [netty-all-4.0.23.Final.jar:4.0.23.Final]
 at 
 io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324)

[jira] [Commented] (CASSANDRA-8991) CQL3 DropIndexStatement should expose getColumnFamily like the CQL2 version does.

2015-04-02 Thread Ulises Cervino Beresi (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393311#comment-14393311
 ] 

Ulises Cervino Beresi commented on CASSANDRA-8991:
--

Can this patch be backported to 2.0 please?

 CQL3 DropIndexStatement should expose getColumnFamily like the CQL2 version 
 does.
 -

 Key: CASSANDRA-8991
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8991
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ulises Cervino Beresi
Assignee: Ulises Cervino Beresi
Priority: Minor
 Fix For: 2.0.15, 2.1.5

 Attachments: CASSANDRA-2.0.13-8991.txt


 CQL3 DropIndexStatement should expose getColumnFamily like the CQL2 version 
 does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8991) CQL3 DropIndexStatement should expose getColumnFamily like the CQL2 version does.

2015-04-02 Thread Aleksey Yeschenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-8991:
-
Fix Version/s: 2.0.15

 CQL3 DropIndexStatement should expose getColumnFamily like the CQL2 version 
 does.
 -

 Key: CASSANDRA-8991
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8991
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Ulises Cervino Beresi
Assignee: Ulises Cervino Beresi
Priority: Minor
 Fix For: 2.0.15, 2.1.5

 Attachments: CASSANDRA-2.0.13-8991.txt


 CQL3 DropIndexStatement should expose getColumnFamily like the CQL2 version 
 does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table

2015-04-02 Thread Aleksey Yeschenko (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393375#comment-14393375
 ] 

Aleksey Yeschenko commented on CASSANDRA-7688:
--

Must be magic then.

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.5

 Attachments: 7688.txt


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-9107) More accurate row count estimates

2015-04-02 Thread Chris Lohfink (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Lohfink updated CASSANDRA-9107:
-
Attachment: 9107-cassandra2-1.patch

 More accurate row count estimates
 -

 Key: CASSANDRA-9107
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9107
 Project: Cassandra
  Issue Type: Improvement
Reporter: Chris Lohfink
Assignee: Chris Lohfink
 Attachments: 9107-cassandra2-1.patch


 Currently the estimated row count from cfstats is the sum of the number of 
 rows in all the sstables. This becomes very inaccurate with wide rows or 
 heavily updated datasets since the same partition would exist in many 
 sstables.  In example:
 {code}
 create KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 
 'replication_factor': 1};
 create TABLE wide (key text PRIMARY KEY , value text) WITH compaction = 
 {'class': 'SizeTieredCompactionStrategy', 'min_threshold': 30, 
 'max_threshold': 100} ;
 ---
 insert INTO wide (key, value) VALUES ('key', 'value');
 // flush
 // cfstats output: Number of keys (estimate): 1  (128 in older version from 
 index)
 insert INTO wide (key, value) VALUES ('key', 'value');
 // flush
 // cfstats output: Number of keys (estimate): 2  (256 in older version from 
 index)
 ... etc
 {code}
 previously it used the index but it still did it per sstable and summed them 
 up which became inaccurate as there are more sstables (just by much worse). 
 With new versions of sstables we can merge the cardinalities to resolve this 
 with a slight hit to accuracy in the case of every sstable having completely 
 unique partitions.
 Furthermore I think it would be pretty minimal effort to include the number 
 of rows in the memtables to this count. We wont have the cardinality merging 
 between memtables and sstables but I would consider that a relatively minor 
 negative.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9045) Deleted columns are resurrected after repair in wide rows


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393231#comment-14393231
 ] 

Roman Tkachenko commented on CASSANDRA-9045:


Also, check out the attached inconsistency.txt file. A request was issued 
twice with a difference of several seconds and the same server says Read 1 
live and 0 tombstoned cells on the first run and Read 0 live and 3 tombstoned 
cells on the second run. How could that happen? I'm also getting inconsistent 
results (for this record and some others) even with LOCAL_QUORUM, when using 
our production app: one query returns the record, the next one does not.

 Deleted columns are resurrected after repair in wide rows
 -

 Key: CASSANDRA-9045
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9045
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Roman Tkachenko
Assignee: Marcus Eriksson
Priority: Critical
 Fix For: 2.0.15

 Attachments: cqlsh.txt, inconsistency.txt


 Hey guys,
 After almost a week of researching the issue and trying out multiple things 
 with (almost) no luck I was suggested (on the user@cass list) to file a 
 report here.
 h5. Setup
 Cassandra 2.0.13 (we had the issue with 2.0.10 as well and upgraded to see if 
 it goes away)
 Multi datacenter 12+6 nodes cluster.
 h5. Schema
 {code}
 cqlsh describe keyspace blackbook;
 CREATE KEYSPACE blackbook WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'IAD': '3',
   'ORD': '3'
 };
 USE blackbook;
 CREATE TABLE bounces (
   domainid text,
   address text,
   message text,
   timestamp bigint,
   PRIMARY KEY (domainid, address)
 ) WITH
   bloom_filter_fp_chance=0.10 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.10 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.00 AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'LeveledCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor'};
 {code}
 h5. Use case
 Each row (defined by a domainid) can have many many columns (bounce entries) 
 so rows can get pretty wide. In practice, most of the rows are not that big 
 but some of them contain hundreds of thousands and even millions of columns.
 Columns are not TTL'ed but can be deleted using the following CQL3 statement:
 {code}
 delete from bounces where domainid = 'domain.com' and address = 
 'al...@example.com';
 {code}
 All queries are performed using LOCAL_QUORUM CL.
 h5. Problem
 We weren't very diligent about running repairs on the cluster initially, but 
 shorty after we started doing it we noticed that some of previously deleted 
 columns (bounce entries) are there again, as if tombstones have disappeared.
 I have run this test multiple times via cqlsh, on the row of the customer who 
 originally reported the issue:
 * delete an entry
 * verify it's not returned even with CL=ALL
 * run repair on nodes that own this row's key
 * the columns reappear and are returned even with CL=ALL
 I tried the same test on another row with much less data and everything was 
 correctly deleted and didn't reappear after repair.
 h5. Other steps I've taken so far
 Made sure NTP is running on all servers and clocks are synchronized.
 Increased gc_grace_seconds to 100 days, ran full repair (on the affected 
 keyspace) on all nodes, then changed it back to the default 10 days again. 
 Didn't help.
 Performed one more test. Updated one of the resurrected columns, then deleted 
 it and ran repair again. This time the updated version of the column 
 reappeared.
 Finally, I noticed these log entries for the row in question:
 {code}
 INFO [ValidationExecutor:77] 2015-03-25 20:27:43,936 
 CompactionController.java (line 192) Compacting large row 
 blackbook/bounces:4ed558feba8a483733001d6a (279067683 bytes) incrementally
 {code}
 Figuring it may be related I bumped in_memory_compaction_limit_in_mb to 
 512MB so the row fits into it, deleted the entry and ran repair once again. 
 The log entry for this row was gone and the columns didn't reappear.
 We have a lot of rows much larger than 512MB so can't increase this 
 parameters forever, if that is the issue.
 Please let me know if you need more information on the case or if I can run 
 more experiments.
 Thanks!
 Roman



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9105) JMX APIs appear untested


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393304#comment-14393304
 ] 

Tyler Hobbs commented on CASSANDRA-9105:


I recently added a JMX utility module to the dtests that should make this 
easier to cover: 
https://github.com/riptano/cassandra-dtest/blob/master/jmxutils.py

 JMX APIs appear untested
 

 Key: CASSANDRA-9105
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9105
 Project: Cassandra
  Issue Type: Test
Reporter: Ariel Weisberg

 Anything supported via JMX is part of the public API of the database.
 Node tool uses JMX but doesn't seem to have its own unit test and the dtest 
 nodetool_test.py is pretty sparse.
 For values returned by JMX for the purposes of reporting we should test as 
 best we can that we are getting real values end to end. Occasionally 
 metrics end up with no values, or values in the wrong units.
 For commands going the other direction they should be exercised. There is 
 probably a lot of coverage of commands since they may be used when testing 
 the features those commands are a part of so no need for duplication there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7688) Add data sizing to a system table


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393352#comment-14393352
 ] 

Philip Thompson commented on CASSANDRA-7688:


I see the system.size_estimates table in 2.1.4, but I don't see it being 
populated. Are you?

 Add data sizing to a system table
 -

 Key: CASSANDRA-7688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7688
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jeremiah Jordan
Assignee: Aleksey Yeschenko
 Fix For: 2.1.5

 Attachments: 7688.txt


 Currently you can't implement something similar to describe_splits_ex purely 
 from the a native protocol driver.  
 https://datastax-oss.atlassian.net/browse/JAVA-312 is open to expose easily 
 getting ownership information to a client in the java-driver.  But you still 
 need the data sizing part to get splits of a given size.  We should add the 
 sizing information to a system table so that native clients can get to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7017) allow per-partition LIMIT clause in cql

2015-04-02 Thread William Price (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393503#comment-14393503
 ] 

William Price commented on CASSANDRA-7017:
--

Going for symmetry with the existing {{CREATE TABLE ... WITH CLUSTERING ORDER 
BY ...}} syntax, what about:

{code}
SELECT * FROM scores (WITH)? CLUSTERING LIMIT 3
{code}

 allow per-partition LIMIT clause in cql
 ---

 Key: CASSANDRA-7017
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7017
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Halliday
Assignee: Dan Burkert
  Labels: cql
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7017.patch


 somewhat related to static columns (#6561) and slicing (#4851), it is 
 desirable to apply a LIMIT on a per-partition rather than per-query basis, 
 such as to retrieve the top (most recent, etc) N clustered values for each 
 partition key, e.g.
 -- for each league, keep a ranked list of users
 create table scores (league text, score int, player text, primary key(league, 
 score, player) );
 -- get the top 3 teams in each league:
 select * from scores staticlimit 3;
 this currently requires issuing one query per partition key, which is tedious 
 if all the key partition key values are known and impossible if they aren't.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9045) Deleted columns are resurrected after repair in wide rows


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393564#comment-14393564
 ] 

Roman Tkachenko commented on CASSANDRA-9045:


Yeah, I understand 1 vs 3. What I'm confused about is live vs tombstoned 
because there were no deletes for this record within this 20 seconds interval.

 Deleted columns are resurrected after repair in wide rows
 -

 Key: CASSANDRA-9045
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9045
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Roman Tkachenko
Assignee: Marcus Eriksson
Priority: Critical
 Fix For: 2.0.15

 Attachments: cqlsh.txt, inconsistency.txt


 Hey guys,
 After almost a week of researching the issue and trying out multiple things 
 with (almost) no luck I was suggested (on the user@cass list) to file a 
 report here.
 h5. Setup
 Cassandra 2.0.13 (we had the issue with 2.0.10 as well and upgraded to see if 
 it goes away)
 Multi datacenter 12+6 nodes cluster.
 h5. Schema
 {code}
 cqlsh describe keyspace blackbook;
 CREATE KEYSPACE blackbook WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'IAD': '3',
   'ORD': '3'
 };
 USE blackbook;
 CREATE TABLE bounces (
   domainid text,
   address text,
   message text,
   timestamp bigint,
   PRIMARY KEY (domainid, address)
 ) WITH
   bloom_filter_fp_chance=0.10 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.10 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.00 AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'LeveledCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor'};
 {code}
 h5. Use case
 Each row (defined by a domainid) can have many many columns (bounce entries) 
 so rows can get pretty wide. In practice, most of the rows are not that big 
 but some of them contain hundreds of thousands and even millions of columns.
 Columns are not TTL'ed but can be deleted using the following CQL3 statement:
 {code}
 delete from bounces where domainid = 'domain.com' and address = 
 'al...@example.com';
 {code}
 All queries are performed using LOCAL_QUORUM CL.
 h5. Problem
 We weren't very diligent about running repairs on the cluster initially, but 
 shorty after we started doing it we noticed that some of previously deleted 
 columns (bounce entries) are there again, as if tombstones have disappeared.
 I have run this test multiple times via cqlsh, on the row of the customer who 
 originally reported the issue:
 * delete an entry
 * verify it's not returned even with CL=ALL
 * run repair on nodes that own this row's key
 * the columns reappear and are returned even with CL=ALL
 I tried the same test on another row with much less data and everything was 
 correctly deleted and didn't reappear after repair.
 h5. Other steps I've taken so far
 Made sure NTP is running on all servers and clocks are synchronized.
 Increased gc_grace_seconds to 100 days, ran full repair (on the affected 
 keyspace) on all nodes, then changed it back to the default 10 days again. 
 Didn't help.
 Performed one more test. Updated one of the resurrected columns, then deleted 
 it and ran repair again. This time the updated version of the column 
 reappeared.
 Finally, I noticed these log entries for the row in question:
 {code}
 INFO [ValidationExecutor:77] 2015-03-25 20:27:43,936 
 CompactionController.java (line 192) Compacting large row 
 blackbook/bounces:4ed558feba8a483733001d6a (279067683 bytes) incrementally
 {code}
 Figuring it may be related I bumped in_memory_compaction_limit_in_mb to 
 512MB so the row fits into it, deleted the entry and ran repair once again. 
 The log entry for this row was gone and the columns didn't reappear.
 We have a lot of rows much larger than 512MB so can't increase this 
 parameters forever, if that is the issue.
 Please let me know if you need more information on the case or if I can run 
 more experiments.
 Thanks!
 Roman



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-9108) SSL appears to be untested

Ariel Weisberg created CASSANDRA-9108:
-

 Summary: SSL appears to be untested
 Key: CASSANDRA-9108
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9108
 Project: Cassandra
  Issue Type: Test
Reporter: Ariel Weisberg


Need to test that you can set up a cluster with SSL node - node and client - 
node. A dtest for this would be really useful to make sure it's always working 
at a basic level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-9109) Repair appears to have some of untested behaviors

[
https://issues.apache.org/jira/browse/CASSANDRA-9109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ariel Weisberg updated CASSANDRA-9109:
--
Description:
There is AntiCompactionTest and a few single process unit tests, but they
aren't very convincing. Looking at the docs to nodetool it looks like there are
a few different ways that repair could operate that aren't explored. dtest wise
there is repair_test and incremental_repair test which do give some useful
coverage, but don't do everything.

It's also the kind of thing you might like to see tested with some concurrent
load to catch interactions with everything else moving about, but a dtest may
not be the right place to do that.

was:
There is AntiCompactionTest and a few single process unit tests, but they
aren't very convincing. Looking at the docs to nodetool it looks like there are
a few different ways that repair could operate that aren't explored.

It's also the kind of thing you might like to see tested with some concurrent
load to catch interactions with everything else moving about, but a dtest may
not be the right place to do that.

Repair appears to have some of untested behaviors
-

Key: CASSANDRA-9109
URL: https://issues.apache.org/jira/browse/CASSANDRA-9109
Project: Cassandra
Issue Type: Test
Reporter: Ariel Weisberg

There is AntiCompactionTest and a few single process unit tests, but they
aren't very convincing. Looking at the docs to nodetool it looks like there
are a few different ways that repair could operate that aren't explored.
dtest wise there is repair_test and incremental_repair test which do give
some useful coverage, but don't do everything.
It's also the kind of thing you might like to see tested with some concurrent
load to catch interactions with everything else moving about, but a dtest may
not be the right place to do that.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-9110) Bounded/RingBuffer CQL Collections

2015-04-02 Thread Jim Plush (JIRA)

Jim Plush created CASSANDRA-9110:


 Summary: Bounded/RingBuffer CQL Collections
 Key: CASSANDRA-9110
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9110
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jim Plush
Priority: Minor


Feature Request:
I've had frequent use cases for bounded and RingBuffer based collections. 

For example: 
I want to store the first 100 times I've see this thing.
I want to store the last 100 times I've seen this thing.

Currently that means having to do application level READ/WRITE operations and 
we like to keep some of our high scale apps to write only where possible. 

While probably expensive for exactly N items an approximation should be good 
enough for most applications. Where N in our example could be 100 or 102, or 
even make that tunable on the type or table. 

For the RingBuffer example, consider I only want to store the last N login 
attempts for a user. Once N+1 comes in it issues a delete for the oldest one in 
the collection. 

A potential implementation idea, given the rowkey would live on a single node 
would be to have an LRU based counter cache (tunable in the yaml settings in 
MB) that keeps a current count of how many items are already in the collection 
for that rowkey. If  than bound, toss.


something akin to:
CREATE TABLE users (
  user_id text PRIMARY KEY,
  first_name text,
  first_logins settext, 100, oldest
  last_logins settext, 100, newest
);





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-9110) Bounded/RingBuffer CQL Collections

2015-04-02 Thread Jim Plush (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-9110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jim Plush updated CASSANDRA-9110:
-
Description:
Feature Request:
I've had frequent use cases for bounded and RingBuffer based collections.

For example:
I want to store the first 100 times I've see this thing.
I want to store the last 100 times I've seen this thing.

Currently that means having to do application level READ/WRITE operations and
we like to keep some of our high scale apps to write only where possible.

While probably expensive for exactly N items an approximation should be good
enough for most applications. Where N in our example could be 100 or 102, or
even make that tunable on the type or table.

For the RingBuffer example, consider I only want to store the last N login
attempts for a user. Once N+1 comes in it issues a delete for the oldest one in
the collection.

A potential implementation idea, given the rowkey would live on a single node
would be to have an LRU based counter cache (tunable in the yaml settings in
MB) that keeps a current count of how many items are already in the collection
for that rowkey. If than bound, toss. It could also be a compaction type
thing where it stores all the data then at compaction time it filters out the
data that's out of bounds as long as the CQL returns the right bounds.

something akin to:
CREATE TABLE users (
user_id text PRIMARY KEY,
first_name text,
first_logins settext, 100, oldest
last_logins settext, 100, newest
);

was:
Feature Request:
I've had frequent use cases for bounded and RingBuffer based collections.

For example:
I want to store the first 100 times I've see this thing.
I want to store the last 100 times I've seen this thing.

Currently that means having to do application level READ/WRITE operations and
we like to keep some of our high scale apps to write only where possible.

While probably expensive for exactly N items an approximation should be good
enough for most applications. Where N in our example could be 100 or 102, or
even make that tunable on the type or table.

For the RingBuffer example, consider I only want to store the last N login
attempts for a user. Once N+1 comes in it issues a delete for the oldest one in
the collection.

something akin to:
CREATE TABLE users (
user_id text PRIMARY KEY,
first_name text,
first_logins settext, 100, oldest
last_logins settext, 100, newest
);

Bounded/RingBuffer CQL Collections
--

Key: CASSANDRA-9110
URL: https://issues.apache.org/jira/browse/CASSANDRA-9110
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: Jim Plush
Priority: Minor

Feature Request:
I've had frequent use cases for bounded and RingBuffer based collections.
For example:
I want to store the first 100 times I've see this thing.
I want to store the last 100 times I've seen this thing.
Currently that means having to do application level READ/WRITE operations and
we like to keep some of our high scale apps to write only where possible.
While probably expensive for exactly N items an approximation should be good
enough for most applications. Where N in our example could be 100 or 102, or
even make that tunable on the type or table.
For the RingBuffer example, consider I only want to store the last N login
attempts for a user. Once N+1 comes in it issues a delete for the oldest one
in the collection.
A potential implementation idea, given the rowkey would live on a single node
would be to have an LRU based counter cache (tunable in the yaml settings in
MB) that keeps a current count of how many items are already in the
collection for that rowkey. If than bound, toss. It could also be a
compaction type thing where it stores all the data then at compaction time it
filters out the data that's out of bounds as long as the CQL returns the
right bounds.
something akin to:
CREATE TABLE users (
user_id text PRIMARY KEY,
first_name text,
first_logins settext, 100, oldest
last_logins settext, 100, newest
);

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-9045) Deleted columns are resurrected after repair in wide rows


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Tkachenko updated CASSANDRA-9045:
---
Attachment: debug.txt

 Deleted columns are resurrected after repair in wide rows
 -

 Key: CASSANDRA-9045
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9045
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Roman Tkachenko
Assignee: Marcus Eriksson
Priority: Critical
 Fix For: 2.0.15

 Attachments: 9045-debug-tracing.txt, 
 apache-cassandra-2.0.13-SNAPSHOT.jar, cqlsh.txt, debug.txt, inconsistency.txt


 Hey guys,
 After almost a week of researching the issue and trying out multiple things 
 with (almost) no luck I was suggested (on the user@cass list) to file a 
 report here.
 h5. Setup
 Cassandra 2.0.13 (we had the issue with 2.0.10 as well and upgraded to see if 
 it goes away)
 Multi datacenter 12+6 nodes cluster.
 h5. Schema
 {code}
 cqlsh describe keyspace blackbook;
 CREATE KEYSPACE blackbook WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'IAD': '3',
   'ORD': '3'
 };
 USE blackbook;
 CREATE TABLE bounces (
   domainid text,
   address text,
   message text,
   timestamp bigint,
   PRIMARY KEY (domainid, address)
 ) WITH
   bloom_filter_fp_chance=0.10 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.10 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.00 AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'LeveledCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor'};
 {code}
 h5. Use case
 Each row (defined by a domainid) can have many many columns (bounce entries) 
 so rows can get pretty wide. In practice, most of the rows are not that big 
 but some of them contain hundreds of thousands and even millions of columns.
 Columns are not TTL'ed but can be deleted using the following CQL3 statement:
 {code}
 delete from bounces where domainid = 'domain.com' and address = 
 'al...@example.com';
 {code}
 All queries are performed using LOCAL_QUORUM CL.
 h5. Problem
 We weren't very diligent about running repairs on the cluster initially, but 
 shorty after we started doing it we noticed that some of previously deleted 
 columns (bounce entries) are there again, as if tombstones have disappeared.
 I have run this test multiple times via cqlsh, on the row of the customer who 
 originally reported the issue:
 * delete an entry
 * verify it's not returned even with CL=ALL
 * run repair on nodes that own this row's key
 * the columns reappear and are returned even with CL=ALL
 I tried the same test on another row with much less data and everything was 
 correctly deleted and didn't reappear after repair.
 h5. Other steps I've taken so far
 Made sure NTP is running on all servers and clocks are synchronized.
 Increased gc_grace_seconds to 100 days, ran full repair (on the affected 
 keyspace) on all nodes, then changed it back to the default 10 days again. 
 Didn't help.
 Performed one more test. Updated one of the resurrected columns, then deleted 
 it and ran repair again. This time the updated version of the column 
 reappeared.
 Finally, I noticed these log entries for the row in question:
 {code}
 INFO [ValidationExecutor:77] 2015-03-25 20:27:43,936 
 CompactionController.java (line 192) Compacting large row 
 blackbook/bounces:4ed558feba8a483733001d6a (279067683 bytes) incrementally
 {code}
 Figuring it may be related I bumped in_memory_compaction_limit_in_mb to 
 512MB so the row fits into it, deleted the entry and ran repair once again. 
 The log entry for this row was gone and the columns didn't reappear.
 We have a lot of rows much larger than 512MB so can't increase this 
 parameters forever, if that is the issue.
 Please let me know if you need more information on the case or if I can run 
 more experiments.
 Thanks!
 Roman



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9097) Repeated incremental nodetool repair results in failed repairs due to running anticompaction

2015-04-02 Thread Yuki Morishita (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393989#comment-14393989
 ] 

Yuki Morishita commented on CASSANDRA-9097:
---

The problem is that the repair coordinator does not wait anticompaction to 
finish on other nodes.
We can change the behavior to wait until coordinator receives notification from 
other replica, but doing so can be a problem between the nodes in different 
minor version.

We definitely need to fix this in 3.0, though let me see what would be the 
right solution for 2.1.x.

 Repeated incremental nodetool repair results in failed repairs due to running 
 anticompaction
 

 Key: CASSANDRA-9097
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9097
 Project: Cassandra
  Issue Type: Bug
Reporter: Gustav Munkby
Assignee: Yuki Morishita
Priority: Minor
 Fix For: 2.1.5


 I'm trying to synchronize incremental repairs over multiple nodes in a 
 Cassandra cluster, and it does not seem to easily achievable.
 In principle, the process iterates through the nodes of the cluster and 
 performs `nodetool -h $NODE repair --incremental`, but that sometimes fails 
 on subsequent nodes. The reason for failing seems to be that the repair 
 returns as soon as the repair and the _local_ anticompaction has completed, 
 but does not guarantee that remote anticompactions are complete. If I 
 subsequently try to issue another repair command, they fail to start (and 
 terminate with failure after about one minute). It usually isn't a problem, 
 as the local anticompaction typically involves as much (or more) data as the 
 remote ones, but sometimes not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-9109) Repair appears to have a lot of untested behaviors

Ariel Weisberg created CASSANDRA-9109:
-

 Summary: Repair appears to have a lot of untested behaviors
 Key: CASSANDRA-9109
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9109
 Project: Cassandra
  Issue Type: Test
Reporter: Ariel Weisberg


There is AntiCompactionTest and a few single process unit tests, but they 
aren't very convincing. Looking at the docs to nodetool it looks like there are 
a few different ways that repair could operate that aren't explored.

It's also the kind of thing you might like to see tested with some concurrent 
load to catch interactions with everything else moving about, but a dtest may 
not be the right place to do that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-9110) Bounded/RingBuffer CQL Collections

2015-04-02 Thread Jim Plush (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-9110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jim Plush updated CASSANDRA-9110:
-
Description:
Feature Request:
I've had frequent use cases for bounded and RingBuffer based collections.

For example:
I want to store the first 100 times I've see this thing.
I want to store the last 100 times I've seen this thing.

Currently that means having to do application level READ/WRITE operations and
we like to keep some of our high scale apps to write only where possible.

While probably expensive for exactly N items an approximation should be good
enough for most applications. Where N in our example could be 100 or 102, or
even make that tunable on the type or table.

For the RingBuffer example, consider I only want to store the last N login
attempts for a user. Once N+1 comes in it issues a delete for the oldest one in
the collection, or waits until compaction to drop the overflow data as long as
the CQL returns the right bounds.

something akin to:
CREATE TABLE users (
user_id text PRIMARY KEY,
first_name text,
first_logins settext, 100, oldest
last_logins settext, 100, newest
);

was:
Feature Request:
I've had frequent use cases for bounded and RingBuffer based collections.

For example:
I want to store the first 100 times I've see this thing.
I want to store the last 100 times I've seen this thing.

Currently that means having to do application level READ/WRITE operations and
we like to keep some of our high scale apps to write only where possible.

While probably expensive for exactly N items an approximation should be good
enough for most applications. Where N in our example could be 100 or 102, or
even make that tunable on the type or table.

For the RingBuffer example, consider I only want to store the last N login
attempts for a user. Once N+1 comes in it issues a delete for the oldest one in
the collection.

something akin to:
CREATE TABLE users (
user_id text PRIMARY KEY,
first_name text,
first_logins settext, 100, oldest
last_logins settext, 100, newest
);

Bounded/RingBuffer CQL Collections
--

Key: CASSANDRA-9110
URL: https://issues.apache.org/jira/browse/CASSANDRA-9110
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: Jim Plush
Priority: Minor

Feature Request:
I've had frequent use cases for bounded and RingBuffer based collections.
For example:
I want to store the first 100 times I've see this thing.
I want to store the last 100 times I've seen this thing.
Currently that means having to do application level READ/WRITE operations and
we like to keep some of our high scale apps to write only where possible.
While probably expensive for exactly N items an approximation should be good
enough for most applications. Where N in our example could be 100 or 102, or
even make that tunable on the type or table.
For the RingBuffer example, consider I only want to store the last N login
attempts for a user. Once N+1 comes in it issues a delete for the oldest one
in the collection, or waits until compaction to drop the overflow data as
long as the CQL returns the right bounds.
A potential implementation idea, given the rowkey would live on a single node
would be to have an LRU based counter cache (tunable in the yaml settings in
MB) that keeps a current count of how many items are already in the
collection for that rowkey. If than bound, toss.
something akin to:
CREATE TABLE users (
user_id text PRIMARY KEY,
first_name text,
first_logins settext, 100, oldest
last_logins settext, 100, newest
);

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (CASSANDRA-9053) Convert dtests that use cassandra-cli to Thrift API


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Hobbs reassigned CASSANDRA-9053:
--

Assignee: Tyler Hobbs

 Convert dtests that use cassandra-cli to Thrift API
 ---

 Key: CASSANDRA-9053
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9053
 Project: Cassandra
  Issue Type: Test
  Components: Tests
Reporter: Tyler Hobbs
Assignee: Tyler Hobbs
 Fix For: 3.0


 The following dtests need to be changed to use the Thrift API directly 
 instead of going through cassandra-cli:
 * {{cql_tests.TestCQL.cql3_insert_thrift_test}}
 * {{cql_tests.TestCQL.rename_test}}
 * {{super_column_cache_test.TestSCCache.sc_with_row_cache_test}}
 * {{upgrade_supercolumns_test.TestSCUpgrade.upgrade_with_index_creation_test}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (CASSANDRA-9053) Convert dtests that use cassandra-cli to Thrift API


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Hobbs resolved CASSANDRA-9053.

Resolution: Fixed

All of the dtests that used the CLI now go through the Thrift API directly.

 Convert dtests that use cassandra-cli to Thrift API
 ---

 Key: CASSANDRA-9053
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9053
 Project: Cassandra
  Issue Type: Test
  Components: Tests
Reporter: Tyler Hobbs
Assignee: Tyler Hobbs
 Fix For: 3.0


 The following dtests need to be changed to use the Thrift API directly 
 instead of going through cassandra-cli:
 * {{cql_tests.TestCQL.cql3_insert_thrift_test}}
 * {{cql_tests.TestCQL.rename_test}}
 * {{super_column_cache_test.TestSCCache.sc_with_row_cache_test}}
 * {{upgrade_supercolumns_test.TestSCUpgrade.upgrade_with_index_creation_test}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-9053) Convert dtests that use cassandra-cli to Thrift API


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Hobbs updated CASSANDRA-9053:
---
Description: 
The following dtests need to be changed to use the Thrift API directly instead 
of going through cassandra-cli:
* {{cql_tests.TestCQL.cql3_insert_thrift_test}}
* {{cql_tests.TestCQL.rename_test}}
* {{super_column_cache_test.TestSCCache.sc_with_row_cache_test}}
* {{upgrade_supercolumns_test.TestSCUpgrade.upgrade_with_index_creation_test}}

  was:
The following dtests need to be changed to use the Thrift API directly instead 
of going through cassandra-cli:
* {{global_row_key_cache_test.TestGlobalRowKeyCache.functional_test}}
* {{cql_tests.TestCQL.cql3_insert_thrift_test}}
* {{cql_tests.TestCQL.rename_test}}
* {{super_column_cache_test.TestSCCache.sc_with_row_cache_test}}
* {{upgrade_supercolumns_test.TestSCUpgrade.upgrade_with_index_creation_test}}


 Convert dtests that use cassandra-cli to Thrift API
 ---

 Key: CASSANDRA-9053
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9053
 Project: Cassandra
  Issue Type: Test
  Components: Tests
Reporter: Tyler Hobbs
 Fix For: 3.0


 The following dtests need to be changed to use the Thrift API directly 
 instead of going through cassandra-cli:
 * {{cql_tests.TestCQL.cql3_insert_thrift_test}}
 * {{cql_tests.TestCQL.rename_test}}
 * {{super_column_cache_test.TestSCCache.sc_with_row_cache_test}}
 * {{upgrade_supercolumns_test.TestSCUpgrade.upgrade_with_index_creation_test}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9082) sstableloader error on trunk due to loading read meter


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393784#comment-14393784
 ] 

Tyler Hobbs commented on CASSANDRA-9082:


+1

 sstableloader error on trunk due to loading read meter
 --

 Key: CASSANDRA-9082
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9082
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Tyler Hobbs
Assignee: Benedict
 Fix For: 3.0, 2.1.5


 If you try to run sstableloader on trunk, you'll get an error like the 
 following:
 {noformat}
 Exception: sstableloader command 
 '/tmp/dtest-p5eSr3/test/node1/bin/sstableloader -d 127.0.0.1 
 /tmp/tmpzd5CCh/ks/cf' failed; exit status: 1'; stdout: Established connection 
 to initial hosts
 Opening sstables and calculating sections to stream
 ; stderr: null
 java.lang.AssertionError
 org.apache.cassandra.exceptions.ConfigurationException
   at 
 org.apache.cassandra.locator.AbstractReplicationStrategy.createInternal(AbstractReplicationStrategy.java:249)
   at 
 org.apache.cassandra.locator.AbstractReplicationStrategy.createReplicationStrategy(AbstractReplicationStrategy.java:264)
   at 
 org.apache.cassandra.db.Keyspace.createReplicationStrategy(Keyspace.java:279)
   at org.apache.cassandra.db.Keyspace.init(Keyspace.java:267)
   at org.apache.cassandra.db.Keyspace.open(Keyspace.java:115)
   at org.apache.cassandra.db.Keyspace.open(Keyspace.java:92)
   at 
 org.apache.cassandra.cql3.restrictions.StatementRestrictions.init(StatementRestrictions.java:128)
   at 
 org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepareRestrictions(SelectStatement.java:788)
   at 
 org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:741)
   at 
 org.apache.cassandra.cql3.QueryProcessor.getStatement(QueryProcessor.java:488)
   at 
 org.apache.cassandra.cql3.QueryProcessor.parseStatement(QueryProcessor.java:266)
   at 
 org.apache.cassandra.cql3.QueryProcessor.prepareInternal(QueryProcessor.java:300)
   at 
 org.apache.cassandra.cql3.QueryProcessor.executeInternal(QueryProcessor.java:308)
   at 
 org.apache.cassandra.db.SystemKeyspace.getSSTableReadMeter(SystemKeyspace.java:899)
   at 
 org.apache.cassandra.io.sstable.format.SSTableReader$GlobalTidy.init(SSTableReader.java:1973)
   at 
 org.apache.cassandra.io.sstable.format.SSTableReader$GlobalTidy.get(SSTableReader.java:2012)
   at 
 org.apache.cassandra.io.sstable.format.SSTableReader$DescriptorTypeTidy.init(SSTableReader.java:1890)
   at 
 org.apache.cassandra.io.sstable.format.SSTableReader$DescriptorTypeTidy.get(SSTableReader.java:1926)
   at 
 org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier.setup(SSTableReader.java:1809)
   at 
 org.apache.cassandra.io.sstable.format.SSTableReader.setup(SSTableReader.java:1754)
   at 
 org.apache.cassandra.io.sstable.format.SSTableReader.openForBatch(SSTableReader.java:398)
   at 
 org.apache.cassandra.io.sstable.SSTableLoader$1.accept(SSTableLoader.java:117)
   at java.io.File.list(File.java:1155)
   at 
 org.apache.cassandra.io.sstable.SSTableLoader.openSSTables(SSTableLoader.java:78)
   at 
 org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:162)
   at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:106)
 Caused by: java.lang.AssertionError
   at 
 org.apache.cassandra.locator.AbstractReplicationStrategy.init(AbstractReplicationStrategy.java:66)
   at 
 org.apache.cassandra.locator.LocalStrategy.init(LocalStrategy.java:36)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at 
 org.apache.cassandra.locator.AbstractReplicationStrategy.createInternal(AbstractReplicationStrategy.java:244)
   ... 25 more
 {noformat}
 At first glance, it looks like the SSTableReader is trying to load the read 
 meter even though it shouldn't (or doesn't need to).  Assigning to Benedict 
 since this seems to be most related to SSTableReader management.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9045) Deleted columns are resurrected after repair in wide rows


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393812#comment-14393812
 ] 

Tyler Hobbs commented on CASSANDRA-9045:


Thanks! What repair operation did you run in between the queries?

 Deleted columns are resurrected after repair in wide rows
 -

 Key: CASSANDRA-9045
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9045
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Roman Tkachenko
Assignee: Marcus Eriksson
Priority: Critical
 Fix For: 2.0.15

 Attachments: 9045-debug-tracing.txt, 
 apache-cassandra-2.0.13-SNAPSHOT.jar, cqlsh.txt, debug.txt, inconsistency.txt


 Hey guys,
 After almost a week of researching the issue and trying out multiple things 
 with (almost) no luck I was suggested (on the user@cass list) to file a 
 report here.
 h5. Setup
 Cassandra 2.0.13 (we had the issue with 2.0.10 as well and upgraded to see if 
 it goes away)
 Multi datacenter 12+6 nodes cluster.
 h5. Schema
 {code}
 cqlsh describe keyspace blackbook;
 CREATE KEYSPACE blackbook WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'IAD': '3',
   'ORD': '3'
 };
 USE blackbook;
 CREATE TABLE bounces (
   domainid text,
   address text,
   message text,
   timestamp bigint,
   PRIMARY KEY (domainid, address)
 ) WITH
   bloom_filter_fp_chance=0.10 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.10 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.00 AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'LeveledCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor'};
 {code}
 h5. Use case
 Each row (defined by a domainid) can have many many columns (bounce entries) 
 so rows can get pretty wide. In practice, most of the rows are not that big 
 but some of them contain hundreds of thousands and even millions of columns.
 Columns are not TTL'ed but can be deleted using the following CQL3 statement:
 {code}
 delete from bounces where domainid = 'domain.com' and address = 
 'al...@example.com';
 {code}
 All queries are performed using LOCAL_QUORUM CL.
 h5. Problem
 We weren't very diligent about running repairs on the cluster initially, but 
 shorty after we started doing it we noticed that some of previously deleted 
 columns (bounce entries) are there again, as if tombstones have disappeared.
 I have run this test multiple times via cqlsh, on the row of the customer who 
 originally reported the issue:
 * delete an entry
 * verify it's not returned even with CL=ALL
 * run repair on nodes that own this row's key
 * the columns reappear and are returned even with CL=ALL
 I tried the same test on another row with much less data and everything was 
 correctly deleted and didn't reappear after repair.
 h5. Other steps I've taken so far
 Made sure NTP is running on all servers and clocks are synchronized.
 Increased gc_grace_seconds to 100 days, ran full repair (on the affected 
 keyspace) on all nodes, then changed it back to the default 10 days again. 
 Didn't help.
 Performed one more test. Updated one of the resurrected columns, then deleted 
 it and ran repair again. This time the updated version of the column 
 reappeared.
 Finally, I noticed these log entries for the row in question:
 {code}
 INFO [ValidationExecutor:77] 2015-03-25 20:27:43,936 
 CompactionController.java (line 192) Compacting large row 
 blackbook/bounces:4ed558feba8a483733001d6a (279067683 bytes) incrementally
 {code}
 Figuring it may be related I bumped in_memory_compaction_limit_in_mb to 
 512MB so the row fits into it, deleted the entry and ran repair once again. 
 The log entry for this row was gone and the columns didn't reappear.
 We have a lot of rows much larger than 512MB so can't increase this 
 parameters forever, if that is the issue.
 Please let me know if you need more information on the case or if I can run 
 more experiments.
 Thanks!
 Roman



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9045) Deleted columns are resurrected after repair in wide rows


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393824#comment-14393824
 ] 

Roman Tkachenko commented on CASSANDRA-9045:


I ran nodetool repair -pr blackbook bounces on a couple of nodes.

 Deleted columns are resurrected after repair in wide rows
 -

 Key: CASSANDRA-9045
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9045
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Roman Tkachenko
Assignee: Marcus Eriksson
Priority: Critical
 Fix For: 2.0.15

 Attachments: 9045-debug-tracing.txt, 
 apache-cassandra-2.0.13-SNAPSHOT.jar, cqlsh.txt, debug.txt, inconsistency.txt


 Hey guys,
 After almost a week of researching the issue and trying out multiple things 
 with (almost) no luck I was suggested (on the user@cass list) to file a 
 report here.
 h5. Setup
 Cassandra 2.0.13 (we had the issue with 2.0.10 as well and upgraded to see if 
 it goes away)
 Multi datacenter 12+6 nodes cluster.
 h5. Schema
 {code}
 cqlsh describe keyspace blackbook;
 CREATE KEYSPACE blackbook WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'IAD': '3',
   'ORD': '3'
 };
 USE blackbook;
 CREATE TABLE bounces (
   domainid text,
   address text,
   message text,
   timestamp bigint,
   PRIMARY KEY (domainid, address)
 ) WITH
   bloom_filter_fp_chance=0.10 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.10 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.00 AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'LeveledCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor'};
 {code}
 h5. Use case
 Each row (defined by a domainid) can have many many columns (bounce entries) 
 so rows can get pretty wide. In practice, most of the rows are not that big 
 but some of them contain hundreds of thousands and even millions of columns.
 Columns are not TTL'ed but can be deleted using the following CQL3 statement:
 {code}
 delete from bounces where domainid = 'domain.com' and address = 
 'al...@example.com';
 {code}
 All queries are performed using LOCAL_QUORUM CL.
 h5. Problem
 We weren't very diligent about running repairs on the cluster initially, but 
 shorty after we started doing it we noticed that some of previously deleted 
 columns (bounce entries) are there again, as if tombstones have disappeared.
 I have run this test multiple times via cqlsh, on the row of the customer who 
 originally reported the issue:
 * delete an entry
 * verify it's not returned even with CL=ALL
 * run repair on nodes that own this row's key
 * the columns reappear and are returned even with CL=ALL
 I tried the same test on another row with much less data and everything was 
 correctly deleted and didn't reappear after repair.
 h5. Other steps I've taken so far
 Made sure NTP is running on all servers and clocks are synchronized.
 Increased gc_grace_seconds to 100 days, ran full repair (on the affected 
 keyspace) on all nodes, then changed it back to the default 10 days again. 
 Didn't help.
 Performed one more test. Updated one of the resurrected columns, then deleted 
 it and ran repair again. This time the updated version of the column 
 reappeared.
 Finally, I noticed these log entries for the row in question:
 {code}
 INFO [ValidationExecutor:77] 2015-03-25 20:27:43,936 
 CompactionController.java (line 192) Compacting large row 
 blackbook/bounces:4ed558feba8a483733001d6a (279067683 bytes) incrementally
 {code}
 Figuring it may be related I bumped in_memory_compaction_limit_in_mb to 
 512MB so the row fits into it, deleted the entry and ran repair once again. 
 The log entry for this row was gone and the columns didn't reappear.
 We have a lot of rows much larger than 512MB so can't increase this 
 parameters forever, if that is the issue.
 Please let me know if you need more information on the case or if I can run 
 more experiments.
 Thanks!
 Roman



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9045) Deleted columns are resurrected after repair in wide rows


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393627#comment-14393627
 ] 

Tyler Hobbs commented on CASSANDRA-9045:


bq. Would you be able to provide a binary that I could use as a drop-in 
replacement?

I can do that if you need me to, although I can also provide simple 
instructions for building the jar.

bq. Also, will I need to replace it on all nodes in the cluster?

One node would be enough if you can reproduce the behavior like 
{{inconsistency.txt}} on a key that it's a replica for.

 Deleted columns are resurrected after repair in wide rows
 -

 Key: CASSANDRA-9045
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9045
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Roman Tkachenko
Assignee: Marcus Eriksson
Priority: Critical
 Fix For: 2.0.15

 Attachments: cqlsh.txt, inconsistency.txt


 Hey guys,
 After almost a week of researching the issue and trying out multiple things 
 with (almost) no luck I was suggested (on the user@cass list) to file a 
 report here.
 h5. Setup
 Cassandra 2.0.13 (we had the issue with 2.0.10 as well and upgraded to see if 
 it goes away)
 Multi datacenter 12+6 nodes cluster.
 h5. Schema
 {code}
 cqlsh describe keyspace blackbook;
 CREATE KEYSPACE blackbook WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'IAD': '3',
   'ORD': '3'
 };
 USE blackbook;
 CREATE TABLE bounces (
   domainid text,
   address text,
   message text,
   timestamp bigint,
   PRIMARY KEY (domainid, address)
 ) WITH
   bloom_filter_fp_chance=0.10 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.10 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.00 AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'LeveledCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor'};
 {code}
 h5. Use case
 Each row (defined by a domainid) can have many many columns (bounce entries) 
 so rows can get pretty wide. In practice, most of the rows are not that big 
 but some of them contain hundreds of thousands and even millions of columns.
 Columns are not TTL'ed but can be deleted using the following CQL3 statement:
 {code}
 delete from bounces where domainid = 'domain.com' and address = 
 'al...@example.com';
 {code}
 All queries are performed using LOCAL_QUORUM CL.
 h5. Problem
 We weren't very diligent about running repairs on the cluster initially, but 
 shorty after we started doing it we noticed that some of previously deleted 
 columns (bounce entries) are there again, as if tombstones have disappeared.
 I have run this test multiple times via cqlsh, on the row of the customer who 
 originally reported the issue:
 * delete an entry
 * verify it's not returned even with CL=ALL
 * run repair on nodes that own this row's key
 * the columns reappear and are returned even with CL=ALL
 I tried the same test on another row with much less data and everything was 
 correctly deleted and didn't reappear after repair.
 h5. Other steps I've taken so far
 Made sure NTP is running on all servers and clocks are synchronized.
 Increased gc_grace_seconds to 100 days, ran full repair (on the affected 
 keyspace) on all nodes, then changed it back to the default 10 days again. 
 Didn't help.
 Performed one more test. Updated one of the resurrected columns, then deleted 
 it and ran repair again. This time the updated version of the column 
 reappeared.
 Finally, I noticed these log entries for the row in question:
 {code}
 INFO [ValidationExecutor:77] 2015-03-25 20:27:43,936 
 CompactionController.java (line 192) Compacting large row 
 blackbook/bounces:4ed558feba8a483733001d6a (279067683 bytes) incrementally
 {code}
 Figuring it may be related I bumped in_memory_compaction_limit_in_mb to 
 512MB so the row fits into it, deleted the entry and ran repair once again. 
 The log entry for this row was gone and the columns didn't reappear.
 We have a lot of rows much larger than 512MB so can't increase this 
 parameters forever, if that is the issue.
 Please let me know if you need more information on the case or if I can run 
 more experiments.
 Thanks!
 Roman



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-9109) Repair appears to have some of untested behaviors