[jira] [Commented] (CASSANDRA-8530) Query on a secondary index creates huge CPU spike + unable to trace
[ https://issues.apache.org/jira/browse/CASSANDRA-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790594#comment-14790594 ] Jonathan Ellis commented on CASSANDRA-8530: --- Is this the same as CASSANDRA-10050 ? > Query on a secondary index creates huge CPU spike + unable to trace > --- > > Key: CASSANDRA-8530 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8530 > Project: Cassandra > Issue Type: Bug > Components: API, Core > Environment: CentOs 6.5 / Cassandra 2.1.2 >Reporter: Pavel Baranov > > After upgrading cassandra from 2.0.10 to 2.1.2 we are having all kinds of > issues, especially with performance. > java version "1.7.0_65" > Table creation: > {noformat} > tweets> desc table tweets; > CREATE TABLE tweets.tweets ( > uname text, > tweet_id bigint, > tweet text, > tweet_date timestamp, > tweet_date_only text, > uid bigint, > PRIMARY KEY (uname, tweet_id) > ) WITH CLUSTERING ORDER BY (tweet_id ASC) > AND bloom_filter_fp_chance = 0.01 > AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' > AND comment = '' > AND compaction = {'min_threshold': '10', 'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32'} > AND compression = {'sstable_compression': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND dclocal_read_repair_chance = 0.0 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.1 > AND speculative_retry = '99.0PERCENTILE'; > CREATE INDEX tweets_tweet_date_only_idx ON tweets.tweets (tweet_date_only); > CREATE INDEX tweets_uid ON tweets.tweets (uid); > {noformat} > With Cassandra 2.0.10 this query: > {noformat} > select uname from tweets where uid = 636732672 limit 1; > {noformat} > did not have any issues. After upgrade, I can see the cpu spikes and load avg > goes from ~1 to ~13, especially if I execute the query over and over again. > Doing "tracing on" does not work and just returns: > "Statement trace did not complete within 10 seconds" > I've done: > nodetool upgradesstables > recreated indexes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8530) Query on a secondary index creates huge CPU spike + unable to trace
[ https://issues.apache.org/jira/browse/CASSANDRA-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745573#comment-14745573 ] Pavel Baranov commented on CASSANDRA-8530: -- Tom, we decided to restructure our tables and not use secondary indexes at all (feedback from Datastax). It's been working great so far. - Pavel > Query on a secondary index creates huge CPU spike + unable to trace > --- > > Key: CASSANDRA-8530 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8530 > Project: Cassandra > Issue Type: Bug > Components: API, Core > Environment: CentOs 6.5 / Cassandra 2.1.2 >Reporter: Pavel Baranov > > After upgrading cassandra from 2.0.10 to 2.1.2 we are having all kinds of > issues, especially with performance. > java version "1.7.0_65" > Table creation: > {noformat} > tweets> desc table tweets; > CREATE TABLE tweets.tweets ( > uname text, > tweet_id bigint, > tweet text, > tweet_date timestamp, > tweet_date_only text, > uid bigint, > PRIMARY KEY (uname, tweet_id) > ) WITH CLUSTERING ORDER BY (tweet_id ASC) > AND bloom_filter_fp_chance = 0.01 > AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' > AND comment = '' > AND compaction = {'min_threshold': '10', 'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32'} > AND compression = {'sstable_compression': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND dclocal_read_repair_chance = 0.0 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.1 > AND speculative_retry = '99.0PERCENTILE'; > CREATE INDEX tweets_tweet_date_only_idx ON tweets.tweets (tweet_date_only); > CREATE INDEX tweets_uid ON tweets.tweets (uid); > {noformat} > With Cassandra 2.0.10 this query: > {noformat} > select uname from tweets where uid = 636732672 limit 1; > {noformat} > did not have any issues. After upgrade, I can see the cpu spikes and load avg > goes from ~1 to ~13, especially if I execute the query over and over again. > Doing "tracing on" does not work and just returns: > "Statement trace did not complete within 10 seconds" > I've done: > nodetool upgradesstables > recreated indexes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8530) Query on a secondary index creates huge CPU spike + unable to trace
[ https://issues.apache.org/jira/browse/CASSANDRA-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745552#comment-14745552 ] Tom van den Berge commented on CASSANDRA-8530: -- Pavel, I'm having similar problems since I'm using vnodes, but I'm not sure if that's causing the problems. Are you using vnodes? Did you manage to find a solution or workaround for this problem? Tom > Query on a secondary index creates huge CPU spike + unable to trace > --- > > Key: CASSANDRA-8530 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8530 > Project: Cassandra > Issue Type: Bug > Components: API, Core > Environment: CentOs 6.5 / Cassandra 2.1.2 >Reporter: Pavel Baranov > > After upgrading cassandra from 2.0.10 to 2.1.2 we are having all kinds of > issues, especially with performance. > java version "1.7.0_65" > Table creation: > {noformat} > tweets> desc table tweets; > CREATE TABLE tweets.tweets ( > uname text, > tweet_id bigint, > tweet text, > tweet_date timestamp, > tweet_date_only text, > uid bigint, > PRIMARY KEY (uname, tweet_id) > ) WITH CLUSTERING ORDER BY (tweet_id ASC) > AND bloom_filter_fp_chance = 0.01 > AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' > AND comment = '' > AND compaction = {'min_threshold': '10', 'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32'} > AND compression = {'sstable_compression': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND dclocal_read_repair_chance = 0.0 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.1 > AND speculative_retry = '99.0PERCENTILE'; > CREATE INDEX tweets_tweet_date_only_idx ON tweets.tweets (tweet_date_only); > CREATE INDEX tweets_uid ON tweets.tweets (uid); > {noformat} > With Cassandra 2.0.10 this query: > {noformat} > select uname from tweets where uid = 636732672 limit 1; > {noformat} > did not have any issues. After upgrade, I can see the cpu spikes and load avg > goes from ~1 to ~13, especially if I execute the query over and over again. > Doing "tracing on" does not work and just returns: > "Statement trace did not complete within 10 seconds" > I've done: > nodetool upgradesstables > recreated indexes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8530) Query on a secondary index creates huge CPU spike + unable to trace
[ https://issues.apache.org/jira/browse/CASSANDRA-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14255898#comment-14255898 ] Pavel Baranov commented on CASSANDRA-8530: -- "What I get from this is that the problem was not triggered by the upgrade, but the upgrade did not fix it" - correct As far as "performance over time" goes - we are running this script on monthly bases so I cannot speak for day by day, but month by month all those queries were running in subsecond range. And, one thing to note is that the script is multithreaded, once the issue appeared I cannot even run it with 1 thread. > Query on a secondary index creates huge CPU spike + unable to trace > --- > > Key: CASSANDRA-8530 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8530 > Project: Cassandra > Issue Type: Bug > Components: API, Core > Environment: CentOs 6.5 / Cassandra 2.1.2 >Reporter: Pavel Baranov > > After upgrading cassandra from 2.0.10 to 2.1.2 we are having all kinds of > issues, especially with performance. > java version "1.7.0_65" > Table creation: > {noformat} > tweets> desc table tweets; > CREATE TABLE tweets.tweets ( > uname text, > tweet_id bigint, > tweet text, > tweet_date timestamp, > tweet_date_only text, > uid bigint, > PRIMARY KEY (uname, tweet_id) > ) WITH CLUSTERING ORDER BY (tweet_id ASC) > AND bloom_filter_fp_chance = 0.01 > AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' > AND comment = '' > AND compaction = {'min_threshold': '10', 'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32'} > AND compression = {'sstable_compression': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND dclocal_read_repair_chance = 0.0 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.1 > AND speculative_retry = '99.0PERCENTILE'; > CREATE INDEX tweets_tweet_date_only_idx ON tweets.tweets (tweet_date_only); > CREATE INDEX tweets_uid ON tweets.tweets (uid); > {noformat} > With Cassandra 2.0.10 this query: > {noformat} > select uname from tweets where uid = 636732672 limit 1; > {noformat} > did not have any issues. After upgrade, I can see the cpu spikes and load avg > goes from ~1 to ~13, especially if I execute the query over and over again. > Doing "tracing on" does not work and just returns: > "Statement trace did not complete within 10 seconds" > I've done: > nodetool upgradesstables > recreated indexes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8530) Query on a secondary index creates huge CPU spike + unable to trace
[ https://issues.apache.org/jira/browse/CASSANDRA-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14255892#comment-14255892 ] Sylvain Lebresne commented on CASSANDRA-8530: - bq. 3. until recently we were using a python script that does very simple selects \[...\] All of a sudden, same queries and same process started giving the following message \[...\] and accompanied by huge cpu spikes (load avg went from ~2 to about ~15) bq. 4. We figured maybe it's time to upgrade, so we did to 2.1.2 - did not help. What I get from this is that the problem was *not* triggered by the upgrade, but the upgrade did not fix it. Which infortunately makes it a little less easy to narrow down. Are you sure this really happened "All of a sudden"? Namely, would it be possible that the query performance degraded over time, but you only started noticing once it started throwing you the error message (i.e. do you actually have monitoring that shows how fast those queries were before they started timing out?). > Query on a secondary index creates huge CPU spike + unable to trace > --- > > Key: CASSANDRA-8530 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8530 > Project: Cassandra > Issue Type: Bug > Components: API, Core > Environment: CentOs 6.5 / Cassandra 2.1.2 >Reporter: Pavel Baranov > > After upgrading cassandra from 2.0.10 to 2.1.2 we are having all kinds of > issues, especially with performance. > java version "1.7.0_65" > Table creation: > {noformat} > tweets> desc table tweets; > CREATE TABLE tweets.tweets ( > uname text, > tweet_id bigint, > tweet text, > tweet_date timestamp, > tweet_date_only text, > uid bigint, > PRIMARY KEY (uname, tweet_id) > ) WITH CLUSTERING ORDER BY (tweet_id ASC) > AND bloom_filter_fp_chance = 0.01 > AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' > AND comment = '' > AND compaction = {'min_threshold': '10', 'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32'} > AND compression = {'sstable_compression': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND dclocal_read_repair_chance = 0.0 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.1 > AND speculative_retry = '99.0PERCENTILE'; > CREATE INDEX tweets_tweet_date_only_idx ON tweets.tweets (tweet_date_only); > CREATE INDEX tweets_uid ON tweets.tweets (uid); > {noformat} > With Cassandra 2.0.10 this query: > {noformat} > select uname from tweets where uid = 636732672 limit 1; > {noformat} > did not have any issues. After upgrade, I can see the cpu spikes and load avg > goes from ~1 to ~13, especially if I execute the query over and over again. > Doing "tracing on" does not work and just returns: > "Statement trace did not complete within 10 seconds" > I've done: > nodetool upgradesstables > recreated indexes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8530) Query on a secondary index creates huge CPU spike + unable to trace
[ https://issues.apache.org/jira/browse/CASSANDRA-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14255884#comment-14255884 ] Pavel Baranov commented on CASSANDRA-8530: -- Providing a self-contained script would be tough right now. I can describe what we've done so far: 1. tweets table was create under Cassandra 2.0.10 2. currently there are 6 nodes in the cluster, total storage usage is 3.3TB with an avg. data size of 401GB per node. 3. until recently we were using a python script that does very simple selects (on secondary index, like I've posted in the original ticket) and also deletes based on primary key (this was done once a month, and on a small volume, nothing intense). All of a sudden, same queries and same process started giving the following message: {noformat} code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'} {noformat} and accompanied by huge cpu spikes (load avg went from ~2 to about ~15) 4. We figured maybe it's time to upgrade, so we did to 2.1.2 - did not help. 5. after upgrading, the secondary indexes didn't work at all, so I ran "nodetool sstableupgrade" which fixed it but the issue remained. 6. tried removing and reCreating indexes - didn't help 7. did couple of rolling cluster restarts - didn't help 8. unable to see the trace info with "tracing on", however, tracing info appears on queries on primary index or without WHERE clause at all. 9. insertions are still coming in and do not trigger any cpu spikes. The only this that shows up in the system.log is: {noformat} WARN [SharedPool-Worker-1] 2014-12-22 08:07:54,953 BatchStatement.java:255 - Batch of prepared statements for [tweets.tweets] is of size 35434, exceeding specified threshold of 5120 by 30314. {noformat} and that's not even after running the "bad" queries, so I don't even know where else to look for abnormalities... Please let me know if there is anything I can do/provide. Thank you! > Query on a secondary index creates huge CPU spike + unable to trace > --- > > Key: CASSANDRA-8530 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8530 > Project: Cassandra > Issue Type: Bug > Components: API, Core > Environment: CentOs 6.5 / Cassandra 2.1.2 >Reporter: Pavel Baranov > > After upgrading cassandra from 2.0.10 to 2.1.2 we are having all kinds of > issues, especially with performance. > java version "1.7.0_65" > Table creation: > {noformat} > tweets> desc table tweets; > CREATE TABLE tweets.tweets ( > uname text, > tweet_id bigint, > tweet text, > tweet_date timestamp, > tweet_date_only text, > uid bigint, > PRIMARY KEY (uname, tweet_id) > ) WITH CLUSTERING ORDER BY (tweet_id ASC) > AND bloom_filter_fp_chance = 0.01 > AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' > AND comment = '' > AND compaction = {'min_threshold': '10', 'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32'} > AND compression = {'sstable_compression': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND dclocal_read_repair_chance = 0.0 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.1 > AND speculative_retry = '99.0PERCENTILE'; > CREATE INDEX tweets_tweet_date_only_idx ON tweets.tweets (tweet_date_only); > CREATE INDEX tweets_uid ON tweets.tweets (uid); > {noformat} > With Cassandra 2.0.10 this query: > {noformat} > select uname from tweets where uid = 636732672 limit 1; > {noformat} > did not have any issues. After upgrade, I can see the cpu spikes and load avg > goes from ~1 to ~13, especially if I execute the query over and over again. > Doing "tracing on" does not work and just returns: > "Statement trace did not complete within 10 seconds" > I've done: > nodetool upgradesstables > recreated indexes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8530) Query on a secondary index creates huge CPU spike + unable to trace
[ https://issues.apache.org/jira/browse/CASSANDRA-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14255578#comment-14255578 ] Sylvain Lebresne commented on CASSANDRA-8530: - We'll have a look, but know that this is a lot more likely to be looked at/fix quickly if you provide full reproduction steps. Typically, a self-contained script that reproduce the problem on a branch new cluster would be really good (and of course, if you can't reproduce on a test cluster, then that's useful information anyway). > Query on a secondary index creates huge CPU spike + unable to trace > --- > > Key: CASSANDRA-8530 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8530 > Project: Cassandra > Issue Type: Bug > Components: API, Core > Environment: CentOs 6.5 / Cassandra 2.1.2 >Reporter: Pavel Baranov > > After upgrading cassandra from 2.0.10 to 2.1.2 we are having all kinds of > issues, especially with performance. > java version "1.7.0_65" > Table creation: > {noformat} > tweets> desc table tweets; > CREATE TABLE tweets.tweets ( > uname text, > tweet_id bigint, > tweet text, > tweet_date timestamp, > tweet_date_only text, > uid bigint, > PRIMARY KEY (uname, tweet_id) > ) WITH CLUSTERING ORDER BY (tweet_id ASC) > AND bloom_filter_fp_chance = 0.01 > AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' > AND comment = '' > AND compaction = {'min_threshold': '10', 'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32'} > AND compression = {'sstable_compression': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND dclocal_read_repair_chance = 0.0 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.1 > AND speculative_retry = '99.0PERCENTILE'; > CREATE INDEX tweets_tweet_date_only_idx ON tweets.tweets (tweet_date_only); > CREATE INDEX tweets_uid ON tweets.tweets (uid); > {noformat} > With Cassandra 2.0.10 this query: > {noformat} > select uname from tweets where uid = 636732672 limit 1; > {noformat} > did not have any issues. After upgrade, I can see the cpu spikes and load avg > goes from ~1 to ~13, especially if I execute the query over and over again. > Doing "tracing on" does not work and just returns: > "Statement trace did not complete within 10 seconds" > I've done: > nodetool upgradesstables > recreated indexes -- This message was sent by Atlassian JIRA (v6.3.4#6332)