[jira] [Commented] (CASSANDRA-11122) SASI does not find term when indexing non-ascii character
[ https://issues.apache.org/jira/browse/CASSANDRA-11122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134141#comment-15134141 ] Pierre Laporte commented on CASSANDRA-11122: Following the script Duyhai provided gave the same outcome: "Object" is not returned in the first script. This has been run against a fresh clone+build of https://github.com/xedin/cassandra/ (branch {{CASSANDRA-11067}}) > SASI does not find term when indexing non-ascii character > - > > Key: CASSANDRA-11122 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11122 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: Cassandra 3.4 SNAPSHOT >Reporter: DOAN DuyHai > Attachments: CASSANDRA-11122.patch > > > I built the snapshot version taken from here: > https://github.com/xedin/cassandra/tree/CASSANDRA-11067 > I create a tiny musical dataset with non-ascii characters (*cyrillic* > actually) and create a SASI index on the artist name. > SASI can find rows for the cyrillic name but strangely fails to index normal > ascii name (_'Object'_). > {code:sql} > CREATE KEYSPACE music WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': '1'} AND durable_writes = true; > CREATE TABLE music.albums ( > title text PRIMARY KEY, > artist text > ); > INSERT INTO music.albums(artist,title) VALUES('Object','The Reflecting Skin'); > INSERT INTO music.albums(artist,title) VALUES('Hayden','Mild and Hazy'); > INSERT INTO music.albums(artist,title) VALUES('Самое Большое Простое > Число','СБПЧ Оркестр'); > CREATE custom INDEX on music.albums(artist) USING > 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { > 'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', > 'case_sensitive': 'false'}; > SELECT * FROM music.albums; > title | artist > -+- > The Reflecting Skin | Object >Mild and Hazy | Hayden > СБПЧ Оркестр | Самое Большое Простое Число > (3 rows) > SELECT * FROM music.albums WHERE artist='Самое Большое Простое Число'; > title | artist > -+- > СБПЧ Оркестр | Самое Большое Простое Число > (1 rows) > SELECT * FROM music.albums WHERE artist='Hayden'; > title | artist > -+- >Mild and Hazy | Hayden > (1 rows) > SELECT * FROM music.albums WHERE artist='Object'; > title | artist > -+- > (0 rows) > SELECT * FROM music.albums WHERE artist like 'Ob%'; > title | artist > -+- > (0 rows) > {code} > Strangely enough, after cleaning all the data and re-inserting without the > russian artist with cyrillic name, SASI does find _'Object_' ... > {code:sql} > DROP INDEX albums_artist_idx; > TRUNCATE TABLE albums; > INSERT INTO albums(artist,title) VALUES('Object','The Reflecting Skin'); > INSERT INTO albums(artist,title) VALUES('Hayden','Mild and Hazy'); > CREATE custom INDEX on music.albums(artist) USING > 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { > 'analyzer_class': > 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', > 'case_sensitive': 'false'}; > SELECT * FROM music.albums; > title | artist > -+- > The Reflecting Skin | Object >Mild and Hazy | Hayden > (2 rows) > SELECT * FROM music.albums WHERE artist='Object'; > title | artist > -+- > The Reflecting Skin | Object > (1 rows) > SELECT * FROM music.albums WHERE artist LIKE 'Ob%'; > title | artist > -+- > The Reflecting Skin | Object > (1 rows) > {code} > The behaviour is quite inconsistent. I can understand that SASI refuses to > index cyrillic character or issue exception when encountering non-ascii > characters (because we did not specify the locale) but it's very surprising > that the indexing fails for normal ascii characters like _Object_ > Could it be that SASI start indexing the artist name by following table > albums token range order (hash of title) and it stops indexing after > encountering the cyrillic name ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8285) OOME in Cassandra 2.0.11
[ https://issues.apache.org/jira/browse/CASSANDRA-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246984#comment-14246984 ] Pierre Laporte commented on CASSANDRA-8285: --- The ruby duration tests have passed, the issue has not been seen on C* 2.0.12 (HEAD + 8285-v2.txt). Note that [~kishkaru]'s previous tests was using two tester machines, while this one only used one, so that reduces the load C* had to handle. Still, the issue has not been seen during the 3-days test. OOME in Cassandra 2.0.11 Key: CASSANDRA-8285 URL: https://issues.apache.org/jira/browse/CASSANDRA-8285 Project: Cassandra Issue Type: Bug Environment: Cassandra 2.0.11 + java-driver 2.0.8-SNAPSHOT Cassandra 2.0.11 + ruby-driver 1.0-beta Reporter: Pierre Laporte Assignee: Aleksey Yeschenko Attachments: 8285-v2.txt, 8285.txt, OOME_node_system.log, gc-1416849312.log.gz, gc.log.gz, heap-usage-after-gc-zoom.png, heap-usage-after-gc.png, system.log.gz We ran drivers 3-days endurance tests against Cassandra 2.0.11 and C* crashed with an OOME. This happened both with ruby-driver 1.0-beta and java-driver 2.0.8-snapshot. Attached are : | OOME_node_system.log | The system.log of one Cassandra node that crashed | | gc.log.gz | The GC log on the same node | | heap-usage-after-gc.png | The heap occupancy evolution after every GC cycle | | heap-usage-after-gc-zoom.png | A focus on when things start to go wrong | Workload : Our test executes 5 CQL statements (select, insert, select, delete, select) for a given unique id, during 3 days, using multiple threads. There is not change in the workload during the test. Symptoms : In the attached log, it seems something starts in Cassandra between 2014-11-06 10:29:22 and 2014-11-06 10:45:32. This causes an allocation that fills the heap. We eventually get stuck in a Full GC storm and get an OOME in the logs. I have run the java-driver tests against Cassandra 1.2.19 and 2.1.1. The error does not occur. It seems specific to 2.0.11. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8285) OOME in Cassandra 2.0.11
[ https://issues.apache.org/jira/browse/CASSANDRA-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238059#comment-14238059 ] Pierre Laporte commented on CASSANDRA-8285: --- Sure. Is the patch already applied to branch cassandra-2.0 or should I apply it manually ? OOME in Cassandra 2.0.11 Key: CASSANDRA-8285 URL: https://issues.apache.org/jira/browse/CASSANDRA-8285 Project: Cassandra Issue Type: Bug Environment: Cassandra 2.0.11 + java-driver 2.0.8-SNAPSHOT Cassandra 2.0.11 + ruby-driver 1.0-beta Reporter: Pierre Laporte Assignee: Aleksey Yeschenko Attachments: 8285-v2.txt, 8285.txt, OOME_node_system.log, gc-1416849312.log.gz, gc.log.gz, heap-usage-after-gc-zoom.png, heap-usage-after-gc.png, system.log.gz We ran drivers 3-days endurance tests against Cassandra 2.0.11 and C* crashed with an OOME. This happened both with ruby-driver 1.0-beta and java-driver 2.0.8-snapshot. Attached are : | OOME_node_system.log | The system.log of one Cassandra node that crashed | | gc.log.gz | The GC log on the same node | | heap-usage-after-gc.png | The heap occupancy evolution after every GC cycle | | heap-usage-after-gc-zoom.png | A focus on when things start to go wrong | Workload : Our test executes 5 CQL statements (select, insert, select, delete, select) for a given unique id, during 3 days, using multiple threads. There is not change in the workload during the test. Symptoms : In the attached log, it seems something starts in Cassandra between 2014-11-06 10:29:22 and 2014-11-06 10:45:32. This causes an allocation that fills the heap. We eventually get stuck in a Full GC storm and get an OOME in the logs. I have run the java-driver tests against Cassandra 1.2.19 and 2.1.1. The error does not occur. It seems specific to 2.0.11. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8150) Revaluate Default JVM tuning parameters
[ https://issues.apache.org/jira/browse/CASSANDRA-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233202#comment-14233202 ] Pierre Laporte commented on CASSANDRA-8150: --- [~mstump] By any chance, have you collected Cassandra gc logs against various scenarios? That would be really valuable to find the right values. I ran a test of java-driver against a C* instance on GCE n1-standard-1 server (1 vCPU, 3,75 GB RAM). The young generation size was 100 MB (80MB for Eden, 10MB for each survivor) and the old generation size was 2,4GB. I had the following: * Average allocation rate: 352MB/s (outliers above 600MB/s) * 4.5 DefNew cycles per second * 1 CMS cycle every 10 minutes Therefore, during the test, Cassandra was promoting objects at a rate of 3,8MB/s. I think the size of Eden could be determined mostly by the allocation rate and the DefNew/ParNew frequency we want to achieve. Here, for instance, I would rather have had a bigger young generation to have ~1 DefNew cycle/s. I did not enable {{-XX:+PrintTenuringDistribution}} so I do not know whether the objects were prematurely promoted. That would have given pointers on survivors sizing as well. Do you have any gc logs with such flag ? Revaluate Default JVM tuning parameters --- Key: CASSANDRA-8150 URL: https://issues.apache.org/jira/browse/CASSANDRA-8150 Project: Cassandra Issue Type: Improvement Components: Config Reporter: Matt Stump Assignee: Brandon Williams Attachments: upload.png It's been found that the old twitter recommendations of 100m per core up to 800m is harmful and should no longer be used. Instead the formula used should be 1/3 or 1/4 max heap with a max of 2G. 1/3 or 1/4 is debatable and I'm open to suggestions. If I were to hazard a guess 1/3 is probably better for releases greater than 2.1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8285) OOME in Cassandra 2.0.11
[ https://issues.apache.org/jira/browse/CASSANDRA-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223083#comment-14223083 ] Pierre Laporte commented on CASSANDRA-8285: --- I have the issue after ~1.5 day on the endurance test of java-driver 2.1.3 against 2.0.12. Please find the associated heap dump [here|https://drive.google.com/open?id=0BxvGkaXP3ayeOElqY1ZNQTlBNTgauthuser=1] OOME in Cassandra 2.0.11 Key: CASSANDRA-8285 URL: https://issues.apache.org/jira/browse/CASSANDRA-8285 Project: Cassandra Issue Type: Bug Environment: Cassandra 2.0.11 + java-driver 2.0.8-SNAPSHOT Cassandra 2.0.11 + ruby-driver 1.0-beta Reporter: Pierre Laporte Assignee: Aleksey Yeschenko Attachments: OOME_node_system.log, gc.log.gz, heap-usage-after-gc-zoom.png, heap-usage-after-gc.png We ran drivers 3-days endurance tests against Cassandra 2.0.11 and C* crashed with an OOME. This happened both with ruby-driver 1.0-beta and java-driver 2.0.8-snapshot. Attached are : | OOME_node_system.log | The system.log of one Cassandra node that crashed | | gc.log.gz | The GC log on the same node | | heap-usage-after-gc.png | The heap occupancy evolution after every GC cycle | | heap-usage-after-gc-zoom.png | A focus on when things start to go wrong | Workload : Our test executes 5 CQL statements (select, insert, select, delete, select) for a given unique id, during 3 days, using multiple threads. There is not change in the workload during the test. Symptoms : In the attached log, it seems something starts in Cassandra between 2014-11-06 10:29:22 and 2014-11-06 10:45:32. This causes an allocation that fills the heap. We eventually get stuck in a Full GC storm and get an OOME in the logs. I have run the java-driver tests against Cassandra 1.2.19 and 2.1.1. The error does not occur. It seems specific to 2.0.11. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8285) OOME in Cassandra 2.0.11
[ https://issues.apache.org/jira/browse/CASSANDRA-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pierre Laporte updated CASSANDRA-8285: -- Attachment: system.log.gz gc-1416849312.log.gz I just reproduced the issue on my machine against Cassandra 2.1.2. *Howto* Create 3-nodes C* cluster {code}ccm create -n 3 -v 2.1.2 -b -s -i 127.0.0. cassandra-2.1{code} Insert/delete a lot of rows inside a single table. I was actually trying to reproduce the TombstoneOverwhelmingException but got an OOME instead. {code} public class CassandraTest implements AutoCloseable { public static final String KEYSPACE = TombstonesOverwhelming; private Cluster cluster; protected Session session; public CassandraTest() { this(new RoundRobinPolicy()); } public CassandraTest(LoadBalancingPolicy loadBalancingPolicy) { System.out.println(Creating builder...); cluster = Cluster.builder().addContactPoint(127.0.0.1).withLoadBalancingPolicy(loadBalancingPolicy).build(); for (Host host : cluster.getMetadata().getAllHosts()) { System.out.println(Found host + host.getAddress() + in DC + host.getDatacenter()); } session = cluster.connect(); } private void executeQuietly(String query) { try { execute(query); } catch (Exception e) { e.printStackTrace(); } } private ResultSet execute(String query) { return session.execute(query); } private ResultSet execute(Statement statement) { return session.execute(statement); } @Override public void close() throws IOException { cluster.close(); } public static void main(String... args) throws Exception { try (CassandraTest test = new CassandraTest()) { test.executeQuietly(DROP KEYSPACE IF EXISTS + KEYSPACE); test.execute(CREATE KEYSPACE + KEYSPACE + + WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 }); test.execute(USE + KEYSPACE); test.execute(CREATE TABLE useful (run int, iteration int, copy int, PRIMARY KEY (run, iteration, copy))); System.out.println(Press ENTER to start the test); System.in.read(); for (int run = 0; run 1_000_000; run++) { System.out.printf(Starting run % 7d... , run); System.out.print(Inserting...); for (int iteration = 0; iteration 1_000_000; iteration++) { Batch batch = QueryBuilder.batch(); batch.setConsistencyLevel(ConsistencyLevel.QUORUM); for (int copy = 0; copy 100; copy++) { batch.add(QueryBuilder.insertInto(useful) .value(run, run).value(iteration, iteration).value(copy, copy)); } test.execute(batch); } System.out.println(Deleting...); for (int iteration = 0; iteration 1_000_000; iteration++) { Batch batch = QueryBuilder.batch(); batch.setConsistencyLevel(ConsistencyLevel.QUORUM); for (int copy = 0; copy 100; copy++) { batch.add(QueryBuilder.delete().from(useful) .where(eq(run, run)).and(eq(iteration, iteration)).and(eq(copy, copy))); } test.execute(batch); } } } catch (Exception e) { e.printStackTrace(); } } } {code} I took ~50 minutes before two instances OOME'd. Please find attached the gc log and the system log. If needed, I can upload a heap dump too. Hope that helps OOME in Cassandra 2.0.11 Key: CASSANDRA-8285 URL: https://issues.apache.org/jira/browse/CASSANDRA-8285 Project: Cassandra Issue Type: Bug Environment: Cassandra 2.0.11 + java-driver 2.0.8-SNAPSHOT Cassandra 2.0.11 + ruby-driver 1.0-beta Reporter: Pierre Laporte Assignee: Aleksey Yeschenko Attachments: OOME_node_system.log, gc-1416849312.log.gz, gc.log.gz, heap-usage-after-gc-zoom.png, heap-usage-after-gc.png, system.log.gz We ran drivers 3-days endurance tests against Cassandra 2.0.11 and C* crashed with an OOME. This happened both with ruby-driver 1.0-beta and java-driver 2.0.8-snapshot. Attached are : | OOME_node_system.log | The system.log of one Cassandra node that crashed | | gc.log.gz | The GC log on the same node | | heap-usage-after-gc.png | The heap occupancy evolution after every GC cycle | | heap-usage-after-gc-zoom.png | A focus on when things start to go wrong | Workload : Our test executes 5 CQL statements (select, insert, select, delete,
[jira] [Comment Edited] (CASSANDRA-8285) OOME in Cassandra 2.0.11
[ https://issues.apache.org/jira/browse/CASSANDRA-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223321#comment-14223321 ] Pierre Laporte edited comment on CASSANDRA-8285 at 11/24/14 7:09 PM: - I just reproduced the issue on my machine against Cassandra 2.1.2. *Howto* Create 3-nodes C* cluster {code}ccm create -n 3 -v 2.1.2 -b -s -i 127.0.0. cassandra-2.1{code} Insert/delete a lot of rows inside a single table. I was actually trying to reproduce the TombstoneOverwhelmingException but got an OOME instead. {code} public class CassandraTest implements AutoCloseable { public static final String KEYSPACE = TombstonesOverwhelming; private Cluster cluster; protected Session session; public CassandraTest() { this(new RoundRobinPolicy()); } public CassandraTest(LoadBalancingPolicy loadBalancingPolicy) { System.out.println(Creating builder...); cluster = Cluster.builder().addContactPoint(127.0.0.1).withLoadBalancingPolicy(loadBalancingPolicy).build(); for (Host host : cluster.getMetadata().getAllHosts()) { System.out.println(Found host + host.getAddress() + in DC + host.getDatacenter()); } session = cluster.connect(); } private void executeQuietly(String query) { try { execute(query); } catch (Exception e) { e.printStackTrace(); } } private ResultSet execute(String query) { return session.execute(query); } private ResultSet execute(Statement statement) { return session.execute(statement); } @Override public void close() throws IOException { cluster.close(); } public static void main(String... args) throws Exception { try (CassandraTest test = new CassandraTest()) { test.executeQuietly(DROP KEYSPACE IF EXISTS + KEYSPACE); test.execute(CREATE KEYSPACE + KEYSPACE + + WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 }); test.execute(USE + KEYSPACE); test.execute(CREATE TABLE useful (run int, iteration int, copy int, PRIMARY KEY (run, iteration, copy))); System.out.println(Press ENTER to start the test); System.in.read(); for (int run = 0; run 1_000_000; run++) { System.out.printf(Starting run % 7d... , run); System.out.print(Inserting...); for (int iteration = 0; iteration 1_000_000; iteration++) { Batch batch = QueryBuilder.batch(); batch.setConsistencyLevel(ConsistencyLevel.QUORUM); for (int copy = 0; copy 100; copy++) { batch.add(QueryBuilder.insertInto(useful) .value(run, run).value(iteration, iteration).value(copy, copy)); } test.execute(batch); } System.out.println(Deleting...); for (int iteration = 0; iteration 1_000_000; iteration++) { Batch batch = QueryBuilder.batch(); batch.setConsistencyLevel(ConsistencyLevel.QUORUM); for (int copy = 0; copy 100; copy++) { batch.add(QueryBuilder.delete().from(useful) .where(eq(run, run)).and(eq(iteration, iteration)).and(eq(copy, copy))); } test.execute(batch); } } } catch (Exception e) { e.printStackTrace(); } } } {code} I took ~50 minutes before two instances OOME'd. Please find attached the gc log (gc-1416849312.log.gz) and the system log (system.log.gz). If needed, I can upload a heap dump too. Hope that helps was (Author: pingtimeout): I just reproduced the issue on my machine against Cassandra 2.1.2. *Howto* Create 3-nodes C* cluster {code}ccm create -n 3 -v 2.1.2 -b -s -i 127.0.0. cassandra-2.1{code} Insert/delete a lot of rows inside a single table. I was actually trying to reproduce the TombstoneOverwhelmingException but got an OOME instead. {code} public class CassandraTest implements AutoCloseable { public static final String KEYSPACE = TombstonesOverwhelming; private Cluster cluster; protected Session session; public CassandraTest() { this(new RoundRobinPolicy()); } public CassandraTest(LoadBalancingPolicy loadBalancingPolicy) { System.out.println(Creating builder...); cluster = Cluster.builder().addContactPoint(127.0.0.1).withLoadBalancingPolicy(loadBalancingPolicy).build(); for (Host host : cluster.getMetadata().getAllHosts()) { System.out.println(Found host + host.getAddress() + in DC + host.getDatacenter()); } session
[jira] [Commented] (CASSANDRA-8365) CamelCase name is used as index name instead of lowercase
[ https://issues.apache.org/jira/browse/CASSANDRA-8365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223328#comment-14223328 ] Pierre Laporte commented on CASSANDRA-8365: --- [~philipthompson] I am using 2.1.2 CamelCase name is used as index name instead of lowercase - Key: CASSANDRA-8365 URL: https://issues.apache.org/jira/browse/CASSANDRA-8365 Project: Cassandra Issue Type: Bug Reporter: Pierre Laporte Priority: Minor Labels: cqlsh In cqlsh, when I execute a CREATE INDEX FooBar ... statement, the CamelCase name is used as index name, even though it is unquoted. Trying to quote the index name results in a syntax error. However, when I try to delete the index, I have to quote the index name, otherwise I get an invalid-query error telling me that the index (lowercase) does not exist. This seems inconsistent. Shouldn't the index name be lowercased before the index is created ? Here is the code to reproduce the issue : {code} cqlsh:schemabuilderit CREATE TABLE IndexTest (a int primary key, b int); cqlsh:schemabuilderit CREATE INDEX FooBar on indextest (b); cqlsh:schemabuilderit DESCRIBE TABLE indextest ; CREATE TABLE schemabuilderit.indextest ( a int PRIMARY KEY, b int ) ; CREATE INDEX FooBar ON schemabuilderit.indextest (b); cqlsh:schemabuilderit DROP INDEX FooBar; code=2200 [Invalid query] message=Index 'foobar' could not be found in any of the tables of keyspace 'schemabuilderit' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8355) NPE when passing wrong argument in ALTER TABLE statement
Pierre Laporte created CASSANDRA-8355: - Summary: NPE when passing wrong argument in ALTER TABLE statement Key: CASSANDRA-8355 URL: https://issues.apache.org/jira/browse/CASSANDRA-8355 Project: Cassandra Issue Type: Bug Environment: Cassandra 2.1.2 Reporter: Pierre Laporte Priority: Minor When I tried to change the caching strategy of a table, I provided a wrong argument {{'rows_per_partition' : ALL}} with unquoted ALL. Cassandra returned a SyntaxError, which is good, but it seems it was because of a NullPointerException. *Howto* {code} CREATE TABLE foo (k int primary key); ALTER TABLE foo WITH caching = {'keys' : 'all', 'rows_per_partition' : ALL}; {code} *Output* {code} ErrorMessage code=2000 [Syntax error in CQL query] message=Failed parsing statement: [ALTER TABLE foo WITH caching = {'keys' : 'all', 'rows_per_partition' : ALL};] reason: NullPointerException null {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8365) CamelCase name is used as index name instead of lowercase
Pierre Laporte created CASSANDRA-8365: - Summary: CamelCase name is used as index name instead of lowercase Key: CASSANDRA-8365 URL: https://issues.apache.org/jira/browse/CASSANDRA-8365 Project: Cassandra Issue Type: Bug Reporter: Pierre Laporte Priority: Minor In cqlsh, when I execute a CREATE INDEX FooBar ... statement, the CamelCase name is used as index name, even though it is unquoted. Trying to quote the index name results in a syntax error. However, when I try to delete the index, I have to quote the index name, otherwise I get an invalid-query error telling me that the index (lowercase) does not exist. This seems inconsistent. Shouldn't the index name be lowercased before the index is created ? Here is the code to reproduce the issue : {code} cqlsh:schemabuilderit CREATE TABLE IndexTest (a int primary key, b int); cqlsh:schemabuilderit CREATE INDEX FooBar on indextest (b); cqlsh:schemabuilderit DESCRIBE TABLE indextest ; CREATE TABLE schemabuilderit.indextest ( a int PRIMARY KEY, b int ) ; CREATE INDEX FooBar ON schemabuilderit.indextest (b); cqlsh:schemabuilderit DROP INDEX FooBar; code=2200 [Invalid query] message=Index 'foobar' could not be found in any of the tables of keyspace 'schemabuilderit' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8285) OOME in Cassandra 2.0.11
[ https://issues.apache.org/jira/browse/CASSANDRA-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14208169#comment-14208169 ] Pierre Laporte commented on CASSANDRA-8285: --- [~jbellis] Please find a new gc log, system log and heap dump [here|https://drive.google.com/a/datastax.com/folderview?id=0BxvGkaXP3ayeNV83Nm1nSUNEcDQusp=sharing] Those 3 files come from the same instance that crashed after a couple of hours. The heap dump was triggered by {{-XX:+HeapDumpOnOutOfMemoryError}} Hope that helps OOME in Cassandra 2.0.11 Key: CASSANDRA-8285 URL: https://issues.apache.org/jira/browse/CASSANDRA-8285 Project: Cassandra Issue Type: Bug Environment: Cassandra 2.0.11 + java-driver 2.0.8-SNAPSHOT Cassandra 2.0.11 + ruby-driver 1.0-beta Reporter: Pierre Laporte Assignee: Russ Hatch Attachments: OOME_node_system.log, gc.log.gz, heap-usage-after-gc-zoom.png, heap-usage-after-gc.png We ran drivers 3-days endurance tests against Cassandra 2.0.11 and C* crashed with an OOME. This happened both with ruby-driver 1.0-beta and java-driver 2.0.8-snapshot. Attached are : | OOME_node_system.log | The system.log of one Cassandra node that crashed | | gc.log.gz | The GC log on the same node | | heap-usage-after-gc.png | The heap occupancy evolution after every GC cycle | | heap-usage-after-gc-zoom.png | A focus on when things start to go wrong | Workload : Our test executes 5 CQL statements (select, insert, select, delete, select) for a given unique id, during 3 days, using multiple threads. There is not change in the workload during the test. Symptoms : In the attached log, it seems something starts in Cassandra between 2014-11-06 10:29:22 and 2014-11-06 10:45:32. This causes an allocation that fills the heap. We eventually get stuck in a Full GC storm and get an OOME in the logs. I have run the java-driver tests against Cassandra 1.2.19 and 2.1.1. The error does not occur. It seems specific to 2.0.11. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8285) OOME in Cassandra 2.0.11
Pierre Laporte created CASSANDRA-8285: - Summary: OOME in Cassandra 2.0.11 Key: CASSANDRA-8285 URL: https://issues.apache.org/jira/browse/CASSANDRA-8285 Project: Cassandra Issue Type: Bug Environment: Cassandra 2.0.11 + java-driver 2.0.8-SNAPSHOT Cassandra 2.0.11 + ruby-driver 1.0-beta Reporter: Pierre Laporte Attachments: OOME_node_system.log, gc.log.gz, heap-usage-after-gc-zoom.png, heap-usage-after-gc.png We ran drivers 3-days endurance tests against Cassandra 2.0.11 and C* crashed with an OOME. This happened both with ruby-driver 1.0-beta and java-driver 2.0.8-snapshot. Attached are : | OOME_node_system.log | The system.log of one Cassandra node that crashed | | gc.log.gz | The GC log on the same node | | heap-usage-after-gc.png | The heap occupancy evolution after every GC cycle | | heap-usage-after-gc-zoom.png | A focus on when things start to go wrong | Workload : Our test executes 5 CQL statements (select, insert, select, delete, select) for a given unique id, during 3 days, using multiple threads. There is not change in the workload during the test. Symptoms : In the attached log, it seems something starts in Cassandra between 2014-11-06 10:29:22 and 2014-11-06 10:45:32. This causes an allocation that fills the heap. We eventually get stuck in a Full GC storm and get an OOME in the logs. I have run the java-driver tests against Cassandra 1.2.19 and 2.1.1. The error does not occur. It seems specific to 2.0.10. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8285) OOME in Cassandra 2.0.11
[ https://issues.apache.org/jira/browse/CASSANDRA-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pierre Laporte updated CASSANDRA-8285: -- Description: We ran drivers 3-days endurance tests against Cassandra 2.0.11 and C* crashed with an OOME. This happened both with ruby-driver 1.0-beta and java-driver 2.0.8-snapshot. Attached are : | OOME_node_system.log | The system.log of one Cassandra node that crashed | | gc.log.gz | The GC log on the same node | | heap-usage-after-gc.png | The heap occupancy evolution after every GC cycle | | heap-usage-after-gc-zoom.png | A focus on when things start to go wrong | Workload : Our test executes 5 CQL statements (select, insert, select, delete, select) for a given unique id, during 3 days, using multiple threads. There is not change in the workload during the test. Symptoms : In the attached log, it seems something starts in Cassandra between 2014-11-06 10:29:22 and 2014-11-06 10:45:32. This causes an allocation that fills the heap. We eventually get stuck in a Full GC storm and get an OOME in the logs. I have run the java-driver tests against Cassandra 1.2.19 and 2.1.1. The error does not occur. It seems specific to 2.0.11. was: We ran drivers 3-days endurance tests against Cassandra 2.0.11 and C* crashed with an OOME. This happened both with ruby-driver 1.0-beta and java-driver 2.0.8-snapshot. Attached are : | OOME_node_system.log | The system.log of one Cassandra node that crashed | | gc.log.gz | The GC log on the same node | | heap-usage-after-gc.png | The heap occupancy evolution after every GC cycle | | heap-usage-after-gc-zoom.png | A focus on when things start to go wrong | Workload : Our test executes 5 CQL statements (select, insert, select, delete, select) for a given unique id, during 3 days, using multiple threads. There is not change in the workload during the test. Symptoms : In the attached log, it seems something starts in Cassandra between 2014-11-06 10:29:22 and 2014-11-06 10:45:32. This causes an allocation that fills the heap. We eventually get stuck in a Full GC storm and get an OOME in the logs. I have run the java-driver tests against Cassandra 1.2.19 and 2.1.1. The error does not occur. It seems specific to 2.0.10. OOME in Cassandra 2.0.11 Key: CASSANDRA-8285 URL: https://issues.apache.org/jira/browse/CASSANDRA-8285 Project: Cassandra Issue Type: Bug Environment: Cassandra 2.0.11 + java-driver 2.0.8-SNAPSHOT Cassandra 2.0.11 + ruby-driver 1.0-beta Reporter: Pierre Laporte Attachments: OOME_node_system.log, gc.log.gz, heap-usage-after-gc-zoom.png, heap-usage-after-gc.png We ran drivers 3-days endurance tests against Cassandra 2.0.11 and C* crashed with an OOME. This happened both with ruby-driver 1.0-beta and java-driver 2.0.8-snapshot. Attached are : | OOME_node_system.log | The system.log of one Cassandra node that crashed | | gc.log.gz | The GC log on the same node | | heap-usage-after-gc.png | The heap occupancy evolution after every GC cycle | | heap-usage-after-gc-zoom.png | A focus on when things start to go wrong | Workload : Our test executes 5 CQL statements (select, insert, select, delete, select) for a given unique id, during 3 days, using multiple threads. There is not change in the workload during the test. Symptoms : In the attached log, it seems something starts in Cassandra between 2014-11-06 10:29:22 and 2014-11-06 10:45:32. This causes an allocation that fills the heap. We eventually get stuck in a Full GC storm and get an OOME in the logs. I have run the java-driver tests against Cassandra 1.2.19 and 2.1.1. The error does not occur. It seems specific to 2.0.11. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8276) Unusable prepared statement with 65k parameters
[ https://issues.apache.org/jira/browse/CASSANDRA-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pierre Laporte updated CASSANDRA-8276: -- Summary: Unusable prepared statement with 65k parameters (was: Prepared statement unavailable) Unusable prepared statement with 65k parameters --- Key: CASSANDRA-8276 URL: https://issues.apache.org/jira/browse/CASSANDRA-8276 Project: Cassandra Issue Type: Bug Environment: Cassandra 2.0.10 Java driver 2.0.8-SNAPSHOT Reporter: Pierre Laporte We had an issue ([JAVA-515|https://datastax-oss.atlassian.net/browse/JAVA-515]) in the java-driver when the number of parameters in a statement is greater than the supported limit (65k). I added a limit-test to verify that prepared statements with 65535 parameters were accepted by the driver, but ran into an issue on the Cassandra side. Basically, the test runs forever, because the driver receives an inconsistent answer from Cassandra. When we prepare the statement, C* answers that it is correctly prepared, however when we try to execute it, we receive a {{UNPREPARED}} answer. [Here is the code|https://github.com/datastax/java-driver/blob/JAVA-515/driver-core/src/test/java/com/datastax/driver/core/PreparedStatementTest.java#L448] to reproduce the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8276) Prepared statement unavailable
Pierre Laporte created CASSANDRA-8276: - Summary: Prepared statement unavailable Key: CASSANDRA-8276 URL: https://issues.apache.org/jira/browse/CASSANDRA-8276 Project: Cassandra Issue Type: Bug Environment: Cassandra 2.0.10 Java driver 2.0.8-SNAPSHOT Reporter: Pierre Laporte We had an issue ([JAVA-515|https://datastax-oss.atlassian.net/browse/JAVA-515]) in the java-driver when the number of parameters in a statement is greater than the supported limit (65k). I added a limit-test to verify that prepared statements with 65535 parameters were accepted by the driver, but ran into an issue on the Cassandra side. Basically, the test runs forever, because the driver receives an inconsistent answer from Cassandra. When we prepare the statement, C* answers that it is correctly prepared, however when we try to execute it, we receive a {{UNPREPARED}} answer. [Here is the code|https://github.com/datastax/java-driver/blob/JAVA-515/driver-core/src/test/java/com/datastax/driver/core/PreparedStatementTest.java#L448] to reproduce the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7743) Possible C* OOM issue during long running test
[ https://issues.apache.org/jira/browse/CASSANDRA-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095698#comment-14095698 ] Pierre Laporte commented on CASSANDRA-7743: --- [~tjake] Sure, I just started a new test with this option Possible C* OOM issue during long running test -- Key: CASSANDRA-7743 URL: https://issues.apache.org/jira/browse/CASSANDRA-7743 Project: Cassandra Issue Type: Bug Components: Core Environment: Google Compute Engine, n1-standard-1 Reporter: Pierre Laporte Fix For: 2.1.0 During a long running test, we ended up with a lot of java.lang.OutOfMemoryError: Direct buffer memory errors on the Cassandra instances. Here is an example of stacktrace from system.log : {code} ERROR [SharedPool-Worker-1] 2014-08-11 11:09:34,610 ErrorMessage.java:218 - Unexpected exception during request java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_25] at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) ~[na:1.7.0_25] at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) ~[na:1.7.0_25] at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.PoolArena.allocate(PoolArena.java:98) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:251) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:112) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:507) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:464) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:378) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:350) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25] {code} The test consisted of a 3-nodes cluster of n1-standard-1 GCE instances (1 vCPU, 3.75 GB RAM) running cassandra-2.1.0-rc5, and a n1-standard-2 instance running the test. After ~2.5 days, several requests start to fail and we see the previous stacktraces in the system.log file. The output from linux ‘free’ and ‘meminfo’ suggest that there is still memory available. {code} $ free -m total used free sharedbuffers cached Mem: 3702 3532169 0161854 -/+ buffers/cache: 2516 1185 Swap:0 0 0 $ head -n 4 /proc/meminfo MemTotal:3791292 kB MemFree: 173568 kB Buffers: 165608 kB Cached: 874752 kB {code} These errors do not affect all the queries we run. The cluster is still responsive but is unable to display tracing information using cqlsh : {code} $ ./bin/nodetool --host 10.240.137.253 status duration_test Datacenter: DC1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.240.98.27925.17 KB 256 100.0% 41314169-eff5-465f-85ea-d501fd8f9c5e RAC1 UN 10.240.137.253 1.1 MB 256 100.0% c706f5f9-c5f3-4d5e-95e9-a8903823827e RAC1 UN 10.240.72.183 896.57 KB 256 100.0% 15735c4d-98d4-4ea4-a305-7ab2d92f65fc RAC1 $ echo 'tracing on;
[jira] [Commented] (CASSANDRA-7743) Possible C* OOM issue during long running test
[ https://issues.apache.org/jira/browse/CASSANDRA-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093965#comment-14093965 ] Pierre Laporte commented on CASSANDRA-7743: --- [~benedict] Actually, the nodes are running with memtable_allocation_type: heap_buffers. [~jbellis] The test failed on bigger instance too. I just realized that setting -XX:MaxDirectMemorySize=-1 is useless since it is the default value. Now I am doubting -1 really means unlimited... Restarting a new one with -XX:MaxDirectMemorySize=1G to see if things change. Possible C* OOM issue during long running test -- Key: CASSANDRA-7743 URL: https://issues.apache.org/jira/browse/CASSANDRA-7743 Project: Cassandra Issue Type: Bug Components: Core Environment: Google Compute Engine, n1-standard-1 Reporter: Pierre Laporte During a long running test, we ended up with a lot of java.lang.OutOfMemoryError: Direct buffer memory errors on the Cassandra instances. Here is an example of stacktrace from system.log : {code} ERROR [SharedPool-Worker-1] 2014-08-11 11:09:34,610 ErrorMessage.java:218 - Unexpected exception during request java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_25] at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) ~[na:1.7.0_25] at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) ~[na:1.7.0_25] at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.PoolArena.allocate(PoolArena.java:98) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:251) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:112) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:507) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:464) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:378) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:350) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25] {code} The test consisted of a 3-nodes cluster of n1-standard-1 GCE instances (1 vCPU, 3.75 GB RAM) running cassandra-2.1.0-rc5, and a n1-standard-2 instance running the test. After ~2.5 days, several requests start to fail and we see the previous stacktraces in the system.log file. The output from linux ‘free’ and ‘meminfo’ suggest that there is still memory available. {code} $ free -m total used free sharedbuffers cached Mem: 3702 3532169 0161854 -/+ buffers/cache: 2516 1185 Swap:0 0 0 $ head -n 4 /proc/meminfo MemTotal:3791292 kB MemFree: 173568 kB Buffers: 165608 kB Cached: 874752 kB {code} These errors do not affect all the queries we run. The cluster is still responsive but is unable to display tracing information using cqlsh : {code} $ ./bin/nodetool --host 10.240.137.253 status duration_test Datacenter: DC1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.240.98.27925.17 KB 256 100.0%
[jira] [Commented] (CASSANDRA-7743) Possible C* OOM issue during long running test
[ https://issues.apache.org/jira/browse/CASSANDRA-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14094030#comment-14094030 ] Pierre Laporte commented on CASSANDRA-7743: --- Sure, I have uploaded one here : https://drive.google.com/file/d/0BxvGkaXP3ayeMDlRTWJ2MVhvT0E/edit?usp=sharing Possible C* OOM issue during long running test -- Key: CASSANDRA-7743 URL: https://issues.apache.org/jira/browse/CASSANDRA-7743 Project: Cassandra Issue Type: Bug Components: Core Environment: Google Compute Engine, n1-standard-1 Reporter: Pierre Laporte During a long running test, we ended up with a lot of java.lang.OutOfMemoryError: Direct buffer memory errors on the Cassandra instances. Here is an example of stacktrace from system.log : {code} ERROR [SharedPool-Worker-1] 2014-08-11 11:09:34,610 ErrorMessage.java:218 - Unexpected exception during request java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_25] at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) ~[na:1.7.0_25] at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) ~[na:1.7.0_25] at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.PoolArena.allocate(PoolArena.java:98) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:251) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:112) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:507) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:464) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:378) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:350) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25] {code} The test consisted of a 3-nodes cluster of n1-standard-1 GCE instances (1 vCPU, 3.75 GB RAM) running cassandra-2.1.0-rc5, and a n1-standard-2 instance running the test. After ~2.5 days, several requests start to fail and we see the previous stacktraces in the system.log file. The output from linux ‘free’ and ‘meminfo’ suggest that there is still memory available. {code} $ free -m total used free sharedbuffers cached Mem: 3702 3532169 0161854 -/+ buffers/cache: 2516 1185 Swap:0 0 0 $ head -n 4 /proc/meminfo MemTotal:3791292 kB MemFree: 173568 kB Buffers: 165608 kB Cached: 874752 kB {code} These errors do not affect all the queries we run. The cluster is still responsive but is unable to display tracing information using cqlsh : {code} $ ./bin/nodetool --host 10.240.137.253 status duration_test Datacenter: DC1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.240.98.27925.17 KB 256 100.0% 41314169-eff5-465f-85ea-d501fd8f9c5e RAC1 UN 10.240.137.253 1.1 MB 256 100.0% c706f5f9-c5f3-4d5e-95e9-a8903823827e RAC1 UN 10.240.72.183 896.57 KB 256 100.0% 15735c4d-98d4-4ea4-a305-7ab2d92f65fc
[jira] [Created] (CASSANDRA-7743) Possible C* OOM issue during long running test
Pierre Laporte created CASSANDRA-7743: - Summary: Possible C* OOM issue during long running test Key: CASSANDRA-7743 URL: https://issues.apache.org/jira/browse/CASSANDRA-7743 Project: Cassandra Issue Type: Bug Components: Core Environment: Google Compute Engine, n1-standard-1 Reporter: Pierre Laporte During a long running test, we ended up with a lot of java.lang.OutOfMemoryError: Direct buffer memory errors on the Cassandra instances. Here is an example of stacktrace from system.log : {code} ERROR [SharedPool-Worker-1] 2014-08-11 11:09:34,610 ErrorMessage.java:218 - Unexpected exception during request java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_25] at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) ~[na:1.7.0_25] at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) ~[na:1.7.0_25] at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.PoolArena.allocate(PoolArena.java:98) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:251) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:112) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:507) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:464) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:378) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:350) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25] {code} The test consisted of a 3-nodes cluster of n1-standard-1 GCE instances (1 vCPU, 3.75 GB RAM) running cassandra-2.1.0-rc5, and a n1-standard-2 instance running the test. After ~2.5 days, several requests start to fail and we see the previous stacktraces in the system.log file. The output from linux ‘free’ and ‘meminfo’ suggest that there is still memory available. {code} $ free -m total used free sharedbuffers cached Mem: 3702 3532169 0161854 -/+ buffers/cache: 2516 1185 Swap:0 0 0 $ head -n 4 /proc/meminfo MemTotal:3791292 kB MemFree: 173568 kB Buffers: 165608 kB Cached: 874752 kB {code} These errors do not affect all the queries we run. The cluster is still responsive but is unable to display tracing information using cqlsh : {code} $ ./bin/nodetool --host 10.240.137.253 status duration_test Datacenter: DC1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.240.98.27925.17 KB 256 100.0% 41314169-eff5-465f-85ea-d501fd8f9c5e RAC1 UN 10.240.137.253 1.1 MB 256 100.0% c706f5f9-c5f3-4d5e-95e9-a8903823827e RAC1 UN 10.240.72.183 896.57 KB 256 100.0% 15735c4d-98d4-4ea4-a305-7ab2d92f65fc RAC1 $ echo 'tracing on; select count(*) from duration_test.ints;' | ./bin/cqlsh 10.240.137.253 Now tracing requests. count --- 9486 (1 rows) Statement trace did not complete within 10 seconds {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7743) Possible C* OOM issue during long running test
[ https://issues.apache.org/jira/browse/CASSANDRA-7743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093010#comment-14093010 ] Pierre Laporte commented on CASSANDRA-7743: --- [~enigmacurry] Eclipse MAT shows 300k instances of java.nio.ByteBuffer[] but retaining only ~26MB. It only accounts for in-heap data. [~jbellis] Ok I am going to start two new tests: one on n1-standard-1 with -XX:MaxDirectMemorySize=-1 and another one on n1-standard-2 without this setting Possible C* OOM issue during long running test -- Key: CASSANDRA-7743 URL: https://issues.apache.org/jira/browse/CASSANDRA-7743 Project: Cassandra Issue Type: Bug Components: Core Environment: Google Compute Engine, n1-standard-1 Reporter: Pierre Laporte During a long running test, we ended up with a lot of java.lang.OutOfMemoryError: Direct buffer memory errors on the Cassandra instances. Here is an example of stacktrace from system.log : {code} ERROR [SharedPool-Worker-1] 2014-08-11 11:09:34,610 ErrorMessage.java:218 - Unexpected exception during request java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_25] at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) ~[na:1.7.0_25] at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) ~[na:1.7.0_25] at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:434) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.PoolArena.allocate(PoolArena.java:98) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:251) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:112) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:507) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:464) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:378) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:350) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) ~[netty-all-4.0.20.Final.jar:4.0.20.Final] at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25] {code} The test consisted of a 3-nodes cluster of n1-standard-1 GCE instances (1 vCPU, 3.75 GB RAM) running cassandra-2.1.0-rc5, and a n1-standard-2 instance running the test. After ~2.5 days, several requests start to fail and we see the previous stacktraces in the system.log file. The output from linux ‘free’ and ‘meminfo’ suggest that there is still memory available. {code} $ free -m total used free sharedbuffers cached Mem: 3702 3532169 0161854 -/+ buffers/cache: 2516 1185 Swap:0 0 0 $ head -n 4 /proc/meminfo MemTotal:3791292 kB MemFree: 173568 kB Buffers: 165608 kB Cached: 874752 kB {code} These errors do not affect all the queries we run. The cluster is still responsive but is unable to display tracing information using cqlsh : {code} $ ./bin/nodetool --host 10.240.137.253 status duration_test Datacenter: DC1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.240.98.27925.17 KB 256 100.0% 41314169-eff5-465f-85ea-d501fd8f9c5e RAC1 UN 10.240.137.253