Compressing data types
Hi! Just wondering why this doesn't already exist: wouldn't it make sense to have decorating data types that compress (gzip, snappy) other data types (esp. UTF8Type, AsciiType) transparently? -tcn
Re: sstable2json2sstable bug with json data stored
On 6/15/11 17:41, Timo Nentwig wrote: (json can likely be boiled down even more...) Any JSON (well, probably anything with quotes...) breaks it: { 74657374: [[data, {foo:bar}, 1308209845388000]] } [default@foo] set transactions[test][data]='{foo:bar}'; I feared that storing data in a readable fashion would be a fateful idea. https://issues.apache.org/jira/browse/CASSANDRA-2780
Re: sstable2json2sstable bug with json data stored
On 6/16/11 10:06, Sasha Dolgy wrote: The JSON you are showing below is an export from cassandra? Yes. Just posted the solution: https://issues.apache.org/jira/browse/CASSANDRA-2780?focusedCommentId=13050274page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13050274 Guess this could simply be done in the quote() method. { 74657374: [[data, {foo:bar}, 1308209845388000]] } Does this work? { 74657374: [[data, {foo:bar}, 1308209845388000]] } -sd On Thu, Jun 16, 2011 at 9:49 AM, Timo Nentwigtimo.nent...@toptarif.de wrote: On 6/15/11 17:41, Timo Nentwig wrote: (json can likely be boiled down even more...) Any JSON (well, probably anything with quotes...) breaks it: { 74657374: [[data, {foo:bar}, 1308209845388000]] } [default@foo] set transactions[test][data]='{foo:bar}'; I feared that storing data in a readable fashion would be a fateful idea. https://issues.apache.org/jira/browse/CASSANDRA-2780
Re: sstable2json2sstable bug with json data stored
On 6/16/11 10:12, Timo Nentwig wrote: On 6/16/11 10:06, Sasha Dolgy wrote: The JSON you are showing below is an export from cassandra? Yes. Just posted the solution: https://issues.apache.org/jira/browse/CASSANDRA-2780?focusedCommentId=13050274page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13050274 Guess this could simply be done in the quote() method. Hm, is this the way it's supposed to be? [default@foo] set transactions[test][data]='{foo:bar}'; Value inserted. [default@foo] get transactions[test][data]; = (column=data, value={foo:bar}, timestamp=1308214517443000) [default@foo] set transactions[test][data]='{\foo\:bar}'; Value inserted. [default@foo] get transactions[test][data]; = (column=data, value={foo:bar}, timestamp=1308214532484000) Otherwise here's a regex that cares about existing backslashes: private static String quote(final String val) { return String.format(\%s\, val.replaceAll((?!)\, \\\)); } { 74657374: [[data, {foo:bar}, 1308209845388000]] } Does this work? { 74657374: [[data, {foo:bar}, 1308209845388000]] } -sd On Thu, Jun 16, 2011 at 9:49 AM, Timo Nentwigtimo.nent...@toptarif.de wrote: On 6/15/11 17:41, Timo Nentwig wrote: (json can likely be boiled down even more...) Any JSON (well, probably anything with quotes...) breaks it: { 74657374: [[data, {foo:bar}, 1308209845388000]] } [default@foo] set transactions[test][data]='{foo:bar}'; I feared that storing data in a readable fashion would be a fateful idea. https://issues.apache.org/jira/browse/CASSANDRA-2780
sstable2json2sstable bug with json data stored
Hi! Couldn't google anybody having yet experienced this, so I do (0.8): { foo:{ foo:{ foo:bar, foo:bar, foo:bar, foo:, foo:bar, foo:bar, id:123456 } }, foo:null } (json can likely be boiled down even more...) [default@foo] set transactions[test][data]='{foo:{foo:{foo:bar,foo:bar,foo:bar,foo:,foo:bar,foo:bar,id:123456}},foo:null}'; $ ./sstable2json /var/lib/cassandra/data/foo/transactions-g-1-Data.db /tmp/foo $ cat /tmp/foo { 74657374: [[data, {foo:{foo:{foo:bar,foo:bar,foo:bar,foo:,foo:bar,foo:bar,id:123456}},foo:null}, 1308152085301000]] } $ ./json2sstable -s -c transactions -K foo /tmp/json /tmp/ss-g-1-Data.db Counting keys to import, please wait... (NOTE: to skip this use -n num_keys) org.codehaus.jackson.JsonParseException: Unexpected character ('f' (code 102)): was expecting comma to separate ARRAY entries at [Source: /tmp/json; line: 2, column: 27] at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:929) at org.codehaus.jackson.impl.JsonParserBase._reportError(JsonParserBase.java:632) at org.codehaus.jackson.impl.JsonParserBase._reportUnexpectedChar(JsonParserBase.java:565) at org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:128) at org.codehaus.jackson.impl.JsonParserBase.skipChildren(JsonParserBase.java:263) at org.apache.cassandra.tools.SSTableImport.importSorted(SSTableImport.java:328) at org.apache.cassandra.tools.SSTableImport.importJson(SSTableImport.java:252) at org.apache.cassandra.tools.SSTableImport.main(SSTableImport.java:476) ERROR: Unexpected character ('f' (code 102)): was expecting comma to separate ARRAY entries at [Source: /tmp/json; line: 2, column: 27] create column family transactions with comparator = AsciiType and key_validation_class = AsciiType and default_validation_class = UTF8Type and keys_cached = 0 and rows_cached = 0 and column_metadata = [{ column_name : uuid, validation_class : LexicalUUIDType, index_name : uuid_idx, index_type : 0 }, { column_name : session_id, validation_class : LexicalUUIDType, index_name : session_id_idx, index_type : 0 }, { column_name : guid, validation_class : LexicalUUIDType, index_name : guid_idx, index_type : 0 }, { column_name : timestamp, validation_class : LongType }, { column_name : completed, validation_class : BytesType }, { column_name : user_id, validation_class : LongType }]; ;
CQL/JDBC: Cannot locate cassandra.yaml
$ CLASSPATH=~/sqlshell/lib/ ~/sqlshell/bin/sqlshell org.apache.cassandra.cql.jdbc.CassandraDriver,jdbc:cassandra:foo/bar@localhost:9160/ks 2011-06-05 16:21:54,452 INFO [main] org.apache.cassandra.cql.jdbc.Connection - Connected to localhost:9160 2011-06-05 16:21:54,517 ERROR [main] org.apache.cassandra.config.DatabaseDescriptor - Fatal configuration error org.apache.cassandra.config.ConfigurationException: Cannot locate cassandra.yaml at org.apache.cassandra.config.DatabaseDescriptor.getStorageConfigURL(DatabaseDescriptor.java:111) at org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:121) at org.apache.cassandra.config.CFMetaData.fromThrift(CFMetaData.java:642) at org.apache.cassandra.cql.jdbc.ColumnDecoder.init(ColumnDecoder.java:61) at org.apache.cassandra.cql.jdbc.Connection.execute(Connection.java:142) at org.apache.cassandra.cql.jdbc.Connection.execute(Connection.java:124) at org.apache.cassandra.cql.jdbc.CassandraConnection.init(CassandraConnection.java:83) at org.apache.cassandra.cql.jdbc.CassandraDriver.connect(CassandraDriver.java:86) at org.clapper.sqlshell.DatabaseConnector.connectJDBC(connector.scala:249) at org.clapper.sqlshell.DatabaseConnector.connect(connector.scala:175) at org.clapper.sqlshell.SQLShell.init(SQLShell.scala:168) at org.clapper.sqlshell.tool.Tool$.main(tool.scala:96) at org.clapper.sqlshell.tool.Tool.main(tool.scala) Cannot locate cassandra.yaml Fatal configuration error; unable to start server. See log for stacktrace. And BTW how do I specify no user/password? Looking at the code I maybe could type a plain slash but that's kind of silly.
Re: CQL/JDBC: Cannot locate cassandra.yaml
On 6/5/11 16:26, Timo Nentwig wrote: $ CLASSPATH=~/sqlshell/lib/ ~/sqlshell/bin/sqlshell org.apache.cassandra.cql.jdbc.CassandraDriver,jdbc:cassandra:foo/bar@localhost:9160/ks 2011-06-05 16:21:54,452 INFO [main] org.apache.cassandra.cql.jdbc.Connection - Connected to localhost:9160 2011-06-05 16:21:54,517 ERROR [main] org.apache.cassandra.config.DatabaseDescriptor - Fatal configuration error org.apache.cassandra.config.ConfigurationException: Cannot locate cassandra.yaml Hmm, worked-around that by setting -Dcassandra.config (hmm, the client needs the server's config...?). 2011-06-05 16:35:20,960 INFO [main] org.apache.cassandra.config.DatabaseDescriptor - DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap Exception in thread main org.apache.cassandra.cql.jdbc.DriverResolverException at org.apache.cassandra.cql.jdbc.CassandraConnection.init(CassandraConnection.java:107) at org.apache.cassandra.cql.jdbc.CassandraDriver.connect(CassandraDriver.java:86) at org.clapper.sqlshell.DatabaseConnector.connectJDBC(connector.scala:249) at org.clapper.sqlshell.DatabaseConnector.connect(connector.scala:175) at org.clapper.sqlshell.SQLShell.init(SQLShell.scala:168) at org.clapper.sqlshell.tool.Tool$.main(tool.scala:96) at org.clapper.sqlshell.tool.Tool.main(tool.scala) Not very verbose :-\ May have something to do with my l/p being just / for AllowAll. And BTW how do I specify no user/password? Looking at the code I maybe could type a plain slash but that's kind of silly.
Quorum + Token range confusion
Hi! 5 nodes, replication factor of 2, fifth node down. As long as I write a single column with hector or pelops, it works. With 2 columns it fails because there are supposed to few servers to reach quorum. Confusing. If I decommission the fifth node with nodetool quorum works again and I can shut down another (the forth) node as expected. The ring was manually balanced as described here: http://wiki.apache.org/cassandra/Operations#Load_balancing. Don't get it. Can somebody please explain? thx tcn
Re: Quorum + Token range confusion
On 5/25/11 13:45, Watanabe Maki wrote: I think I don't get your situation yet, but if you use RF=2, CL=QUORUM is identical with CL=ALL. Does it explain your experience? If it was CL=ALL, it would explain it, however I does not explain why it works when I decommission one node. RF=2 means that data is put on 2 nodes in total (not 3), right?
Re: Quorum + Token range confusion
On 5/25/11 14:08, Timo Nentwig wrote: On 5/25/11 13:45, Watanabe Maki wrote: I think I don't get your situation yet, but if you use RF=2, CL=QUORUM is identical with CL=ALL. Does it explain your experience? If it was CL=ALL, it would explain it, however I does not explain why it works when I decommission one node. RF=2 means that data is put on 2 nodes in total (not 3), right? Just self-diagnosed dementia: I'ts 2 out of the number of replicas, not the number of nodes. - now I get it! :) http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Quorum-killing-1-out-of-3-server-kills-the-cluster-td5819523.html
Re: suggestion: sstable2json to ignore TTL
On Apr 27, 2011, at 16:59, Timo Nentwig wrote: On Apr 27, 2011, at 16:52, Edward Capriolo wrote: The method being private is not a deal-breaker.While not good software engineering practice you can copy and paste the code and renamed the class SSTable2MyJson or whatever. Sure I can do this but I'd like to have it just available in the distro (incl. a script I can simply call). Just read that the json format changed in 0.8. That's why I don't want to copy code... unfortunately nobody cares about CASSANDRA-2582, so if I can do anything to help pls. tell me. Should I post this rather on the dev mailing list?
Re: suggestion: sstable2json to ignore TTL
On Apr 27, 2011, at 17:10, Edward Capriolo wrote: I would think most people who watch dev watch this list. http://wiki.apache.org/cassandra/HowToContribute So, here it is: https://issues.apache.org/jira/browse/CASSANDRA-2582
suggestion: sstable2json to ignore TTL
Hi! What about a simple option for sstable2json to not print out expiration TTL+LocalDeletionTime (maybe even ignore isMarkedForDelete)? I want to move old data from a live cluster (with TTL) to an archive cluster (-data does not expire there). BTW is there a smarter way to do this? Actually I'd like to dump only new data (i.e. data after a certain timestamp).
Re: suggestion: sstable2json to ignore TTL
On Apr 27, 2011, at 15:58, Edward Capriolo wrote: Hacking a separate copy of SSTable2json is trivial. Just look for the section of the code that writes the data and change what it writes. If I did. The method's private... you can make it a knob --nottl then it could be included in Cassandra core if the project thinks it has global appeal. That's why I was posting this suggestion :)
Re: suggestion: sstable2json to ignore TTL
On Apr 27, 2011, at 16:52, Edward Capriolo wrote: The method being private is not a deal-breaker.While not good software engineering practice you can copy and paste the code and renamed the class SSTable2MyJson or whatever. Sure I can do this but I'd like to have it just available in the distro (incl. a script I can simply call). Should I post this rather on the dev mailing list?
cassandra-cli (output) broken for super columns
This is not what it's supposed to be like, is it? [default@foo] get foo[page-field]; = (super_column=20110208, (column=82f4c650-2d53-11e0-a08b-58b035f3f60d, value=msg1, timestamp=1297159430471000) (column=82f4c650-2d53-11e0-a08b-58b035f3f60e, value=msg2, timestamp=1297159437423000) (column=82f4c650-2d53-11e0-a08b-58b035f3f60f, value=msg3, timestamp=1297159439855000)) Returned 1 results. [default@foo] get foo[page-field][20110208]; , value=msg1, timestamp=1297159430471000) = (column=???P-S???X?5??, value=msg2, timestamp=1297159437423000) = (column=???P-S???X?5??, value=msg3, timestamp=1297159439855000) Returned 3 results. [default@foo] get foo[page-field][20110208][82f4c650-2d53-11e0-a08b-58b035f3f60d]; , value=msg1, timestamp=1297159430471000) [default@foo] get foo[page-field][20110208][82f4c650-2d53-11e0-a08b-58b035f3f60e]; = (column=???P-S???X?5??, value=msg2, timestamp=1297159437423000) - name: foo column_type: Super compare_with: AsciiType compare_subcolumns_with: TimeUUIDType default_validation_class: AsciiType
Re: cassandra-cli (output) broken for super columns
On Feb 8, 2011, at 13:41, Stephen Connolly wrote: On 8 February 2011 10:38, Timo Nentwig timo.nent...@toptarif.de wrote: This is not what it's supposed to be like, is it? Looks alright: [default@foo] get foo[page-field]; = (super_column=20110208, (column=82f4c650-2d53-11e0-a08b-58b035f3f60d, value=msg1, timestamp=1297159430471000) (column=82f4c650-2d53-11e0-a08b-58b035f3f60e, value=msg2, timestamp=1297159437423000) (column=82f4c650-2d53-11e0-a08b-58b035f3f60f, value=msg3, timestamp=1297159439855000)) Returned 1 results. Missing first half of column 1 is and UUID is not printed correctly anymore: [default@foo] get foo[page-field][20110208]; , value=msg1, timestamp=1297159430471000) = (column=???P-S???X?5??, value=msg2, timestamp=1297159437423000) = (column=???P-S???X?5??, value=msg3, timestamp=1297159439855000) Returned 3 results. Still prints only half of the column: [default@foo] get foo[page-field][20110208][82f4c650-2d53-11e0-a08b-58b035f3f60d]; , value=msg1, timestamp=1297159430471000) Applies only to first column?! [default@foo] get foo[page-field][20110208][82f4c650-2d53-11e0-a08b-58b035f3f60e]; = (column=???P-S???X?5??, value=msg2, timestamp=1297159437423000) - name: foo column_type: Super compare_with: AsciiType compare_subcolumns_with: TimeUUIDType default_validation_class: AsciiType Is it the ?'s that you are complaining about or is it something else? If it is the ?'s have you got a mismatch between the character encoding in your shell and UTF-8? Nope. See above :) Esp. that the first column isn't printed completely.
Re: Multiple indexes - how does Cassandra handle these internally?
On Jan 21, 2011, at 13:55, buddhasystem wrote: if I use multiple secondary indexes in the query, what will Cassandra do? Some examples say it will index on first EQ and then loop on others. Does it ever do a proper index product to avoid inner loops? Just asked the same question on the hector-dev group a few minutes ago. Seems indeed to be the case that cassandra only uses 1 index. At least this would make sense narrowing down issues I have that get foo where col1=cond1 and col2=cond2 works while flipping conditions get foo where col2=cond2 and col1=cond1 returns no results no more. Unfortunately nobody around here seems to care...
Re: Multiple indexes - how does Cassandra handle these internally?
On Jan 21, 2011, at 16:46, Maxim Potekhin wrote: But Timo, this is even more mysterious! If both conditions are met, at least something must be returned in the second query. Have you tried this in CLI? That would allow you to at least alleviate client concerns. I did this on the CLI only so far. So value comparison on the index seems to be done differently than in the nested loop...or something. Don't know, don't know the code base well enough to debug this down to the very bottom either. But it's actually only a CF with 2 cols (AsciiType and IntegerType) and a command in the CLI so not too time-consuming to reproduce.
Re: cassandra-cli: where a and b (works) vs. where b and a (doesn't)
On Jan 18, 2011, at 18:53, Nate McCall wrote: When doing mixed types on slicing operations, you should use ByteArraySerializer and handle the conversions by hand. We have an issue open for making this more graceful. Pls. have a look at http://groups.google.com/group/hector-dev/browse_thread/thread/8c21b2e33cbacc3b Somebody should look into this.
cassandra-cli: where a and b (works) vs. where b and a (doesn't)
I put a secondary index on rc (IntegerType) and user_agent (AsciiType). Don't understand this bevahiour at all, can somebody explain? [default@tracking] get crawler where user_agent=foo and rc=200; 0 Row Returned. [default@tracking] get crawler where rc=200 and user_agent=foo; --- RowKey: -??2 = (column=rc, value=200, timestamp=1295347760933000) = (column=url, value=http://www/0, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760915000) 1 Row Returned. [default@tracking] get crawler where rc199 and user_agent=foo; 0 Row Returned. [default@tracking] get crawler where user_agent=foo; --- RowKey: -??7 = (column=rc, value=207, timestamp=1295347760935000) = (column=url, value=http://www/8, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760917000) --- RowKey: -??8 = (column=rc, value=209, timestamp=1295347760935000) = (column=url, value=http://www/9, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760916000) --- RowKey: -??5 = (column=rc, value=201, timestamp=1295347760937000) = (column=url, value=http://www/2, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760916000) --- RowKey: -??6 = (column=rc, value=205, timestamp=1295347760935000) = (column=url, value=http://www/5, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760917000) --- RowKey: -??2 = (column=rc, value=200, timestamp=1295347760933000) = (column=url, value=http://www/0, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760915000) 5 Rows Returned.
Re: cassandra-cli: where a and b (works) vs. where b and a (doesn't)
On Jan 18, 2011, at 12:02, Aaron Morton wrote: Does wrapping foo in single quotes help? No. Also, does this help http://www.datastax.com/blog/whats-new-cassandra-07-secondary-indexes Actually this doesn't even compile because addGtExpression expects a String type (?!). StringSerializer ss = StringSerializer.get(); IndexedSlicesQueryString, String, String indexedSlicesQuery = HFactory.createIndexedSlicesQuery(keyspace, ss, ss, ss); indexedSlicesQuery.setColumnNames(full_name, birth_date, state); indexedSlicesQuery.addGtExpression(birth_date, 1970L); indexedSlicesQuery.addEqualsExpression(state, UT); indexedSlicesQuery.setColumnFamily(users); indexedSlicesQuery.setStartKey(); QueryResultOrderedRowsString, String, String result = indexedSlicesQuery.execute(); Aaron On 18/01/2011, at 11:54 PM, Timo Nentwig timo.nent...@toptarif.de wrote: I put a secondary index on rc (IntegerType) and user_agent (AsciiType). Don't understand this bevahiour at all, can somebody explain? [default@tracking] get crawler where user_agent=foo and rc=200; 0 Row Returned. [default@tracking] get crawler where rc=200 and user_agent=foo; --- RowKey: -??2 = (column=rc, value=200, timestamp=1295347760933000) = (column=url, value=http://www/0, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760915000) 1 Row Returned. [default@tracking] get crawler where rc199 and user_agent=foo; 0 Row Returned. [default@tracking] get crawler where user_agent=foo; --- RowKey: -??7 = (column=rc, value=207, timestamp=1295347760935000) = (column=url, value=http://www/8, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760917000) --- RowKey: -??8 = (column=rc, value=209, timestamp=1295347760935000) = (column=url, value=http://www/9, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760916000) --- RowKey: -??5 = (column=rc, value=201, timestamp=1295347760937000) = (column=url, value=http://www/2, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760916000) --- RowKey: -??6 = (column=rc, value=205, timestamp=1295347760935000) = (column=url, value=http://www/5, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760917000) --- RowKey: -??2 = (column=rc, value=200, timestamp=1295347760933000) = (column=url, value=http://www/0, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760915000) 5 Rows Returned.
Re: cassandra-cli: where a and b (works) vs. where b and a (doesn't)
On Jan 18, 2011, at 12:05, Timo Nentwig wrote: On Jan 18, 2011, at 12:02, Aaron Morton wrote: Does wrapping foo in single quotes help? No. Also, does this help http://www.datastax.com/blog/whats-new-cassandra-07-secondary-indexes Actually this doesn't even compile because addGtExpression expects a String type (?!). This works as expected: .addInsertion(now, cf, createColumn(rc, a, SS, StringSerializer.get())).execute(); while this doesn't: .addInsertion(now, cf, createColumn(rc, 97, SS, IntegerSerializer.get())).execute(); The only difference is that the IntegerSerializer pads the byte array with zeros. Shouldn't matter (?). But it does. I dumped both versions to JSON and reimported them[*]. Same behavior. Then I manually removed the trailing six zeros from the IntegerSerializer version and retried. Same behavior. [*] BTW when reimporting the JSON data the secondary indices are not being recreated. I had to remove the system keyspace and reimport the schema in order to trigger that... StringSerializer ss = StringSerializer.get(); IndexedSlicesQueryString, String, String indexedSlicesQuery = HFactory.createIndexedSlicesQuery(keyspace, ss, ss, ss); indexedSlicesQuery.setColumnNames(full_name, birth_date, state); indexedSlicesQuery.addGtExpression(birth_date, 1970L); indexedSlicesQuery.addEqualsExpression(state, UT); indexedSlicesQuery.setColumnFamily(users); indexedSlicesQuery.setStartKey(); QueryResultOrderedRowsString, String, String result = indexedSlicesQuery.execute(); Aaron On 18/01/2011, at 11:54 PM, Timo Nentwig timo.nent...@toptarif.de wrote: I put a secondary index on rc (IntegerType) and user_agent (AsciiType). Don't understand this bevahiour at all, can somebody explain? [default@tracking] get crawler where user_agent=foo and rc=200; 0 Row Returned. [default@tracking] get crawler where rc=200 and user_agent=foo; --- RowKey: -??2 = (column=rc, value=200, timestamp=1295347760933000) = (column=url, value=http://www/0, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760915000) 1 Row Returned. [default@tracking] get crawler where rc199 and user_agent=foo; 0 Row Returned. [default@tracking] get crawler where user_agent=foo; --- RowKey: -??7 = (column=rc, value=207, timestamp=1295347760935000) = (column=url, value=http://www/8, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760917000) --- RowKey: -??8 = (column=rc, value=209, timestamp=1295347760935000) = (column=url, value=http://www/9, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760916000) --- RowKey: -??5 = (column=rc, value=201, timestamp=1295347760937000) = (column=url, value=http://www/2, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760916000) --- RowKey: -??6 = (column=rc, value=205, timestamp=1295347760935000) = (column=url, value=http://www/5, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760917000) --- RowKey: -??2 = (column=rc, value=200, timestamp=1295347760933000) = (column=url, value=http://www/0, timestamp=1295347760933000) = (column=user_agent, value=foo, timestamp=1295347760915000) 5 Rows Returned.
Re: Adding secondary index: java.lang.ArithmeticException: / by zero
On Dec 23, 2010, at 12:34, Timo Nentwig wrote: On Dec 23, 2010, at 9:34, Timo Nentwig wrote: I was about to add a secondary index (which apparently failed) to existing data. When I restarted the node it crashed (!) with: It crashed because it ran out of heap space (2G). So I increased to 3.5G but after a whlie it's caught in full GC again. The node holds some 50G of data. So, if 3.5G isn't sufficient, how do I determine how much memory it's going to need and more important how can I cancel such indexing if I don't have enough memory available? Or did I hit some bug and should simply wait for rc3? The good news: with 3.5G it didn't crash, yet. The bad news: the node is full GCing for over 1 day now... Any advice what to do with it?
Adding secondary index: java.lang.ArithmeticException: / by zero
I was about to add a secondary index (which apparently failed) to existing data. When I restarted the node it crashed (!) with: INFO 09:21:36,510 Opening /var/lib/cassandra/data/test/tracking.6b6579-tmp-e-1 ERROR 09:21:36,512 Exception encountered during startup. java.lang.ArithmeticException: / by zero at org.apache.cassandra.io.sstable.SSTable.estimateRowsFromIndex(SSTable.java:233) at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:284) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:200) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:225) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:448) at org.apache.cassandra.db.ColumnFamilyStore.addIndex(ColumnFamilyStore.java:305) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:246) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:448) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:436) at org.apache.cassandra.db.Table.initCf(Table.java:360) at org.apache.cassandra.db.Table.init(Table.java:290) at org.apache.cassandra.db.Table.open(Table.java:107) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:138) at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:216) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) Exception encountered during startup. java.lang.ArithmeticException: / by zero at org.apache.cassandra.io.sstable.SSTable.estimateRowsFromIndex(SSTable.java:233) at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:284) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:200) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:225) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:448) at org.apache.cassandra.db.ColumnFamilyStore.addIndex(ColumnFamilyStore.java:305) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:246) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:448) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:436) at org.apache.cassandra.db.Table.initCf(Table.java:360) at org.apache.cassandra.db.Table.init(Table.java:290) at org.apache.cassandra.db.Table.open(Table.java:107) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:138) at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:216) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) So, I deleted the file which let's cassandra to startup again (and starting all over to build the secondary index). Since 0.7rc2 was too unstable I'm on SNAPSHOT from Dec 17.
Re: Adding secondary index: java.lang.ArithmeticException: / by zero
On Dec 23, 2010, at 9:34, Timo Nentwig wrote: I was about to add a secondary index (which apparently failed) to existing data. When I restarted the node it crashed (!) with: It crashed because it ran out of heap space (2G). So I increased to 3.5G but after a whlie it's caught in full GC again. The node holds some 50G of data. So, if 3.5G isn't sufficient, how do I determine how much memory it's going to need and more important how can I cancel such indexing if I don't have enough memory available? Or did I hit some bug and should simply wait for rc3? INFO 09:21:36,510 Opening /var/lib/cassandra/data/test/tracking.6b6579-tmp-e-1 ERROR 09:21:36,512 Exception encountered during startup. java.lang.ArithmeticException: / by zero ... So, I deleted the file which let's cassandra to startup again (and starting all over to build the secondary index). Since 0.7rc2 was too unstable I'm on SNAPSHOT from Dec 17.
Re: java.io.IOException: No space left on device
On Dec 22, 2010, at 16:20, Peter Schuller wrote: And the data could be more evenly balanced, obviously. However the nodes fails to startup because due of lacking disk space (instead of starting up and denies further writes it appears to try to process the [6.6G!] commit logs). So, I cannot perform any actions on it no more like re-balancing the ring or reading old data from it and rotating it somewhere else. So, what to do now? So even given deletion of obsolete sstables on start-up, it goes out of disk just from the commit log replay of only 6 gig? Sounds like you're very, very full. Answer: $ time cassandra -f INFO 16:30:09,486 Heap size: 2143158272/2143158272 log4j:ERROR Failed to flush writer, java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:260) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:272) at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:276) at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212) at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:59) at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:324) at org.apache.log4j.RollingFileAppender.subAppend(RollingFileAppender.java:276) at org.apache.log4j.WriterAppender.append(WriterAppender.java:162) at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251) at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66) at org.apache.log4j.Category.callAppenders(Category.java:206) at org.apache.log4j.Category.forcedLog(Category.java:391) at org.apache.log4j.Category.log(Category.java:856) at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:347) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:73) at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:216) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) INFO 16:30:09,495 JNA not found. Native methods will be disabled. INFO 16:30:09,504 Loading settings from file:/home/dev/cassandra.git/conf/cassandra.yaml INFO 16:30:09,774 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap INFO 16:30:09,849 Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1293031809849.log ERROR 16:30:09,853 Exception encountered during startup. java.io.IOError: java.io.IOException: No space left on device at org.apache.cassandra.db.commitlog.CommitLogSegment.init(CommitLogSegment.java:59) at org.apache.cassandra.db.commitlog.CommitLog.init(CommitLog.java:113) at org.apache.cassandra.db.commitlog.CommitLog.clinit(CommitLog.java:83) at org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:347) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:76) at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:216) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) Caused by: java.io.IOException: No space left on device at java.io.FileOutputStream.write(Native Method) at java.io.DataOutputStream.writeInt(DataOutputStream.java:180) at org.apache.cassandra.db.commitlog.CommitLogHeader$CommitLogHeaderSerializer.serialize(CommitLogHeader.java:157) at org.apache.cassandra.db.commitlog.CommitLogHeader.writeCommitLogHeader(CommitLogHeader.java:121) at org.apache.cassandra.db.commitlog.CommitLogSegment.writeHeader(CommitLogSegment.java:70) at org.apache.cassandra.db.commitlog.CommitLogSegment.init(CommitLogSegment.java:55) ... 7 more Exception encountered during startup. java.io.IOError: java.io.IOException: No space left on device at org.apache.cassandra.db.commitlog.CommitLogSegment.init(CommitLogSegment.java:59) at org.apache.cassandra.db.commitlog.CommitLog.init(CommitLog.java:113) at org.apache.cassandra.db.commitlog.CommitLog.clinit(CommitLog.java:83) at org.apache.cassandra.config.DatabaseDescriptor.clinit(DatabaseDescriptor.java:347) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:76) at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55) at
Re: java.io.IOException: No space left on device
On Dec 22, 2010, at 16:20, Peter Schuller wrote: In any case: Monitoring disk-space is very very important. So, why doesn't cassandra monitor it itself and stop accepting writes if it runs out of space?
Re: Memory leak with Sun Java 1.6 ?
On Dec 12, 2010, at 17:21, Jonathan Ellis wrote: http://www.riptano.com/docs/0.6/troubleshooting/index#nodes-are-dying-with-oom-errors I can rule out the first 3. I was running cassandra with default settings, i.e. 1GB heap and 256M memtable. So, with 3 memtables+1GB the JVM should run with 1.75G (although http://wiki.apache.org/cassandra/MemtableThresholds considers to increase heap size only gently). Did so. 4GB machine with 2GB 64bit-JVM seemed to run stable for quite some time but then also crashed with OOM. Looking at the heap dump it's always the same: all memory nearly always bound in CompactionExecutor (ColumnFamilyStore/ConcurrentSkipListMap, respectively). This looks like somebody else recently have had a similar problem (-Bottom line: more heap - which is okay, but I'd like to understand why): http://www.mail-archive.com/user@cassandra.apache.org/msg07516.html This is my only CF currently in use (via JMX): - column_families: - column_type: Standard comment: tracking column family compare_with: org.apache.cassandra.db.marshal.UTF8Type default_validation_class: org.apache.cassandra.db.marshal.UTF8Type gc_grace_seconds: 864000 key_cache_save_period_in_seconds: 3600 keys_cached: 20.0 max_compaction_threshold: 32 memtable_flush_after_mins: 60 min_compaction_threshold: 4 name: tracking read_repair_chance: 1.0 row_cache_save_period_in_seconds: 0 rows_cached: 0.0 name: test replica_placement_strategy: org.apache.cassandra.locator.SimpleStrategy replication_factor: 3 In addition...actually there is plenty of free memory on the heap (?): 3605.478: [GC 3605.478: [ParNew Desired survivor size 2162688 bytes, new threshold 1 (max 1) - age 1: 416112 bytes, 416112 total : 16887K-553K(38336K), 0.0209550 secs]3605.499: [CMS: 1145267K-447565K(2054592K), 1.9143630 secs] 1161938K-447565K(2092928K), [CMS Perm : 18186K-18158K(30472K)], 1.9355340 secs] [Times: user=1.95 sys=0.00, real=1.94 secs] 3607.414: [Full GC 3607.414: [CMS: 447565K-447453K(2054592K), 1.9694960 secs] 447565K-447453K(2092928K), [CMS Perm : 18158K-18025K(30472K)], 1.9696450 secs] [Times: user=1.92 sys=0.00, real=1.97 secs] Total time for which application threads were stopped: 3.9070380 seconds Total time for which application threads were stopped: 7.3388640 seconds Total time for which application threads were stopped: 0.0560610 seconds 3616.931: [GC 3616.931: [ParNew Desired survivor size 2162688 bytes, new threshold 1 (max 1) - age 1: 474264 bytes, 474264 total : 34112K-747K(38336K), 0.0098680 secs] 481565K-448201K(2092928K), 0.0099690 secs] [Times: user=0.00 sys=0.00, real=0.01 secs] Total time for which application threads were stopped: 0.0108670 seconds 3617.035: [GC 3617.035: [ParNew Desired survivor size 2162688 bytes, new threshold 1 (max 1) - age 1: 63040 bytes, 63040 total : 34859K-440K(38336K), 0.0065950 secs] 482313K-448455K(2092928K), 0.0066880 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] Total time for which application threads were stopped: 0.0075850 seconds 3617.133: [GC 3617.133: [ParNew Desired survivor size 2162688 bytes, new threshold 1 (max 1) - age 1: 23016 bytes, 23016 total : 34552K-121K(38336K), 0.0042920 secs] 482567K-448193K(2092928K), 0.0043650 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] Total time for which application threads were stopped: 0.0049630 seconds 3617.228: [GC 3617.228: [ParNew Desired survivor size 2162688 bytes, new threshold 1 (max 1) - age 1: 16992 bytes, 16992 total : 34233K-34K(38336K), 0.0043180 secs] 482305K-448122K(2092928K), 0.0043910 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] Total time for which application threads were stopped: 0.0049150 seconds 3617.323: [GC 3617.323: [ParNew Desired survivor size 2162688 bytes, new threshold 1 (max 1) - age 1: 18456 bytes, 18456 total : 34146K-29K(38336K), 0.0038930 secs] 482234K-448127K(2092928K), 0.0039810 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] Total time for which application threads were stopped: 0.0055390 seconds Heap par new generation total 38336K, used 17865K [0x00077ae0, 0x00077d79, 0x00077d79) eden space 34112K, 52% used [0x00077ae0, 0x00077bf6afb0, 0x00077cf5) from space 4224K, 0% used [0x00077cf5, 0x00077cf57720, 0x00077d37) to space 4224K, 0% used [0x00077d37, 0x00077d37, 0x00077d79) concurrent mark-sweep generation total 2054592K, used 448097K [0x00077d79, 0x0007fae0, 0x0007fae0) concurrent-mark-sweep perm gen total 30472K, used 18125K [0x0007fae0, 0x0007fcbc2000, 0x0008) On Sun, Dec 12, 2010 at 9:52 AM, Timo Nentwig timo.nent...@toptarif.de wrote: On Dec 10, 2010, at 19:37, Peter Schuller wrote: To cargo cult it: Are you running a modern JVM? (Not e.g. openjdk b17 in lenny or some
Re: Memory leak with Sun Java 1.6 ?
On Dec 14, 2010, at 15:31, Timo Nentwig wrote: On Dec 14, 2010, at 14:41, Jonathan Ellis wrote: This is A row has grown too large section from that troubleshooting guide. Why? This is what a typical row (?) looks like: [defa...@test] list tracking limit 1; --- RowKey: 123 = (column=key, value=foo, timestamp=1292238005886000) = (column=value, value=bar, timestamp=1292238005886000) I'm importing the data from a RDBMS so I can say for sure that none of the 2 columns will contain more than 255 chars. Started all over again. Deleleted /var/lib/cassandra/*, started 4 nodes (2x2G+2x1G heap, actual RAM twice as much, swap off) and my stress test: first node (1GB) crashed after 212s (!). Wrote roughly about 200k rows like the one above (data wasn't even flushed to disk!). INFO [main] 2010-12-14 15:58:23,938 CassandraDaemon.java (line 119) Listening for thrift clients... INFO [GossipStage:1] 2010-12-14 15:58:24,410 Gossiper.java (line 569) InetAddress /192.168.68.80 is now UP INFO [HintedHandoff:1] 2010-12-14 15:58:24,410 HintedHandOffManager.java (line 191) Started hinted handoff for endpoint /192.168.68.80 INFO [HintedHandoff:1] 2010-12-14 15:58:24,412 HintedHandOffManager.java (line 247) Finished hinted handoff of 0 rows to endpoint /192.168.68.80 INFO [GossipStage:1] 2010-12-14 15:58:26,335 Gossiper.java (line 577) Node /192.168.68.69 is now part of the cluster INFO [GossipStage:1] 2010-12-14 15:58:27,319 Gossiper.java (line 569) InetAddress /192.168.68.69 is now UP INFO [HintedHandoff:1] 2010-12-14 15:58:27,319 HintedHandOffManager.java (line 191) Started hinted handoff for endpoint /192.168.68.69 INFO [HintedHandoff:1] 2010-12-14 15:58:27,320 HintedHandOffManager.java (line 247) Finished hinted handoff of 0 rows to endpoint /192.168.68.69 INFO [GossipStage:1] 2010-12-14 15:58:29,446 Gossiper.java (line 577) Node /192.168.68.70 is now part of the cluster INFO [GossipStage:1] 2010-12-14 15:58:29,620 Gossiper.java (line 569) InetAddress /192.168.68.70 is now UP INFO [HintedHandoff:1] 2010-12-14 15:58:29,621 HintedHandOffManager.java (line 191) Started hinted handoff for endpoint /192.168.68.70 INFO [HintedHandoff:1] 2010-12-14 15:58:29,621 HintedHandOffManager.java (line 247) Finished hinted handoff of 0 rows to endpoint /192.168.68.70 INFO [MigrationStage:1] 2010-12-14 16:01:16,535 ColumnFamilyStore.java (line 639) switching in a fresh Memtable for Migrations at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1292338703483.log', position=15059) INFO [MigrationStage:1] 2010-12-14 16:01:16,536 ColumnFamilyStore.java (line 943) Enqueuing flush of memtable-migrati...@768974140(9092 bytes, 1 operations) INFO [MigrationStage:1] 2010-12-14 16:01:16,536 ColumnFamilyStore.java (line 639) switching in a fresh Memtable for Schema at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1292338703483.log', position=15059) INFO [MigrationStage:1] 2010-12-14 16:01:16,536 ColumnFamilyStore.java (line 943) Enqueuing flush of memtable-sch...@591783334(3765 bytes, 3 operations) INFO [FlushWriter:1] 2010-12-14 16:01:16,541 Memtable.java (line 155) Writing memtable-migrati...@768974140(9092 bytes, 1 operations) INFO [FlushWriter:1] 2010-12-14 16:01:16,633 Memtable.java (line 162) Completed flushing /var/lib/cassandra/data/system/Migrations-e-1-Data.db (9225 bytes) INFO [FlushWriter:1] 2010-12-14 16:01:16,634 Memtable.java (line 155) Writing memtable-sch...@591783334(3765 bytes, 3 operations) INFO [FlushWriter:1] 2010-12-14 16:01:16,706 Memtable.java (line 162) Completed flushing /var/lib/cassandra/data/system/Schema-e-1-Data.db (4053 bytes) INFO [Create index Indexed1.626972746864617465] 2010-12-14 16:01:16,718 ColumnFamilyStore.java (line 325) Creating index org.apache.cassandra.db.ta...@354d581b.indexed1.626972746864617465 INFO [MigrationStage:1] 2010-12-14 16:01:16,725 ColumnFamilyStore.java (line 639) switching in a fresh Memtable for Migrations at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1292338703483.log', position=27559) INFO [MigrationStage:1] 2010-12-14 16:01:16,726 ColumnFamilyStore.java (line 943) Enqueuing flush of memtable-migrati...@1922556837(7973 bytes, 1 operations) INFO [Create index Indexed1.626972746864617465] 2010-12-14 16:01:16,729 ColumnFamilyStore.java (line 339) Index Indexed1.626972746864617465 complete INFO [MigrationStage:1] 2010-12-14 16:01:16,730 ColumnFamilyStore.java (line 639) switching in a fresh Memtable for Schema at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1292338703483.log', position=27559) INFO [FlushWriter:1] 2010-12-14 16:01:16,730 Memtable.java (line 155) Writing memtable-migrati...@1922556837(7973 bytes, 1 operations) INFO [MigrationStage:1] 2010-12-14 16:01:16,730 ColumnFamilyStore.java (line 943) Enqueuing flush of memtable-sch...@1373806697(4029 bytes, 4 operations) INFO [Create index Indexed1.626972746864617465
Re: Memory leak with Sun Java 1.6 ?
On Dec 14, 2010, at 19:38, Peter Schuller wrote: For debugging purposes you may want to switch Cassandra to standard IO mode instead of mmap. This will have a performance-penalty, but the virtual/resident sizes won't be polluted with mmap():ed data. Already did so. It *seems* to run more stable, but it's still far off from being stable. I actually already put 100 millions rows into a local cassandra instance (on OSX [and on rc1], not xen'ed Linux), so this is unlikely a cassandra Java code problem but rather something native code/platform related. In general, unless you're hitting something particularly strange or just a bug in Cassandra, you shouldn't be randomly getting OOM:s unless you are truly using that heap space. What do you mean by always bound in compactionexecutor - by what method did you determine this to be the case? heap dumps - MAT (http://www.eclipse.org/mat/) There should be no magic need for CPU. Unless you are severely taxing it in terms of very high write load or similar, an out-of-the-box configured cassandra should be needing limited amounts of memory. Did you run with default memtable thresholds (memtable_throughput_in_mb i Yes This is my only CF currently in use (via JMX): - column_families: - column_type: Standard comment: tracking column family compare_with: org.apache.cassandra.db.marshal.UTF8Type default_validation_class: org.apache.cassandra.db.marshal.UTF8Type gc_grace_seconds: 864000 key_cache_save_period_in_seconds: 3600 keys_cached: 20.0 max_compaction_threshold: 32 memtable_flush_after_mins: 60 min_compaction_threshold: 4 name: tracking read_repair_chance: 1.0 row_cache_save_period_in_seconds: 0 rows_cached: 0.0 name: test replica_placement_strategy: org.apache.cassandra.locator.SimpleStrategy replication_factor: 3 This is the only column family being used? Current, for testing, yes. In addition...actually there is plenty of free memory on the heap (?): 3605.478: [GC 3605.478: [ParNew Desired survivor size 2162688 bytes, new threshold 1 (max 1) - age 1: 416112 bytes, 416112 total : 16887K-553K(38336K), 0.0209550 secs]3605.499: [CMS: 1145267K-447565K(2054592K), 1.9143630 secs] 1161938K-447565K(2092928K), [CMS Perm : 18186K-18158K(30472K)], 1.9355340 secs] [Times: user=1.95 sys=0.00, real=1.94 secs] 3607.414: [Full GC 3607.414: [CMS: 447565K-447453K(2054592K), 1.9694960 secs] 447565K-447453K(2092928K), [CMS Perm : 18158K-18025K(30472K)], 1.9696450 secs] [Times: user=1.92 sys=0.00, real=1.97 secs] 1.9 seconds to do [CMS: 1145267K-447565K(2054592K) is completely abnormal if that represents a pause (but not if it's just concurrent mark/sweep time). I don't quite recognize the format of this log... I'm suddenly unsure what this log output is coming from. A normal -XX:+PrintGC and -XX:+PrintGCDetails should yield stuff like: I just uncommented the GC JVMOPTS from the shipped cassandra start script and use Sun JVM 1.6.0_23. Hmm, but these GC tuning options are also uncommented. I'll comment them again and try again.
Re: Memory leak with Sun Java 1.6 ?
On Dec 10, 2010, at 19:37, Peter Schuller wrote: To cargo cult it: Are you running a modern JVM? (Not e.g. openjdk b17 in lenny or some such.) If it is a JVM issue, ensuring you're using a reasonably recent JVM is probably much easier than to start tracking it down... I had OOM problems with OpenJDK, switched to Sun/Oracle's recent 1.6.0_23 and...still have the same problem :-\ Stack trace always looks the same: java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:329) at org.apache.cassandra.utils.FBUtilities.readByteArray(FBUtilities.java:261) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:76) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:35) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129) at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:120) at org.apache.cassandra.db.RowMutationSerializer.defreezeTheMaps(RowMutation.java:383) at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:393) at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:351) at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:52) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) I'm writing from 1 client with 50 threads to a cluster of 4 machines (with hector). With QUORUM and ONE 2 machines quite reliably will soon die with OOM. What may cause this? Won't cassandra block/reject when memtable is full and being flushed to disk but grow and if flushing to disk isn't fast enough will run out of memory?
Re: Quorum: killing 1 out of 3 server kills the cluster (?)
On Dec 9, 2010, at 18:50, Tyler Hobbs wrote: If you switch your writes to CL ONE when a failure occurs, you might as well use ONE for all writes. ONE and QUORUM behave the same when all nodes are working correctly. That's finally a precise statement! :) I was wondering what to at least 1 replica's commit log is supposed to actually mean: http://wiki.apache.org/cassandra/API Does quorum mean that data is replicated to q nodes or to at least q nodes? I just added another blank machine to my cluster. Nothing happened as expected (stopped writing to the cluster) but after I ran nodetool repair it held more data than all other nodes. So it copied data from the other nodes to this one? I assumed that data is replicated to q nodes not to all, is quorum 'only' about consistency and not about saving storage space? - Tyler On Thu, Dec 9, 2010 at 11:26 AM, Timo Nentwig timo.nent...@toptarif.de wrote: On Dec 9, 2010, at 17:55, Sylvain Lebresne wrote: I naively assume that if I kill either node that holds N1 (i.e. node 1 or 3), N1 will still remain on another node. Only if both fail, I actually lose data. But apparently this is not how it works... Sure, the data that N1 holds is also on another node and you won't lose it by only losing N1. But when you do a quorum query, you are saying to Cassandra Please please would you fail my request if you can't get a response from 2 nodes. So if only 1 node holding the data is up at the moment of the query then Cassandra, which is a very polite software, do what you asked and fail. And my application would fall back to ONE. Quorum writes will also fail so I would also use ONE so that the app stays up. What would I have to do make the data to redistribute when the broken node is up again? Simply call nodetool repair on it? If you want Cassandra to send you an answer with only one node up, use CL=ONE (as said by David). On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne sylv...@yakaz.com wrote: I'ts 2 out of the number of replicas, not the number of nodes. At RF=2, you have 2 replicas. And since quorum is also 2 with that replication factor, you cannot lose a node, otherwise some query will end up as UnavailableException. Again, this is not related to the total number of nodes. Even with 200 nodes, if you use RF=2, you will have some query that fail (altough much less that what you are probably seeing). On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig timo.nent...@toptarif.de wrote: On Dec 9, 2010, at 16:50, Daniel Lundin wrote: Quorum is really only useful when RF 2, since the for a quorum to succeed RF/2+1 replicas must be available. 2/2+1==2 and I killed 1 of 3, so... don't get it. This means for RF = 2, consistency levels QUORUM and ALL yield the same result. /d On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig timo.nent...@toptarif.de wrote: Hi! I've 3 servers running (0.7rc1) with a replication_factor of 2 and use quorum for writes. But when I shut down one of them UnavailableExceptions are thrown. Why is that? Isn't that the sense of quorum and a fault-tolerant DB that it continues with the remaining 2 nodes and redistributes the data to the broken one as soons as its up again? What may I be doing wrong? thx tcn
Quorum: killing 1 out of 3 server kills the cluster (?)
Hi! I've 3 servers running (0.7rc1) with a replication_factor of 2 and use quorum for writes. But when I shut down one of them UnavailableExceptions are thrown. Why is that? Isn't that the sense of quorum and a fault-tolerant DB that it continues with the remaining 2 nodes and redistributes the data to the broken one as soons as its up again? What may I be doing wrong? thx tcn
Re: Quorum: killing 1 out of 3 server kills the cluster (?)
On Dec 9, 2010, at 16:50, Daniel Lundin wrote: Quorum is really only useful when RF 2, since the for a quorum to succeed RF/2+1 replicas must be available. 2/2+1==2 and I killed 1 of 3, so... don't get it. This means for RF = 2, consistency levels QUORUM and ALL yield the same result. /d On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig timo.nent...@toptarif.de wrote: Hi! I've 3 servers running (0.7rc1) with a replication_factor of 2 and use quorum for writes. But when I shut down one of them UnavailableExceptions are thrown. Why is that? Isn't that the sense of quorum and a fault-tolerant DB that it continues with the remaining 2 nodes and redistributes the data to the broken one as soons as its up again? What may I be doing wrong? thx tcn
Re: Quorum: killing 1 out of 3 server kills the cluster (?)
On Dec 9, 2010, at 17:39, David Boxenhorn wrote: In other words, if you want to use QUORUM, you need to set RF=3. (I know because I had exactly the same problem.) I naively assume that if I kill either node that holds N1 (i.e. node 1 or 3), N1 will still remain on another node. Only if both fail, I actually lose data. But apparently this is not how it works... On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne sylv...@yakaz.com wrote: I'ts 2 out of the number of replicas, not the number of nodes. At RF=2, you have 2 replicas. And since quorum is also 2 with that replication factor, you cannot lose a node, otherwise some query will end up as UnavailableException. Again, this is not related to the total number of nodes. Even with 200 nodes, if you use RF=2, you will have some query that fail (altough much less that what you are probably seeing). On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig timo.nent...@toptarif.de wrote: On Dec 9, 2010, at 16:50, Daniel Lundin wrote: Quorum is really only useful when RF 2, since the for a quorum to succeed RF/2+1 replicas must be available. 2/2+1==2 and I killed 1 of 3, so... don't get it. This means for RF = 2, consistency levels QUORUM and ALL yield the same result. /d On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig timo.nent...@toptarif.de wrote: Hi! I've 3 servers running (0.7rc1) with a replication_factor of 2 and use quorum for writes. But when I shut down one of them UnavailableExceptions are thrown. Why is that? Isn't that the sense of quorum and a fault-tolerant DB that it continues with the remaining 2 nodes and redistributes the data to the broken one as soons as its up again? What may I be doing wrong? thx tcn
Re: Quorum: killing 1 out of 3 server kills the cluster (?)
On Dec 9, 2010, at 17:55, Sylvain Lebresne wrote: I naively assume that if I kill either node that holds N1 (i.e. node 1 or 3), N1 will still remain on another node. Only if both fail, I actually lose data. But apparently this is not how it works... Sure, the data that N1 holds is also on another node and you won't lose it by only losing N1. But when you do a quorum query, you are saying to Cassandra Please please would you fail my request if you can't get a response from 2 nodes. So if only 1 node holding the data is up at the moment of the query then Cassandra, which is a very polite software, do what you asked and fail. And my application would fall back to ONE. Quorum writes will also fail so I would also use ONE so that the app stays up. What would I have to do make the data to redistribute when the broken node is up again? Simply call nodetool repair on it? If you want Cassandra to send you an answer with only one node up, use CL=ONE (as said by David). On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne sylv...@yakaz.com wrote: I'ts 2 out of the number of replicas, not the number of nodes. At RF=2, you have 2 replicas. And since quorum is also 2 with that replication factor, you cannot lose a node, otherwise some query will end up as UnavailableException. Again, this is not related to the total number of nodes. Even with 200 nodes, if you use RF=2, you will have some query that fail (altough much less that what you are probably seeing). On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig timo.nent...@toptarif.de wrote: On Dec 9, 2010, at 16:50, Daniel Lundin wrote: Quorum is really only useful when RF 2, since the for a quorum to succeed RF/2+1 replicas must be available. 2/2+1==2 and I killed 1 of 3, so... don't get it. This means for RF = 2, consistency levels QUORUM and ALL yield the same result. /d On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig timo.nent...@toptarif.de wrote: Hi! I've 3 servers running (0.7rc1) with a replication_factor of 2 and use quorum for writes. But when I shut down one of them UnavailableExceptions are thrown. Why is that? Isn't that the sense of quorum and a fault-tolerant DB that it continues with the remaining 2 nodes and redistributes the data to the broken one as soons as its up again? What may I be doing wrong? thx tcn