RE: old data / tombstones are not deleted after ttl
Short question afterwards: I have read in the documentation, that after a major compaction, minor compactions are no longer automatically trigger. Does this mean, that I have to do the nodetool compact regulary? Or is there a way to get back to the automatically minor compactions? Thx, Br, Matthias Zeilinger Production Operation – Shared Services P: +43 (0) 50 858-31185 M: +43 (0) 664 85-34459 E: matthias.zeilin...@bwinparty.com bwin.party services (Austria) GmbH Marxergasse 1B A-1030 Vienna www.bwinparty.com -Original Message- From: Matthias Zeilinger [mailto:matthias.zeilin...@bwinparty.com] Sent: Dienstag, 05. März 2013 08:03 To: user@cassandra.apache.org Subject: RE: old data / tombstones are not deleted after ttl Yes it was a major compaction. I know it´s not a great solution, but I needed something to get rid of the old data, because I went out of diskspace. Br, Matthias Zeilinger Production Operation – Shared Services P: +43 (0) 50 858-31185 M: +43 (0) 664 85-34459 E: matthias.zeilin...@bwinparty.com bwin.party services (Austria) GmbH Marxergasse 1B A-1030 Vienna www.bwinparty.com -Original Message- From: Michal Michalski [mailto:mich...@opera.com] Sent: Dienstag, 05. März 2013 07:47 To: user@cassandra.apache.org Subject: Re: old data / tombstones are not deleted after ttl Was it a major compaction? I ask because it's definitely a solution that had to work, but it's also a solution that - in general - probably no-one here would suggest you to use. M. W dniu 05.03.2013 07:08, Matthias Zeilinger pisze: Hi, I have done a manually compaction over the nodetool and this worked. But thx for the explanation, why it wasn´t compacted Br, Matthias Zeilinger Production Operation – Shared Services P: +43 (0) 50 858-31185 M: +43 (0) 664 85-34459 E: matthias.zeilin...@bwinparty.com bwin.party services (Austria) GmbH Marxergasse 1B A-1030 Vienna www.bwinparty.com From: Bryan Talbot [mailto:btal...@aeriagames.com] Sent: Montag, 04. März 2013 23:36 To: user@cassandra.apache.org Subject: Re: old data / tombstones are not deleted after ttl Those older files won't be included in a compaction until there are min_compaction_threshold (4) files of that size. When you get another SS table -Data.db file that is about 12-18GB then you'll have 4 and they will be compacted together into one new file. At that time, if there are any rows with only tombstones that are all older than gc_grace the row will be removed (assuming the row exists exclusively in the 4 input SS tables). Columns with data that is more than TTL seconds old will be written with a tombstone. If the row does have column values in SS tables that are not being compacted, the row will not be removed. -Bryan On Sun, Mar 3, 2013 at 11:07 PM, Matthias Zeilinger matthias.zeilin...@bwinparty.commailto:matthias.zeilin...@bwinparty.com wrote: Hi, I´m running Cassandra 1.1.5 and have following issue. I´m using a 10 days TTL on my CF. I can see a lot of tombstones in there, but they aren´t deleted after compaction. I have tried a nodetool –cleanup and also a restart of Cassandra, but nothing happened. total 61G drwxr-xr-x 2 cassandra dba 20K Mar 4 06:35 . drwxr-xr-x 10 cassandra dba 4.0K Dec 10 13:05 .. -rw-r--r-- 1 cassandra dba 15M Dec 15 22:04 whatever-he-1398-CompressionInfo.db -rw-r--r-- 1 cassandra dba 19G Dec 15 22:04 whatever-he-1398-Data.db -rw-r--r-- 1 cassandra dba 15M Dec 15 22:04 whatever-he-1398-Filter.db -rw-r--r-- 1 cassandra dba 357M Dec 15 22:04 whatever-he-1398-Index.db -rw-r--r-- 1 cassandra dba 4.3K Dec 15 22:04 whatever-he-1398-Statistics.db -rw-r--r-- 1 cassandra dba 9.5M Feb 6 15:45 whatever-he-5464-CompressionInfo.db -rw-r--r-- 1 cassandra dba 12G Feb 6 15:45 whatever-he-5464-Data.db -rw-r--r-- 1 cassandra dba 48M Feb 6 15:45 whatever-he-5464-Filter.db -rw-r--r-- 1 cassandra dba 736M Feb 6 15:45 whatever-he-5464-Index.db -rw-r--r-- 1 cassandra dba 4.3K Feb 6 15:45 whatever-he-5464-Statistics.db -rw-r--r-- 1 cassandra dba 9.7M Feb 21 19:13 whatever-he-6829-CompressionInfo.db -rw-r--r-- 1 cassandra dba 12G Feb 21 19:13 whatever-he-6829-Data.db -rw-r--r-- 1 cassandra dba 47M Feb 21 19:13 whatever-he-6829-Filter.db -rw-r--r-- 1 cassandra dba 792M Feb 21 19:13 whatever-he-6829-Index.db -rw-r--r-- 1 cassandra dba 4.3K Feb 21 19:13 whatever-he-6829-Statistics.db -rw-r--r-- 1 cassandra dba 3.7M Mar 1 10:46 whatever-he-7578-CompressionInfo.db -rw-r--r-- 1 cassandra dba 4.3G Mar 1 10:46 whatever-he-7578-Data.db -rw-r--r-- 1 cassandra dba 12M Mar 1 10:46 whatever-he-7578-Filter.db -rw-r--r-- 1 cassandra dba 274M Mar 1 10:46 whatever-he-7578-Index.db -rw-r--r-- 1 cassandra dba 4.3K Mar 1 10:46 whatever-he-7578-Statistics.db -rw-r--r-- 1 cassandra dba 3.6M Mar 1 11:21 whatever-he-7582-CompressionInfo.db -rw-r--r-- 1 cassandra dba 4.3G Mar 1 11:21
Replacing dead node when num_tokens is used
Hello, while trying out cassandra I read about the steps necessary to replace a dead node. In my test cluster I used a setup using num_tokens instead of initial_tokens. How do I replace a dead node in this scenario? Thanks, Jan
Re: old data / tombstones are not deleted after ttl
you could consider enabling leveled compaction: http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra On Tue, Mar 5, 2013 at 9:46 AM, Matthias Zeilinger matthias.zeilin...@bwinparty.com wrote: Short question afterwards: I have read in the documentation, that after a major compaction, minor compactions are no longer automatically trigger. Does this mean, that I have to do the nodetool compact regulary? Or is there a way to get back to the automatically minor compactions? Thx, Br, Matthias Zeilinger Production Operation – Shared Services P: +43 (0) 50 858-31185 M: +43 (0) 664 85-34459 E: matthias.zeilin...@bwinparty.com bwin.party services (Austria) GmbH Marxergasse 1B A-1030 Vienna www.bwinparty.com -Original Message- From: Matthias Zeilinger [mailto:matthias.zeilin...@bwinparty.com] Sent: Dienstag, 05. März 2013 08:03 To: user@cassandra.apache.org Subject: RE: old data / tombstones are not deleted after ttl Yes it was a major compaction. I know it´s not a great solution, but I needed something to get rid of the old data, because I went out of diskspace. Br, Matthias Zeilinger Production Operation – Shared Services P: +43 (0) 50 858-31185 M: +43 (0) 664 85-34459 E: matthias.zeilin...@bwinparty.com bwin.party services (Austria) GmbH Marxergasse 1B A-1030 Vienna www.bwinparty.com -Original Message- From: Michal Michalski [mailto:mich...@opera.com] Sent: Dienstag, 05. März 2013 07:47 To: user@cassandra.apache.org Subject: Re: old data / tombstones are not deleted after ttl Was it a major compaction? I ask because it's definitely a solution that had to work, but it's also a solution that - in general - probably no-one here would suggest you to use. M. W dniu 05.03.2013 07:08, Matthias Zeilinger pisze: Hi, I have done a manually compaction over the nodetool and this worked. But thx for the explanation, why it wasn´t compacted Br, Matthias Zeilinger Production Operation – Shared Services P: +43 (0) 50 858-31185 M: +43 (0) 664 85-34459 E: matthias.zeilin...@bwinparty.com bwin.party services (Austria) GmbH Marxergasse 1B A-1030 Vienna www.bwinparty.com From: Bryan Talbot [mailto:btal...@aeriagames.com] Sent: Montag, 04. März 2013 23:36 To: user@cassandra.apache.org Subject: Re: old data / tombstones are not deleted after ttl Those older files won't be included in a compaction until there are min_compaction_threshold (4) files of that size. When you get another SS table -Data.db file that is about 12-18GB then you'll have 4 and they will be compacted together into one new file. At that time, if there are any rows with only tombstones that are all older than gc_grace the row will be removed (assuming the row exists exclusively in the 4 input SS tables). Columns with data that is more than TTL seconds old will be written with a tombstone. If the row does have column values in SS tables that are not being compacted, the row will not be removed. -Bryan On Sun, Mar 3, 2013 at 11:07 PM, Matthias Zeilinger matthias.zeilin...@bwinparty.commailto:matthias.zeilin...@bwinparty.com wrote: Hi, I´m running Cassandra 1.1.5 and have following issue. I´m using a 10 days TTL on my CF. I can see a lot of tombstones in there, but they aren´t deleted after compaction. I have tried a nodetool –cleanup and also a restart of Cassandra, but nothing happened. total 61G drwxr-xr-x 2 cassandra dba 20K Mar 4 06:35 . drwxr-xr-x 10 cassandra dba 4.0K Dec 10 13:05 .. -rw-r--r-- 1 cassandra dba 15M Dec 15 22:04 whatever-he-1398-CompressionInfo.db -rw-r--r-- 1 cassandra dba 19G Dec 15 22:04 whatever-he-1398-Data.db -rw-r--r-- 1 cassandra dba 15M Dec 15 22:04 whatever-he-1398-Filter.db -rw-r--r-- 1 cassandra dba 357M Dec 15 22:04 whatever-he-1398-Index.db -rw-r--r-- 1 cassandra dba 4.3K Dec 15 22:04 whatever-he-1398-Statistics.db -rw-r--r-- 1 cassandra dba 9.5M Feb 6 15:45 whatever-he-5464-CompressionInfo.db -rw-r--r-- 1 cassandra dba 12G Feb 6 15:45 whatever-he-5464-Data.db -rw-r--r-- 1 cassandra dba 48M Feb 6 15:45 whatever-he-5464-Filter.db -rw-r--r-- 1 cassandra dba 736M Feb 6 15:45 whatever-he-5464-Index.db -rw-r--r-- 1 cassandra dba 4.3K Feb 6 15:45 whatever-he-5464-Statistics.db -rw-r--r-- 1 cassandra dba 9.7M Feb 21 19:13 whatever-he-6829-CompressionInfo.db -rw-r--r-- 1 cassandra dba 12G Feb 21 19:13 whatever-he-6829-Data.db -rw-r--r-- 1 cassandra dba 47M Feb 21 19:13 whatever-he-6829-Filter.db -rw-r--r-- 1 cassandra dba 792M Feb 21 19:13 whatever-he-6829-Index.db -rw-r--r-- 1 cassandra dba 4.3K Feb 21 19:13 whatever-he-6829-Statistics.db -rw-r--r-- 1 cassandra dba 3.7M Mar 1 10:46 whatever-he-7578-CompressionInfo.db -rw-r--r-- 1 cassandra dba 4.3G Mar 1 10:46 whatever-he-7578-Data.db -rw-r--r-- 1 cassandra dba 12M Mar 1 10:46
Re: Cassandra instead of memcached
Check out http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html Netflix used Cassandra with SSDs and were able to drop their memcache layer. Mind you they were not using it purely as an in memory KV store. Ben Instaclustr | www.instaclustr.com | @instaclustr On 05/03/2013, at 4:33 PM, Drew Kutcharian d...@venarc.com wrote: Hi Guys, I'm thinking about using Cassandra as an in-memory key/value store instead of memcached for a new project (just to get rid of a dependency if possible). I was thinking about setting the replication factor to 1, enabling off-heap row-cache and setting gc_grace_period to zero for the CF that will be used for the key/value store. Has anyone tried this? Any comments? Thanks, Drew
Re: Trying to identify the problem with CQL ...
Without looking into details too closely, I'd say you're probably hitting https://issues.apache.org/jira/browse/CASSANDRA-5292 (since you use NTS+propertyFileSnitch+a DC name in caps). Long story short, the CREATE KEYSPACE interpret your DC-TORONTO as dc-toronto, which then probably don't match what you have in you property file. This will be fixed in 1.2.3. In the meantime, a workaround would be to use the cassandra-cli to create/update your keyspace definition. -- Sylvain On Tue, Mar 5, 2013 at 11:24 AM, Gabriel Ciuloaica gciuloa...@gmail.comwrote: Hello, I'm trying to find out what the problem is and where it is located. I have a 3 nodes Cassandra cluster (1.2.1), RF=3. I have a keyspace and a cf as defined (using PropertyFileSnitch): CREATE KEYSPACE backend WITH replication = { 'class': 'NetworkTopologyStrategy', 'DC-TORONTO': '3' }; USE backend; CREATE TABLE avatars ( id bigint PRIMARY KEY, avatar blob, image_type text ) WITH bloom_filter_fp_chance=0.**01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.**00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND compaction={'class': 'SizeTieredCompactionStrategy'**} AND compression={'sstable_**compression': 'SnappyCompressor'}; Status of the cluster: Datacenter: DC-TORONTO == Status=Up/Down |/ State=Normal/Leaving/Joining/**Moving -- Address Load Tokens Owns Host ID Rack UN 10.11.1.109 44.98 MB 256 46.8% 726689df-edc3-49a0-b680-**370953994a8c RAC2 UN 10.11.1.200 6.57 MB64 10.3% d6d700d4-28aa-4722-b215-**a6a7d304b8e7 RAC3 UN 10.11.1.108 54.32 MB 256 42.8% 73cd86a9-4efb-4407-9fe8-**9a1b3a277af7 RAC1 I'm trying to read my writes, by using CQL (datastax-java-driver), using LOCAL_QUORUM for reads and writes. For some reason, some of the writes are lost. Not sure if it is a driver issue or cassandra issue. Dinging further, using cqlsh client (1.2.1), I found a strange situation: select count(*) from avatars; count --- 226 select id from avatars; id - 314 396 19 .- 77 rows in result select id, image_type from avatars; id | image_type -+ 332 |png 314 |png 396 | jpeg 19 |png 1250014 | jpeg - 226 rows in result. I do not understand why for second select I'm able to retrieve just a part of the rows and not all rows. Not sure if this is related or not to the initial problem. Any help is really appreciated. Thanks, Gabi
Re: Ghost nodes
Any clue on this ? 2013/2/25 Alain RODRIGUEZ arodr...@gmail.com Hi, I am having issues after decommissioning 3 nodes one by one of my 1.1.6 C* cluster (RF=3): On the c.164 node, which was added a week after removing the 3 nodes, with gossipinfo I have: /a.135 RPC_ADDRESS:0.0.0.0 STATUS:NORMAL,127605887595351923798765477786913079296 RELEASE_VERSION:1.1.6 RACK:1b SCHEMA:49aee81e-7c46-31bd-8e4b-dfd07d74d94c DC:eu-west LOAD:3.40954135223E11 /b.173 RPC_ADDRESS:0.0.0.0 STATUS:NORMAL,0 RELEASE_VERSION:1.1.6 RACK:1b SCHEMA:49aee81e-7c46-31bd-8e4b-dfd07d74d94c DC:eu-west LOAD:3.32757832183E11 /c.164 RPC_ADDRESS:0.0.0.0 STATUS:NORMAL,85070591730234615865843651857942052864 RELEASE_VERSION:1.1.6 RACK:1b SCHEMA:49aee81e-7c46-31bd-8e4b-dfd07d74d94c DC:eu-west LOAD:2.93726484252E11 /d.6 RPC_ADDRESS:0.0.0.0 STATUS:NORMAL,42535295865117307932921825928971026432 RELEASE_VERSION:1.1.6 RACK:1b SCHEMA:49aee81e-7c46-31bd-8e4b-dfd07d74d94c DC:eu-west LOAD:2.85020693654E11 On the 3 other nodes I see this: /a.135 RPC_ADDRESS:0.0.0.0 SCHEMA:49aee81e-7c46-31bd-8e4b-dfd07d74d94c RELEASE_VERSION:1.1.6 RACK:1b STATUS:NORMAL,127605887595351923798765477786913079296 DC:eu-west LOAD:3.40974023487E11 /10.64.167.32 RPC_ADDRESS:0.0.0.0 SCHEMA:d9adcce3-09ed-3e7f-a6a3-147d4283ed15 RELEASE_VERSION:1.1.6 RACK:1b STATUS:LEFT,28356863910078203714492389662765613056,1359823927010 DC:eu-west LOAD:1.47947624544E11 /10.250.202.154 RPC_ADDRESS:0.0.0.0 SCHEMA:d9adcce3-09ed-3e7f-a6a3-147d4283ed15 RELEASE_VERSION:1.1.6 RACK:1b STATUS:LEFT,85070591730234615865843651857942052863,1359808901882 DC:eu-west LOAD:1.45049060742E11 /b.173 RPC_ADDRESS:0.0.0.0 SCHEMA:49aee81e-7c46-31bd-8e4b-dfd07d74d94c RELEASE_VERSION:1.1.6 RACK:1b STATUS:NORMAL,0 DC:eu-west LOAD:3.32760540235E11 /c.164 RPC_ADDRESS:0.0.0.0 SCHEMA:49aee81e-7c46-31bd-8e4b-dfd07d74d94c RELEASE_VERSION:1.1.6 RACK:1b LOAD:2.93751485625E11 DC:eu-west STATUS:NORMAL,85070591730234615865843651857942052864 /10.64.103.228 RPC_ADDRESS:0.0.0.0 SCHEMA:d9adcce3-09ed-3e7f-a6a3-147d4283ed15 RELEASE_VERSION:1.1.6 RACK:1b STATUS:LEFT,141784319550391032739561396922763706367,1359893766266 DC:eu-west LOAD:2.46247802646E11 /d.6 RPC_ADDRESS:0.0.0.0 SCHEMA:49aee81e-7c46-31bd-8e4b-dfd07d74d94c RELEASE_VERSION:1.1.6 RACK:1b STATUS:NORMAL,42535295865117307932921825928971026432 DC:eu-west LOAD:2.85042986093E11 Since I removed this 3 nodes (marked as left) at least 3 weeks ago, shouldn't gossip have them totally removed for a while ? The c.164, node that doesn't show nodes that left the ring in gosssipinfo, is logging every minute the following: ... INFO [GossipStage:1] 2013-02-25 10:18:56,269 Gossiper.java (line 830) InetAddress /10.64.167.32 is now dead. INFO [GossipStage:1] 2013-02-25 10:18:56,283 Gossiper.java (line 830) InetAddress /10.250.202.154 is now dead. INFO [GossipStage:1] 2013-02-25 10:18:56,297 Gossiper.java (line 830) InetAddress /10.64.103.228 is now dead. INFO [GossipStage:1] 2013-02-25 10:19:57,700 Gossiper.java (line 830) InetAddress /10.64.167.32 is now dead. INFO [GossipStage:1] 2013-02-25 10:19:57,721 Gossiper.java (line 830) InetAddress /10.250.202.154 is now dead. INFO [GossipStage:1] 2013-02-25 10:19:57,742 Gossiper.java (line 830) InetAddress /10.64.103.228 is now dead. INFO [GossipStage:1] 2013-02-25 10:20:58,722 Gossiper.java (line 830) InetAddress /10.64.167.32 is now dead. INFO [GossipStage:1] 2013-02-25 10:20:58,739 Gossiper.java (line 830) InetAddress /10.250.202.154 is now dead. INFO [GossipStage:1] 2013-02-25 10:20:58,754 Gossiper.java (line 830) InetAddress /10.64.103.228 is now dead. ... All this look a bit weird to me, is this normal ? Alain
Re: Trying to identify the problem with CQL ...
Hi Sylvain, thanks for fast answer. I have updated keyspace definition and cassandra-topologies.properties to all 3 nodes and restarted each node. Both problems are still reproducible. I'm not able to read my writes and also the selects shows same data as in my previous email. for write and read I'm using: private static final String WRITE_STATEMENT = INSERT INTO avatars (id, image_type, avatar) VALUES (?,?,?);; private static final String READ_STATEMENT = SELECT avatar, image_type FROM avatars WHERE id=?; I'm using java-driver (1.0.0-beta1) with prepared statement, sync calls. Write snippet: Session session; try { session = cassandraSession.getSession(); BoundStatement stmt = session.prepare(WRITE_STATEMENT) .setConsistencyLevel(ConsistencyLevel.LOCAL_QUORUM).bind(); stmt.enableTracing(); stmt.setLong(id, accountId); stmt.setString(image_type, image.getType()); stmt.setBytes(avatar, ByteBuffer.wrap(image.getBytes())); ResultSet result = session.execute(stmt); LOG.info(UPLOAD COORDINATOR: {}, result.getQueryTrace() .getCoordinator().getCanonicalHostName()); } catch (NoHostAvailableException e) { LOG.error(Could not prepare the statement., e); throw new StorageUnavailableException(e); } finally { cassandraSession.releaseSession(); } Read snippet: Session session = null; byte[] imageBytes = null; String imageType = png; try { session = cassandraSession.getSession(); BoundStatement stmt = session.prepare(READ_STATEMENT) .setConsistencyLevel(ConsistencyLevel.LOCAL_QUORUM).bind(); stmt.setLong(id, accountId); ResultSet result = session.execute(stmt); IteratorRow it = result.iterator(); ByteBuffer avatar = null; while (it.hasNext()) { Row row = it.next(); avatar = row.getBytes(avatar); imageType = row.getString(image_type); } if (avatar == null) { throw new AvatarNotFoundException(Avatar hasn't been found); } int length = avatar.remaining(); imageBytes = new byte[length]; avatar.get(imageBytes, 0, length); } catch (NoHostAvailableException e) { LOG.error(Could not prepare the statement., e); throw new StorageUnavailableException(e); } finally { cassandraSession.releaseSession(); } Let me know what other information is need it. Thanks, Gabi On 3/5/13 12:52 PM, Sylvain Lebresne wrote: Without looking into details too closely, I'd say you're probably hitting https://issues.apache.org/jira/browse/CASSANDRA-5292 (since you use NTS+propertyFileSnitch+a DC name in caps). Long story short, the CREATE KEYSPACE interpret your DC-TORONTO as dc-toronto, which then probably don't match what you have in you property file. This will be fixed in 1.2.3. In the meantime, a workaround would be to use the cassandra-cli to create/update your keyspace definition. -- Sylvain On Tue, Mar 5, 2013 at 11:24 AM, Gabriel Ciuloaica gciuloa...@gmail.com mailto:gciuloa...@gmail.com wrote: Hello, I'm trying to find out what the problem is and where it is located. I have a 3 nodes Cassandra cluster (1.2.1), RF=3. I have a keyspace and a cf as defined (using PropertyFileSnitch): CREATE KEYSPACE backend WITH replication = { 'class': 'NetworkTopologyStrategy', 'DC-TORONTO': '3' }; USE backend; CREATE TABLE avatars ( id bigint PRIMARY KEY, avatar blob, image_type text ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND dclocal_read_repair_chance=0.00 AND gc_grace_seconds=864000 AND read_repair_chance=0.10 AND replicate_on_write='true' AND compaction={'class': 'SizeTieredCompactionStrategy'} AND compression={'sstable_compression': 'SnappyCompressor'}; Status of the cluster: Datacenter: DC-TORONTO == Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.11.1.109 44.98 MB 256 46.8% 726689df-edc3-49a0-b680-370953994a8c RAC2 UN 10.11.1.200 6.57 MB64 10.3% d6d700d4-28aa-4722-b215-a6a7d304b8e7 RAC3 UN 10.11.1.108 54.32 MB 256 42.8% 73cd86a9-4efb-4407-9fe8-9a1b3a277af7 RAC1 I'm trying to read my writes, by using CQL (datastax-java-driver), using LOCAL_QUORUM for reads and writes. For some reason, some of the writes are lost. Not sure if it is a driver issue or cassandra issue. Dinging further, using cqlsh client (1.2.1), I found a strange
Re: CQL query issue
Thank you i am able to solve this one. If i am trying as : SELECT * FROM CompositeUser WHERE userId='mevivs' LIMIT 100 ALLOW FILTERING it works. Somehow got confused by http://www.datastax.com/docs/1.2/cql_cli/cql/SELECT, which states as : SELECT select_expression FROM *keyspace_name.*table_name *WHERE clause AND clause ...* *ALLOW FILTERING**LIMIT n* *ORDER BY compound_key_2 ASC | DESC* * * *is this an issue?* * * *-Vivek* On Tue, Mar 5, 2013 at 5:21 PM, Vivek Mishra mishra.v...@gmail.com wrote: Hi, I am trying to execute a cql3 query as : SELECT * FROM CompositeUser WHERE userId='mevivs' ALLOW FILTERING LIMIT 100 and getting given below error: Caused by: InvalidRequestException(why:line 1:70 missing EOF at 'LIMIT') at org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result.read(Cassandra.java:37849) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql3_query(Cassandra.java:1562) at org.apache.cassandra.thrift.Cassandra$Client.execute_cql3_query(Cassandra.java:1547) Is there something incorrect in syntax?
Re: Ghost nodes
try assasinate from the jmx? http://nartax.com/2012/09/assassinate-cassandra-node/; I finally used this solution... It always solves the problems of ghost nodes :D. Last time I had unreachable nodes while describing cluster in CLI (as described in the link) and I used the jmx unsafeAssassinateEnpoint. This time it was a bit different since the schema was good an sync between all the nodes, but this function solved this new issue. Thanks for the answer even if I was also hopping to understand what happened and not just Assassinate the problem, at least my prod is now ok. Alain 2013/3/5 Jason Wee peich...@gmail.com try assasinate from the jmx? http://nartax.com/2012/09/assassinate-cassandra-node/ or try cassandra -Dcassandra.load_ring_state=false http://www.datastax.com/docs/1.0/references/cassandra#options On Tue, Mar 5, 2013 at 6:54 PM, Alain RODRIGUEZ arodr...@gmail.comwrote: Any clue on this ? 2013/2/25 Alain RODRIGUEZ arodr...@gmail.com Hi, I am having issues after decommissioning 3 nodes one by one of my 1.1.6 C* cluster (RF=3): On the c.164 node, which was added a week after removing the 3 nodes, with gossipinfo I have: /a.135 RPC_ADDRESS:0.0.0.0 STATUS:NORMAL,127605887595351923798765477786913079296 RELEASE_VERSION:1.1.6 RACK:1b SCHEMA:49aee81e-7c46-31bd-8e4b-dfd07d74d94c DC:eu-west LOAD:3.40954135223E11 /b.173 RPC_ADDRESS:0.0.0.0 STATUS:NORMAL,0 RELEASE_VERSION:1.1.6 RACK:1b SCHEMA:49aee81e-7c46-31bd-8e4b-dfd07d74d94c DC:eu-west LOAD:3.32757832183E11 /c.164 RPC_ADDRESS:0.0.0.0 STATUS:NORMAL,85070591730234615865843651857942052864 RELEASE_VERSION:1.1.6 RACK:1b SCHEMA:49aee81e-7c46-31bd-8e4b-dfd07d74d94c DC:eu-west LOAD:2.93726484252E11 /d.6 RPC_ADDRESS:0.0.0.0 STATUS:NORMAL,42535295865117307932921825928971026432 RELEASE_VERSION:1.1.6 RACK:1b SCHEMA:49aee81e-7c46-31bd-8e4b-dfd07d74d94c DC:eu-west LOAD:2.85020693654E11 On the 3 other nodes I see this: /a.135 RPC_ADDRESS:0.0.0.0 SCHEMA:49aee81e-7c46-31bd-8e4b-dfd07d74d94c RELEASE_VERSION:1.1.6 RACK:1b STATUS:NORMAL,127605887595351923798765477786913079296 DC:eu-west LOAD:3.40974023487E11 /10.64.167.32 RPC_ADDRESS:0.0.0.0 SCHEMA:d9adcce3-09ed-3e7f-a6a3-147d4283ed15 RELEASE_VERSION:1.1.6 RACK:1b STATUS:LEFT,28356863910078203714492389662765613056,1359823927010 DC:eu-west LOAD:1.47947624544E11 /10.250.202.154 RPC_ADDRESS:0.0.0.0 SCHEMA:d9adcce3-09ed-3e7f-a6a3-147d4283ed15 RELEASE_VERSION:1.1.6 RACK:1b STATUS:LEFT,85070591730234615865843651857942052863,1359808901882 DC:eu-west LOAD:1.45049060742E11 /b.173 RPC_ADDRESS:0.0.0.0 SCHEMA:49aee81e-7c46-31bd-8e4b-dfd07d74d94c RELEASE_VERSION:1.1.6 RACK:1b STATUS:NORMAL,0 DC:eu-west LOAD:3.32760540235E11 /c.164 RPC_ADDRESS:0.0.0.0 SCHEMA:49aee81e-7c46-31bd-8e4b-dfd07d74d94c RELEASE_VERSION:1.1.6 RACK:1b LOAD:2.93751485625E11 DC:eu-west STATUS:NORMAL,85070591730234615865843651857942052864 /10.64.103.228 RPC_ADDRESS:0.0.0.0 SCHEMA:d9adcce3-09ed-3e7f-a6a3-147d4283ed15 RELEASE_VERSION:1.1.6 RACK:1b STATUS:LEFT,141784319550391032739561396922763706367,1359893766266 DC:eu-west LOAD:2.46247802646E11 /d.6 RPC_ADDRESS:0.0.0.0 SCHEMA:49aee81e-7c46-31bd-8e4b-dfd07d74d94c RELEASE_VERSION:1.1.6 RACK:1b STATUS:NORMAL,42535295865117307932921825928971026432 DC:eu-west LOAD:2.85042986093E11 Since I removed this 3 nodes (marked as left) at least 3 weeks ago, shouldn't gossip have them totally removed for a while ? The c.164, node that doesn't show nodes that left the ring in gosssipinfo, is logging every minute the following: ... INFO [GossipStage:1] 2013-02-25 10:18:56,269 Gossiper.java (line 830) InetAddress /10.64.167.32 is now dead. INFO [GossipStage:1] 2013-02-25 10:18:56,283 Gossiper.java (line 830) InetAddress /10.250.202.154 is now dead. INFO [GossipStage:1] 2013-02-25 10:18:56,297 Gossiper.java (line 830) InetAddress /10.64.103.228 is now dead. INFO [GossipStage:1] 2013-02-25 10:19:57,700 Gossiper.java (line 830) InetAddress /10.64.167.32 is now dead. INFO [GossipStage:1] 2013-02-25 10:19:57,721 Gossiper.java (line 830) InetAddress /10.250.202.154 is now dead. INFO [GossipStage:1] 2013-02-25 10:19:57,742 Gossiper.java (line 830) InetAddress /10.64.103.228 is now dead. INFO [GossipStage:1] 2013-02-25 10:20:58,722 Gossiper.java (line 830) InetAddress /10.64.167.32 is now dead. INFO [GossipStage:1] 2013-02-25 10:20:58,739 Gossiper.java (line 830) InetAddress /10.250.202.154 is now dead. INFO [GossipStage:1] 2013-02-25 10:20:58,754 Gossiper.java (line 830) InetAddress /10.64.103.228 is now dead. ... All this look a bit weird to me, is this normal ? Alain
Re: CQL query issue
Somebody in group, please confirm if it is an issue or that needs rectified for select syntax. -Vivek On Tue, Mar 5, 2013 at 5:31 PM, Vivek Mishra mishra.v...@gmail.com wrote: Thank you i am able to solve this one. If i am trying as : SELECT * FROM CompositeUser WHERE userId='mevivs' LIMIT 100 ALLOW FILTERING it works. Somehow got confused by http://www.datastax.com/docs/1.2/cql_cli/cql/SELECT, which states as : SELECT select_expression FROM *keyspace_name.*table_name *WHERE clause AND clause ...* *ALLOW FILTERING**LIMIT n* *ORDER BY compound_key_2 ASC | DESC* * * *is this an issue?* * * *-Vivek* On Tue, Mar 5, 2013 at 5:21 PM, Vivek Mishra mishra.v...@gmail.comwrote: Hi, I am trying to execute a cql3 query as : SELECT * FROM CompositeUser WHERE userId='mevivs' ALLOW FILTERING LIMIT 100 and getting given below error: Caused by: InvalidRequestException(why:line 1:70 missing EOF at 'LIMIT') at org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result.read(Cassandra.java:37849) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql3_query(Cassandra.java:1562) at org.apache.cassandra.thrift.Cassandra$Client.execute_cql3_query(Cassandra.java:1547) Is there something incorrect in syntax?
Re: CQL query issue
This is not an issue of Cassandra. In particular http://cassandra.apache.org/doc/cql3/CQL.html#selectStmt is up to date. It is an issue of the datastax documentation however. I'll see with them that this gets resolved. On Tue, Mar 5, 2013 at 3:26 PM, Vivek Mishra mishra.v...@gmail.com wrote: Somebody in group, please confirm if it is an issue or that needs rectified for select syntax. -Vivek On Tue, Mar 5, 2013 at 5:31 PM, Vivek Mishra mishra.v...@gmail.comwrote: Thank you i am able to solve this one. If i am trying as : SELECT * FROM CompositeUser WHERE userId='mevivs' LIMIT 100 ALLOW FILTERING it works. Somehow got confused by http://www.datastax.com/docs/1.2/cql_cli/cql/SELECT, which states as : SELECT select_expression FROM *keyspace_name.*table_name *WHERE clause AND clause ...* *ALLOW FILTERING**LIMIT n* *ORDER BY compound_key_2 ASC | DESC* * * *is this an issue?* * * *-Vivek* On Tue, Mar 5, 2013 at 5:21 PM, Vivek Mishra mishra.v...@gmail.comwrote: Hi, I am trying to execute a cql3 query as : SELECT * FROM CompositeUser WHERE userId='mevivs' ALLOW FILTERING LIMIT 100 and getting given below error: Caused by: InvalidRequestException(why:line 1:70 missing EOF at 'LIMIT') at org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result.read(Cassandra.java:37849) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql3_query(Cassandra.java:1562) at org.apache.cassandra.thrift.Cassandra$Client.execute_cql3_query(Cassandra.java:1547) Is there something incorrect in syntax?
Re: what size file for LCS is best for 300-500G per node?
Thanks! Dean On 3/4/13 7:12 PM, Wei Zhu wz1...@yahoo.com wrote: We have 200G and ended going with 10M. The compaction after repair takes a day to finish. Try to run a repair and see how it goes. -Wei - Original Message - From: Dean Hiller dean.hil...@nrel.gov To: user@cassandra.apache.org Sent: Monday, March 4, 2013 10:52:27 AM Subject: what size file for LCS is best for 300-500G per node? Should we really be going with 5MB when it compresses to 3MB? That seems to be on the small side, right? We have ulimit cranked up so many files shouldn't be an issue but maybe we should go to 10MB or 100MB or something in between? Does anyone have any experience with changing the LCS sizes? I do read somewhere startup times of opening 100,000 files could be slow? Which implies a larger size so less files might be better? Thanks, Dean
Re: Trying to identify the problem with CQL ...
So, I have added more logging to the test app (comments inline). For some reason I'm loosing updates. In a for loop I'm executing upload, read writetime, download blob. Executed 10 times... See iteration number 2 and 3 1. initialize session 0[main] INFO com.datastax.driver.core.Cluster - New Cassandra host /10.11.1.109 added 1[main] INFO com.datastax.driver.core.Cluster - New Cassandra host /10.11.1.200 added COORDINATOR: 10.11.1.108 UPLOAD LENGTH:214 bytes -- uploading a blob (INSERT) UPLOAD:1154 ms Write time: 1362497519584000 ms -- reading writetime(avatar) DOWNLOAD LENGTH: 214 bytes -- download the blob (SELECT) DOWNLOAD: 134 ms --- md5 of the blob on upload --- md5 of the blob on download Upload Digest MD5 : b1944c41a25192520d33d15d00db2718 === Download Digest MD5 : b1944c41a25192520d33d15d00db2718 0 COORDINATOR: 10.11.1.109 UPLOAD LENGTH:4031 bytes UPLOAD:675 ms Write time: 1362497521493000 ms DOWNLOAD LENGTH: 4031 bytes DOWNLOAD: 135 ms Upload Digest MD5 : 20b71b77f90b3f8ae8995a7ce7f68295 === Download Digest MD5 : 20b71b77f90b3f8ae8995a7ce7f68295 1 COORDINATOR: 10.11.1.200 UPLOAD LENGTH:3392 bytes UPLOAD:668 ms Write time: 1362497556815000 ms DOWNLOAD LENGTH: 3392 bytes DOWNLOAD: 136 ms Upload Digest MD5 : 1158e1ea54d46a4d0bd45becc4523585 === Download Digest MD5 : 1158e1ea54d46a4d0bd45becc4523585 2 COORDINATOR: 10.11.1.108 UPLOAD LENGTH:253 bytes UPLOAD:668 ms Write time: 1362497556815000 ms DOWNLOAD LENGTH: 3392 bytes DOWNLOAD: 136 ms Upload Digest MD5 : fc9ce009530d6018a80c344d87b8ada4 === Download Digest MD5 : 1158e1ea54d46a4d0bd45becc4523585 3 COORDINATOR: 10.11.1.109 UPLOAD LENGTH:266 bytes UPLOAD:704 ms Write time: 1362497556815000 ms DOWNLOAD LENGTH: 3392 bytes DOWNLOAD: 136 ms Upload Digest MD5 : 5726af06e91a520deed093aba6afe112 === Download Digest MD5 : 1158e1ea54d46a4d0bd45becc4523585 4 COORDINATOR: 10.11.1.200 UPLOAD LENGTH:3082 bytes UPLOAD:901 ms Write time: 1362497562076000 ms DOWNLOAD LENGTH: 3082 bytes DOWNLOAD: 135 ms Upload Digest MD5 : fa2ea1972992cafea1c71b6c3e718058 === Download Digest MD5 : fa2ea1972992cafea1c71b6c3e718058 5 COORDINATOR: 10.11.1.108 UPLOAD LENGTH:1481 bytes UPLOAD:703 ms Write time: 1362497562076000 ms DOWNLOAD LENGTH: 3082 bytes DOWNLOAD: 135 ms Upload Digest MD5 : f208e4d3ea133fad5f9d175052ca70cf === Download Digest MD5 : fa2ea1972992cafea1c71b6c3e718058 6 COORDINATOR: 10.11.1.109 UPLOAD LENGTH:5214 bytes UPLOAD:801 ms Write time: 1362497562076000 ms DOWNLOAD LENGTH: 3082 bytes DOWNLOAD: 134 ms Upload Digest MD5 : c58d92d8273c7c9a7db76363b0b3e4c7 === Download Digest MD5 : fa2ea1972992cafea1c71b6c3e718058 7 COORDINATOR: 10.11.1.200 UPLOAD LENGTH:2992 bytes UPLOAD:665 ms Write time: 1362497567779000 ms DOWNLOAD LENGTH: 2992 bytes DOWNLOAD: 134 ms Upload Digest MD5 : 0848513c1b4214adf73c6ea5509ec294 === Download Digest MD5 : 0848513c1b4214adf73c6ea5509ec294 8 COORDINATOR: 10.11.1.108 UPLOAD LENGTH:3670 bytes UPLOAD:672 ms Write time: 1362497567779000 ms DOWNLOAD LENGTH: 2992 bytes DOWNLOAD: 136 ms Upload Digest MD5 : 27e235b9a90a22004d4098a0228ee07b === Download Digest MD5 : 0848513c1b4214adf73c6ea5509ec294 9 Thanks, Gabi On 3/5/13 1:31 PM, Gabriel Ciuloaica wrote: Hi Sylvain, thanks for fast answer. I have updated keyspace definition and cassandra-topologies.properties to all 3 nodes and restarted each node. Both problems are still reproducible. I'm not able to read my writes and also the selects shows same data as in my previous email. for write and read I'm using: private static final String WRITE_STATEMENT = INSERT INTO avatars (id, image_type, avatar) VALUES (?,?,?);; private static final String READ_STATEMENT = SELECT avatar, image_type FROM avatars WHERE id=?; I'm using java-driver (1.0.0-beta1) with prepared statement, sync calls. Write snippet: Session session; try { session = cassandraSession.getSession(); BoundStatement stmt = session.prepare(WRITE_STATEMENT) .setConsistencyLevel(ConsistencyLevel.LOCAL_QUORUM).bind(); stmt.enableTracing(); stmt.setLong(id, accountId); stmt.setString(image_type, image.getType()); stmt.setBytes(avatar, ByteBuffer.wrap(image.getBytes())); ResultSet result = session.execute(stmt); LOG.info(UPLOAD COORDINATOR: {}, result.getQueryTrace() .getCoordinator().getCanonicalHostName()); } catch (NoHostAvailableException e) { LOG.error(Could not prepare the statement., e); throw new StorageUnavailableException(e); } finally {
*-ib-* files instead of *-he-* files?
Our upgradesstables completed but I still see *he-593-Data.db files and such which from testing seem to be the 1.1.x table names. I see the new *ib-695-Data.db files. Can I safely delete the *-he-* files now? I was expecting cassandra to delete them when I was done but maybe they are there from some corruption that happened due to negligence on my part. I do know we had screwed up a few things resulting in odd size increases reported to nodetool ring. In fact, some nodes had 60gig more than other nodes while previously our cluster was exactly balanced for months. Thanks, Dean
Re: Unable to instantiate cache provider org.apache.cassandra.cache.SerializingCacheProvider
Details are here https://issues.apache.org/jira/browse/CASSANDRA-3271 Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 4/03/2013, at 8:04 AM, Jason Wee peich...@gmail.com wrote: version 1.0.8 Just curious, what is the mechanism for off heap in 1.1? Thank you. /Jason On Mon, Mar 4, 2013 at 11:49 PM, aaron morton aa...@thelastpickle.com wrote: What version are you using ? As of 1.1 off heap caches no longer require JNA https://github.com/apache/cassandra/blob/trunk/NEWS.txt#L327 Also the row and key caches are now set globally not per CF https://github.com/apache/cassandra/blob/trunk/NEWS.txt#L324 Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 1/03/2013, at 1:33 AM, Jason Wee peich...@gmail.com wrote: This happened sometime ago, but for the sake of helping others if they encounter, each column family has a row cache provider, you can read into the schema, for example : ... and row_cache_provider = 'SerializingCacheProvider' ... it cannot start the cache provider for a reason and as a result, default to the ConcurrentLinkedHashCacheProvider. the serializing cache provider require jna lib, and if you place the library into cassandra lib directory, then this warning should not happen again.
Re: backing up and restoring from only 1 replica?
Hinted Handoff works well. But it's an optimisation that has certain safety valves, configuration and throttling that means it is still not considered the way to ensure on disk consistency. In general, if a node restarts or drops mutations HH should get the message there eventually. In specific cases it may not. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 4/03/2013, at 10:40 AM, Mike Koh defmike...@gmail.com wrote: Thanks for the response. Could you elaborate more on the bad things that happen during a restart or message drops that would cause a 1 replica restore to fail? I'm completely on board with not using a restore process that nobody else uses, but I need to convince somebody else who thinks that it will work that it is not a good idea. On 3/4/2013 7:54 AM, aaron morton wrote: That would be OK only if you never had node go down (e.g. a restart) or drop messages. It's not something I would consider trying. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 28/02/2013, at 3:21 PM, Mike Koh defmike...@gmail.com wrote: It has been suggested to me that we could save a fair amount of time and money by taking a snapshot of only 1 replica (so every third node for most column families). Assuming that we are okay with not having the absolute latest data, does this have any possibility of working? I feel like it shouldn't but don't really know the argument for why it wouldn't.
Re: anyone see this user-cassandra thread get answered...
Was probably this https://issues.apache.org/jira/browse/CASSANDRA-4597 Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 4/03/2013, at 2:05 PM, Hiller, Dean dean.hil...@nrel.gov wrote: I was reading http://mail-archives.apache.org/mod_mbox/cassandra-user/201208.mbox/%3CCAGZm5drRh3VXNpHefR9UjH8H=dhad2y18s0xmam5cs4yfl5...@mail.gmail.com%3E As we are having the same issue in 1.2.2. We modify to LCS and cassandra-cli shows us at LCS on any node we run cassandra cli on, but then looking at cqlsh, it is showing us at SizeTieredCompactionStrategy :(. Thanks, Dean
Re: anyone see this user-cassandra thread get answered...
That ticket says it was fixed in 1.1.5 and we are on 1.2.2. We upgraded from 1.1.4 to 1.2.2, ran upgrade tables and watched filenames change from *-he-*.db to *-id-*.db, then changed compaction strategies and still had this issue. Is it the fact we came from 1.1.4? Ours was a very simple 4 node QA test where we setup a 1.1.4 cluster, put data in, upgraded, then upgraded tables, then switched to LCS and run upgrade tables again hoping it would use LCS. Thanks, Dean From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, March 5, 2013 9:13 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: anyone see this user-cassandra thread get answered... Was probably this https://issues.apache.org/jira/browse/CASSANDRA-4597 Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 4/03/2013, at 2:05 PM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: I was reading http://mail-archives.apache.org/mod_mbox/cassandra-user/201208.mbox/%3CCAGZm5drRh3VXNpHefR9UjH8H=dhad2y18s0xmam5cs4yfl5...@mail.gmail.com%3E As we are having the same issue in 1.2.2. We modify to LCS and cassandra-cli shows us at LCS on any node we run cassandra cli on, but then looking at cqlsh, it is showing us at SizeTieredCompactionStrategy :(. Thanks, Dean
Re: LCS and counters
Well no one says my assertion is false, so it is probably true. Going further, what would be the steps to migrate from STC to LCS ? Is there any precautions to take doing it using C*1.1.6 (like removing commit logs since drain is broken) ? Any insight or link on this procedure would be appreciated. 2013/2/25 Janne Jalkanen janne.jalka...@ecyrd.com At least for our use case (reading slices from varyingly sized rows from 10-100k composite columns with counters and hundreds of writes/second) LCS has a nice ~75% lower read latency than Size Tiered. And compactions don't stop the world anymore. Repairs do easily trigger a few hundred compactions though, but it's not that bad. /Janne On Feb 25, 2013, at 17:10 , Alain RODRIGUEZ arodr...@gmail.com wrote: Hi I am just wondering... Wouldn't it always be worth it to use LCS on counter CF since LCS is optimized for reads and that writing a counter always require a read ? Alain
Re: LCS and counters
+1. We are trying to figure that all out too. I don't know if it helps but we finally upgraded to 1.2.2 which is supposed to have better LCS support from what I understand. We did lots of QA testing and jumped from 1.1.4. Rolling restart did not work at all in QA so we went with take the whole cluster down instead and used these steps after having a cassandra-1.2.2 deployed and updating all property files correctly as well. (snapshot had to be done before drain oddly enough :( :( since drain seemed to shut down some ports that nodetool couldn't work). #on the node itself clush -g datanodes -l cassandra nodetool -h localhost -p 7199 snapshot databus5 -t 1.2.2a #drain shuts down ports so you can't do a snapshot clush -g datanodes -l cassandra nodetool drain #AS ROOT... ./cassandraStop.sh clush -g datanodes rm /opt/cassandra clush -g datanodes ln -s /opt/cassandra-1.2.2 /opt/cassandra #BACK to cassandra(it won't run as root anyways as not in path) cassandraStart.sh From: Alain RODRIGUEZ arodr...@gmail.commailto:arodr...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, March 5, 2013 9:50 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: LCS and counters Well no one says my assertion is false, so it is probably true. Going further, what would be the steps to migrate from STC to LCS ? Is there any precautions to take doing it using C*1.1.6 (like removing commit logs since drain is broken) ? Any insight or link on this procedure would be appreciated. 2013/2/25 Janne Jalkanen janne.jalka...@ecyrd.commailto:janne.jalka...@ecyrd.com At least for our use case (reading slices from varyingly sized rows from 10-100k composite columns with counters and hundreds of writes/second) LCS has a nice ~75% lower read latency than Size Tiered. And compactions don't stop the world anymore. Repairs do easily trigger a few hundred compactions though, but it's not that bad. /Janne On Feb 25, 2013, at 17:10 , Alain RODRIGUEZ arodr...@gmail.commailto:arodr...@gmail.com wrote: Hi I am just wondering... Wouldn't it always be worth it to use LCS on counter CF since LCS is optimized for reads and that writing a counter always require a read ? Alain
Re: Poor read latency
According to this: https://issues.apache.org/jira/browse/CASSANDRA-5029 Bloom filter is still on by default for LCS in 1.2.X Thanks. -Wei From: Hiller, Dean dean.hil...@nrel.gov To: user@cassandra.apache.org user@cassandra.apache.org Sent: Monday, March 4, 2013 10:42 AM Subject: Re: Poor read latency Recommended settings are 8G RAM and your memory grows with the number of rows through index samples(configured in cassandra.yaml as samples per row something…look for the word index). Also, bloomfilters grow with RAM if using size tiered compaction. We are actually trying to switch to leveled compaction in 1.2.2 as I think the default is no bloomfilters as LCS does not really need them I think since 90% of rows are in highest tier(but this just works better for certain type profiles like very heavy read vs. the number of writes). Later, Dean From: Tom Martin tompo...@gmail.commailto:tompo...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Monday, March 4, 2013 11:20 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Poor read latency Yeah, I just checked and the heap size 0.75 warning has been appearing. nodetool info reports: Heap Memory (MB) : 563.88 / 1014.00 Heap Memory (MB) : 646.01 / 1014.00 Heap Memory (MB) : 639.71 / 1014.00 We have plenty of free memory on each instance. Do we need bigger instances or should we just configure each node to have a bigger max heap? On Mon, Mar 4, 2013 at 6:10 PM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: What is nodetool info say for your memory? (we hit that one with memory near the max and it slowed down our system big time…still working on resolving it too). Do any logs have the hit 0.75, running compaction OR worse hit 0.85 running compaction….you get that if the above is the case typically. Dean From: Tom Martin tompo...@gmail.commailto:tompo...@gmail.commailto:tompo...@gmail.commailto:tompo...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Monday, March 4, 2013 10:31 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Poor read latency Hi all, We have a small (3 node) cassandra cluster on aws. We have a replication factor of 3, a read level of local_quorum and are using the ephemeral disk. We're getting pretty poor read performance and quite high read latency in cfstats. For example: Column Family: AgentHotel SSTable count: 4 Space used (live): 829021175 Space used (total): 829021175 Number of Keys (estimate): 2148352 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 67204 Read Latency: 23.813 ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Bloom Filter False Positives: 50 Bloom Filter False Ratio: 0.00201 Bloom Filter Space Used: 7635472 Compacted row minimum size: 259 Compacted row maximum size: 4768 Compacted row mean size: 873 For comparison we have a similar set up in another cluster for an old project (hosted on rackspace) where we're getting sub 1ms read latencies. We are using multigets on the client (Hector) but are only requesting ~40 rows per request on average. I feel like we should reasonably expect better performance but perhaps I'm mistaken. Is there anything super obvious we should be checking out?
Re: Poor read latency
Great to know!……0.1 though is still a heck of a lot of savings compared to 0.01(size-tiered) when using the bloomfilter calculator. Thanks for the info, Dean From: Wei Zhu wz1...@yahoo.commailto:wz1...@yahoo.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org, Wei Zhu wz1...@yahoo.commailto:wz1...@yahoo.com Date: Tuesday, March 5, 2013 11:02 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Poor read latency According to this: https://issues.apache.org/jira/browse/CASSANDRA-5029 Bloom filter is still on by default for LCS in 1.2.X Thanks. -Wei From: Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Sent: Monday, March 4, 2013 10:42 AM Subject: Re: Poor read latency Recommended settings are 8G RAM and your memory grows with the number of rows through index samples(configured in cassandra.yaml as samples per row something…look for the word index). Also, bloomfilters grow with RAM if using size tiered compaction. We are actually trying to switch to leveled compaction in 1.2.2 as I think the default is no bloomfilters as LCS does not really need them I think since 90% of rows are in highest tier(but this just works better for certain type profiles like very heavy read vs. the number of writes). Later, Dean From: Tom Martin tompo...@gmail.commailto:tompo...@gmail.commailto:tompo...@gmail.commailto:tompo...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Monday, March 4, 2013 11:20 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Poor read latency Yeah, I just checked and the heap size 0.75 warning has been appearing. nodetool info reports: Heap Memory (MB) : 563.88 / 1014.00 Heap Memory (MB) : 646.01 / 1014.00 Heap Memory (MB) : 639.71 / 1014.00 We have plenty of free memory on each instance. Do we need bigger instances or should we just configure each node to have a bigger max heap? On Mon, Mar 4, 2013 at 6:10 PM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.govmailto:dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: What is nodetool info say for your memory? (we hit that one with memory near the max and it slowed down our system big time…still working on resolving it too). Do any logs have the hit 0.75, running compaction OR worse hit 0.85 running compaction….you get that if the above is the case typically. Dean From: Tom Martin tompo...@gmail.commailto:tompo...@gmail.commailto:tompo...@gmail.commailto:tompo...@gmail.commailto:tompo...@gmail.commailto:tompo...@gmail.commailto:tompo...@gmail.commailto:tompo...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Monday, March 4, 2013 10:31 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Poor read latency Hi all, We have a small (3 node) cassandra cluster on aws. We have a replication factor of 3, a read level of local_quorum and are using the ephemeral disk. We're getting pretty poor read performance and quite high read latency in cfstats. For example: Column Family: AgentHotel SSTable count: 4 Space used (live): 829021175 Space used (total): 829021175 Number of Keys (estimate): 2148352 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 67204 Read Latency: 23.813 ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Bloom Filter False Positives: 50 Bloom Filter
odd timestamps
I happened to notice some bizarre timestamps coming out of the cassandra-cli. Example: [default@XXX] get CF[‘e2b753aa33b13e74e5e803d787b06000']; = (column=c35ef420-c37a-11e0-ac88-09b2f4397c6a, value=XXX, timestamp=2013042719) = (column=c3845ea0-c37a-11e0-8f6f-09b2f4397c6a, value=XXX, timestamp=2013287771) = (column=c3993840-c37a-11e0-a069-09b2f4397c6a, value=XXX timestamp=2013423245) = (column=c39a9040-c37a-11e0-8617-09b2f4397c6a, value=XXX, timestamp=2013431971) Returned 4 results. I'm used to timestamps being micro from 1970, like this (same CF): [default@XXX] get CF[‘3a4599767e16e94465e8491139154871']; = (column=6c8e3160-4678-11e2-b69a-3534db9e8cfa, value=XXX, timestamp=1355549380581630) = (column=6c91f3a0-4678-11e2-bc00-3534db9e8cfa, value=XXX, timestamp=1355549380606285) = (column=6c980f00-4678-11e2-9963-3534db9e8cfa, value=XXX, timestamp=1355549380646378) = (column=6c994100-4678-11e2-955f-3534db9e8cfa, value=XXX, timestamp=1355549380654057) = (column=6c9e7950-4678-11e2-9189-3534db9e8cfa, value=XXX timestamp=1355549380688268) Returned 5 results. Was there a version of cassandra that wrote bad timestamps at the server level? I started around 0.8.4, and I'm running 1.1.2 right now. Could it have been a client with a bug? I've only been using phpcassa and the CLI to read/write data (no CQL). Is there another theory? I don't _think_ these odd timestamps will cause me problems. I just don't like unexpected results from my data stores... :-) will
Re: odd timestamps
There is no exact spec on timestamp the convention is micros from epoch but you are free to use anything you want. To update a column you only need a timestamp higher then the original. On Tue, Mar 5, 2013 at 1:55 PM, Hiller, Dean dean.hil...@nrel.gov wrote: Yes, clients can write timestamps in those versions….I am not sure about the newer versions as I seem to remember reading something on that. Dean From: William Oberman ober...@civicscience.commailto: ober...@civicscience.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, March 5, 2013 11:37 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: odd timestamps I happened to notice some bizarre timestamps coming out of the cassandra-cli. Example: [default@XXX] get CF[‘e2b753aa33b13e74e5e803d787b06000']; = (column=c35ef420-c37a-11e0-ac88-09b2f4397c6a, value=XXX, timestamp=2013042719) = (column=c3845ea0-c37a-11e0-8f6f-09b2f4397c6a, value=XXX, timestamp=2013287771) = (column=c3993840-c37a-11e0-a069-09b2f4397c6a, value=XXX timestamp=2013423245) = (column=c39a9040-c37a-11e0-8617-09b2f4397c6a, value=XXX, timestamp=2013431971) Returned 4 results. I'm used to timestamps being micro from 1970, like this (same CF): [default@XXX] get CF[‘3a4599767e16e94465e8491139154871']; = (column=6c8e3160-4678-11e2-b69a-3534db9e8cfa, value=XXX, timestamp=1355549380581630) = (column=6c91f3a0-4678-11e2-bc00-3534db9e8cfa, value=XXX, timestamp=1355549380606285) = (column=6c980f00-4678-11e2-9963-3534db9e8cfa, value=XXX, timestamp=1355549380646378) = (column=6c994100-4678-11e2-955f-3534db9e8cfa, value=XXX, timestamp=1355549380654057) = (column=6c9e7950-4678-11e2-9189-3534db9e8cfa, value=XXX timestamp=1355549380688268) Returned 5 results. Was there a version of cassandra that wrote bad timestamps at the server level? I started around 0.8.4, and I'm running 1.1.2 right now. Could it have been a client with a bug? I've only been using phpcassa and the CLI to read/write data (no CQL). Is there another theory? I don't _think_ these odd timestamps will cause me problems. I just don't like unexpected results from my data stores... :-) will
Re: is upgradesstables required for 1.1.4 to 1.2.2? (I don't think it is)
To answer my own question here, we tested this out in QA and then ran in production with no issues Step 1. Upgrade to 1.2.2 Step 2. Start up all nodes It works great. There was no need to run upgradesstables. That said, we are doing a rolling upgradesstables on every node in production right now(we did test that in QA as well of course). Doing so renames all the *-he-* files to *-ib-* files. 1.2.2 seems to be able to do this without affecting cluster performance as well where 1.1.4, this was a big performance impact(we usually removed the node from the cluster to do it). It is slow but only because we have not raised the mb_throughput for compactions at this point(we may do that overnight though and turn it back down later). So to answer the question of how do you upgrade sstables before starting 1.2.2you don't...upgrade first and all rows seem to be accessible immediately as it reads the old format...and some nodes even start compacting to the new format before you get to upgrading the tables yourself on those nodes. Also, it should delete the *-he-* files from our testing (though the snapshot ones stick around of course so you have a back up plan...though we are nearing to the point where we will delete those...at least once we are fully on all *-ib-* files. I hope that helps someone who googles for it. Later, Dean On 2/28/13 8:45 AM, Michael Kjellman mkjell...@barracuda.com wrote: You won't be able to stream them. You need to run upgradesstables between majors. Best, Michael On Feb 27, 2013, at 11:15 PM, Michal Michalski mich...@opera.com wrote: I'm currently migrating 1.1.0 to 1.2.1 and on our small CI cluster, that I was testing some stuff on, it seems that it's not required to run upgradesstables (this doc doesn't mention about it too: http://www.datastax.com/docs/1.2/install/upgrading but the previous versions did). Of course I'd like to upgrade them sooner or later (in case of another C* upgrade or so), but for me it seems like it's just going to work (Cassandra is able to read data files created by the previous version, but the inverse is not always true.) and compactions will slowly convert old-version SSTables to new ones if I don't do it manually. M. W dniu 27.02.2013 20:40, Hiller, Dean pisze: H, wouldn't I have to run upgradesstables BEFORE I start the 1.2.2 node? But running upgradesstables as I recall required cassandra to be running.so does it somehow understand the old format when it starts I suspect? I am thinking I just keep the node out of the ring while I run the upgradesstables, correct? But of course am not sure how to start a 1.2.2 node such that it does not join the cluster. Thanks, Dean On 2/27/13 12:31 PM, Hiller, Dean dean.hil...@nrel.gov wrote: Hmmm, I have this info from Aaron, but what about bringing up version 1.2.2 with thrift off so I can run upgradesstables before I rejoin the ring? Quote from Aaron... In pre 1.2 add these jvm startup params -Dcassandra.join_ring=false -Dcassandra.start_rpc=false Thanks, Dean On 2/27/13 12:00 PM, Michael Kjellman mkjell...@barracuda.com wrote: Yes, it's required between majors. Which your upgrade would be. On 2/27/13 10:54 AM, Hiller, Dean dean.hil...@nrel.gov wrote: My script to upgrade our first node in QA is thus (basically, snapshot, drain, stop, then switch over then start)Š #!/bin/bash export NODE=$1 export VERSION=1.1.4 export USER=cassandra #NOTE: This script requires you have cassandra 1.2.2 in /opt/cassandra-1.2.2 but # feel free to modify if you like #Move the newest cassandra.yaml to the node scp cassandra.yaml $USER@$NODE:/opt/cassandra/conf #As cassandra user, snapshot then drain the node # and finally shut down cassandra on that node ssh $USER@$NODE \EOF nodetool snapshot $VERSION nodetool drain pkill -f 'java.*cassandra' EOF #Now, our .bashrc for cassandra has /opt/cassandra/bin in it's path # so we unlink and the link to the new cassandra as root since only root has # access to the opt directory. ssh root@$NODE \EOF rm /opt/cassandra ln -s /opt/cassandra-1.2.2 /opt/cassandra EOF #We should start cassandra ourselves probablyso we can watch the cluster as it joins the node #especially for the very first node we do... #Now as cassandra user, start up the cassandra node and then do manual health checks #ssh $USER@$NODE \EOF # cassandra #EOF Copy, by Barracuda, helps you store, protect, and share all your amazing things. Start today: www.copy.com. Copy, by Barracuda, helps you store, protect, and share all your amazing things. Start today: www.copy.com.
Re: Consistent problem when solve Digest mismatch
Otherwise, it means the version conflict solving strong depends on global sequence id (timestamp) which need provide by client ? Yes. If you have an area of your data model that has a high degree of concurrency C* may not be the right match. In 1.1 we have atomic updates so clients see either the entire write or none of it. And sometimes you can design a data model that does mutate shared values, but writes ledger entries instead. See Matt Denis talk here http://www.datastax.com/events/cassandrasummit2012/presentations or this post http://thelastpickle.com/2012/08/18/Sorting-Lists-For-Humans/ Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 4/03/2013, at 4:30 PM, Jason Tang ares.t...@gmail.com wrote: Hi The timestamp provided by my client is unix timestamp (with ntp), and as I said, due to the ntp drift, the local unix timestamp is not accurately synchronized (compare to my case). So for short, client can not provide global sequence number to indicate the event order. But I wonder, I configured Cassandra consistency level as write QUORUM. So for one record, I suppose Cassandra has the ability to decide the final update results. Otherwise, it means the version conflict solving strong depends on global sequence id (timestamp) which need provide by client ? //Tang 2013/3/4 Sylvain Lebresne sylv...@datastax.com The problem is, what is the sequence number you are talking about is exactly? Or let me put it another way: if you do have a sequence number that provides a total ordering of your operation, then that is exactly what you should use as your timestamp. What Cassandra calls the timestamp, is exactly what you call seqID, it's the number Cassandra uses to decide the order of operation. Except that in real life, provided you have more than one client talking to Cassandra, then providing a total ordering of operation is hard, and in fact not doable efficiently. So in practice, people use unix timestamp (with ntp) which provide a very good while cheap approximation of the real life order of operations. But again, if you do know how to assign a more precise timestamp, Cassandra let you use that: you can provid your own timestamp (using unix timestamp is just the default). The point being, unix timestamp is the better approximation we have in practice. -- Sylvain On Mon, Mar 4, 2013 at 9:26 AM, Jason Tang ares.t...@gmail.com wrote: Hi Previous I met a consistency problem, you can refer the link below for the whole story. http://mail-archives.apache.org/mod_mbox/cassandra-user/201206.mbox/%3CCAFb+LUxna0jiY0V=AvXKzUdxSjApYm4zWk=ka9ljm-txc04...@mail.gmail.com%3E And after check the code, seems I found some clue of the problem. Maybe some one can check this. For short, I have Cassandra cluster (1.0.3), The consistency level is read/write quorum, replication_factor is 3. Here is event sequence: seqID NodeA NodeB NodeC 1. New New New 2. Update Update Update 3. Delete Delete When try to read from NodeB and NodeC, Digest mismatch exception triggered, so Cassandra try to resolve this version conflict. But the result is value Update. Here is the suspect root cause, the version conflict resolved based on time stamp. Node C local time is a bit earlier then node A. Update requests sent from node C with time stamp 00:00:00.050, Delete sent from node A with time stamp 00:00:00.020, which is not same as the event sequence. So the version conflict resolved incorrectly. It is true? If Yes, then it means, consistency level can secure the conflict been found, but to solve it correctly, dependence one time synchronization's accuracy, e.g. NTP ?
Re: hinted handoff disabling trade-offs
The advantage of HH is that it reduces the probability of a DigestMismatch when using a CL ONE. A DigestMismatch means the read has to run a second time before returning to the client. - No risk of hinted-handoffs building up - No risk of hinted-handoffs flooding a node that just came up See the yaml config settings for the max hint window and the throttling. Can anyone suggest any other factors that I'm missing here. Specifically reasons not to do this. If you are doing this for performance first make sure your data model is efficient, that you are doing the most efficient reads (see my presentation here http://www.datastax.com/events/cassandrasummit2012/presentations), and your caching is bang on. Then consider if you can tune the CL, and if your client is token aware so it directs traffic to a node that has it. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 4/03/2013, at 9:19 PM, Michael Kjellman mkjell...@barracuda.com wrote: Also, if you have enough hints being created that its significantly impacting your heap I have a feeling things are going to get out of sync very quickly. On Mar 4, 2013, at 9:17 PM, Wz1975 wz1...@yahoo.com wrote: Why do you think disabling hinted handoff will improve memory usage? Thanks. -Wei Sent from my Samsung smartphone on ATT Original message Subject: Re: hinted handoff disabling trade-offs From: Michael Kjellman mkjell...@barracuda.com To: user@cassandra.apache.org user@cassandra.apache.org CC: Repair is slow. On Mar 4, 2013, at 8:07 PM, Matt Kap matvey1...@gmail.com wrote: I am looking to get a second opinion about disabling hinted-handoffs. I have an application that can tolerate a fair amount of inconsistency (advertising domain), and so I'm weighting the pros and cons of hinted handoffs. I'm running Cassandra 1.0, looking to upgrade to 1.1 soon. Pros of disabling hinted handoffs: - Reduces heap - Improves GC performance - No risk of hinted-handoffs building up - No risk of hinted-handoffs flooding a node that just came up Cons - Some writes can be lost, at least until repair runs Can anyone suggest any other factors that I'm missing here. Specifically reasons not to do this. Cheers! -Matt Copy, by Barracuda, helps you store, protect, and share all your amazing things. Start today: www.copy.com. -- Copy, by Barracuda, helps you store, protect, and share all your amazing things. Start today: www.copy.com.
Re: Replacing dead node when num_tokens is used
AFAIK you just fire up the new one and let nature take it's course :) http://www.datastax.com/docs/1.2/operations/add_replace_nodes#replace-node i.e. you do not need to use -Dcassandra.replace_token. Hope that helps. - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 5/03/2013, at 1:06 AM, Jan Kesten j.kes...@enercast.de wrote: Hello, while trying out cassandra I read about the steps necessary to replace a dead node. In my test cluster I used a setup using num_tokens instead of initial_tokens. How do I replace a dead node in this scenario? Thanks, Jan
Re: old data / tombstones are not deleted after ttl
If you have a data model with long lived and frequently updated rows, you can get around the all fragments problem by running a user defined compaction. Look for the CompactionManagerMbean on the JMX API https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionManagerMBean.java#L67 Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 5/03/2013, at 1:52 AM, Michal Michalski mich...@opera.com wrote: I have read in the documentation, that after a major compaction, minor compactions are no longer automatically trigger. Does this mean, that I have to do the nodetool compact regulary? Or is there a way to get back to the automatically minor compactions? I think it's one of the most confusing parts of C* docs. There's nothing like a switch for minor compactions that gets magically turned off when you trigger major compaction. Minor compactions won't get trigerred automatically for _some_ time, because you'll only have one gargantuan SSTable and unless you get enough new (smaller) SSTables to get them compacted together (4 by default), no compactions will kick in. Of course you'll still have one huge SSTable and it will take a lot of time to get another 3 of similar size to get them compacted. I think that it will be a problem for your TTL-based data model, as you'll have tons of Tombstones in the newer/smaller SSTables that you won't be able to compact together with the huge SSTable containing data. BTW: As far as I remember, there was an external tool (I don't remember the name) allowing to split SSTables - I didn't use it, so I can't suggest you using it, but you may want to give it a try. M. W dniu 05.03.2013 09:46, Matthias Zeilinger pisze: Short question afterwards: I have read in the documentation, that after a major compaction, minor compactions are no longer automatically trigger. Does this mean, that I have to do the nodetool compact regulary? Or is there a way to get back to the automatically minor compactions? Thx, Br, Matthias Zeilinger Production Operation – Shared Services P: +43 (0) 50 858-31185 M: +43 (0) 664 85-34459 E: matthias.zeilin...@bwinparty.com bwin.party services (Austria) GmbH Marxergasse 1B A-1030 Vienna www.bwinparty.com -Original Message- From: Matthias Zeilinger [mailto:matthias.zeilin...@bwinparty.com] Sent: Dienstag, 05. März 2013 08:03 To: user@cassandra.apache.org Subject: RE: old data / tombstones are not deleted after ttl Yes it was a major compaction. I know it´s not a great solution, but I needed something to get rid of the old data, because I went out of diskspace. Br, Matthias Zeilinger Production Operation – Shared Services P: +43 (0) 50 858-31185 M: +43 (0) 664 85-34459 E: matthias.zeilin...@bwinparty.com bwin.party services (Austria) GmbH Marxergasse 1B A-1030 Vienna www.bwinparty.com -Original Message- From: Michal Michalski [mailto:mich...@opera.com] Sent: Dienstag, 05. März 2013 07:47 To: user@cassandra.apache.org Subject: Re: old data / tombstones are not deleted after ttl Was it a major compaction? I ask because it's definitely a solution that had to work, but it's also a solution that - in general - probably no-one here would suggest you to use. M. W dniu 05.03.2013 07:08, Matthias Zeilinger pisze: Hi, I have done a manually compaction over the nodetool and this worked. But thx for the explanation, why it wasn´t compacted Br, Matthias Zeilinger Production Operation – Shared Services P: +43 (0) 50 858-31185 M: +43 (0) 664 85-34459 E: matthias.zeilin...@bwinparty.com bwin.party services (Austria) GmbH Marxergasse 1B A-1030 Vienna www.bwinparty.com From: Bryan Talbot [mailto:btal...@aeriagames.com] Sent: Montag, 04. März 2013 23:36 To: user@cassandra.apache.org Subject: Re: old data / tombstones are not deleted after ttl Those older files won't be included in a compaction until there are min_compaction_threshold (4) files of that size. When you get another SS table -Data.db file that is about 12-18GB then you'll have 4 and they will be compacted together into one new file. At that time, if there are any rows with only tombstones that are all older than gc_grace the row will be removed (assuming the row exists exclusively in the 4 input SS tables). Columns with data that is more than TTL seconds old will be written with a tombstone. If the row does have column values in SS tables that are not being compacted, the row will not be removed. -Bryan On Sun, Mar 3, 2013 at 11:07 PM, Matthias Zeilinger matthias.zeilin...@bwinparty.commailto:matthias.zeilin...@bwinparty.com wrote: Hi, I´m running Cassandra 1.1.5 and have following issue. I´m using a 10 days TTL on my CF. I can see a lot of tombstones in there, but they aren´t deleted
Re: what size file for LCS is best for 300-500G per node?
Don't forget you can test things http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 5/03/2013, at 7:37 AM, Hiller, Dean dean.hil...@nrel.gov wrote: Thanks! Dean On 3/4/13 7:12 PM, Wei Zhu wz1...@yahoo.com wrote: We have 200G and ended going with 10M. The compaction after repair takes a day to finish. Try to run a repair and see how it goes. -Wei - Original Message - From: Dean Hiller dean.hil...@nrel.gov To: user@cassandra.apache.org Sent: Monday, March 4, 2013 10:52:27 AM Subject: what size file for LCS is best for 300-500G per node? Should we really be going with 5MB when it compresses to 3MB? That seems to be on the small side, right? We have ulimit cranked up so many files shouldn't be an issue but maybe we should go to 10MB or 100MB or something in between? Does anyone have any experience with changing the LCS sizes? I do read somewhere startup times of opening 100,000 files could be slow? Which implies a larger size so less files might be better? Thanks, Dean
Re: Replacing dead node when num_tokens is used
Hello Aaron, thanks for your reply. Found it just an hour ago on my own, yesterday I accidentally looked at the 1.0 docs. Right now my replacement node is streaming from the others - than more testing can follow. Thanks again, Jan
Re: *-ib-* files instead of *-he-* files?
he-593-Data.db Search the logs to see if there are any messages for the sstable. Can I safely delete the *-he-* files now? Maybe, maybe not. Personally I would not. Others can say yes. Restart and see if it get's opened in the logs. if it's still there you can either run another upgrade tables which will upgrade ALL the sstables for the CF. Or shut down the node, move the ib- files (keep the one with the highes number there) , start it without thrift and gossip, upgrade, shutdown, put the files back, restart. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 5/03/2013, at 7:41 AM, Hiller, Dean dean.hil...@nrel.gov wrote: Our upgradesstables completed but I still see *he-593-Data.db files and such which from testing seem to be the 1.1.x table names. I see the new *ib-695-Data.db files. Can I safely delete the *-he-* files now? I was expecting cassandra to delete them when I was done but maybe they are there from some corruption that happened due to negligence on my part. I do know we had screwed up a few things resulting in odd size increases reported to nodetool ring. In fact, some nodes had 60gig more than other nodes while previously our cluster was exactly balanced for months. Thanks, Dean
Re: anyone see this user-cassandra thread get answered...
bah, think I got confused by looking at the version in the email you linked to. if the update CF call is not working, and this is QA, run it with DEBUG logging and file a bug here https://issues.apache.org/jira/browse/CASSANDRA Thanks - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 5/03/2013, at 8:29 AM, Hiller, Dean dean.hil...@nrel.gov wrote: That ticket says it was fixed in 1.1.5 and we are on 1.2.2. We upgraded from 1.1.4 to 1.2.2, ran upgrade tables and watched filenames change from *-he-*.db to *-id-*.db, then changed compaction strategies and still had this issue. Is it the fact we came from 1.1.4? Ours was a very simple 4 node QA test where we setup a 1.1.4 cluster, put data in, upgraded, then upgraded tables, then switched to LCS and run upgrade tables again hoping it would use LCS. Thanks, Dean From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, March 5, 2013 9:13 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: anyone see this user-cassandra thread get answered... Was probably this https://issues.apache.org/jira/browse/CASSANDRA-4597 Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 4/03/2013, at 2:05 PM, Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote: I was reading http://mail-archives.apache.org/mod_mbox/cassandra-user/201208.mbox/%3CCAGZm5drRh3VXNpHefR9UjH8H=dhad2y18s0xmam5cs4yfl5...@mail.gmail.com%3E As we are having the same issue in 1.2.2. We modify to LCS and cassandra-cli shows us at LCS on any node we run cassandra cli on, but then looking at cqlsh, it is showing us at SizeTieredCompactionStrategy :(. Thanks, Dean
Re: any way to reset timestamp in cassandra to work around cassandra bug?
Short answer is no. Medium answer is yes but you want like it. Medium to Long answer is to remove data with a high timestamp you need to delete it with a higher time stamp, and make sure it is purged in compaction by reducing gc_grace. But, if this is the schema it's probably best to take a very good look at things first, and deleting the schema kind of sucks. Check the timestamps by looking at the CF's in the system key space. Turn on DEBUG logging when you do the schema migration and see the timestamp used in the update. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 5/03/2013, at 9:14 AM, Hiller, Dean dean.hil...@nrel.gov wrote: I read this Several schema bugs have been fixed since 1.1.1. Let us know if you can reproduce in 1.1.4. You may need to recreate the schema because 1.1.1 used an incorrectly-high timestamp on the original creation. Is there any way to reset that timestamp on all my CF's as we upgraded to 1.2.2 and are trying to switch to LCS but most likely this timestamp issue is the root of our problem I think? Thanks, Dean
Re: Can I create a counter column family with many rows in 1.1.10?
Note that CQL 3 in 1.1 is compatible with CQL 3 in 1.2. Also you do not have to use CQL 3, you can still use the cassandra-cli to create CF's. The syntax you use to populate it depends on the client you are using. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 5/03/2013, at 9:16 PM, Abhijit Chanda abhijit.chan...@gmail.com wrote: Yes you can , you just have to use CQL3 and 1.1.10 onward cassandra supports CQL3. Just you have to aware of the fact that a column family that contains a counter column can only contain counters. In other other words either all the columns of the column family excluding KEY have the counter type or none of them can have it. Best Regards, -- Abhijit Chanda +91-974395
Re: Can I create a counter column family with many rows in 1.1.10?
Thanks @aaron for the rectification On Wed, Mar 6, 2013 at 1:17 PM, aaron morton aa...@thelastpickle.comwrote: Note that CQL 3 in 1.1 is compatible with CQL 3 in 1.2. Also you do not have to use CQL 3, you can still use the cassandra-cli to create CF's. The syntax you use to populate it depends on the client you are using. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 5/03/2013, at 9:16 PM, Abhijit Chanda abhijit.chan...@gmail.com wrote: Yes you can , you just have to use CQL3 and 1.1.10 onward cassandra supports CQL3. Just you have to aware of the fact that a column family that contains a counter column can only contain counters. In other other words either all the columns of the column family excluding KEY have the counter type or none of them can have it. Best Regards, -- Abhijit Chanda +91-974395 -- Abhijit Chanda +91-974395