Re: Cassandra 1.0 hangs during GC
Can you provide output from sar command for the time period when long GC occurred ? Regards, Wojciech Meler
Re: Cassandra 1.0 hangs during GC
48 G of Ram on that machine, swap is not used. I will disable swap at all just in case I have 4 cassandra processes (parts of 4 different clusters), each allocated 8 GB and using 4 of them java -version java version 1.7.0 Java(TM) SE Runtime Environment (build 1.7.0-b147) Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode) 23.07.2012, 20:12, Joost van de Wijgerd jwijg...@gmail.com: Howmuch memory do you have on the machine. Seems like you have 8G reserved for the Cassandra java process, If this is all the memory on the machine you might be swapping. Also which jvm do you use? kind regards Joost On Mon, Jul 23, 2012 at 10:07 AM, Nikolay Kоvshov nkovs...@yandex.ru wrote: 21th I have mirgated to cassandra 1.1.2 but see no improvement cat /var/log/cassandra/Earth1.log | grep GC for INFO [ScheduledTasks:1] 2012-05-22 17:42:48,445 GCInspector.java (line 123) GC for ParNew: 345 ms for 1 collections, 82451888 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-05-23 02:47:13,911 GCInspector.java (line 123) GC for ParNew: 312 ms for 1 collections, 110617416 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-05-23 11:57:54,317 GCInspector.java (line 123) GC for ParNew: 298 ms for 1 collections, 98161920 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-07-02 08:52:37,019 GCInspector.java (line 123) GC for ParNew: 196886 ms for 1 collections, 2310058496 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-07-16 17:41:25,940 GCInspector.java (line 123) GC for ParNew: 200146 ms for 1 collections, 2345987088 used; max is 8464105472 === Migrated from 1.0.0 to 1.1.2 INFO [ScheduledTasks:1] 2012-07-21 09:05:08,280 GCInspector.java (line 122) GC for ParNew: 282 ms for 1 collections, 466406864 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-07-21 12:38:43,132 GCInspector.java (line 122) GC for ParNew: 233 ms for 1 collections, 405269504 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-07-22 02:29:09,596 GCInspector.java (line 122) GC for ParNew: 253 ms for 1 collections, 389700768 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-07-22 17:45:46,357 GCInspector.java (line 122) GC for ParNew: 57391 ms for 1 collections, 400083984 used; max is 8464105472 Memory and yaml memory-related settings are default I do not do deletes I have 2 CF's and no secondary indexes LiveRatio's: INFO [pool-1-thread-1] 2012-06-09 02:36:07,759 Memtable.java (line 177) CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted was 1.0). calculation took 85ms for 6236 columns INFO [MemoryMeter:1] 2012-07-21 09:04:47,614 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 (just-counted was 1.0). calculation took 8ms for 1 columns INFO [MemoryMeter:1] 2012-07-21 09:04:51,012 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted was 1.0). calculation took 99ms for 1094 columns INFO [MemoryMeter:1] 2012-07-21 09:04:51,331 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 (just-counted was 1.0). calculation took 80ms for 242 columns INFO [MemoryMeter:1] 2012-07-21 09:04:51,856 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted was 1.0). calculation took 505ms for 2678 columns INFO [MemoryMeter:1] 2012-07-21 09:04:52,881 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted was 1.0). calculation took 776ms for 5236 columns INFO [MemoryMeter:1] 2012-07-21 09:04:52,945 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 (just-counted was 1.0). calculation took 64ms for 389 columns INFO [MemoryMeter:1] 2012-07-21 09:04:55,162 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted was 1.0). calculation took 1378ms for 8948 columns INFO [MemoryMeter:1] 2012-07-21 09:04:55,304 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 (just-counted was 1.0). calculation took 140ms for 1082 columns INFO [MemoryMeter:1] 2012-07-21 09:05:08,439 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 2.5763038186160894 (just-counted was 2.5763038186160894). calculation took 8796ms for 102193 columns 18.07.2012, 07:51, aaron morton aa...@thelastpickle.com: Assuming all the memory and yaml settings default that does not sound right. The first thought would be the memory meter not counting correctly... Do you do a lot of deletes ? Do you have a lot of CF's and/or secondary indexes ? Can you see log lines about the liveRatio for your cf's ? I would upgrade to 1.0.10 before getting too carried away though. Cheers - Aaron Morton Freelance Developer
Re: Cassandra 1.0 hangs during GC
I ran sar only recently after your advice and did not meet any huge GC-s on that server At 08:14 there was a GC lasting 4.5 seconds, that's not five minutes of course, but also quite an unpleasant value; Still I'm waiting for big GC values and will provide according sar logs. 07:25:01 PM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s%vmeff 07:35:01 PM 0.00 4.34 9.02 0.00 13.61 0.00 0.00 0.00 0.00 07:45:01 PM 0.00 5.17 20.47 0.00 25.77 0.00 0.00 0.00 0.00 07:55:01 PM 0.00 4.66 8.63 0.00 18.69 0.00 0.00 0.00 0.00 08:05:01 PM 0.00 8.11 8.84 0.00 14.37 0.00 0.00 0.00 0.00 08:15:01 PM 0.00 5.19 21.65 0.00 25.94 0.00 0.00 0.00 0.00 24.07.2012, 10:22, Wojciech Meler wojciech.me...@gmail.com: Can you provide output from sar command for the time period when long GC occurred ? Regards, Wojciech Meler
Re: Cassandra 1.0 hangs during GC
You are better off using Sun Java 6 to run Cassandra. In the past there were issues reported on 7. Can you try running it on Sun Java 6? kind regards Joost On Tue, Jul 24, 2012 at 10:04 AM, Nikolay Kоvshov nkovs...@yandex.ru wrote: 48 G of Ram on that machine, swap is not used. I will disable swap at all just in case I have 4 cassandra processes (parts of 4 different clusters), each allocated 8 GB and using 4 of them java -version java version 1.7.0 Java(TM) SE Runtime Environment (build 1.7.0-b147) Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode) 23.07.2012, 20:12, Joost van de Wijgerd jwijg...@gmail.com: Howmuch memory do you have on the machine. Seems like you have 8G reserved for the Cassandra java process, If this is all the memory on the machine you might be swapping. Also which jvm do you use? kind regards Joost On Mon, Jul 23, 2012 at 10:07 AM, Nikolay Kоvshov nkovs...@yandex.ru wrote: 21th I have mirgated to cassandra 1.1.2 but see no improvement cat /var/log/cassandra/Earth1.log | grep GC for INFO [ScheduledTasks:1] 2012-05-22 17:42:48,445 GCInspector.java (line 123) GC for ParNew: 345 ms for 1 collections, 82451888 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-05-23 02:47:13,911 GCInspector.java (line 123) GC for ParNew: 312 ms for 1 collections, 110617416 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-05-23 11:57:54,317 GCInspector.java (line 123) GC for ParNew: 298 ms for 1 collections, 98161920 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-07-02 08:52:37,019 GCInspector.java (line 123) GC for ParNew: 196886 ms for 1 collections, 2310058496 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-07-16 17:41:25,940 GCInspector.java (line 123) GC for ParNew: 200146 ms for 1 collections, 2345987088 used; max is 8464105472 === Migrated from 1.0.0 to 1.1.2 INFO [ScheduledTasks:1] 2012-07-21 09:05:08,280 GCInspector.java (line 122) GC for ParNew: 282 ms for 1 collections, 466406864 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-07-21 12:38:43,132 GCInspector.java (line 122) GC for ParNew: 233 ms for 1 collections, 405269504 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-07-22 02:29:09,596 GCInspector.java (line 122) GC for ParNew: 253 ms for 1 collections, 389700768 used; max is 8464105472 INFO [ScheduledTasks:1] 2012-07-22 17:45:46,357 GCInspector.java (line 122) GC for ParNew: 57391 ms for 1 collections, 400083984 used; max is 8464105472 Memory and yaml memory-related settings are default I do not do deletes I have 2 CF's and no secondary indexes LiveRatio's: INFO [pool-1-thread-1] 2012-06-09 02:36:07,759 Memtable.java (line 177) CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted was 1.0). calculation took 85ms for 6236 columns INFO [MemoryMeter:1] 2012-07-21 09:04:47,614 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 (just-counted was 1.0). calculation took 8ms for 1 columns INFO [MemoryMeter:1] 2012-07-21 09:04:51,012 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted was 1.0). calculation took 99ms for 1094 columns INFO [MemoryMeter:1] 2012-07-21 09:04:51,331 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 (just-counted was 1.0). calculation took 80ms for 242 columns INFO [MemoryMeter:1] 2012-07-21 09:04:51,856 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted was 1.0). calculation took 505ms for 2678 columns INFO [MemoryMeter:1] 2012-07-21 09:04:52,881 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted was 1.0). calculation took 776ms for 5236 columns INFO [MemoryMeter:1] 2012-07-21 09:04:52,945 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 (just-counted was 1.0). calculation took 64ms for 389 columns INFO [MemoryMeter:1] 2012-07-21 09:04:55,162 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 (just-counted was 1.0). calculation took 1378ms for 8948 columns INFO [MemoryMeter:1] 2012-07-21 09:04:55,304 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 (just-counted was 1.0). calculation took 140ms for 1082 columns INFO [MemoryMeter:1] 2012-07-21 09:05:08,439 Memtable.java (line 213) CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 2.5763038186160894 (just-counted was 2.5763038186160894). calculation took 8796ms for 102193 columns 18.07.2012, 07:51, aaron morton aa...@thelastpickle.com: Assuming all the memory and yaml settings default that does not sound right. The first thought would be the memory meter not counting correctly... Do you do a lot of
keyspace no longer modifiable
Greetings. We have a very strange problem: it seems that sometimes our keyspaces become immodifiable. user@server:~$ cqlsh -3 -k goh_master cassandra1 Connected to GOH Cluster at cassandra1:9160. [cqlsh 2.2.0 | Cassandra 1.1.2 | CQL spec 3.0.0 | Thrift protocol 19.32.0] Use HELP for help. cqlsh:goh_master drop columnfamily agents_blueprints; cqlsh:goh_master [Here i disconnected, just in case. It's exactly the same if I don't do this.] user@server:~$ cqlsh -3 -k goh_master cassandra1 Connected to GOH Cluster at cassandra1:9160. [cqlsh 2.2.0 | Cassandra 1.1.2 | CQL spec 3.0.0 | Thrift protocol 19.32.0] Use HELP for help. cqlsh:goh_master DESCRIBE COLUMNFAMILY agents_blueprints CREATE TABLE agents_blueprints ( agent_id ascii, archetype ascii, proto_id ascii, PRIMARY KEY (agent_id, archetype) ) WITH COMPACT STORAGE AND comment='' AND caching='KEYS_ONLY' AND read_repair_chance=0.10 AND gc_grace_seconds=864000 AND min_compaction_threshold=4 AND max_compaction_threshold=32 AND replicate_on_write='true' AND compaction_strategy_class='SizeTieredCompactionStrategy' AND compression_parameters:sstable_compression='SnappyCompressor'; cqlsh:goh_master Is it still possible to write and read data from the tables, they just can't be dropped, created or altered. With 1.1.1 we discovered that a rolling restart of the cluster used to fix the problem. This is no longer happening with 1.1.2, and the only way we found to come out from this situation is to bring down the cluster, delete everything in /var/lib/cassandra (everything inside commitlog, data and saved_caches), start over with a clean cluster and dump again new data. This happens to us very often, both on our 3 nodes cluster and on our test single-node cluster. We use Ubuntu LTS 12.04, with Sun Oracle Java 6. Is it something known ? This is a pretty ugly bug, to us. -- Marco Matarazzo == Hex Keep == You can learn more about a man in one hour of play than in one year of conversation.” - Plato
Re: CQL3 and column slices
On Tue, Jul 24, 2012 at 12:09 AM, Josep Blanquer blanq...@rightscale.com wrote: is there some way to express that in CQL3? something logically equivalent to SELECT * FROM bug_test WHERE a:b:c:d:e 1:1:1:1:2?? No, there isn't. Not currently at least. But feel free of course to open a ticket/request on https://issues.apache.org/jira/browse/CASSANDRA. I note that I would be curious to know the concrete use case you have for such type of queries. It would also help as an argument to add such facilities more quickly (or at all). Typically, we should support it in CQL3 because it was possible with thrift is definitively an argument, but a much weaker one without concrete examples of why it might be useful in the first place. -- Sylvain
Dropping counter mutations taking longer than rpc_timeout
Hey, Mutations taking longer than rpc_timeout will be dropped because coordinator won't be waiting for the coordinator and will return TimeoutException to the client, if it doesn't reach the consistency level [1]. In case of counters though, since counter mutations aren't idempotent, the client is not supposed to retry an increment on TimeoutException. So why doesn't a counter mutation gets processed regardless of rpc_timeout? Cheers, Omid [1] http://wiki.apache.org/cassandra/FAQ#dropped_messages
going back in time
One of the scenarios I have to have in account for a small Cassandra cluster (N=4) is restoring the data back in time. I will have full backups for 15 days, and it's possible that I will need to restore, let's say, the data from 10 days ago (don't ask, I'm not going into the details why). I know/suspect that restores by KeySpace/ColumnFamily are possible (we haven't tested yet), but I wonder if it would there any side effects of stopping all the nodes (assuming cut to the other KS are OK), restoring the SSTables, and starting the nodes one by one. So far we're thinking in RF=4, but also in RF=3 or at least RFN in the future. Is all this crazy talk or is it possible? Any side effects in the KS system and/or indexes to have in account? -- Marcos Dione SysAdmin Astek Sud-Est pour FT/TGPF/OPF/PORTAIL/DOP/HEBEX @ Marco Polo 04 97 12 62 45 - mdione@orange.com _ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, France Telecom - Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, France Telecom - Orange is not liable for messages that have been modified, changed or falsified. Thank you.
Re: going back in time
mdione@orange.com writes: One of the scenarios I have to have in account for a small Cassandra cluster (N=4) is restoring the data back in time. I will have full backups for 15 days, and it's possible that I will need to restore, let's say, the data from 10 days ago (don't ask, I'm not going into the details why). I know/suspect that restores by KeySpace/ColumnFamily are possible (we haven't tested yet), but I wonder if it would there any side effects of stopping all the nodes (assuming cut to the other KS are OK), restoring the SSTables, and starting the nodes one by one. So far we're thinking in RF=4, but also in RF=3 or at least RFN in the future. Is all this crazy talk or is it possible? Any side effects in the KS system and/or indexes to have in account? Snapshot and restores are great for point in time recovery. There's no particular side-effect if you're willing to accept the downtime. If you don't want to take your whole cluster offline you can use sstableloader as well.
RE: going back in time
De : Pierre-Yves Ritschard [mailto:p...@spootnik.org] Snapshot and restores are great for point in time recovery. There's no particular side-effect if you're willing to accept the downtime. Are you sure? The system KS has no book-keeping about the KSs/CFs? For instance, schema changes, etc? If you don't want to take your whole cluster offline you can use sstableloader as well. Sounds wonderful. _ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, France Telecom - Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, France Telecom - Orange is not liable for messages that have been modified, changed or falsified. Thank you.
Re: going back in time
mdione@orange.com writes: De : Pierre-Yves Ritschard [mailto:p...@spootnik.org] Snapshot and restores are great for point in time recovery. There's no particular side-effect if you're willing to accept the downtime. Are you sure? The system KS has no book-keeping about the KSs/CFs? For instance, schema changes, etc? Don't take my word for it, it's easy enough to fill up a small 3 node cluster and play with it. Here's a few more things that you should pay attention to: - If you change the schema you are obviously going to have to reconverge to the previous schema. - You want to avoid pending commitlog entries. If you want to load a full snapshot from scratch, the easiest route would be to drop the KS, recreate it with the expected schema and load your sstables from disk. If you don't want to take your whole cluster offline you can use sstableloader as well. Sounds wonderful.
Re: Migrating data from a 0.8.8 - 1.1.2 ring
On Mon, Jul 23, 2012 at 1:25 PM, Mike Heffner m...@librato.com wrote: Hi, We are migrating from a 0.8.8 ring to a 1.1.2 ring and we are noticing missing data post-migration. We use pre-built/configured AMIs so our preferred route is to leave our existing production 0.8.8 untouched and bring up a parallel 1.1.2 ring and migrate data into it. Data is written to the rings via batch processes so we can easily assure that both the existing and new rings will have the same data post migration. snip The steps we are taking are: 1. Bring up a 1.1.2 ring in the same AZ/data center configuration with tokens matching the corresponding nodes in the 0.8.8 ring. 2. Create the same keyspace on 1.1.2. 3. Create each CF in the keyspace on 1.1.2. 4. Flush each node of the 0.8.8 ring. 5. Rsync each non-compacted sstable from 0.8.8 to the corresponding node in 1.1.2. 6. Move each 0.8.8 sstable into the 1.1.2 directory structure by renaming the file to the /cassandra/data/keyspace/cf/keyspace-cf... format. For example, for the keyspace Metrics and CF epochs_60 we get: cassandra/data/Metrics/epochs_60/Metrics-epochs_60-g-941-Data.db. 7. On each 1.1.2 node run `nodetool -h localhost refresh Metrics CF` for each CF in the keyspace. We notice that storage load jumps accordingly. 8. On each 1.1.2 node run `nodetool -h localhost upgradesstables`. This takes awhile but appears to correctly rewrite each sstable in the new 1.1.x format. Storage load drops as sstables are compressed. So, after some further testing we've observed that the `upgradesstables` command is removing data from the sstables, leading to our missing data. We've repeated the steps above with several variations: WORKS refresh - scrub WORKS refresh - scrub - major compaction FAILS refresh - upgradesstables FAILS refresh - scrub - upgradesstables FAILS refresh - scrub - major compaction - upgradesstables So, we are able to migrate our test CFs from a 0.8.8 ring to a 1.1.2 ring when we use scrub. However, whenever we run an upgradesstables command the sstables are shrunk significantly and our tests show missing data: INFO [CompactionExecutor:4] 2012-07-24 04:27:36,837 CompactionTask.java (line 109) Compacting [SSTableReader(path='/raid0/cassandra/data/Metrics/metrics_900/Metrics-metrics_900-hd-51-Data.db')] INFO [CompactionExecutor:4] 2012-07-24 04:27:51,090 CompactionTask.java (line 221) Compacted to [/raid0/cassandra/data/Metrics/metrics_900/Metrics-metrics_900-hd-58-Data.db,]. 60,449,155 to 2,578,102 (~4% of original) bytes for 4,002 keys at 0.172562MB/s. Time: 14,248ms. Is there a scenario where upgradesstables would remove data that a scrub command wouldn't? According the documentation, it would appear that the scrub command is actually more destructive than upgradesstables in terms of removing data. On 1.1.x, upgradesstables is the documented upgrade command over a scrub. The keyspace is defined as: Keyspace: Metrics: Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy Durable Writes: true Options: [us-east:3] And the column family above defined as: ColumnFamily: metrics_900 Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type Default column value validator: org.apache.cassandra.db.marshal.BytesType Columns sorted by: org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.LongType,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type) GC grace seconds: 0 Compaction min/max thresholds: 4/32 Read repair chance: 0.1 DC Local Read repair chance: 0.0 Replicate on write: true Caching: KEYS_ONLY Bloom Filter FP chance: default Built indexes: [] Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy Compression Options: sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor All rows have a TTL of 30 days, so it's possible that, along with the gc_grace=0, a small number would be removed during a compaction/scrub/upgradesstables step. However, the majority should still be kept as their TTL has not expired yet. We are still experimenting to see under what conditions this happens, but I thought I'd send out some more info in case there is something clearly wrong we're doing here. Thanks, Mike -- Mike Heffner m...@librato.com Librato, Inc.
Re: How to Optimizing Cassandra Updates -( Use of memtables)
I am guessing you already asked if they could give you three 100MB files instead? so you could parallelize the operation. or maybe your task doesn't lend itself well to that. Dean On Tue, Jul 24, 2012 at 10:01 AM, Pushpalanka Jayawardhana pushpalankaj...@gmail.com wrote: Hi all, I am dealing with a scenario where I receive a .csv file in every 10mins intervals which is of average 300MB. I need to update a Cassandra cluster according to the received data from .csv file, after some processing functions. Current approach is keeping a Hashmap in memory, updating it from the processed .csv files gathering the data to be updated(This data is mostly a update on a counter). Then periodically(let's say in 2s intervals) the values in the Hashmap are read one by one again and updated in Cassandra. I have tried generating sstables and loading data as batches via sstableloader, but it is lot slower than the requirement that I need near real time results. Are there any hints on what I can try out? Is there any possibility to do something like directly updating values in a memtable (Instead of using Hashmap) and sending to Cassandra than loading via sstables? -- Pushpalanka Jayawardhana
Re: CQL3 and column slices
Thank Sylvain, The main argument for this is pagination. Let me try to explain the use cases, and compare it to RDBMS for better illustration: 1- Right now, Cassandra doesn't stream the requests, so large resultsets are a royal pain in the neck to deal with. I.e., if I have a range_slice, or even a slice query that cuts across 1 million columns...I have to completely eat it all in the client receiving the response. That is, I'll need to store 1 million results in the client no matter what, and that can be quite prohibitive. 2- In an effort to alleviate that, one can be smarter in the client and play the pagination game...i.e., start slicing at some column and get the next N results, then start the slice at the last column seen and get N moreetc. That results in many more queries from the smart client, but at least it would allow you to handle large result sets. (That's where the need for the CQL query in my original email was about). 3- There's another important factor related to this problem in my opinion: the LIMIT clause in Cassandra (in both CQL or Thrift) is a required field. What I mean by required is that cassandra requires an explicit count to operate underneath. So it is really different from RDBMS' semantics where no LIMIT means you'll get all the results (instead of the high, yet still bound count of 10K or 20K max resultset row cassandra enforces by defaul)...and I cannot tell you how many problems we've had with developers forgetting about these default counts in queries, and realizing that some had results truncated because of that...in my mind, LIMIT should be to only used restrict results...queries with no LIMIT should always return all results (much like RDBMS)...otherwise the query seems the same but it is semantically different. So, all in all I think that the main problem/use case I'm facing is that Cassandra cannot stream resultsets. If it did, I believe that the need for my pagination use case would basically disappear, since it'd be the transport/client that would throttle how many results are stored in the client buffer at any point time. At the same time, I believe that with a streaming protocol you could simply change Cassandra internals to have infinite default limits...since there wouldn't be no reason to stop scanning (unless an explicit LIMIT clause was specified by the client). That would give you not only the SQL-equivalent syntax, but also the equivalent semantics of most current DBs. I hope that makes sense. That being said, are there any plans for streaming results? I believe that without that (and especially with the new CQL restrictions) it make much more difficult to use Cassandra with wide rows and large resultsets (which, in my mind is one of its sweet spots ). I believe that if that doesn't happen it would a) force the clients to be built in a much more complex and inefficient way to handle wide rows or b) will force users to use different, less efficient datamodels for their data. Both seem bad propositions to me, as they wouldn't be taking advantage of Cassandra's power, therefore diminishing its value. Cheers, Josep M. On Tue, Jul 24, 2012 at 3:11 AM, Sylvain Lebresne sylv...@datastax.comwrote: On Tue, Jul 24, 2012 at 12:09 AM, Josep Blanquer blanq...@rightscale.com wrote: is there some way to express that in CQL3? something logically equivalent to SELECT * FROM bug_test WHERE a:b:c:d:e 1:1:1:1:2?? No, there isn't. Not currently at least. But feel free of course to open a ticket/request on https://issues.apache.org/jira/browse/CASSANDRA. I note that I would be curious to know the concrete use case you have for such type of queries. It would also help as an argument to add such facilities more quickly (or at all). Typically, we should support it in CQL3 because it was possible with thrift is definitively an argument, but a much weaker one without concrete examples of why it might be useful in the first place. -- Sylvain
Re: Bringing a dead node back up after fixing hardware issues
On Mon, Jul 23, 2012 at 10:24 PM, Eran Chinthaka Withana eran.chinth...@gmail.com wrote: Thanks Brandon for the answer (and I didn't know driftx = Brandon Williams. Thanks for your awesome support in Cassandra IRC) Thanks :) Increasing CL is tricky for us for now, as our RF on that datacenter is 2 and CL is set to ONE. If we make the CL to be LOCAL_QUORUM, then, if a node goes down we will have trouble. I will try to increase the RF to 3 in that data center and set the CL to LOCAL_QUORUM if nothing works out. Increasing the RF and and using LOCAL_QUORUM is the right thing in this case. By choosing CL.ONE, you are agreeing that read misses are acceptable. If they are not, then adjusting your RF/CL is the only path. About decommissioning, if the node goes down. There is no way of knowing running that command on that node, right? IIUC, decommissioning should be run on a node that needs to be decommissioned. Well, decom and removetoken are both ways of removing a node. The former is for a live node, and the latter is for a dead node. Since your node was actually alive you could have decommissioned it. Coming back to the original question, without touching the CL, can we bring back a dead node (after fixing it) and somehow tell Cassandra that the node is backup and do not send read requests until it gets all the data? No, as I said, you are accepting this behavior by choosing CL.ONE. -Brandon
Re: TimedOutException caused by
aaron morton aaron at thelastpickle.com writes: The cluster is running into GC problems and this is slowing it down under the stress test. When it slows down one or more of the nodes is failing to perform the write within rpc_timeout . This causes the coordinator of the write to raise the TimedOutException. You options are: * allocate more memory * ease back on the stress test. * work as a CL QUORUM so that one node failing does result in the error. see also http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts Cheers - Aaron Morton Freelance Developer at aaronmorton http://www.thelastpickle.com On 28/05/2012, at 12:59 PM, Jason Tang wrote: Hi My system is 4 nodes 64 bit cassandra cluster, 6G big per node,default configuration (which means 1/3 heap for memtable), replicate number 3, write all, read one. When I run stress load testing, I got this TimedOutException, and some operation failed, and all traffic hang for a while. And when I have 1G memory 32 bit cassandra on standalone model, I didn't find so frequently Stop the world behavior. So I wonder what kind of operation will hang the cassandra system. How to collect information for tuning. From the system log and document, I guess there are three type operations: 1) Flush memtable when meet max size 2) Compact SSTable (why?) 3) Java GC system.log: INFO [main] 2012-05-25 16:12:17,054 ColumnFamilyStore.java (line 688) Enqueuing flush of Memtable-LocationInfo at 1229893321(53/66 serialized/live bytes, 2 ops) INFO [FlushWriter:1] 2012-05-25 16:12:17,054 Memtable.java (line 239) Writing Memtable-LocationInfo at 1229893321(53/66 serialized/live bytes, 2 ops) INFO [FlushWriter:1] 2012-05-25 16:12:17,166 Memtable.java (line 275) Completed flushing /var/proclog/raw/cassandra/data/system/LocationInfo-hb-2- Data.db (163 bytes) ... INFO [CompactionExecutor:441] 2012-05-28 08:02:55,345 CompactionTask.java (line 112) Compacting [SSTableReader(path='/var/proclog/raw/cassandra/data/myks/queue-hb-41-Data.db'), SSTableReader(path='/var/proclog/raw/cassandra/data/ myks /queue-hb-32-Data.db'), SSTableReader(path='/var/proclog/raw/cassandra/data/ myks /queue-hb-37-Data.db'), SSTableReader(path='/var/proclog/raw/cassandra/data/ myks /queue-hb-53-Data.db')] ... WARN [ScheduledTasks:1] 2012-05-28 08:02:26,619 GCInspector.java (line 146) Heap is 0.7993011015621736 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically INFO [ScheduledTasks:1] 2012-05-28 08:02:54,980 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 728 ms for 2 collections, 3594946600 used; max is 6274678784 INFO [ScheduledTasks:1] 2012-05-28 08:41:34,030 GCInspector.java (line 123) GC for ParNew: 1668 ms for 1 collections, 4171503448 used; max is 6274678784 INFO [ScheduledTasks:1] 2012-05-28 08:41:48,978 GCInspector.java (line 123) GC for ParNew: 1087 ms for 1 collections, 2623067496 used; max is 6274678784 INFO [ScheduledTasks:1] 2012-05-28 08:41:48,987 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 3198 ms for 3 collections, 2623361280 used; max is 6274678784 Timeout Exception: Caused by: org.apache.cassandra.thrift.TimedOutException: null at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:19 495) ~[na:na] at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:10 35) ~[na:na] at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:1009) ~ [na:na] at me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceIm pl.java:95) ~[na:na] ... 64 common frames omitted BRs //Tang Weiqiang Hi, I've been running into the same type of issue, but on a single machine with CL ONE. Also a custom insertion stress utility. What would I need to do to address the timeouts? By allocate more memory do you mean increase heap size in the environment conf file? Thanks, J.
Fwd: Call for Papers for ApacheCon Europe 2012 now open!
There are Big Data and NoSQL tracks where Cassandra talks would be appropriate. -- Forwarded message -- From: Nick Burch nick.bu...@alfresco.com Date: Thu, Jul 19, 2012 at 1:14 PM Subject: Call for Papers for ApacheCon Europe 2012 now open! To: committ...@apache.org Hi All We're pleased to announce that the Call for Papers for ApacheCon Europe 2012 is finally open! (For those who don't already know, ApacheCon Europe will be taking place between the 5th and the 9th of November this year, in Sinsheim, Germany.) If you'd like to submit a talk proposal, please visit the conference website at http://www.apachecon.eu/ and sign up for a new account. Once you've signed up, use your dashboard to enter your speaker bio, then submit your talk proposal(s). There's more information on the CFP page on the conference website. We welcome talk proposals from all projects, from right across the bredth of projects at the foundation! To make things easier for talk selection and scheduling, we'd ask that you tag your proposal with the track that it most closely fits within. The details of the tracks, and what projects they expect to cover, are available at http://www.apachecon.eu/tracks/. (If your project/group of projects was intending to submit a track, and missed the deadline, then please get in touch with us on apachecon-disc...@apache.org straight away, so we can work out if it's possible to squeeze you in...) The CFP will close on Friday 3rd August, so you've a little over weeks to send in your talk proposal. Don't put it off! We'll look forward to seeing some great ones shortly! Thanks Nick (On behalf of the Conferences committee) -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com