Re: Consistency Level One Question
Hi Graham, On 21/02/14 07:54, graham sanderson wrote: Note also; that reading at ONE there will be no read repair, since the coordinator does not know that another replica has stale data (remember at ONE, basically only one node is asked for the answer). I don't think this is right. My understanding is that while only one node will be sent a direct read request, all other replicas will (not on every query - it depends on the value of read_repair_chance) get a background read repair request. You can test this experimentally using cqlsh and turning tracing on: issue a read request many times. Most of the time you will see that the coordinator sends a message to one node, but from time to time (depending on read_repair_chance) you will see it sending messages to many nodes. Best wishes, Duncan. In practice for our use cases, we always write at LOCAL_QUORUM (failing the whole update if that doesn’t work - stale data is OK if 1 node is down), and we read at LOCAL_QUORUM, but (because stale data is better than no data), we will fall back per read request to LOCAL_ONE if we detect that there were insufficient nodes - this lets us cope with 2 down nodes in a 3 replica environment (or more if the nodes are not consecutive in the ring). On Feb 20, 2014, at 11:21 PM, Drew Kutcharian d...@venarc.com wrote: Hi Guys, I wanted to get some clarification on what happens when you write and read at consistency level 1. Say I have a keyspace with replication factor of 3 and a table which will contain write-once/read-only wide rows. If I write at consistency level 1 and the write happens on node A and I read back at consistency level 1 from another node other than A, say B, will C* return “not found” or will it trigger a read-repair before responding? In addition, what’s the best consistency level for reading/writing write-once/read-only wide rows? Thanks, Drew
Re: Intermittent long application pauses on nodes
What happens if a ParNew is triggered while CMS is running? Will it wait for the CMS to finish? If so, that would be the eplanation of our long ParNew above. Regards, Joel 2014-02-20 16:29 GMT+01:00 Joel Samuelsson samuelsson.j...@gmail.com: Hi Frank, We got a (quite) long GC pause today on 2.0.5: INFO [ScheduledTasks:1] 2014-02-20 13:51:14,528 GCInspector.java (line 116) GC for ParNew: 1627 ms for 1 collections, 425562984 used; max is 4253024256 INFO [ScheduledTasks:1] 2014-02-20 13:51:14,542 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 3703 ms for 2 collections, 434394920 used; max is 4253024256 Unfortunately it's a production cluster so I have no additional GC-logging enabled. This may be an indication that upgrading is not the (complete) solution. Regards, Joel 2014-02-17 13:41 GMT+01:00 Benedict Elliott Smith belliottsm...@datastax.com: Hi Ondrej, It's possible you were hit by the problems in this thread before, but it looks potentially like you may have other issues. Of course it may be that on G1 you have one issue and CMS another, but 27s is extreme even for G1, so it seems unlikely. If you're hitting these pause times in CMS and you get some more output from the safepoint tracing, please do contribute as I would love to get to the bottom of that, however is it possible you're experiencing paging activity? Have you made certain the VM memory is locked (and preferably that paging is entirely disabled, as the bloom filters and other memory won't be locked, although that shouldn't cause pauses during GC) Note that mmapped file accesses and other native work shouldn't in anyway inhibit GC activity or other safepoint pause times, unless there's a bug in the VM. These threads will simply enter a safepoint as they return to the VM execution context, and are considered safe for the duration they are outside. On 17 February 2014 12:30, Ondřej Černoš cern...@gmail.com wrote: Hi, we tried to switch to G1 because we observed this behaviour on CMS too (27 seconds pause in G1 is quite an advise not to use it). Pauses with CMS were not easily traceable - JVM stopped even without stop-the-world pause scheduled (defragmentation, remarking). We thought the go-to-safepoint waiting time might have been involved (we saw waiting for safepoint resolution) - especially because access to mmpaped files is not preemptive, afaik, but it doesn't explain tens of seconds waiting times, even slow IO should read our sstables into memory in much less time. We switched to G1 out of desperation - and to try different code paths - not that we'd thought it was a great idea. So I think we were hit by the problem discussed in this thread, just the G1 report wasn't very clear, sorry. regards, ondrej On Mon, Feb 17, 2014 at 11:45 AM, Benedict Elliott Smith belliottsm...@datastax.com wrote: Ondrej, It seems like your issue is much less difficult to diagnose: your collection times are long. At least, the pause you printed the time for is all attributable to the G1 pause. Note that G1 has not generally performed well with Cassandra in our testing. There are a number of changes going in soon that may change that, but for the time being it is advisable to stick with CMS. With tuning you can no doubt bring your pauses down considerably. On 17 February 2014 10:17, Ondřej Černoš cern...@gmail.com wrote: Hi all, we are seeing the same kind of long pauses in Cassandra. We tried to switch CMS to G1 without positive result. The stress test is read heavy, 2 datacenters, 6 nodes, 400reqs/sec on one datacenter. We see spikes in latency on 99.99 percentil and higher, caused by threads being stopped in JVM. The GC in G1 looks like this: {Heap before GC invocations=4073 (full 1): garbage-first heap total 8388608K, used 3602914K [0x0005f5c0, 0x0007f5c0, 0x0007f5c0) region size 4096K, 142 young (581632K), 11 survivors (45056K) compacting perm gen total 28672K, used 27428K [0x0007f5c0, 0x0007f780, 0x0008) the space 28672K, 95% used [0x0007f5c0, 0x0007f76c9108, 0x0007f76c9200, 0x0007f780) No shared spaces configured. 2014-02-17T04:44:16.385+0100: 222346.218: [GC pause (G1 Evacuation Pause) (young) Desired survivor size 37748736 bytes, new threshold 15 (max 15) - age 1: 17213632 bytes, 17213632 total - age 2: 19391208 bytes, 36604840 total , 0.1664300 secs] [Parallel Time: 163.9 ms, GC Workers: 2] [GC Worker Start (ms): Min: 222346218.3, Avg: 222346218.3, Max: 222346218.3, Diff: 0.0] [Ext Root Scanning (ms): Min: 6.0, Avg: 6.9, Max: 7.7, Diff: 1.7, Sum: 13.7] [Update RS (ms): Min: 20.4, Avg: 21.3, Max: 22.1, Diff: 1.7, Sum: 42.6] [Processed Buffers: Min: 49, Avg: 60.0, Max: 71, Diff: 22, Sum: 120] [Scan RS (ms): Min: 23.2, Avg: 23.2, Max: 23.3, Diff: 0.1, Sum: 46.5] [Object Copy (ms): Min: 112.3,
TSocket read 0 bytes cqlsh error
Hi, I'm getting a TSocket read 0 bytes error in cqlsh when doing a SELECT * FROM tbl. Anyone else experienced this? It's a single node cluster running locally. I've tried doing a nodetool cleanup but that didn't solve the issue. Version information: INFO [main] 2014-02-21 10:20:25,224 StorageService.java (line 487) Cassandra version: 2.0.5 INFO [main] 2014-02-21 10:20:25,224 StorageService.java (line 488) Thrift API version: 19.39.0 INFO [main] 2014-02-21 10:20:25,227 StorageService.java (line 489) CQL supported versions: 2.0.0,3.1.4 (default: 3.1.4) I get this error in the cassandra logs: ERROR [Thrift:1] 2014-02-21 10:21:03,963 CustomTThreadPoolServer.java (line 212) Error occurred during processing of message. java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:267) at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:55) at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:64) at org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:130) at org.apache.cassandra.cql3.statements.SelectStatement.processColumnFamily(SelectStatement.java:874) at org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:854) at org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:222) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:202) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:172) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:58) at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:188) at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:222) at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:212) at org.apache.cassandra.thrift.CassandraServer.execute_cql3_query(CassandraServer.java:1958) at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4486) at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4470) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:194) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744)
Re: Performance problem with large wide row inserts using CQL
On Thu, Feb 20, 2014 at 10:49 PM, Rüdiger Klaehn rkla...@gmail.com wrote: Hi Sylvain, I applied the patch to the cassandra-2.0 branch (this required some manual work since I could not figure out which commit it was supposed to apply for, and it did not apply to the head of cassandra-2.0). Yeah, some commit yesterday made the patch not apply cleanly anymore. In any case, It's not committed to the cassandra-2.0 branch and will be part of 2.0.6. The benchmark now runs in pretty much identical time to the thrift based benchmark. ~30s for 1000 inserts of 1 key/value pairs each. Great work! Glad that it helped. I still have some questions regarding the mapping. Please bear with me if these are stupid questions. I am quite new to Cassandra. The basic cassandra data model for a keyspace is something like this, right? SortedMapbyte[], SortedMapbyte[], PairLong, byte[] ^ row key. determines which server(s) the rest is stored on ^ column key ^ timestamp (latest one wins) ^ value (can be size 0) It's a reasonable way to think of how things are stored internally, yes. Though as DuyHai mentioned, the first map is really sorting by token and in general that means you use mostly the sorting of the second map concretely. So if I have a table like the one in my benchmark (using blobs) CREATE TABLE IF NOT EXISTS test.wide ( time blob, name blob, value blob, PRIMARY KEY (time,name)) WITH COMPACT STORAGE From reading http://www.datastax.com/dev/blog/thrift-to-cql3 it seems that - time maps to the row key and name maps to the column key without any overhead - value directly maps to value in the model above without any prefix is that correct, or is there some overhead involved in CQL over the raw model as described above? If so, where exactly? That's correct. For completeness sake, if you were to remove the COMPACT STORAGE, there would be some overhead in how it maps to the underlying column key, but that overhead would buy you much more flexibility in how you could evolve this table schema (you could add more CQL columns later if needs be, have collections or have static columns following CASSANDRA-6561 that comes in 2.0.6; none of which you can have with COMPACT STORAGE). Note that it's perfectly fine to use COMPACT STORAGE if you know you don't and won't need the additional flexibility, but I generally advise people to actually check first that using COMPACT STORAGE does make a concrete and meaningful difference for their use case (be careful with premature optimization really). The difference in performance/storage space used is not always all that noticeable in practice (note that I didn't said it's never noticeable!) and is narrowing with Cassandra evolution (it's not impossible at all that we will get to never noticeable someday, while COMPACT STORAGE tables will never get the flexibility of normal tables because there is backwards compatibility issues). It's also my experience that more often that not (again, not always), flexibility turns out to be more important that squeezing every bit of performance you can (if it comes at the price of that flexibility that is) in the long run. Do what you want of that advise :) -- Sylvain kind regards and many thanks for your help, Rüdiger On Thu, Feb 20, 2014 at 8:36 AM, Sylvain Lebresne sylv...@datastax.comwrote: On Wed, Feb 19, 2014 at 9:38 PM, Rüdiger Klaehn rkla...@gmail.comwrote: I have cloned the cassandra repo, applied the patch, and built it. But when I want to run the bechmark I get an exception. See below. I tried with a non-managed dependency to cassandra-driver-core-2.0.0-rc3-SNAPSHOT-jar-with-dependencies.jar, which I compiled from source because I read that that might help. But that did not make a difference. So currently I don't know how to give the patch a try. Any ideas? cheers, Rüdiger Exception in thread main java.lang.IllegalArgumentException: replicate_on_write is not a column defined in this metadata at com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273) at com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279) at com.datastax.driver.core.Row.getBool(Row.java:117) at com.datastax.driver.core.TableMetadata$Options.init(TableMetadata.java:474) at com.datastax.driver.core.TableMetadata.build(TableMetadata.java:107) at com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:128) at com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:89) at com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:259) at com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:214) at
Re: TSocket read 0 bytes cqlsh error
Looks like the problem is caused by: https://issues.apache.org/jira/browse/CASSANDRA-5202 On Fri, Feb 21, 2014 at 10:26 AM, Kasper Middelboe Petersen kas...@sybogames.com wrote: Hi, I'm getting a TSocket read 0 bytes error in cqlsh when doing a SELECT * FROM tbl. Anyone else experienced this? It's a single node cluster running locally. I've tried doing a nodetool cleanup but that didn't solve the issue. Version information: INFO [main] 2014-02-21 10:20:25,224 StorageService.java (line 487) Cassandra version: 2.0.5 INFO [main] 2014-02-21 10:20:25,224 StorageService.java (line 488) Thrift API version: 19.39.0 INFO [main] 2014-02-21 10:20:25,227 StorageService.java (line 489) CQL supported versions: 2.0.0,3.1.4 (default: 3.1.4) I get this error in the cassandra logs: ERROR [Thrift:1] 2014-02-21 10:21:03,963 CustomTThreadPoolServer.java (line 212) Error occurred during processing of message. java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:267) at org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:55) at org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:64) at org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:130) at org.apache.cassandra.cql3.statements.SelectStatement.processColumnFamily(SelectStatement.java:874) at org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:854) at org.apache.cassandra.cql3.statements.SelectStatement.processResults(SelectStatement.java:222) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:202) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:172) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:58) at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:188) at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:222) at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:212) at org.apache.cassandra.thrift.CassandraServer.execute_cql3_query(CassandraServer.java:1958) at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4486) at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4470) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:194) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744)
Re: Consistency Level One Question
My bad; should have checked the code: /** * This function executes local and remote reads, and blocks for the results: * * 1. Get the replica locations, sorted by response time according to the snitch * 2. Send a data request to the closest replica, and digest requests to either *a) all the replicas, if read repair is enabled *b) the closest R-1 replicas, where R is the number required to satisfy the ConsistencyLevel * 3. Wait for a response from R replicas * 4. If the digests (if any) match the data return the data * 5. else carry out read repair by getting data from all the nodes. */ On Feb 21, 2014, at 3:10 AM, Duncan Sands duncan.sa...@gmail.com wrote: Hi Graham, On 21/02/14 07:54, graham sanderson wrote: Note also; that reading at ONE there will be no read repair, since the coordinator does not know that another replica has stale data (remember at ONE, basically only one node is asked for the answer). I don't think this is right. My understanding is that while only one node will be sent a direct read request, all other replicas will (not on every query - it depends on the value of read_repair_chance) get a background read repair request. You can test this experimentally using cqlsh and turning tracing on: issue a read request many times. Most of the time you will see that the coordinator sends a message to one node, but from time to time (depending on read_repair_chance) you will see it sending messages to many nodes. Best wishes, Duncan. In practice for our use cases, we always write at LOCAL_QUORUM (failing the whole update if that doesn’t work - stale data is OK if 1 node is down), and we read at LOCAL_QUORUM, but (because stale data is better than no data), we will fall back per read request to LOCAL_ONE if we detect that there were insufficient nodes - this lets us cope with 2 down nodes in a 3 replica environment (or more if the nodes are not consecutive in the ring). On Feb 20, 2014, at 11:21 PM, Drew Kutcharian d...@venarc.com wrote: Hi Guys, I wanted to get some clarification on what happens when you write and read at consistency level 1. Say I have a keyspace with replication factor of 3 and a table which will contain write-once/read-only wide rows. If I write at consistency level 1 and the write happens on node A and I read back at consistency level 1 from another node other than A, say B, will C* return “not found” or will it trigger a read-repair before responding? In addition, what’s the best consistency level for reading/writing write-once/read-only wide rows? Thanks, Drew smime.p7s Description: S/MIME cryptographic signature
Re: Performance problem with large wide row inserts using CQL
The main issue is that cassandra has two of everything. Two access apis, two meta data systems, and two groups of users. Those groups of users using the original systems thrift, cfmetadata, and following the advice of three years ago have been labled obsolete (did you ever see that twighlight zone episode?). If you suggest a thrift only feature get ready to fight. People seem oblivious to the fact that you may have a 38 node cluster with 12 tb of data under compact storage, and that you can't just snap your fingers and adopt whatever new system to pack data that someone comes up with. Earlier in the thread I detailed a potential way to store collection like things in compact storage. You would just assume that with all the collective brain power in the project, that somehow, some way collections could make their way into compact storage. Or the new language would offer similiar features regardless of storage chosen (say like innodb and mariadb). The shelf life of codd's normal form has been what? 30 or 40 years and still going strong? Im always rather pisswd that 3 years after i start using cassandra everything has changed, that im not the future, and that no one is really interested in supporting anything i used the datastore for. On Friday, February 21, 2014, Sylvain Lebresne sylv...@datastax.com wrote: On Thu, Feb 20, 2014 at 10:49 PM, Rüdiger Klaehn rkla...@gmail.com wrote: Hi Sylvain, I applied the patch to the cassandra-2.0 branch (this required some manual work since I could not figure out which commit it was supposed to apply for, and it did not apply to the head of cassandra-2.0). Yeah, some commit yesterday made the patch not apply cleanly anymore. In any case, It's not committed to the cassandra-2.0 branch and will be part of 2.0.6. The benchmark now runs in pretty much identical time to the thrift based benchmark. ~30s for 1000 inserts of 1 key/value pairs each. Great work! Glad that it helped. I still have some questions regarding the mapping. Please bear with me if these are stupid questions. I am quite new to Cassandra. The basic cassandra data model for a keyspace is something like this, right? SortedMapbyte[], SortedMapbyte[], PairLong, byte[] ^ row key. determines which server(s) the rest is stored on ^ column key ^ timestamp (latest one wins) ^ value (can be size 0) It's a reasonable way to think of how things are stored internally, yes. Though as DuyHai mentioned, the first map is really sorting by token and in general that means you use mostly the sorting of the second map concretely. So if I have a table like the one in my benchmark (using blobs) CREATE TABLE IF NOT EXISTS test.wide ( time blob, name blob, value blob, PRIMARY KEY (time,name)) WITH COMPACT STORAGE From reading http://www.datastax.com/dev/blog/thrift-to-cql3 it seems that - time maps to the row key and name maps to the column key without any overhead - value directly maps to value in the model above without any prefix is that correct, or is there some overhead involved in CQL over the raw model as described above? If so, where exactly? That's correct. For completeness sake, if you were to remove the COMPACT STORAGE, there would be some overhead in how it maps to the underlying column key, but that overhead would buy you much more flexibility in how you could evolve this table schema (you could add more CQL columns later if needs be, have collections or have static columns following CASSANDRA-6561 that comes in 2.0.6; none of which you can have with COMPACT STORAGE). Note that it's perfectly fine to use COMPACT STORAGE if you know you don't and won't need the additional flexibility, but I generally advise people to actually check first that using COMPACT STORAGE does make a concrete and meaningful difference for their use case (be careful with premature optimization really). The difference in performance/storage space used is not always all that noticeable in practice (note that I didn't said it's never noticeable!) and is narrowing with Cassandra evolution (it's not impossible at all that we will get to never noticeable someday, while COMPACT STORAGE tables will never get the flexibility of normal tables because there is backwards compatibility issues). It's also my experience that more often that not (again, not always), flexibility turns out to be more important that squeezing every bit of performance you can (if it comes at the price of that flexibility that is) in the long run. Do what you want of that advise :) -- Sylvain kind regards and many thanks for your help, Rüdiger On Thu, Feb 20, 2014 at 8:36 AM, Sylvain Lebresne sylv...@datastax.com wrote: On Wed, Feb 19, 2014 at 9:38 PM, Rüdiger Klaehn rkla...@gmail.com wrote: I have cloned the cassandra repo, applied the patch, and built it. But when I want to run the bechmark
Re: Consistency Level One Question
Thanks, this clears things up. On Feb 21, 2014, at 6:47 AM, Edward Capriolo edlinuxg...@gmail.com wrote: When you write at one, as soon as one node acknowledges the write the ack is returned to the client. This means if you quickly read from aome other node 1)you may get the result because by the time the read is processed the data may be on that node 2)the node you read from may proxy the request to the node woth the data or not 3)you may get a column not found because the read might hit a node where the data does not exist yet. Generally even at level one the replication is fast. I have done an experiment on what you are asking. Write.one read from another as soon as client gets an ack. Most of the time the data is replicated by the time the second requeat is received. However most of the time is not a guarentee. If the nodes are geographically separate who is to say if the firat request and the second route around the internet a different way and the second action arrives on a node before the first. That is eventual consistency for you. On Friday, February 21, 2014, graham sanderson gra...@vast.com wrote: My bad; should have checked the code: /** * This function executes local and remote reads, and blocks for the results: * * 1. Get the replica locations, sorted by response time according to the snitch * 2. Send a data request to the closest replica, and digest requests to either *a) all the replicas, if read repair is enabled *b) the closest R-1 replicas, where R is the number required to satisfy the ConsistencyLevel * 3. Wait for a response from R replicas * 4. If the digests (if any) match the data return the data * 5. else carry out read repair by getting data from all the nodes. */ On Feb 21, 2014, at 3:10 AM, Duncan Sands duncan.sa...@gmail.com wrote: Hi Graham, On 21/02/14 07:54, graham sanderson wrote: Note also; that reading at ONE there will be no read repair, since the coordinator does not know that another replica has stale data (remember at ONE, basically only one node is asked for the answer). I don't think this is right. My understanding is that while only one node will be sent a direct read request, all other replicas will (not on every query - it depends on the value of read_repair_chance) get a background read repair request. You can test this experimentally using cqlsh and turning tracing on: issue a read request many times. Most of the time you will see that the coordinator sends a message to one node, but from time to time (depending on read_repair_chance) you will see it sending messages to many nodes. Best wishes, Duncan. In practice for our use cases, we always write at LOCAL_QUORUM (failing the whole update if that doesn’t work - stale data is OK if 1 node is down), and we read at LOCAL_QUORUM, but (because stale data is better than no data), we will fall back per read request to LOCAL_ONE if we detect that there were insufficient nodes - this lets us cope with 2 down nodes in a 3 replica environment (or more if the nodes are not consecutive in the ring). On Feb 20, 2014, at 11:21 PM, Drew Kutcharian d...@venarc.com wrote: Hi Guys, I wanted to get some clarification on what happens when you write and read at consistency level 1. Say I have a keyspace with replication factor of 3 and a table which will contain write-once/read-only wide rows. If I write at consistency level 1 and the write happens on node A and I read back at consistency level 1 from another node other than A, say B, will C* return “not found” or will it trigger a read-repair before responding? In addition, what’s the best consistency level for reading/writing write-once/read-only wide rows? Thanks, Drew -- Sorry this was sent from mobile. Will do less grammar and spell check than usual.
Re: How do you remote backup your cassandra nodes ?
I'm wanting to back up my data to amazon S3. Can anyone please tell about which directories should I copy to the remote location for backup so as to restore the entire Cassandra data in the event of any failures? On Fri, Feb 21, 2014 at 1:43 AM, user 01 user...@gmail.com wrote: What is your strategy/tools set to backup your Cassandra nodes, apart from from cluster replication/ snapshots within cluster?
Re: How do you remote backup your cassandra nodes ?
You might want to use the Priam tool for backups. https://github.com/Netflix/Priam If you don't want to use Priam, you should read this Datastax entry on backup and restore. http://www.datastax.com/docs/1.0/operations/backup_restore On 02/21/2014 11:19 AM, user 01 wrote: I'm wanting to back up my data to amazon S3. Can anyone please tell about which directories should I copy to the remote location for backup so as to restore the entire Cassandra data in the event of any failures? On Fri, Feb 21, 2014 at 1:43 AM, user 01 user...@gmail.com mailto:user...@gmail.com wrote: What is your strategy/tools set to backup your Cassandra nodes, apart from from cluster replication/ snapshots within cluster? -- *Colin Blower* /Software Engineer/ Barracuda Networks Inc. === Find out how eSigning generates significant financial benefit. Read the Barracuda SignNow ROI whitepaper at https://signnow.com/l/business/esignature_roi
Re: How do you remote backup your cassandra nodes ?
On Thu, Feb 20, 2014 at 12:13 PM, user 01 user...@gmail.com wrote: What is your strategy/tools set to backup your Cassandra nodes, apart from from cluster replication/ snapshots within cluster? https://github.com/synack/tablesnap =Rob
Re: Performance problem with large wide row inserts using CQL
Sylvain, I am trying ccm to install and it does from source directory, I have tried 2.0.4/3/2/1 and 1.2.15, all of them are reporting the same failure after 127 records inserted. I am using 1.56.34 and 1.56.38 client both reports the same issue. Is something wrong with the client or the server, none of the server logs show any error. Thanks, Yogi On Wed, Feb 19, 2014 at 11:36 PM, Sylvain Lebresne sylv...@datastax.comwrote: On Wed, Feb 19, 2014 at 9:38 PM, Rüdiger Klaehn rkla...@gmail.com wrote: I have cloned the cassandra repo, applied the patch, and built it. But when I want to run the bechmark I get an exception. See below. I tried with a non-managed dependency to cassandra-driver-core-2.0.0-rc3-SNAPSHOT-jar-with-dependencies.jar, which I compiled from source because I read that that might help. But that did not make a difference. So currently I don't know how to give the patch a try. Any ideas? cheers, Rüdiger Exception in thread main java.lang.IllegalArgumentException: replicate_on_write is not a column defined in this metadata at com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273) at com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279) at com.datastax.driver.core.Row.getBool(Row.java:117) at com.datastax.driver.core.TableMetadata$Options.init(TableMetadata.java:474) at com.datastax.driver.core.TableMetadata.build(TableMetadata.java:107) at com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:128) at com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:89) at com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:259) at com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:214) at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:161) at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:77) at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:890) at com.datastax.driver.core.Cluster$Manager.newSession(Cluster.java:910) at com.datastax.driver.core.Cluster$Manager.access$200(Cluster.java:806) at com.datastax.driver.core.Cluster.connect(Cluster.java:158) at cassandra.CassandraTestMinimized$delayedInit$body.apply(CassandraTestMinimized.scala:31) at scala.Function0$class.apply$mcV$sp(Function0.scala:40) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) at scala.App$$anonfun$main$1.apply(App.scala:71) at scala.App$$anonfun$main$1.apply(App.scala:71) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32) at scala.App$class.main(App.scala:71) at cassandra.CassandraTestMinimized$.main(CassandraTestMinimized.scala:5) at cassandra.CassandraTestMinimized.main(CassandraTestMinimized.scala) I believe you've tried the cassandra trunk branch? trunk is basically the future Cassandra 2.1 and the driver is currently unhappy because the replicate_on_write option has been removed in that version. I'm supposed to have fixed that on the driver 2.0 branch like 2 days ago so maybe you're also using a slightly old version of the driver sources in there? Or maybe I've screwed up my fix, I'll double check. But anyway, it would be overall simpler to test with the cassandra-2.0 branch of Cassandra, with which you shouldn't run into that. -- Sylvain
Re: Performance problem with large wide row inserts using CQL
I am using CCM to install the servers, it is bringing in the source code, is there any option for CCM which I can set only to download the binary, just to make sure it is not bringing in the working copy of the code. I am using the following statements to create Keyspace and table definition. create keyspace test1 with replication = { 'class':'SimpleStrategy', 'replication_factor':1}; CREATE TABLE IF NOT EXISTS wide ( time varchar, name varchar, value varchar, PRIMARY KEY (time,name)) WITH COMPACT STORAGE; On Fri, Feb 21, 2014 at 11:47 AM, Yogi Nerella ynerella...@gmail.comwrote: Sylvain, I am trying ccm to install and it does from source directory, I have tried 2.0.4/3/2/1 and 1.2.15, all of them are reporting the same failure after 127 records inserted. I am using 1.56.34 and 1.56.38 client both reports the same issue. Is something wrong with the client or the server, none of the server logs show any error. Thanks, Yogi On Wed, Feb 19, 2014 at 11:36 PM, Sylvain Lebresne sylv...@datastax.comwrote: On Wed, Feb 19, 2014 at 9:38 PM, Rüdiger Klaehn rkla...@gmail.comwrote: I have cloned the cassandra repo, applied the patch, and built it. But when I want to run the bechmark I get an exception. See below. I tried with a non-managed dependency to cassandra-driver-core-2.0.0-rc3-SNAPSHOT-jar-with-dependencies.jar, which I compiled from source because I read that that might help. But that did not make a difference. So currently I don't know how to give the patch a try. Any ideas? cheers, Rüdiger Exception in thread main java.lang.IllegalArgumentException: replicate_on_write is not a column defined in this metadata at com.datastax.driver.core.ColumnDefinitions.getAllIdx(ColumnDefinitions.java:273) at com.datastax.driver.core.ColumnDefinitions.getFirstIdx(ColumnDefinitions.java:279) at com.datastax.driver.core.Row.getBool(Row.java:117) at com.datastax.driver.core.TableMetadata$Options.init(TableMetadata.java:474) at com.datastax.driver.core.TableMetadata.build(TableMetadata.java:107) at com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:128) at com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:89) at com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:259) at com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:214) at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:161) at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:77) at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:890) at com.datastax.driver.core.Cluster$Manager.newSession(Cluster.java:910) at com.datastax.driver.core.Cluster$Manager.access$200(Cluster.java:806) at com.datastax.driver.core.Cluster.connect(Cluster.java:158) at cassandra.CassandraTestMinimized$delayedInit$body.apply(CassandraTestMinimized.scala:31) at scala.Function0$class.apply$mcV$sp(Function0.scala:40) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) at scala.App$$anonfun$main$1.apply(App.scala:71) at scala.App$$anonfun$main$1.apply(App.scala:71) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32) at scala.App$class.main(App.scala:71) at cassandra.CassandraTestMinimized$.main(CassandraTestMinimized.scala:5) at cassandra.CassandraTestMinimized.main(CassandraTestMinimized.scala) I believe you've tried the cassandra trunk branch? trunk is basically the future Cassandra 2.1 and the driver is currently unhappy because the replicate_on_write option has been removed in that version. I'm supposed to have fixed that on the driver 2.0 branch like 2 days ago so maybe you're also using a slightly old version of the driver sources in there? Or maybe I've screwed up my fix, I'll double check. But anyway, it would be overall simpler to test with the cassandra-2.0 branch of Cassandra, with which you shouldn't run into that. -- Sylvain
abusing cassandra's multi DC abilities
Upfront TLDR: We want to do stuff (reindex documents, bust cache) when changed data from DC1 shows up in DC2. Full Story: We're planning on adding data centers throughout the US. Our platform is used for business communications. Each DC currently utilizes elastic search and redis. A message can be sent from one user to another, and the intent is that it would be seen in near-real-time. This means that 2 people may be using different data centers, and the messages need to propagate from one to the other. On the plus side, we know we get this with Cassandra (fist pump) but the other pieces, not so much. Even if they did work, there's all sorts of race conditions that could pop up from having different pieces of our architecture communicating over different channels. From this, we've arrived at the idea that since Cassandra is the authoritative data source, we might be able to trigger events in DC2 based on activity coming through either the commit log or some other means. One idea was to use a CF with a low gc time as a means of transporting messages between DCs, and watching the commit logs for deletes to that CF in order to know when we need to do things like reindex a document (or a new document), bust cache, etc. Facebook did something similar with their modifications to MySQL to include cache keys in the replication log. Assuming this is sane, I'd want to avoid having the same event register on 3 servers, thus registering 3 items in the queue when only one should be there. So, for any piece of data replicated from the other DC, I'd need a way to determine if it was supposed to actually trigger the event or not. (Maybe it looks at the token and determines if the current server falls in the token range?) Or is there a better way? So, my questions to all ye Cassandra users: 1. Is this is even sane? 2. Is anyone doing it? -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: How do you remote backup your cassandra nodes ?
Thanks for the links. @Colin: The Datastax doc about backup/restore (the one you linked to) is more about onsite backups, but not for remote backups. It did not described about what directories should I copy for backup to a remote location. After taking the snapshots should I copy all the snapshots directories within the tables directories to the remote location or do I need to copy the entire data directory commitlog directory. Another way rather than snapshot, could be to flush all the keyspaces, CFs then copy the data commitlog directory to remote location, isn't it ? What directories/files needs to be copied for a remote location backup ? On Sat, Feb 22, 2014 at 1:12 AM, Robert Coli rc...@eventbrite.com wrote: On Thu, Feb 20, 2014 at 12:13 PM, user 01 user...@gmail.com wrote: What is your strategy/tools set to backup your Cassandra nodes, apart from from cluster replication/ snapshots within cluster? https://github.com/synack/tablesnap =Rob
Update multiple rows in a CQL lightweight transaction
Folks, Does anyone know how I can modify multiple rows at once in a lightweight transaction in CQL3? I saw the following ticket: https://issues.apache.org/jira/browse/CASSANDRA-5633 but it was not obvious to me from the comments how (or whether) this got resolved. I also couldn't find anything in the DataStax documentation about how to perform these operations. I'm in particular interested in how to perform a compare-and-set operation that modifies multiple rows (with the same partition key) using the DataStax Java driver. Thanks! Best regards, Clint
cell-level security for cassandra ?
has there been any thought about adding cell-level security to Cassandra ? something similar to: http://accumulo.apache.org/1.5/accumulo_user_manual.html#_security ? -- Frank Hsueh | frank.hs...@gmail.com
Fwd: Delivery Status Notification (Failure)
I'm trying to get CQL going for my CentOS 5 cassandra PHP platform. I've installed thrift, but when I try to make cassandra-pdo or YACassandraPDO for that matter, none of the tests pass. And when I install it with PHP, phpinfo still doesn't show it loading and it doesn't work. Any ideas would be appreciated. There are pretty good instructions here - https://code.google.com/a/apache-extras.org/p/cassandra-pdo/ - for other platforms. But I can't find anything devoted to CentOS. Spencer
List support in Net::Async::CassandraCQL ?
This perl library has been extremely useful for scripting up data migrations. I wonder if anyone knows of the easiest way to use lists with this driver? Throwing a perl array in as a parameter doesn’t work as is: my $q = $cass-prepare(update contact set name=?, address=? where uuid=?)-get; push @f, $q-execute([$name, @address, $uuid]); Future-needs_all( @f )-get; Returns the following: Cannot encode address: not an ARRAY at /usr/local/share/perl/5.14.2/Net/Async/CassandraCQL/Query.pm line 182 In the mean time I could resort to inserting one list item at a time, but surely there is a nicer way (: Thanks as always, Jacob