To add to what Jonathan and Jack have said... To get high levels of performance with the python driver you should:
- prepare your statements once (recent drivers default to Token Aware - and will correctly apply it if the statement is prepared). - execute asynchronously (up to ~150 futures - tho my [old] benchmarks showed smaller numbers worked fine.) - use multi-processing (performance leveled off in my [old] benchmark when each process consumed ~50% of a CPU) - Watch for network bottlenecks. ml On Thu, Dec 31, 2015 at 12:30 PM, Jack Krupansky <jack.krupan...@gmail.com> wrote: > Make sure the driver is configured for token aware routing, otherwise the > coordinator node may have to redirect your write, adding a network hop. > > To be absolutely clear, Cassandra uses the distributed, parallel model for > Big Data - lots of multi-threaded clients with lots of nodes. Clusters with > less than six or eight nodes and using a single, single-threaded client are > not a representative usage of Cassandra. Replication is presumed as well. > Anything less than RF=3 is simply not a representative or recommended usage > of Cassandra. Similarly, writes at less than QUORUM are neither > representative nor recommended. > > CL=ONE has to update the memtable as well, not just the commit log. > Flushing to sstables occurs once the memtables reach some threshold > size.See: > > http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html > > > -- Jack Krupansky > > On Thu, Dec 31, 2015 at 11:13 AM, Jonathan Haddad <j...@jonhaddad.com> > wrote: > >> The limitation is on the driver side. Try looking at >> execute_concurrent_with_args in the cassandra.concurrent module to get >> parallel writes with prepared statements. >> >> https://datastax.github.io/python-driver/api/cassandra/concurrent.html >> >> On Wed, Dec 30, 2015 at 11:34 PM Alexandre Beaulne < >> alexandre.beau...@gmail.com> wrote: >> >>> Hi everyone, >>> >>> First and foremost thanks to everyone involved with making C* available >>> to the world, it is a great technology to have access to. >>> >>> I'm experimenting with C* for one of our projects and I cannot reproduce >>> the write speeds C* is lauded for. I would appreciate some guidance as to >>> what I'm doing wrong. >>> >>> *Setup*: I have one, single-threaded, python client (using Datastax's >>> python driver), writing (no reads) to a C* cluster. All C* nodes are >>> launched by running the official Docker container. There's a single >>> keyspace with replication factor of 1 and client is set to consistency >>> level LOCAL ONE. In that keyspace there is a single table with ~40 columns >>> of mixed types. Two columns are set as primary key and two more as >>> clustering columns. The primary key is close to uniformly distributed in >>> the dataset. The writer is in a tight-loop, building CQL 3 insert >>> statements one by one and executing them against the C* cluster. >>> >>> *Specs*: Cassandra v3.0.1, python-driver v3.0.0, host is CentOS 7 with >>> 40 cores @ 3GHz and 66Gb of RAM. >>> >>> In the course of my experimentation I came up with 7 scenarios trying to >>> isolate the performance bottleneck: >>> >>> *Scenario 1*: the writer simply build the insert statement strings >>> without doing anything with them. >>> >>> Results: sample size: 200002, percentiles (ms): [50] 0.00 - [95] 0.01 - >>> [99] 0.01 [100] 0.05 >>> >>> *Scenario 2*: the writer open a TCP socket and send the insert >>> statement string to a simple reader running on the same host. The reader >>> then append that insert statement string to a file on disk, mimicking a >>> commit log of some sort. >>> >>> Results: sample size: 200002, percentiles (ms): [50] 0.01 - [95] 0.02 - >>> [99] 0.03 [100] 63.33 >>> >>> *Scenario 3*: is identical to scenario 2, but the reader is ran inside >>> a Docker container, to measure if there is any overhead from running in the >>> container. >>> >>> Results: sample size: 200002, percentiles (ms): [50] 0.01 - [95] 0.01 - >>> [99] 0.01 [100] 4.45 >>> >>> * Scenario 4*: the writer asynchronously executes the insert statements >>> against a single-node C* cluster. >>> >>> Results: sample size: 200002, percentiles (ms): [50] 0.07 - [95] 0.15 - >>> [99] 0.56 [100] 534.09 >>> >>> *Scenario 5*: the writer synchronously executes the insert statements >>> against a single-node C* cluster. >>> >>> Results: sample size: 200002, percentiles (ms): [50] 1.40 - [95] 1.46 - >>> [99] 1.54 [100] 41.75 >>> >>> *Scenario 6*: the writer asynchronously executes the insert statements >>> against a four-nodes C* cluster. >>> >>> Results: sample size: 200002, percentiles (ms): [50] 0.09 - [95] 0.14 - >>> [99] 0.16 [100] 838.83 >>> >>> *Scenario 7*: the writer synchronously executes the insert statements >>> against a four-nodes C* cluster. >>> >>> Results: sample size: 200002, percentiles (ms): [50] 1.73 - [95] 1.89 - >>> [99] 2.15 [100] 50.94 >>> >>> Looking at scenario 3 & 5, a synchronous write statement to C* is about >>> 150x slower than appending to a flat file. Now I understand write to a DB >>> is more involved than appending to a file, but I'm surprised by the >>> magnitude of the difference. I thought all C* did for writes with >>> consistency level of 1 was to append the write to its commit log and >>> return, then distribute the write across the cluster in an eventual >>> consistency manner. More than 1 ms per write is less than a 1000 writes per >>> second, far from big data velocity. >>> >>> What am I doing wrong? Are writes supposed to be batched before >>> inserted? Instead of appending rows to the table, would it be more >>> efficient to append columns to the rows? Why writes are so slow? >>> >>> Thanks for your time, >>> Alex >>> >> >