The limitation is on the driver side. Try looking at
execute_concurrent_with_args in the cassandra.concurrent module to get
parallel writes with prepared statements.

https://datastax.github.io/python-driver/api/cassandra/concurrent.html
On Wed, Dec 30, 2015 at 11:34 PM Alexandre Beaulne <
[email protected]> wrote:

> Hi everyone,
>
> First and foremost thanks to everyone involved with making C* available to
> the world, it is a great technology to have access to.
>
> I'm experimenting with C* for one of our projects and I cannot reproduce
> the write speeds C* is lauded for. I would appreciate some guidance as to
> what I'm doing wrong.
>
> *Setup*: I have one, single-threaded, python client (using Datastax's
> python driver), writing (no reads) to a C* cluster. All C* nodes are
> launched by running the official Docker container. There's a single
> keyspace with replication factor of 1 and client is set to consistency
> level LOCAL ONE. In that keyspace there is a single table with ~40 columns
> of mixed types. Two columns are set as primary key and two more as
> clustering columns. The primary key is close to uniformly distributed in
> the dataset. The writer is in a tight-loop, building CQL 3 insert
> statements one by one and executing them against the C* cluster.
>
> *Specs*: Cassandra v3.0.1, python-driver v3.0.0, host is CentOS 7 with 40
> cores @ 3GHz and 66Gb of RAM.
>
> In the course of my experimentation I came up with 7 scenarios trying to
> isolate the performance bottleneck:
>
> *Scenario 1*: the writer simply build the insert statement strings
> without doing anything with them.
>
> Results: sample size: 200002, percentiles (ms): [50] 0.00 - [95] 0.01 -
> [99] 0.01 [100] 0.05
>
> *Scenario 2*: the writer open a TCP socket and send the insert statement
> string to a simple reader running on the same host. The reader then append
> that insert statement string to a file on disk, mimicking a commit log of
> some sort.
>
> Results: sample size: 200002, percentiles (ms): [50] 0.01 - [95] 0.02 -
> [99] 0.03 [100] 63.33
>
> *Scenario 3*: is identical to scenario 2, but the reader is ran inside a
> Docker container, to measure if there is any overhead from running in the
> container.
>
> Results: sample size: 200002, percentiles (ms): [50] 0.01 - [95] 0.01 -
> [99] 0.01 [100] 4.45
>
> * Scenario 4*: the writer asynchronously executes the insert statements
> against a single-node C* cluster.
>
> Results: sample size: 200002, percentiles (ms): [50] 0.07 - [95] 0.15 -
> [99] 0.56 [100] 534.09
>
> *Scenario 5*: the writer synchronously executes the insert statements
> against a single-node C* cluster.
>
> Results: sample size: 200002, percentiles (ms): [50] 1.40 - [95] 1.46 -
> [99] 1.54 [100] 41.75
>
> *Scenario 6*: the writer asynchronously executes the insert statements
> against a four-nodes C* cluster.
>
> Results: sample size: 200002, percentiles (ms): [50] 0.09 - [95] 0.14 -
> [99] 0.16 [100] 838.83
>
> *Scenario 7*: the writer synchronously executes the insert statements
> against a four-nodes C* cluster.
>
> Results: sample size: 200002, percentiles (ms): [50] 1.73 - [95] 1.89 -
> [99] 2.15 [100] 50.94
>
> Looking at scenario 3 & 5, a synchronous write statement to C* is about
> 150x slower than appending to a flat file. Now I understand write to a DB
> is more involved than appending to a file, but I'm surprised by the
> magnitude of the difference. I thought all C* did for writes with
> consistency level of 1 was to append the write to its commit log and
> return, then distribute the write across the cluster in an eventual
> consistency manner. More than 1 ms per write is less than a 1000 writes per
> second, far from big data velocity.
>
> What am I doing wrong? Are writes supposed to be batched before inserted?
> Instead of appending rows to the table, would it be more efficient to
> append columns to the rows? Why writes are so slow?
>
> Thanks for your time,
> Alex
>

Reply via email to