Hi, I' m testing Cassandra and got some questions:
I'm currently using image blob to store images converted to base64.
Those images are under 1MB. For know I have only one source of images and it
work without problem. But problems come with the stress-test tool. First in my
test I have defined,
columnspec: - name: image size: fixed(681356)
the size 681356 is the number of characters (base64) which comes from
the largest picture from my source.
when I start the stress test, I got the following error most of the
time:
com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra
timeout during write query at consistency LOCAL_ONE (1 replica were required
but only 0 acknowledged the write)
(the timeout is set to his default value of two seconds)
The cluster is made of three nodes, all are VMs in the cloud. The VM
used to do the stress test is also the seed node. A write test made using dd
shown that we have a low write speed (I talked with the people in charge to
spin up VMs, he confirmed that speed from the plan we have),
$ dd if=/dev/zero of=1024M bs=1M count=1024 1024+0 records in 1024+0
records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 31.3265 s, 34.3 MB/s
First I want to be sure I understand correctly the error: for me the
message,
com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra
timeout during write query at consistency LOCAL_ONE (1 replica were required
but only 0 acknowledged the write)
for me that means that we bottleneck the write operations (memtable to
sstable) right ?
Second, I wonder if having only 3 nodes can be a problem. In my
understanding of Apache Cassandra, if my partition keys is defined to split
data evenly trough the cluster, more clusters means less write operations
(memtable and sstable) performed in the time. Moreover, considering the case of
the stress test, I wonder if the fact to have only 3 nodes overload each node
like an hot spot would have.
third, the write speed on the disk is something important right ? so
changing the plan we have for our VMS to something with a better write speed
should help to solve the current issue, right ?
about the stress test, what is exactly the meaning of threads ? is each
thread inserting data asynchronously or synchronously (waiting an ACK from
Cassandra?) ? should I consider each thread as a source ? because the stress
test never goes over 64 threads (iirc).
Actually I don' t know for sure what parameter(s) define the number of sources
during the stress test. For me the stress test should show me the limits of the
infrastructure and model under the pressure of increasing the number of sources
and operations per seconds.
do you have any comment or advice that can help me?
regards,
Nicolas Jäger