I wrote some scripts to test this: https://github.com/davidtinker/cassandra-perf
3 node cluster, each node: Intel® Xeon® E3-1270 v3 Quadcore Haswell
32GB RAM, 1 x 2TB commit log disk, 2 x 4TB data disks (RAID0)
Using a batch of prepared statements is about 5% faster than inline parameters:
This loop takes 2500ms or so on my test cluster:
PreparedStatement ps = session.prepare(INSERT INTO perf_test.wibble
(id, info) VALUES (?, ?))
for (int i = 0; i 1000; i++) session.execute(ps.bind( + i, aa + i));
The same loop with the parameters inline is about 1300ms. It gets
worse if
Then I suspect that this is artifact of your test methodology. Prepared
statements *are* faster than non prepared ones in general. They save some
parsing and some bytes on the wire. The savings will tend to be bigger for
bigger queries, and it's possible that for very small queries (like the one
I use hand-rolled batches a lot. You can get a *lot* of performance
improvement. Just make sure to sanitize your strings.
I¹ve been wondering, what¹s the limit, practical or hard, on the length of
a query?
Robert
On 12/11/13, 3:37 AM, David Tinker david.tin...@gmail.com wrote:
Yes thats what I
Network latency is the reason why the batched query is fastest. One trip to
Cassandra versus 1000. If you execute the inserts in parallel, then that
eliminates the latency issue.
From: Sylvain Lebresne sylv...@datastax.com
Reply-To: user@cassandra.apache.org
Date: Wednesday, December 11, 2013
On Wed, Dec 11, 2013 at 1:52 PM, Robert Wille rwi...@fold3.com wrote:
Network latency is the reason why the batched query is fastest. One trip
to Cassandra versus 1000. If you execute the inserts in parallel, then that
eliminates the latency issue.
While it is true a batch will means only
I didn't do any warming up etc. I am new to Cassandra and was just
poking around with some scripts to try to find the fastest way to do
things. That said all the mini-tests ran under the same conditions.
In our case the batches will have a variable number of different
inserts/updates in them so
Very good point. I¹ve written code to do a very large number of inserts, but
I¹ve only ever run it on a single-node cluster. I may very well find out
when I run it against a multinode cluster that the performance benefits of
large unlogged batches mostly go away.
From: Sylvain Lebresne
I have tried the DataStax Java driver and it seems the fastest way to
insert data is to compose a CQL string with all parameters inline.
This loop takes 2500ms or so on my test cluster:
PreparedStatement ps = session.prepare(INSERT INTO perf_test.wibble
(id, info) VALUES (?, ?))
for (int i = 0;
I should probably give you a number which is about 300 meg / s via thrift api
and use 1mb batches
On Dec 10, 2013, at 5:14 AM, graham sanderson gra...@vast.com wrote:
Perhaps not the way forward, however I can bulk insert data via astyanax at a
rate that maxes out our (fast) networks. That
Hmm. I have read that the thrift interface to Cassandra is out of
favour and the CQL interface is in. Where does that leave Astyanax?
On Tue, Dec 10, 2013 at 1:14 PM, graham sanderson gra...@vast.com wrote:
Perhaps not the way forward, however I can bulk insert data via astyanax at a
rate that
I can’t speak for Astyanax; their thrift transport I believe is abstracted out,
however the object model is very CF wide row vs table-y.
I have no idea what the plans are for further Astyanax dev (maybe someone on
this list), but I believe the thrift API is not going away, so considering
The session.execute blocks until the C* returns the response. Use the async
version, but do so with caution. If you don't throttle the requests, you
will start seeing timeouts on the client side pretty quickly. For
throttling I've used a Semaphore, but I think Guava's RateLimiter is better
suited.
13 matches
Mail list logo