Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-13 Thread David Tinker
I wrote some scripts to test this: https://github.com/davidtinker/cassandra-perf 3 node cluster, each node: Intel® Xeon® E3-1270 v3 Quadcore Haswell 32GB RAM, 1 x 2TB commit log disk, 2 x 4TB data disks (RAID0) Using a batch of prepared statements is about 5% faster than inline parameters:

Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-11 Thread Sylvain Lebresne
This loop takes 2500ms or so on my test cluster: PreparedStatement ps = session.prepare(INSERT INTO perf_test.wibble (id, info) VALUES (?, ?)) for (int i = 0; i 1000; i++) session.execute(ps.bind( + i, aa + i)); The same loop with the parameters inline is about 1300ms. It gets worse if

Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-11 Thread Sylvain Lebresne
Then I suspect that this is artifact of your test methodology. Prepared statements *are* faster than non prepared ones in general. They save some parsing and some bytes on the wire. The savings will tend to be bigger for bigger queries, and it's possible that for very small queries (like the one

Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-11 Thread Robert Wille
I use hand-rolled batches a lot. You can get a *lot* of performance improvement. Just make sure to sanitize your strings. I¹ve been wondering, what¹s the limit, practical or hard, on the length of a query? Robert On 12/11/13, 3:37 AM, David Tinker david.tin...@gmail.com wrote: Yes thats what I

Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-11 Thread Robert Wille
Network latency is the reason why the batched query is fastest. One trip to Cassandra versus 1000. If you execute the inserts in parallel, then that eliminates the latency issue. From: Sylvain Lebresne sylv...@datastax.com Reply-To: user@cassandra.apache.org Date: Wednesday, December 11, 2013

Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-11 Thread Sylvain Lebresne
On Wed, Dec 11, 2013 at 1:52 PM, Robert Wille rwi...@fold3.com wrote: Network latency is the reason why the batched query is fastest. One trip to Cassandra versus 1000. If you execute the inserts in parallel, then that eliminates the latency issue. While it is true a batch will means only

Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-11 Thread David Tinker
I didn't do any warming up etc. I am new to Cassandra and was just poking around with some scripts to try to find the fastest way to do things. That said all the mini-tests ran under the same conditions. In our case the batches will have a variable number of different inserts/updates in them so

Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-11 Thread Robert Wille
Very good point. I¹ve written code to do a very large number of inserts, but I¹ve only ever run it on a single-node cluster. I may very well find out when I run it against a multinode cluster that the performance benefits of large unlogged batches mostly go away. From: Sylvain Lebresne

What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-10 Thread David Tinker
I have tried the DataStax Java driver and it seems the fastest way to insert data is to compose a CQL string with all parameters inline. This loop takes 2500ms or so on my test cluster: PreparedStatement ps = session.prepare(INSERT INTO perf_test.wibble (id, info) VALUES (?, ?)) for (int i = 0;

Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-10 Thread graham sanderson
I should probably give you a number which is about 300 meg / s via thrift api and use 1mb batches On Dec 10, 2013, at 5:14 AM, graham sanderson gra...@vast.com wrote: Perhaps not the way forward, however I can bulk insert data via astyanax at a rate that maxes out our (fast) networks. That

Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-10 Thread David Tinker
Hmm. I have read that the thrift interface to Cassandra is out of favour and the CQL interface is in. Where does that leave Astyanax? On Tue, Dec 10, 2013 at 1:14 PM, graham sanderson gra...@vast.com wrote: Perhaps not the way forward, however I can bulk insert data via astyanax at a rate that

Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-10 Thread graham sanderson
I can’t speak for Astyanax; their thrift transport I believe is abstracted out, however the object model is very CF wide row vs table-y. I have no idea what the plans are for further Astyanax dev (maybe someone on this list), but I believe the thrift API is not going away, so considering

Re: What is the fastest way to get data into Cassandra 2 from a Java application?

2013-12-10 Thread John Sanda
The session.execute blocks until the C* returns the response. Use the async version, but do so with caution. If you don't throttle the requests, you will start seeing timeouts on the client side pretty quickly. For throttling I've used a Semaphore, but I think Guava's RateLimiter is better suited.