Forgot to CC the list.

/Jens

Från: Jens Rantil
Skickat: den 12 april 2011 15:04
Till: 'Russell Brown'
Ämne: SV: SV: Slow inserts

Hi Rusell,

Thank you for your response.

The java client I was using was 0.14.1. Still, it was slow.

Thanks for all the input concerning protocol buffers riak java client. I'll 
look into it in the near future.

Regards,
Jens

Från: Russell Brown [mailto:[email protected]]
Skickat: den 12 april 2011 13:48
Till: Jens Rantil
Ämne: Re: SV: Slow inserts

Hi Jens,

On 12 Apr 2011, at 12:03, Jens Rantil wrote:

Kresten,

Thank you for your response. It is good to have a reference. Thank you for that.

As of the Python client, I first rewrote the benchmark in Java. It still seems 
to be using an http client and I see a similar write performance (260-330 
writes/second). Later, I read in the Python client documentation that protocol 
buffers was recommended for production systems. Modifying the Python client to 
use protocol buffers indeed yielded a significant performance boost now up to 
~2900 writes/second. This looks very much what I expected and it does indeed 
show that http was the overhead.

All benchmarks mentioned above have consisted of 11 threads writing in total 
53000 key-values using round robin over the nodes.

A follow-up question: I am considering using Java for production and currently 
my Java benchmark (see above) is very slow. You mention using a Java stream 
interface? Do you mean this involves making multiple requests through the same 
http connection?

Both the HTTP client and the protocol buffers client reuse connections. The 
HTTP client holds a pool of connections and the pb client creates a connection 
per thread, and reuses that connection (unless it is inactive for over a 
second) so if you can set up your client to have a number of threads busily 
pumping data you can get some good throughput.

I have set maxConnections to 50 for every node in (Java) benchmark without any 
significant performance boost. Is there anything else I should set?

What version of of the Java client where you using for HTTP? The 0.14 (and 
previous) only really allowed a maximum of 2 concurrent connections over http. 
That is fixed and is in the new 0.14.1 release that went out yesterday.

I stumbled across the java protocol buffers client, and I guess that's the 
better alternative.

Without a doubt you want to be using the protobuffers client.

Using the protobuffers client you have a couple of options about how to use it 
for best write performance. Either use

public ByteString[] store(RiakObject[] values, RequestMeta meta) from some 
threads (if you can batch your objects up) or use
public RiakObject[] store(RiakObject value, IRequestMeta meta) from some 
threads.

If you use IRequestMete.returnBody(true) the former will be faster as it reads 
the responses in whilst still writing out the responses.

If you're just pumping data in then don't set returnBody to true (or use public 
void store(RiakObject) ).

Setting the *same* client id across the threads (whilst conceptually iffy) 
yields a performance increase too I've noticed, but to do that you will also 
need the latest release (0.14.1).

On my dual core MBP running the client and a single Riak node, the 1400 writes 
from the Riak Fast Track google.csv file take ~900 millisecs with the 
protobuffs client, with client Id set using 3 threads and about ~1300 without 
client id set.
HTTP about ~2800 from 3 threads with client Id and about ~3200 without.

(Times include reading the file into RiakObjects)

Cheers

Russell


Thanks,
Jens

Från: Kresten Krab Thorup [mailto:[email protected]]
Skickat: den 11 april 2011 23:34
Till: Jens Rantil
Kopia: [email protected]<mailto:[email protected]>
Ämne: Re: Slow inserts

Jens,

Just for reference ...in our dev lab ... we quite consistently get ~3000 
puts/second with N=3, W=2 on a 3-node macmini cluster running 0.14.1 w/bitcask 
as the backend.  That's small machines with 2.6GHz Dual Core 2, and 8GB ram.

We do use lots of threads in the client, and a load balancer [or you can just 
have the threads connect to the different Riak's].  If all requests funnel 
through one machine it may become loaded, ... and we're using the Java client 
which is able to stream requests  without having to open new connections all 
the time .. dunno how the python client is in this regard.  We also have a 
proper gigabit managed switch between the macmini's, but I think it is unlikely 
that networking is the limiting factor ... 3000 x 1-kilobyte-per-second /sec 
corresponds to just 3 mega-bit-per-second ... the chatter among the clustered 
machines is roughly N-fold the chatter from the client to the cluster by my 
observations.

Have you tried to run some simple CPU/IO monitor on the 3 machines you're using 
to see if they are CPU or I/O bound?  ... If they are not really loaded, you 
may need to add clients/threads.  You should not be satisfied until the server 
machines are saturated.

Kresten


On Apr 11, 2011, at 18:00 , Jens Rantil wrote:


Hi,

I have set up a 5-node test environment to give Riak a test run. I wrote a 
Python script (http://pastebin.com/geQ00Ngb) that put 53780 key-values into my 
cluster. Using round robin, 11 threads inserted these values in ~70 seconds. 
This means an average of ~750 key-values/second. Is this the expected speed of 
inserts? It seems quite slow. The Mozilla benchmark 
(http://blog.mozilla.com/data/2010/08/16/benchmarking-riak-for-the-mozilla-test-pilot-project/)
 reached > 1500 ops/second for significantly bigger values than my benchmark.

Additional information:
* Four of the nodes were not doing anything else.
* n_val=3
* For every insert, w=1 and dw=0.
* All nodes are using riak_0.14.1-1_amd64.deb and I have not changed any of the 
defaults settings except Erlang -cookie and -name.

Thanks,
Jens Rantil
<ATT00001..txt>

_______________________________________________
riak-users mailing list
[email protected]<mailto:[email protected]>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to