One thing, which is often missed by newcomers to Riak [I'm not saying you
missed it], is the importance of managing client IDs, and passing the right
vector clocks back to the server.
{ Basho'ers ... please corret me if I'm wrong }
Kresten
So, Rule#1 (which has two clauses), which you can always revert to:
1.a / every client needs a clientID, which is distinct for that client. Be
sure to always pass it along in all calls (in Java that is done by calling
setClientID on the RiakClient, at the HTTP-level, it is done by passing the
X-Riak-ClientId HTTP header).
1.b / when you send an update (HTTP PUT or DELETE), always pass along the
X-Riak-Vectorclock from a corresponding GET. If you don't do this, your PUT is
likely to go to /dev/null, because Riak thinks that it is a replay of an old
request.
Until you're re really familiar with how Riak works, you should always do these
two, or you will be severely burned when you realize that it doesn't behave as
expected. Believe me, I've been there.
1.a / Choosing a good client ID
========================
If you don't choose a client ID, Riak will do it for you ... BUT .. it will
choose a new one for EVERY REQUEST. This has many issues, so Riak should
really require YOU to come up with one in stead; perhaps it will do so at some
point in the future.
Riak has some special optimizations if your client ID is the Base64-encoding of
a byte array of length 4. So, a good, default way to choose a client id is
thus:
static SecureRandom rnd = new SecureRandom();
static ThreadLocal<String> CLIENT_ID = new ThreadLocal<String>() {
protected String initialValue() {
return randomClientID();
};
};
public static String getClientID() {
return CLIENT_ID.get();
}
static private String randomClientID() {
byte[] bytes = new byte[4];
rnd.nextBytes(bytes);
return Base64.encode(bytes);;
}
This makes it so that each thread in your application is assigned a new random
ClientID, which is often useful if your client is multi-threaded.
The above code is *alot* better than the default of having the server side
choose a new client id for every request.
If you have some kind of logical unique, non-concurrent client concept in your
system, that may be even better. It could e.g. be the IMEI of your mobile
phone, if your Riak client app is running on a Phone; or it could be a userid,
if you are sure that only one user is accessing the system at a time.
1.b / Passing the VectorClock
=======================
Secondly, you need to make sure that you pass the vector clock.
You should think of the vector clock as an opaque "optimistic concurrency
token", that you receive when you do a GET, and have to pass in when you do a
PUT ... and then you get a new "optimistic concurrency token", that you have to
use henceforth.
Depending on the configuration of your buckets, using an old vector clock will
simply cause the PUT request to be ignored (if allow_mult=false), or cause
siblings to be created (if allow_mult=true). This is where Riak is often "not
what you expect", but there is a good reason for this behavior.
IT IS ABSOLUTELY PARAMOUNT TO UNDERSTAND THIS.
The above two things (1.a and 1.b) are so difficult to understand for
newcomers, and a bit tricky to get right, so IMHO a new Java client should
provide some way to avoid doing these mistakes as the default behavior.
- So, it should choose a good client ID fo you if you don't.
- And it should make it so that you can't do UPDATE/PUT without having first
GOT'en the riak object.
The last part is especially tricky. Perhaps we should have the API look like
this to help that ....
interface RiakObject {
...
}
interface UpdateableRiakObject extends RiakObject { ... }
interface CreateableRiakObject extends RiakObject { ... }
RiakClient {
UpdateableRiakObject update(UpdateableRiakObject o) throws NotModified
{ ... send PUT ... }
UpdateableRiakObject create(CreateableRiakObject o) throws AlreadyThere
{ ... send PUT ... }
UpdateableRiakObject get(bucket, key);
CreateableRiakObject fresh(bucket, key);
}
I.e. NOT EXPOSE constructors for the implementors of RiakObject. The only way
to get an UpdateableRiakObject is to call RiakClient.get, or as the result of
calling update/create; you can't just allocate one. Also calling update/create
should "invalidate" the original object so that it cannot accidentally be used
again.
I really think we need to have a way to enforce the linear nature of these
things. Otherwise people get fooled.
Kresten
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com