Re: Java Riak client can't handle a Riak node failure?

Vanessa Williams Mon, 22 Feb 2016 10:27:12 -0800

Hi Alex, would a second fetch just indicate that the object is *still*
deleted? Or that this delete operation succeeded? In other words, perhaps
what my contract really is is: return true if there was already a value
there. In which case would the second fetch be superfluous?


Thanks for your help.

Vanessa

On Mon, Feb 22, 2016 at 11:15 AM, Alex Moore <amo...@basho.com> wrote:

> That's the correct behaviour: it should return true iff a value was
>> actually deleted.
>
>
> Ok, if that's the case you should do another FetchValue after the deletion
> (to update the response.hasValues()) field, or use the async version of
> the delete function. I also noticed that we weren't passing the vclock to
> the Delete function, so I added that here as well:
>
> public boolean delete(String key) throws ExecutionException, 
> InterruptedException {
>
>     // fetch in order to get the causal context
>     FetchValue.Response response = fetchValue(key);
>
>     if(response.isNotFound())
>     {
>         return ???; // what do we return if it doesn't exist?
>     }
>
>     DeleteValue deleteValue = new DeleteValue.Builder(new Location(namespace, 
> key))
>                                              
> .withVClock(response.getVectorClock())
>                                              .build();
>
>     final RiakFuture<Void, Location> deleteFuture = 
> client.executeAsync(deleteValue);
>
>     deleteFuture.await();
>
>     if(deleteFuture.isSuccess())
>     {
>         return true;
>     }
>     else
>     {
>         deleteFuture.cause(); // Cause of failure
>         return false;
>     }
> }
>
>
> Thanks,
> Alex
>
> On Mon, Feb 22, 2016 at 10:48 AM, Vanessa Williams <
> vanessa.willi...@thoughtwire.ca> wrote:
>
>> See inline:
>>
>> On Mon, Feb 22, 2016 at 10:31 AM, Alex Moore <amo...@basho.com> wrote:
>>
>>> Hi Vanessa,
>>>
>>> You might have a problem with your delete function (depending on it's
>>> return value).
>>> What does the return value of the delete() function indicate?  Right now
>>> if an object existed, and was deleted, the function will return true, and
>>> will only return false if the object didn't exist or only consisted of
>>> tombstones.
>>>
>>
>>
>> That's the correct behaviour: it should return true iff a value was
>> actually deleted.
>>
>>
>>> If you never look at the object value returned by your fetchValue(key) 
>>> function, another potential optimization you could make is to only return 
>>> the HEAD / metadata:
>>>
>>> FetchValue fv = new FetchValue.Builder(new Location(new Namespace(
>>> "some_bucket"), key))
>>>
>>>                               .withOption(FetchValue.Option.HEAD, true)
>>>                               .build();
>>>
>>> This would be more efficient because Riak won't have to send you the
>>> values over the wire, if you only need the metadata.
>>>
>>>
>> Thanks, I'll clean that up.
>>
>>
>>> If you do write this up somewhere, share the link! :)
>>>
>>
>> Will do!
>>
>> Regards,
>> Vanessa
>>
>>
>>>
>>> Thanks,
>>> Alex
>>>
>>>
>>> On Mon, Feb 22, 2016 at 6:23 AM, Vanessa Williams <
>>> vanessa.willi...@thoughtwire.ca> wrote:
>>>
>>>> Hi Dmitri, this thread is old, but I read this part of your answer
>>>> carefully:
>>>>
>>>> You can use the following strategies to prevent stale values, in
>>>>> increasing order of security/preference:
>>>>> 1) Use timestamps (and not pass in vector clocks/causal context). This
>>>>> is ok if you're not editing objects, or you're ok with a bit of risk of
>>>>> stale values.
>>>>> 2) Use causal context correctly (which means, read-before-you-write --
>>>>> in fact, the Update operation in the java client does this for you, I
>>>>> think). And if Riak can't determine which version is correct, it will fall
>>>>> back on timestamps.
>>>>> 3) Turn on siblings, for that bucket or bucket type.  That way, Riak
>>>>> will still try to use causal context to decide the right value. But if it
>>>>> can't decide, it will store BOTH values, and give them back to you on the
>>>>> next read, so that your application can decide which is the correct one.
>>>>
>>>>
>>>> I decided on strategy #2. What I am hoping for is some validation that
>>>> the code we use to "get", "put", and "delete" is correct in that context,
>>>> or if it could be simplified in some cases. Not we are using delete-mode
>>>> "immediate" and no duplicates.
>>>>
>>>> In their shortest possible forms, here are the three methods I'd like
>>>> some feedback on (note, they're being used in production and haven't caused
>>>> any problems yet, however we have very few writes in production so the lack
>>>> of problems doesn't support the conclusion that the implementation is
>>>> correct.) Note all argument-checking, exception-handling, and logging
>>>> removed for clarity. *I'm mostly concerned about correct use of causal
>>>> context and response.isNotFound and response.hasValues. *Is there
>>>> anything I could/should have left out?
>>>>
>>>>     public RiakClient(String name,
>>>> com.basho.riak.client.api.RiakClient client)
>>>>     {
>>>>         this.name = name;
>>>>         this.namespace = new Namespace(name);
>>>>         this.client = client;
>>>>     }
>>>>
>>>>     public byte[] get(String key) throws ExecutionException,
>>>> InterruptedException {
>>>>
>>>>         FetchValue.Response response = fetchValue(key);
>>>>         if (!response.isNotFound())
>>>>         {
>>>>             RiakObject riakObject = response.getValue(RiakObject.class);
>>>>             return riakObject.getValue().getValue();
>>>>         }
>>>>         return null;
>>>>     }
>>>>
>>>>     public void put(String key, byte[] value) throws
>>>> ExecutionException, InterruptedException {
>>>>
>>>>         // fetch in order to get the causal context
>>>>         FetchValue.Response response = fetchValue(key);
>>>>         RiakObject storeObject = new
>>>>
>>>> RiakObject().setValue(BinaryValue.create(value)).setContentType("binary/octet-stream");
>>>>         StoreValue.Builder builder =
>>>>             new StoreValue.Builder(storeObject).withLocation(new
>>>> Location(namespace, key));
>>>>         if (response.getVectorClock() != null) {
>>>>             builder =
>>>> builder.withVectorClock(response.getVectorClock());
>>>>         }
>>>>         StoreValue storeValue = builder.build();
>>>>         client.execute(storeValue);
>>>>     }
>>>>
>>>>     public boolean delete(String key) throws ExecutionException,
>>>> InterruptedException {
>>>>
>>>>         // fetch in order to get the causal context
>>>>         FetchValue.Response response = fetchValue(key);
>>>>         if (!response.isNotFound())
>>>>         {
>>>>             DeleteValue deleteValue = new DeleteValue.Builder(new
>>>> Location(namespace, key)).build();
>>>>             client.execute(deleteValue);
>>>>         }
>>>>         return !response.isNotFound() || !response.hasValues();
>>>>     }
>>>>
>>>>
>>>> Any comments much appreciated. I want to provide a minimally correct
>>>> example of simple client code somewhere (GitHub, blog post, something...)
>>>> so I don't want to post this without review.
>>>>
>>>> Thanks,
>>>> Vanessa
>>>>
>>>> ThoughtWire Corporation
>>>> http://www.thoughtwire.com
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Oct 8, 2015 at 8:45 AM, Dmitri Zagidulin <dzagidu...@basho.com>
>>>> wrote:
>>>>
>>>>> Hi Vanessa,
>>>>>
>>>>> The thing to keep in mind about read repair is -- it happens
>>>>> asynchronously on every GET, but /after/ the results are returned to the
>>>>> client.
>>>>>
>>>>> So, when you issue a GET with r=1, the coordinating node only waits
>>>>> for 1 of the replicas before responding to the client with a success, and
>>>>> only afterwards triggers read-repair.
>>>>>
>>>>> It's true that with notfound_ok=false, it'll wait for the first
>>>>> non-missing replica before responding. But if you edit or update your
>>>>> objects at all, an R=1 still gives you a risk of stale values being
>>>>> returned.
>>>>>
>>>>> For example, say you write an object with value A.  And let's say your
>>>>> 3 replicas now look like this:
>>>>>
>>>>> replica 1: A,  replica 2: A, replica 3: notfound/missing
>>>>>
>>>>> A read with an R=1 and notfound_ok=false is just fine, here. (Chances
>>>>> are, the notfound replica will arrive first, but the notfound_ok setting
>>>>> will force the coordinator to wait for the first non-empty value, A, and
>>>>> return it to the client. And then trigger read-repair).
>>>>>
>>>>> But what happens if you edit that same object, and give it a new
>>>>> value, B?  So, now, there's a chance that your replicas will look like 
>>>>> this:
>>>>>
>>>>> replica 1: A, replica 2: B, replica 3: B.
>>>>>
>>>>> So now if you do a read with an R=1, there's a chance that replica 1,
>>>>> with the old value of A, will arrive first, and that's the response that
>>>>> will be returned to the client.
>>>>>
>>>>> Whereas, using R=2 eliminates that risk -- well, at least decreases
>>>>> it. You still have the issue of -- how does Riak decide whether A or B is
>>>>> the correct value? Are you using causal context/vclocks correctly? (That
>>>>> is, reading the object before you update, to get the correct causal
>>>>> context?) Or are you relying on timestamps? (This is an ok strategy,
>>>>> provided that the edits are sufficiently far apart in time, and you don't
>>>>> have many concurrent edits, AND you're ok with the small risk of
>>>>> occasionally the timestamp being wrong). You can use the following
>>>>> strategies to prevent stale values, in increasing order of
>>>>> security/preference:
>>>>>
>>>>> 1) Use timestamps (and not pass in vector clocks/causal context). This
>>>>> is ok if you're not editing objects, or you're ok with a bit of risk of
>>>>> stale values.
>>>>>
>>>>> 2) Use causal context correctly (which means, read-before-you-write --
>>>>> in fact, the Update operation in the java client does this for you, I
>>>>> think). And if Riak can't determine which version is correct, it will fall
>>>>> back on timestamps.
>>>>>
>>>>> 3) Turn on siblings, for that bucket or bucket type.  That way, Riak
>>>>> will still try to use causal context to decide the right value. But if it
>>>>> can't decide, it will store BOTH values, and give them back to you on the
>>>>> next read, so that your application can decide which is the correct one.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Oct 8, 2015 at 1:56 AM, Vanessa Williams <
>>>>> vanessa.willi...@thoughtwire.ca> wrote:
>>>>>
>>>>>> Hi Dmitri, what would be the benefit of r=2, exactly? It isn't
>>>>>> necessary to trigger read-repair, is it? If it's important I'd rather try
>>>>>> it sooner than later...
>>>>>>
>>>>>> Regards,
>>>>>> Vanessa
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 7, 2015 at 4:02 PM, Dmitri Zagidulin <
>>>>>> dzagidu...@basho.com> wrote:
>>>>>>
>>>>>>> Glad you sorted it out!
>>>>>>>
>>>>>>> (I do want to encourage you to bump your R setting to at least 2,
>>>>>>> though. Run some tests -- I think you'll find that the difference in 
>>>>>>> speed
>>>>>>> will not be noticeable, but you do get a lot more data resilience with 
>>>>>>> 2.)
>>>>>>>
>>>>>>> On Wed, Oct 7, 2015 at 6:24 PM, Vanessa Williams <
>>>>>>> vanessa.willi...@thoughtwire.ca> wrote:
>>>>>>>
>>>>>>>> Hi Dmitri, well...we solved our problem to our satisfaction but it
>>>>>>>> turned out to be something unexpected.
>>>>>>>>
>>>>>>>> The keys were two properties mentioned in a blog post on
>>>>>>>> "configuring Riak’s oft-subtle behavioral characteristics":
>>>>>>>> http://basho.com/posts/technical/riaks-config-behaviors-part-4/
>>>>>>>>
>>>>>>>> notfound_ok= false
>>>>>>>> basic_quorum=true
>>>>>>>>
>>>>>>>> The 2nd one just makes things a little faster, but the first one is
>>>>>>>> the one whose default value of true was killing us.
>>>>>>>>
>>>>>>>> With r=1 and notfound_ok=true (default) the first node to respond,
>>>>>>>> if it didn't find the requested key, the authoritative answer was 
>>>>>>>> "this key
>>>>>>>> is not found". Not what we were expecting at all.
>>>>>>>>
>>>>>>>> With the changed settings, it will wait for a quorum of responses
>>>>>>>> and only if *no one* finds the key will "not found" be returned. 
>>>>>>>> Perfect.
>>>>>>>> (Without this setting it would wait for all responses, not ideal.)
>>>>>>>>
>>>>>>>> Now there is only one snag, which is that if the Riak node the
>>>>>>>> client connects to goes down, there will be no communication and we 
>>>>>>>> have a
>>>>>>>> problem. This is easily solvable with a load-balancer, though for
>>>>>>>> complicated reasons we actually don't need to do that right now. It's 
>>>>>>>> just
>>>>>>>> acceptable for us temporarily. Later, we'll get the load-balancer 
>>>>>>>> working
>>>>>>>> and even that won't be a problem.
>>>>>>>>
>>>>>>>> I *think* we're ok now. Thanks for your help!
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Vanessa
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 7, 2015 at 9:33 AM, Dmitri Zagidulin <
>>>>>>>> dzagidu...@basho.com> wrote:
>>>>>>>>
>>>>>>>>> Yeah, definitely find out what the sysadmin's experience was, with
>>>>>>>>> the load balancer. It could have just been a wrong configuration or
>>>>>>>>> something.
>>>>>>>>>
>>>>>>>>> And yes, that's the documentation page I recommend -
>>>>>>>>> http://docs.basho.com/riak/latest/ops/advanced/configs/load-balancing-proxy/
>>>>>>>>> Just set up HAProxy, and point your Java clients to its IP.
>>>>>>>>>
>>>>>>>>> The drawbacks to load-balancing on the java client side (yes, the
>>>>>>>>> cluster object) instead of a standalone load balancer like HAProxy, 
>>>>>>>>> are the
>>>>>>>>> following:
>>>>>>>>>
>>>>>>>>> 1) Adding node means code changes (or at very least, config file
>>>>>>>>> changes) rolled out to all your clients. Which turns out to be a 
>>>>>>>>> pretty
>>>>>>>>> serious hassle. Instead, HAProxy allows you to add or remove nodes 
>>>>>>>>> without
>>>>>>>>> changing any java code or config files.
>>>>>>>>>
>>>>>>>>> 2) Performance. We've ran many tests to compare performance, and
>>>>>>>>> client-side load balancing results in significantly lower throughput 
>>>>>>>>> than
>>>>>>>>> you'd have using haproxy (or nginx). (Specifically, you actually want 
>>>>>>>>> to
>>>>>>>>> use the 'leastconn' load balancing algorithm with HAProxy, instead of 
>>>>>>>>> round
>>>>>>>>> robin).
>>>>>>>>>
>>>>>>>>> 3) The health check on the client side (so that the java load
>>>>>>>>> balancer can tell when a remote node is down) is much less 
>>>>>>>>> intelligent than
>>>>>>>>> a dedicated load balancer would provide. With something like HAProxy, 
>>>>>>>>> you
>>>>>>>>> should be able to take down nodes with no ill effects for the client 
>>>>>>>>> code.
>>>>>>>>>
>>>>>>>>> Now, if you load balance on the client side and you take a node
>>>>>>>>> down, it's not supposed to stop working completely. (I'm not sure why 
>>>>>>>>> it's
>>>>>>>>> failing for you, we can investigate, but it'll be easier to just use 
>>>>>>>>> a load
>>>>>>>>> balancer). It should throw an error or two, but then start working 
>>>>>>>>> again
>>>>>>>>> (on the retry).
>>>>>>>>>
>>>>>>>>> Dmitri
>>>>>>>>>
>>>>>>>>> On Wed, Oct 7, 2015 at 2:45 PM, Vanessa Williams <
>>>>>>>>> vanessa.willi...@thoughtwire.ca> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Dmitri, thanks for the quick reply.
>>>>>>>>>>
>>>>>>>>>> It was actually our sysadmin who tried the load balancer approach
>>>>>>>>>> and had no success, late last evening. However I haven't discussed 
>>>>>>>>>> the gory
>>>>>>>>>> details with him yet. The failure he saw was at the application 
>>>>>>>>>> level (i.e.
>>>>>>>>>> failure to read a key), but I don't know a) how he set up the LB or 
>>>>>>>>>> b) what
>>>>>>>>>> the Java exception was, if any. I'll find that out in an hour or two 
>>>>>>>>>> and
>>>>>>>>>> report back.
>>>>>>>>>>
>>>>>>>>>> I did find this article just now:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://docs.basho.com/riak/latest/ops/advanced/configs/load-balancing-proxy/
>>>>>>>>>>
>>>>>>>>>> So I suppose we'll give those suggestions a try this morning.
>>>>>>>>>>
>>>>>>>>>> What is the drawback to having the client connect to all 4 nodes
>>>>>>>>>> (the cluster client, I assume you mean?) My understanding from 
>>>>>>>>>> reading
>>>>>>>>>> articles I've found is that one of the nodes going away causes that 
>>>>>>>>>> client
>>>>>>>>>> to fail as well. Is that what you mean, or are there other drawbacks 
>>>>>>>>>> as
>>>>>>>>>> well?
>>>>>>>>>>
>>>>>>>>>> If there's anything else you can recommend, or links other than
>>>>>>>>>> the one above you can point me to, it would be much appreciated. We 
>>>>>>>>>> expect
>>>>>>>>>> both node failure and deliberate node removal for upgrade, repair,
>>>>>>>>>> replacement, etc.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Vanessa
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 7, 2015 at 8:29 AM, Dmitri Zagidulin <
>>>>>>>>>> dzagidu...@basho.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Vanessa,
>>>>>>>>>>>
>>>>>>>>>>> Riak is definitely meant to run behind a load balancer. (Or, at
>>>>>>>>>>> the worst case, to be load-balanced on the client side. That is, all
>>>>>>>>>>> clients connect to all 4 nodes).
>>>>>>>>>>>
>>>>>>>>>>> When you say "we did try putting all 4 Riak nodes behind a
>>>>>>>>>>> load-balancer and pointing the clients at it, but it didn't help." 
>>>>>>>>>>> -- what
>>>>>>>>>>> do you mean exactly, by "it didn't help"? What happened when you 
>>>>>>>>>>> tried
>>>>>>>>>>> using the load balancer?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 7, 2015 at 1:57 PM, Vanessa Williams <
>>>>>>>>>>> vanessa.willi...@thoughtwire.ca> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi all, we are still (for a while longer) using Riak 1.4 and
>>>>>>>>>>>> the matching Java client. The client(s) connect to one node in the 
>>>>>>>>>>>> cluster
>>>>>>>>>>>> (since that's all it can do in this client version). The cluster 
>>>>>>>>>>>> itself has
>>>>>>>>>>>> 4 nodes (sorry, we can't use 5 in this scenario). There are 2 
>>>>>>>>>>>> separate
>>>>>>>>>>>> clients.
>>>>>>>>>>>>
>>>>>>>>>>>> We've tried both n_val = 3 and n_val=4. We achieve
>>>>>>>>>>>> consistency-by-writes by setting w=all. Therefore, we only require 
>>>>>>>>>>>> one
>>>>>>>>>>>> successful read (r=1).
>>>>>>>>>>>>
>>>>>>>>>>>> When all nodes are up, everything is fine. If one node fails,
>>>>>>>>>>>> the clients can no longer read any keys at all. There's an 
>>>>>>>>>>>> exception like
>>>>>>>>>>>> this:
>>>>>>>>>>>>
>>>>>>>>>>>> com.basho.riak.client.RiakRetryFailedException:
>>>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>>>>
>>>>>>>>>>>> Now, it isn't possible that Riak can't operate when one node
>>>>>>>>>>>> fails, so we're clearly missing something here.
>>>>>>>>>>>>
>>>>>>>>>>>> Note: we did try putting all 4 Riak nodes behind a
>>>>>>>>>>>> load-balancer and pointing the clients at it, but it didn't help.
>>>>>>>>>>>>
>>>>>>>>>>>> Riak is a high-availability key-value store, so... why are we
>>>>>>>>>>>> failing to achieve high-availability? Any suggestions greatly 
>>>>>>>>>>>> appreciated,
>>>>>>>>>>>> and if more info is required I'll do my best to provide it.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>> Vanessa
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Vanessa Williams
>>>>>>>>>>>> ThoughtWire Corporation
>>>>>>>>>>>> http://www.thoughtwire.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> riak-users mailing list
>>>>>>>>>>>> riak-users@lists.basho.com
>>>>>>>>>>>>
>>>>>>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> riak-users@lists.basho.com
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>
>>>>
>>>
>>
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Java Riak client can't handle a Riak node failure?

Reply via email to