Re: Java Riak client can't handle a Riak node failure?

2016-02-22 Thread Vanessa Williams
Thanks very much for the advice. I'll give it a good test and then write
something. Somewhere. Cheers.

On Mon, Feb 22, 2016 at 3:42 PM, Alex Moore  wrote:

> If the contract is "Return true iff the object existed", then the second
> fetch is superfluous + so is the async example I posted.  You can use the
> code you had as-is.
>
> Thanks,
> Alex
>
> On Mon, Feb 22, 2016 at 1:23 PM, Vanessa Williams <
> vanessa.willi...@thoughtwire.ca> wrote:
>
>> Hi Alex, would a second fetch just indicate that the object is *still*
>> deleted? Or that this delete operation succeeded? In other words, perhaps
>> what my contract really is is: return true if there was already a value
>> there. In which case would the second fetch be superfluous?
>>
>> Thanks for your help.
>>
>> Vanessa
>>
>> On Mon, Feb 22, 2016 at 11:15 AM, Alex Moore  wrote:
>>
>>> That's the correct behaviour: it should return true iff a value was
 actually deleted.
>>>
>>>
>>> Ok, if that's the case you should do another FetchValue after the
>>> deletion (to update the response.hasValues()) field, or use the async
>>> version of the delete function. I also noticed that we weren't passing the
>>> vclock to the Delete function, so I added that here as well:
>>>
>>> public boolean delete(String key) throws ExecutionException, 
>>> InterruptedException {
>>>
>>> // fetch in order to get the causal context
>>> FetchValue.Response response = fetchValue(key);
>>>
>>> if(response.isNotFound())
>>> {
>>> return ???; // what do we return if it doesn't exist?
>>> }
>>>
>>> DeleteValue deleteValue = new DeleteValue.Builder(new 
>>> Location(namespace, key))
>>>  
>>> .withVClock(response.getVectorClock())
>>>  .build();
>>>
>>> final RiakFuture deleteFuture = 
>>> client.executeAsync(deleteValue);
>>>
>>> deleteFuture.await();
>>>
>>> if(deleteFuture.isSuccess())
>>> {
>>> return true;
>>> }
>>> else
>>> {
>>> deleteFuture.cause(); // Cause of failure
>>> return false;
>>> }
>>> }
>>>
>>>
>>> Thanks,
>>> Alex
>>>
>>> On Mon, Feb 22, 2016 at 10:48 AM, Vanessa Williams <
>>> vanessa.willi...@thoughtwire.ca> wrote:
>>>
 See inline:

 On Mon, Feb 22, 2016 at 10:31 AM, Alex Moore  wrote:

> Hi Vanessa,
>
> You might have a problem with your delete function (depending on it's
> return value).
> What does the return value of the delete() function indicate?  Right
> now if an object existed, and was deleted, the function will return true,
> and will only return false if the object didn't exist or only consisted of
> tombstones.
>


 That's the correct behaviour: it should return true iff a value was
 actually deleted.


> If you never look at the object value returned by your fetchValue(key) 
> function, another potential optimization you could make is to only return 
> the HEAD / metadata:
>
> FetchValue fv = new FetchValue.Builder(new Location(new Namespace(
> "some_bucket"), key))
>
>   .withOption(FetchValue.Option.HEAD, true)
>   .build();
>
> This would be more efficient because Riak won't have to send you the
> values over the wire, if you only need the metadata.
>
>
 Thanks, I'll clean that up.


> If you do write this up somewhere, share the link! :)
>

 Will do!

 Regards,
 Vanessa


>
> Thanks,
> Alex
>
>
> On Mon, Feb 22, 2016 at 6:23 AM, Vanessa Williams <
> vanessa.willi...@thoughtwire.ca> wrote:
>
>> Hi Dmitri, this thread is old, but I read this part of your answer
>> carefully:
>>
>> You can use the following strategies to prevent stale values, in
>>> increasing order of security/preference:
>>> 1) Use timestamps (and not pass in vector clocks/causal context).
>>> This is ok if you're not editing objects, or you're ok with a bit of 
>>> risk
>>> of stale values.
>>> 2) Use causal context correctly (which means, read-before-you-write
>>> -- in fact, the Update operation in the java client does this for you, I
>>> think). And if Riak can't determine which version is correct, it will 
>>> fall
>>> back on timestamps.
>>> 3) Turn on siblings, for that bucket or bucket type.  That way, Riak
>>> will still try to use causal context to decide the right value. But if 
>>> it
>>> can't decide, it will store BOTH values, and give them back to you on 
>>> the
>>> next read, so that your application can decide which is the correct one.
>>
>>
>> I decided on strategy #2. What I am hoping for is some validation
>> that the code we use to "get", "put", and "delete" 

Re: Java Riak client can't handle a Riak node failure?

2016-02-22 Thread Alex Moore
If the contract is "Return true iff the object existed", then the second
fetch is superfluous + so is the async example I posted.  You can use the
code you had as-is.

Thanks,
Alex

On Mon, Feb 22, 2016 at 1:23 PM, Vanessa Williams <
vanessa.willi...@thoughtwire.ca> wrote:

> Hi Alex, would a second fetch just indicate that the object is *still*
> deleted? Or that this delete operation succeeded? In other words, perhaps
> what my contract really is is: return true if there was already a value
> there. In which case would the second fetch be superfluous?
>
> Thanks for your help.
>
> Vanessa
>
> On Mon, Feb 22, 2016 at 11:15 AM, Alex Moore  wrote:
>
>> That's the correct behaviour: it should return true iff a value was
>>> actually deleted.
>>
>>
>> Ok, if that's the case you should do another FetchValue after the
>> deletion (to update the response.hasValues()) field, or use the async
>> version of the delete function. I also noticed that we weren't passing the
>> vclock to the Delete function, so I added that here as well:
>>
>> public boolean delete(String key) throws ExecutionException, 
>> InterruptedException {
>>
>> // fetch in order to get the causal context
>> FetchValue.Response response = fetchValue(key);
>>
>> if(response.isNotFound())
>> {
>> return ???; // what do we return if it doesn't exist?
>> }
>>
>> DeleteValue deleteValue = new DeleteValue.Builder(new 
>> Location(namespace, key))
>>  
>> .withVClock(response.getVectorClock())
>>  .build();
>>
>> final RiakFuture deleteFuture = 
>> client.executeAsync(deleteValue);
>>
>> deleteFuture.await();
>>
>> if(deleteFuture.isSuccess())
>> {
>> return true;
>> }
>> else
>> {
>> deleteFuture.cause(); // Cause of failure
>> return false;
>> }
>> }
>>
>>
>> Thanks,
>> Alex
>>
>> On Mon, Feb 22, 2016 at 10:48 AM, Vanessa Williams <
>> vanessa.willi...@thoughtwire.ca> wrote:
>>
>>> See inline:
>>>
>>> On Mon, Feb 22, 2016 at 10:31 AM, Alex Moore  wrote:
>>>
 Hi Vanessa,

 You might have a problem with your delete function (depending on it's
 return value).
 What does the return value of the delete() function indicate?  Right
 now if an object existed, and was deleted, the function will return true,
 and will only return false if the object didn't exist or only consisted of
 tombstones.

>>>
>>>
>>> That's the correct behaviour: it should return true iff a value was
>>> actually deleted.
>>>
>>>
 If you never look at the object value returned by your fetchValue(key) 
 function, another potential optimization you could make is to only return 
 the HEAD / metadata:

 FetchValue fv = new FetchValue.Builder(new Location(new Namespace(
 "some_bucket"), key))

   .withOption(FetchValue.Option.HEAD, true)
   .build();

 This would be more efficient because Riak won't have to send you the
 values over the wire, if you only need the metadata.


>>> Thanks, I'll clean that up.
>>>
>>>
 If you do write this up somewhere, share the link! :)

>>>
>>> Will do!
>>>
>>> Regards,
>>> Vanessa
>>>
>>>

 Thanks,
 Alex


 On Mon, Feb 22, 2016 at 6:23 AM, Vanessa Williams <
 vanessa.willi...@thoughtwire.ca> wrote:

> Hi Dmitri, this thread is old, but I read this part of your answer
> carefully:
>
> You can use the following strategies to prevent stale values, in
>> increasing order of security/preference:
>> 1) Use timestamps (and not pass in vector clocks/causal context).
>> This is ok if you're not editing objects, or you're ok with a bit of risk
>> of stale values.
>> 2) Use causal context correctly (which means, read-before-you-write
>> -- in fact, the Update operation in the java client does this for you, I
>> think). And if Riak can't determine which version is correct, it will 
>> fall
>> back on timestamps.
>> 3) Turn on siblings, for that bucket or bucket type.  That way, Riak
>> will still try to use causal context to decide the right value. But if it
>> can't decide, it will store BOTH values, and give them back to you on the
>> next read, so that your application can decide which is the correct one.
>
>
> I decided on strategy #2. What I am hoping for is some validation that
> the code we use to "get", "put", and "delete" is correct in that context,
> or if it could be simplified in some cases. Not we are using delete-mode
> "immediate" and no duplicates.
>
> In their shortest possible forms, here are the three methods I'd like
> some feedback on (note, they're being used in production and haven't 
> caused
> any problems yet, however we have 

Re: Java Riak client can't handle a Riak node failure?

2016-02-22 Thread Vanessa Williams
Hi Alex, would a second fetch just indicate that the object is *still*
deleted? Or that this delete operation succeeded? In other words, perhaps
what my contract really is is: return true if there was already a value
there. In which case would the second fetch be superfluous?

Thanks for your help.

Vanessa

On Mon, Feb 22, 2016 at 11:15 AM, Alex Moore  wrote:

> That's the correct behaviour: it should return true iff a value was
>> actually deleted.
>
>
> Ok, if that's the case you should do another FetchValue after the deletion
> (to update the response.hasValues()) field, or use the async version of
> the delete function. I also noticed that we weren't passing the vclock to
> the Delete function, so I added that here as well:
>
> public boolean delete(String key) throws ExecutionException, 
> InterruptedException {
>
> // fetch in order to get the causal context
> FetchValue.Response response = fetchValue(key);
>
> if(response.isNotFound())
> {
> return ???; // what do we return if it doesn't exist?
> }
>
> DeleteValue deleteValue = new DeleteValue.Builder(new Location(namespace, 
> key))
>  
> .withVClock(response.getVectorClock())
>  .build();
>
> final RiakFuture deleteFuture = 
> client.executeAsync(deleteValue);
>
> deleteFuture.await();
>
> if(deleteFuture.isSuccess())
> {
> return true;
> }
> else
> {
> deleteFuture.cause(); // Cause of failure
> return false;
> }
> }
>
>
> Thanks,
> Alex
>
> On Mon, Feb 22, 2016 at 10:48 AM, Vanessa Williams <
> vanessa.willi...@thoughtwire.ca> wrote:
>
>> See inline:
>>
>> On Mon, Feb 22, 2016 at 10:31 AM, Alex Moore  wrote:
>>
>>> Hi Vanessa,
>>>
>>> You might have a problem with your delete function (depending on it's
>>> return value).
>>> What does the return value of the delete() function indicate?  Right now
>>> if an object existed, and was deleted, the function will return true, and
>>> will only return false if the object didn't exist or only consisted of
>>> tombstones.
>>>
>>
>>
>> That's the correct behaviour: it should return true iff a value was
>> actually deleted.
>>
>>
>>> If you never look at the object value returned by your fetchValue(key) 
>>> function, another potential optimization you could make is to only return 
>>> the HEAD / metadata:
>>>
>>> FetchValue fv = new FetchValue.Builder(new Location(new Namespace(
>>> "some_bucket"), key))
>>>
>>>   .withOption(FetchValue.Option.HEAD, true)
>>>   .build();
>>>
>>> This would be more efficient because Riak won't have to send you the
>>> values over the wire, if you only need the metadata.
>>>
>>>
>> Thanks, I'll clean that up.
>>
>>
>>> If you do write this up somewhere, share the link! :)
>>>
>>
>> Will do!
>>
>> Regards,
>> Vanessa
>>
>>
>>>
>>> Thanks,
>>> Alex
>>>
>>>
>>> On Mon, Feb 22, 2016 at 6:23 AM, Vanessa Williams <
>>> vanessa.willi...@thoughtwire.ca> wrote:
>>>
 Hi Dmitri, this thread is old, but I read this part of your answer
 carefully:

 You can use the following strategies to prevent stale values, in
> increasing order of security/preference:
> 1) Use timestamps (and not pass in vector clocks/causal context). This
> is ok if you're not editing objects, or you're ok with a bit of risk of
> stale values.
> 2) Use causal context correctly (which means, read-before-you-write --
> in fact, the Update operation in the java client does this for you, I
> think). And if Riak can't determine which version is correct, it will fall
> back on timestamps.
> 3) Turn on siblings, for that bucket or bucket type.  That way, Riak
> will still try to use causal context to decide the right value. But if it
> can't decide, it will store BOTH values, and give them back to you on the
> next read, so that your application can decide which is the correct one.


 I decided on strategy #2. What I am hoping for is some validation that
 the code we use to "get", "put", and "delete" is correct in that context,
 or if it could be simplified in some cases. Not we are using delete-mode
 "immediate" and no duplicates.

 In their shortest possible forms, here are the three methods I'd like
 some feedback on (note, they're being used in production and haven't caused
 any problems yet, however we have very few writes in production so the lack
 of problems doesn't support the conclusion that the implementation is
 correct.) Note all argument-checking, exception-handling, and logging
 removed for clarity. *I'm mostly concerned about correct use of causal
 context and response.isNotFound and response.hasValues. *Is there
 anything I could/should have left out?

 public RiakClient(String name,
 

Re: Java Riak client can't handle a Riak node failure?

2016-02-22 Thread Alex Moore
>
> That's the correct behaviour: it should return true iff a value was
> actually deleted.


Ok, if that's the case you should do another FetchValue after the deletion
(to update the response.hasValues()) field, or use the async version of the
delete function. I also noticed that we weren't passing the vclock to the
Delete function, so I added that here as well:

public boolean delete(String key) throws ExecutionException,
InterruptedException {

// fetch in order to get the causal context
FetchValue.Response response = fetchValue(key);

if(response.isNotFound())
{
return ???; // what do we return if it doesn't exist?
}

DeleteValue deleteValue = new DeleteValue.Builder(new
Location(namespace, key))

.withVClock(response.getVectorClock())
 .build();

final RiakFuture deleteFuture =
client.executeAsync(deleteValue);

deleteFuture.await();

if(deleteFuture.isSuccess())
{
return true;
}
else
{
deleteFuture.cause(); // Cause of failure
return false;
}
}


Thanks,
Alex

On Mon, Feb 22, 2016 at 10:48 AM, Vanessa Williams <
vanessa.willi...@thoughtwire.ca> wrote:

> See inline:
>
> On Mon, Feb 22, 2016 at 10:31 AM, Alex Moore  wrote:
>
>> Hi Vanessa,
>>
>> You might have a problem with your delete function (depending on it's
>> return value).
>> What does the return value of the delete() function indicate?  Right now
>> if an object existed, and was deleted, the function will return true, and
>> will only return false if the object didn't exist or only consisted of
>> tombstones.
>>
>
>
> That's the correct behaviour: it should return true iff a value was
> actually deleted.
>
>
>> If you never look at the object value returned by your fetchValue(key) 
>> function, another potential optimization you could make is to only return 
>> the HEAD / metadata:
>>
>> FetchValue fv = new FetchValue.Builder(new Location(new Namespace(
>> "some_bucket"), key))
>>
>>   .withOption(FetchValue.Option.HEAD, true)
>>   .build();
>>
>> This would be more efficient because Riak won't have to send you the
>> values over the wire, if you only need the metadata.
>>
>>
> Thanks, I'll clean that up.
>
>
>> If you do write this up somewhere, share the link! :)
>>
>
> Will do!
>
> Regards,
> Vanessa
>
>
>>
>> Thanks,
>> Alex
>>
>>
>> On Mon, Feb 22, 2016 at 6:23 AM, Vanessa Williams <
>> vanessa.willi...@thoughtwire.ca> wrote:
>>
>>> Hi Dmitri, this thread is old, but I read this part of your answer
>>> carefully:
>>>
>>> You can use the following strategies to prevent stale values, in
 increasing order of security/preference:
 1) Use timestamps (and not pass in vector clocks/causal context). This
 is ok if you're not editing objects, or you're ok with a bit of risk of
 stale values.
 2) Use causal context correctly (which means, read-before-you-write --
 in fact, the Update operation in the java client does this for you, I
 think). And if Riak can't determine which version is correct, it will fall
 back on timestamps.
 3) Turn on siblings, for that bucket or bucket type.  That way, Riak
 will still try to use causal context to decide the right value. But if it
 can't decide, it will store BOTH values, and give them back to you on the
 next read, so that your application can decide which is the correct one.
>>>
>>>
>>> I decided on strategy #2. What I am hoping for is some validation that
>>> the code we use to "get", "put", and "delete" is correct in that context,
>>> or if it could be simplified in some cases. Not we are using delete-mode
>>> "immediate" and no duplicates.
>>>
>>> In their shortest possible forms, here are the three methods I'd like
>>> some feedback on (note, they're being used in production and haven't caused
>>> any problems yet, however we have very few writes in production so the lack
>>> of problems doesn't support the conclusion that the implementation is
>>> correct.) Note all argument-checking, exception-handling, and logging
>>> removed for clarity. *I'm mostly concerned about correct use of causal
>>> context and response.isNotFound and response.hasValues. *Is there
>>> anything I could/should have left out?
>>>
>>> public RiakClient(String name, com.basho.riak.client.api.RiakClient
>>> client)
>>> {
>>> this.name = name;
>>> this.namespace = new Namespace(name);
>>> this.client = client;
>>> }
>>>
>>> public byte[] get(String key) throws ExecutionException,
>>> InterruptedException {
>>>
>>> FetchValue.Response response = fetchValue(key);
>>> if (!response.isNotFound())
>>> {
>>> RiakObject riakObject = response.getValue(RiakObject.class);
>>> return riakObject.getValue().getValue();
>>> }
>>> return null;
>>> }
>>>
>>> public void 

Re: Java Riak client can't handle a Riak node failure?

2016-02-22 Thread Vanessa Williams
See inline:

On Mon, Feb 22, 2016 at 10:31 AM, Alex Moore  wrote:

> Hi Vanessa,
>
> You might have a problem with your delete function (depending on it's
> return value).
> What does the return value of the delete() function indicate?  Right now
> if an object existed, and was deleted, the function will return true, and
> will only return false if the object didn't exist or only consisted of
> tombstones.
>


That's the correct behaviour: it should return true iff a value was
actually deleted.


> If you never look at the object value returned by your fetchValue(key) 
> function, another potential optimization you could make is to only return the 
> HEAD / metadata:
>
> FetchValue fv = new FetchValue.Builder(new Location(new Namespace(
> "some_bucket"), key))
>
>   .withOption(FetchValue.Option.HEAD, true)
>   .build();
>
> This would be more efficient because Riak won't have to send you the
> values over the wire, if you only need the metadata.
>
>
Thanks, I'll clean that up.


> If you do write this up somewhere, share the link! :)
>

Will do!

Regards,
Vanessa


>
> Thanks,
> Alex
>
>
> On Mon, Feb 22, 2016 at 6:23 AM, Vanessa Williams <
> vanessa.willi...@thoughtwire.ca> wrote:
>
>> Hi Dmitri, this thread is old, but I read this part of your answer
>> carefully:
>>
>> You can use the following strategies to prevent stale values, in
>>> increasing order of security/preference:
>>> 1) Use timestamps (and not pass in vector clocks/causal context). This
>>> is ok if you're not editing objects, or you're ok with a bit of risk of
>>> stale values.
>>> 2) Use causal context correctly (which means, read-before-you-write --
>>> in fact, the Update operation in the java client does this for you, I
>>> think). And if Riak can't determine which version is correct, it will fall
>>> back on timestamps.
>>> 3) Turn on siblings, for that bucket or bucket type.  That way, Riak
>>> will still try to use causal context to decide the right value. But if it
>>> can't decide, it will store BOTH values, and give them back to you on the
>>> next read, so that your application can decide which is the correct one.
>>
>>
>> I decided on strategy #2. What I am hoping for is some validation that
>> the code we use to "get", "put", and "delete" is correct in that context,
>> or if it could be simplified in some cases. Not we are using delete-mode
>> "immediate" and no duplicates.
>>
>> In their shortest possible forms, here are the three methods I'd like
>> some feedback on (note, they're being used in production and haven't caused
>> any problems yet, however we have very few writes in production so the lack
>> of problems doesn't support the conclusion that the implementation is
>> correct.) Note all argument-checking, exception-handling, and logging
>> removed for clarity. *I'm mostly concerned about correct use of causal
>> context and response.isNotFound and response.hasValues. *Is there
>> anything I could/should have left out?
>>
>> public RiakClient(String name, com.basho.riak.client.api.RiakClient
>> client)
>> {
>> this.name = name;
>> this.namespace = new Namespace(name);
>> this.client = client;
>> }
>>
>> public byte[] get(String key) throws ExecutionException,
>> InterruptedException {
>>
>> FetchValue.Response response = fetchValue(key);
>> if (!response.isNotFound())
>> {
>> RiakObject riakObject = response.getValue(RiakObject.class);
>> return riakObject.getValue().getValue();
>> }
>> return null;
>> }
>>
>> public void put(String key, byte[] value) throws ExecutionException,
>> InterruptedException {
>>
>> // fetch in order to get the causal context
>> FetchValue.Response response = fetchValue(key);
>> RiakObject storeObject = new
>>
>> RiakObject().setValue(BinaryValue.create(value)).setContentType("binary/octet-stream");
>> StoreValue.Builder builder =
>> new StoreValue.Builder(storeObject).withLocation(new
>> Location(namespace, key));
>> if (response.getVectorClock() != null) {
>> builder = builder.withVectorClock(response.getVectorClock());
>> }
>> StoreValue storeValue = builder.build();
>> client.execute(storeValue);
>> }
>>
>> public boolean delete(String key) throws ExecutionException,
>> InterruptedException {
>>
>> // fetch in order to get the causal context
>> FetchValue.Response response = fetchValue(key);
>> if (!response.isNotFound())
>> {
>> DeleteValue deleteValue = new DeleteValue.Builder(new
>> Location(namespace, key)).build();
>> client.execute(deleteValue);
>> }
>> return !response.isNotFound() || !response.hasValues();
>> }
>>
>>
>> Any comments much appreciated. I want to provide a minimally correct
>> example of simple client code 

Re: Java Riak client can't handle a Riak node failure?

2016-02-22 Thread Alex Moore
Hi Vanessa,

You might have a problem with your delete function (depending on it's
return value).
What does the return value of the delete() function indicate?  Right now if
an object existed, and was deleted, the function will return true, and will
only return false if the object didn't exist or only consisted of
tombstones.

If you never look at the object value returned by your fetchValue(key)
function, another potential optimization you could make is to only
return the HEAD / metadata:

FetchValue fv = new FetchValue.Builder(new Location(new Namespace(
"some_bucket"), key))

  .withOption(FetchValue.Option.HEAD, true)
  .build();

This would be more efficient because Riak won't have to send you the values
over the wire, if you only need the metadata.

If you do write this up somewhere, share the link! :)

Thanks,
Alex


On Mon, Feb 22, 2016 at 6:23 AM, Vanessa Williams <
vanessa.willi...@thoughtwire.ca> wrote:

> Hi Dmitri, this thread is old, but I read this part of your answer
> carefully:
>
> You can use the following strategies to prevent stale values, in
>> increasing order of security/preference:
>> 1) Use timestamps (and not pass in vector clocks/causal context). This is
>> ok if you're not editing objects, or you're ok with a bit of risk of stale
>> values.
>> 2) Use causal context correctly (which means, read-before-you-write -- in
>> fact, the Update operation in the java client does this for you, I think).
>> And if Riak can't determine which version is correct, it will fall back on
>> timestamps.
>> 3) Turn on siblings, for that bucket or bucket type.  That way, Riak will
>> still try to use causal context to decide the right value. But if it can't
>> decide, it will store BOTH values, and give them back to you on the next
>> read, so that your application can decide which is the correct one.
>
>
> I decided on strategy #2. What I am hoping for is some validation that the
> code we use to "get", "put", and "delete" is correct in that context, or if
> it could be simplified in some cases. Not we are using delete-mode
> "immediate" and no duplicates.
>
> In their shortest possible forms, here are the three methods I'd like some
> feedback on (note, they're being used in production and haven't caused any
> problems yet, however we have very few writes in production so the lack of
> problems doesn't support the conclusion that the implementation is
> correct.) Note all argument-checking, exception-handling, and logging
> removed for clarity. *I'm mostly concerned about correct use of causal
> context and response.isNotFound and response.hasValues. *Is there
> anything I could/should have left out?
>
> public RiakClient(String name, com.basho.riak.client.api.RiakClient
> client)
> {
> this.name = name;
> this.namespace = new Namespace(name);
> this.client = client;
> }
>
> public byte[] get(String key) throws ExecutionException,
> InterruptedException {
>
> FetchValue.Response response = fetchValue(key);
> if (!response.isNotFound())
> {
> RiakObject riakObject = response.getValue(RiakObject.class);
> return riakObject.getValue().getValue();
> }
> return null;
> }
>
> public void put(String key, byte[] value) throws ExecutionException,
> InterruptedException {
>
> // fetch in order to get the causal context
> FetchValue.Response response = fetchValue(key);
> RiakObject storeObject = new
>
> RiakObject().setValue(BinaryValue.create(value)).setContentType("binary/octet-stream");
> StoreValue.Builder builder =
> new StoreValue.Builder(storeObject).withLocation(new
> Location(namespace, key));
> if (response.getVectorClock() != null) {
> builder = builder.withVectorClock(response.getVectorClock());
> }
> StoreValue storeValue = builder.build();
> client.execute(storeValue);
> }
>
> public boolean delete(String key) throws ExecutionException,
> InterruptedException {
>
> // fetch in order to get the causal context
> FetchValue.Response response = fetchValue(key);
> if (!response.isNotFound())
> {
> DeleteValue deleteValue = new DeleteValue.Builder(new
> Location(namespace, key)).build();
> client.execute(deleteValue);
> }
> return !response.isNotFound() || !response.hasValues();
> }
>
>
> Any comments much appreciated. I want to provide a minimally correct
> example of simple client code somewhere (GitHub, blog post, something...)
> so I don't want to post this without review.
>
> Thanks,
> Vanessa
>
> ThoughtWire Corporation
> http://www.thoughtwire.com
>
>
>
>
> On Thu, Oct 8, 2015 at 8:45 AM, Dmitri Zagidulin 
> wrote:
>
>> Hi Vanessa,
>>
>> The thing to keep in mind about read repair is -- it happens
>> asynchronously on every GET, but 

Re: Java Riak client can't handle a Riak node failure?

2016-02-22 Thread Vanessa Williams
Hi Dmitri, this thread is old, but I read this part of your answer
carefully:

You can use the following strategies to prevent stale values, in increasing
> order of security/preference:
> 1) Use timestamps (and not pass in vector clocks/causal context). This is
> ok if you're not editing objects, or you're ok with a bit of risk of stale
> values.
> 2) Use causal context correctly (which means, read-before-you-write -- in
> fact, the Update operation in the java client does this for you, I think).
> And if Riak can't determine which version is correct, it will fall back on
> timestamps.
> 3) Turn on siblings, for that bucket or bucket type.  That way, Riak will
> still try to use causal context to decide the right value. But if it can't
> decide, it will store BOTH values, and give them back to you on the next
> read, so that your application can decide which is the correct one.


I decided on strategy #2. What I am hoping for is some validation that the
code we use to "get", "put", and "delete" is correct in that context, or if
it could be simplified in some cases. Not we are using delete-mode
"immediate" and no duplicates.

In their shortest possible forms, here are the three methods I'd like some
feedback on (note, they're being used in production and haven't caused any
problems yet, however we have very few writes in production so the lack of
problems doesn't support the conclusion that the implementation is
correct.) Note all argument-checking, exception-handling, and logging
removed for clarity. *I'm mostly concerned about correct use of causal
context and response.isNotFound and response.hasValues. *Is there anything
I could/should have left out?

public RiakClient(String name, com.basho.riak.client.api.RiakClient
client)
{
this.name = name;
this.namespace = new Namespace(name);
this.client = client;
}

public byte[] get(String key) throws ExecutionException,
InterruptedException {

FetchValue.Response response = fetchValue(key);
if (!response.isNotFound())
{
RiakObject riakObject = response.getValue(RiakObject.class);
return riakObject.getValue().getValue();
}
return null;
}

public void put(String key, byte[] value) throws ExecutionException,
InterruptedException {

// fetch in order to get the causal context
FetchValue.Response response = fetchValue(key);
RiakObject storeObject = new

RiakObject().setValue(BinaryValue.create(value)).setContentType("binary/octet-stream");
StoreValue.Builder builder =
new StoreValue.Builder(storeObject).withLocation(new
Location(namespace, key));
if (response.getVectorClock() != null) {
builder = builder.withVectorClock(response.getVectorClock());
}
StoreValue storeValue = builder.build();
client.execute(storeValue);
}

public boolean delete(String key) throws ExecutionException,
InterruptedException {

// fetch in order to get the causal context
FetchValue.Response response = fetchValue(key);
if (!response.isNotFound())
{
DeleteValue deleteValue = new DeleteValue.Builder(new
Location(namespace, key)).build();
client.execute(deleteValue);
}
return !response.isNotFound() || !response.hasValues();
}


Any comments much appreciated. I want to provide a minimally correct
example of simple client code somewhere (GitHub, blog post, something...)
so I don't want to post this without review.

Thanks,
Vanessa

ThoughtWire Corporation
http://www.thoughtwire.com




On Thu, Oct 8, 2015 at 8:45 AM, Dmitri Zagidulin 
wrote:

> Hi Vanessa,
>
> The thing to keep in mind about read repair is -- it happens
> asynchronously on every GET, but /after/ the results are returned to the
> client.
>
> So, when you issue a GET with r=1, the coordinating node only waits for 1
> of the replicas before responding to the client with a success, and only
> afterwards triggers read-repair.
>
> It's true that with notfound_ok=false, it'll wait for the first
> non-missing replica before responding. But if you edit or update your
> objects at all, an R=1 still gives you a risk of stale values being
> returned.
>
> For example, say you write an object with value A.  And let's say your 3
> replicas now look like this:
>
> replica 1: A,  replica 2: A, replica 3: notfound/missing
>
> A read with an R=1 and notfound_ok=false is just fine, here. (Chances are,
> the notfound replica will arrive first, but the notfound_ok setting will
> force the coordinator to wait for the first non-empty value, A, and return
> it to the client. And then trigger read-repair).
>
> But what happens if you edit that same object, and give it a new value,
> B?  So, now, there's a chance that your replicas will look like this:
>
> replica 1: A, replica 2: B, replica 3: B.
>
> So now if you do a read with an R=1, there's a chance 

Re: Java Riak client can't handle a Riak node failure?

2015-10-13 Thread Vanessa Williams
Alexander, thanks for that reminder. Yes, n_val = 2 would suit us better.
I'll look into getting that tested.

Regards,
Vanessa

On Thu, Oct 8, 2015 at 1:04 PM, Alexander Sicular 
wrote:

> Greetings and salutations, Vanessa.
>
> I am obliged to point out that running n_val of 3 (or more) is highly
> detrimental in a cluster size smaller than 5 nodes. Needless to say, it is
> not recommended. Let's talk about why that is the case for a moment.
>
> The default level of abstraction in Riak is the virtual node, or vnode.
> The vnode represents a segment of the ring (ring_size, configurable in
> steps of power of 2, default 64, min 8, max 1024.) The "ring" is the number
> line 0 - 2^160 which represents the output of the SHA hash.
>
> Riak achieves high availability through replication. Riak replicates data
> to vnodes. A hypothetical replica set may be, for example, set[vnode 1,
> vnode 10, vnode 20]. Note, I said vnodes. Not physical nodes. And therein
> lies the concern. Considering a default ring size of 64 and a default
> replica count of 3, the minimum recommended production deployment of a Riak
> cluster should be 5 due to the fact that in that circumstance every replica
> set combination is guaranteed to have each vnode on a distinct physical
> node. Anything less than that will certainly have some combinations of
> replica sets which have two of its copies on the same physical host. Note I
> said some combinations. Some fraction of node replica set combinations will
> have two of their copies allocated to one physical machine.
>
> You can see where I'm going with this. Firstly, performance will be
> negatively impacted when writing more than one copy of data to the same
> physical hardware, aka disk. But more importantly, you are negating Riak's
> high availability mechanic. If you lost any given physical node you would
> lose access to two copies of the set of data which had two replicas on that
> node.
>
> Riak is designed to withstand loss of any two physical nodes while
> maintaining access to 100% of your corpus assuming the fact that you are
> running the default settings and have deployed 5 nodes.
>
> Here is the rule of thumb that I recommend (me personally, not Basho) to
> folks looking to deploy clusters with less than 5 nodes:
>
> 1,2 nodes: n_val 1
> 3,4 nodes: n_val 2
> 5+ nodes: n_val 3
>
> In summary, please consider reconfiguring your production deployment.
>
> Sincerely,
> Alexander
>
> @siculars
> http://siculars.posthaven.com
>
> Sent from my iRotaryPhone
>
> On Oct 7, 2015, at 19:56, Vanessa Williams <
> vanessa.willi...@thoughtwire.ca> wrote:
>
> Hi Dmitri, what would be the benefit of r=2, exactly? It isn't necessary
> to trigger read-repair, is it? If it's important I'd rather try it sooner
> than later...
>
> Regards,
> Vanessa
>
>
>
> On Wed, Oct 7, 2015 at 4:02 PM, Dmitri Zagidulin 
> wrote:
>
>> Glad you sorted it out!
>>
>> (I do want to encourage you to bump your R setting to at least 2, though.
>> Run some tests -- I think you'll find that the difference in speed will not
>> be noticeable, but you do get a lot more data resilience with 2.)
>>
>> On Wed, Oct 7, 2015 at 6:24 PM, Vanessa Williams <
>> vanessa.willi...@thoughtwire.ca> wrote:
>>
>>> Hi Dmitri, well...we solved our problem to our satisfaction but it
>>> turned out to be something unexpected.
>>>
>>> The keys were two properties mentioned in a blog post on "configuring
>>> Riak’s oft-subtle behavioral characteristics":
>>> http://basho.com/posts/technical/riaks-config-behaviors-part-4/
>>>
>>> notfound_ok= false
>>> basic_quorum=true
>>>
>>> The 2nd one just makes things a little faster, but the first one is the
>>> one whose default value of true was killing us.
>>>
>>> With r=1 and notfound_ok=true (default) the first node to respond, if it
>>> didn't find the requested key, the authoritative answer was "this key is
>>> not found". Not what we were expecting at all.
>>>
>>> With the changed settings, it will wait for a quorum of responses and
>>> only if *no one* finds the key will "not found" be returned. Perfect.
>>> (Without this setting it would wait for all responses, not ideal.)
>>>
>>> Now there is only one snag, which is that if the Riak node the client
>>> connects to goes down, there will be no communication and we have a
>>> problem. This is easily solvable with a load-balancer, though for
>>> complicated reasons we actually don't need to do that right now. It's just
>>> acceptable for us temporarily. Later, we'll get the load-balancer working
>>> and even that won't be a problem.
>>>
>>> I *think* we're ok now. Thanks for your help!
>>>
>>> Regards,
>>> Vanessa
>>>
>>>
>>>
>>> On Wed, Oct 7, 2015 at 9:33 AM, Dmitri Zagidulin 
>>> wrote:
>>>
 Yeah, definitely find out what the sysadmin's experience was, with the
 load balancer. It could have just been a wrong configuration or something.

 And yes, that's the 

Re: Java Riak client can't handle a Riak node failure?

2015-10-13 Thread Vanessa Williams
Hi Dmitri, your point about r=2 is noted. I'll probably go with that. The
thing I have to decide is how to reconcile duplicates. For the time being I
can tolerate some stale data/inconsistency (caused by r=1). But for the
future I would prefer not to risk that.

Thanks to everyone for their help with the gory details. I'll be upgrading
from Riak 1.4.10 to Riak 2.1.2 (or whatever the latest is) shortly and I'll
take all of these points into consideration at that time.

Best regards,
Vanessa


Vanessa Williams
ThoughtWire Corporation
http://www.thoughtwire.com

On Thu, Oct 8, 2015 at 8:45 AM, Dmitri Zagidulin 
wrote:

> Hi Vanessa,
>
> The thing to keep in mind about read repair is -- it happens
> asynchronously on every GET, but /after/ the results are returned to the
> client.
>
> So, when you issue a GET with r=1, the coordinating node only waits for 1
> of the replicas before responding to the client with a success, and only
> afterwards triggers read-repair.
>
> It's true that with notfound_ok=false, it'll wait for the first
> non-missing replica before responding. But if you edit or update your
> objects at all, an R=1 still gives you a risk of stale values being
> returned.
>
> For example, say you write an object with value A.  And let's say your 3
> replicas now look like this:
>
> replica 1: A,  replica 2: A, replica 3: notfound/missing
>
> A read with an R=1 and notfound_ok=false is just fine, here. (Chances are,
> the notfound replica will arrive first, but the notfound_ok setting will
> force the coordinator to wait for the first non-empty value, A, and return
> it to the client. And then trigger read-repair).
>
> But what happens if you edit that same object, and give it a new value,
> B?  So, now, there's a chance that your replicas will look like this:
>
> replica 1: A, replica 2: B, replica 3: B.
>
> So now if you do a read with an R=1, there's a chance that replica 1, with
> the old value of A, will arrive first, and that's the response that will be
> returned to the client.
>
> Whereas, using R=2 eliminates that risk -- well, at least decreases it.
> You still have the issue of -- how does Riak decide whether A or B is the
> correct value? Are you using causal context/vclocks correctly? (That is,
> reading the object before you update, to get the correct causal context?)
> Or are you relying on timestamps? (This is an ok strategy, provided that
> the edits are sufficiently far apart in time, and you don't have many
> concurrent edits, AND you're ok with the small risk of occasionally the
> timestamp being wrong). You can use the following strategies to prevent
> stale values, in increasing order of security/preference:
>
> 1) Use timestamps (and not pass in vector clocks/causal context). This is
> ok if you're not editing objects, or you're ok with a bit of risk of stale
> values.
>
> 2) Use causal context correctly (which means, read-before-you-write -- in
> fact, the Update operation in the java client does this for you, I think).
> And if Riak can't determine which version is correct, it will fall back on
> timestamps.
>
> 3) Turn on siblings, for that bucket or bucket type.  That way, Riak will
> still try to use causal context to decide the right value. But if it can't
> decide, it will store BOTH values, and give them back to you on the next
> read, so that your application can decide which is the correct one.
>
>
>
>
>
>
>
> On Thu, Oct 8, 2015 at 1:56 AM, Vanessa Williams <
> vanessa.willi...@thoughtwire.ca> wrote:
>
>> Hi Dmitri, what would be the benefit of r=2, exactly? It isn't necessary
>> to trigger read-repair, is it? If it's important I'd rather try it sooner
>> than later...
>>
>> Regards,
>> Vanessa
>>
>>
>>
>> On Wed, Oct 7, 2015 at 4:02 PM, Dmitri Zagidulin 
>> wrote:
>>
>>> Glad you sorted it out!
>>>
>>> (I do want to encourage you to bump your R setting to at least 2,
>>> though. Run some tests -- I think you'll find that the difference in speed
>>> will not be noticeable, but you do get a lot more data resilience with 2.)
>>>
>>> On Wed, Oct 7, 2015 at 6:24 PM, Vanessa Williams <
>>> vanessa.willi...@thoughtwire.ca> wrote:
>>>
 Hi Dmitri, well...we solved our problem to our satisfaction but it
 turned out to be something unexpected.

 The keys were two properties mentioned in a blog post on "configuring
 Riak’s oft-subtle behavioral characteristics":
 http://basho.com/posts/technical/riaks-config-behaviors-part-4/

 notfound_ok= false
 basic_quorum=true

 The 2nd one just makes things a little faster, but the first one is the
 one whose default value of true was killing us.

 With r=1 and notfound_ok=true (default) the first node to respond, if
 it didn't find the requested key, the authoritative answer was "this key is
 not found". Not what we were expecting at all.

 With the changed settings, it will wait for a quorum of responses and
 only if 

Re: Java Riak client can't handle a Riak node failure?

2015-10-08 Thread Dmitri Zagidulin
Hi Vanessa,

The thing to keep in mind about read repair is -- it happens asynchronously
on every GET, but /after/ the results are returned to the client.

So, when you issue a GET with r=1, the coordinating node only waits for 1
of the replicas before responding to the client with a success, and only
afterwards triggers read-repair.

It's true that with notfound_ok=false, it'll wait for the first non-missing
replica before responding. But if you edit or update your objects at all,
an R=1 still gives you a risk of stale values being returned.

For example, say you write an object with value A.  And let's say your 3
replicas now look like this:

replica 1: A,  replica 2: A, replica 3: notfound/missing

A read with an R=1 and notfound_ok=false is just fine, here. (Chances are,
the notfound replica will arrive first, but the notfound_ok setting will
force the coordinator to wait for the first non-empty value, A, and return
it to the client. And then trigger read-repair).

But what happens if you edit that same object, and give it a new value, B?
So, now, there's a chance that your replicas will look like this:

replica 1: A, replica 2: B, replica 3: B.

So now if you do a read with an R=1, there's a chance that replica 1, with
the old value of A, will arrive first, and that's the response that will be
returned to the client.

Whereas, using R=2 eliminates that risk -- well, at least decreases it. You
still have the issue of -- how does Riak decide whether A or B is the
correct value? Are you using causal context/vclocks correctly? (That is,
reading the object before you update, to get the correct causal context?)
Or are you relying on timestamps? (This is an ok strategy, provided that
the edits are sufficiently far apart in time, and you don't have many
concurrent edits, AND you're ok with the small risk of occasionally the
timestamp being wrong). You can use the following strategies to prevent
stale values, in increasing order of security/preference:

1) Use timestamps (and not pass in vector clocks/causal context). This is
ok if you're not editing objects, or you're ok with a bit of risk of stale
values.

2) Use causal context correctly (which means, read-before-you-write -- in
fact, the Update operation in the java client does this for you, I think).
And if Riak can't determine which version is correct, it will fall back on
timestamps.

3) Turn on siblings, for that bucket or bucket type.  That way, Riak will
still try to use causal context to decide the right value. But if it can't
decide, it will store BOTH values, and give them back to you on the next
read, so that your application can decide which is the correct one.







On Thu, Oct 8, 2015 at 1:56 AM, Vanessa Williams <
vanessa.willi...@thoughtwire.ca> wrote:

> Hi Dmitri, what would be the benefit of r=2, exactly? It isn't necessary
> to trigger read-repair, is it? If it's important I'd rather try it sooner
> than later...
>
> Regards,
> Vanessa
>
>
>
> On Wed, Oct 7, 2015 at 4:02 PM, Dmitri Zagidulin 
> wrote:
>
>> Glad you sorted it out!
>>
>> (I do want to encourage you to bump your R setting to at least 2, though.
>> Run some tests -- I think you'll find that the difference in speed will not
>> be noticeable, but you do get a lot more data resilience with 2.)
>>
>> On Wed, Oct 7, 2015 at 6:24 PM, Vanessa Williams <
>> vanessa.willi...@thoughtwire.ca> wrote:
>>
>>> Hi Dmitri, well...we solved our problem to our satisfaction but it
>>> turned out to be something unexpected.
>>>
>>> The keys were two properties mentioned in a blog post on "configuring
>>> Riak’s oft-subtle behavioral characteristics":
>>> http://basho.com/posts/technical/riaks-config-behaviors-part-4/
>>>
>>> notfound_ok= false
>>> basic_quorum=true
>>>
>>> The 2nd one just makes things a little faster, but the first one is the
>>> one whose default value of true was killing us.
>>>
>>> With r=1 and notfound_ok=true (default) the first node to respond, if it
>>> didn't find the requested key, the authoritative answer was "this key is
>>> not found". Not what we were expecting at all.
>>>
>>> With the changed settings, it will wait for a quorum of responses and
>>> only if *no one* finds the key will "not found" be returned. Perfect.
>>> (Without this setting it would wait for all responses, not ideal.)
>>>
>>> Now there is only one snag, which is that if the Riak node the client
>>> connects to goes down, there will be no communication and we have a
>>> problem. This is easily solvable with a load-balancer, though for
>>> complicated reasons we actually don't need to do that right now. It's just
>>> acceptable for us temporarily. Later, we'll get the load-balancer working
>>> and even that won't be a problem.
>>>
>>> I *think* we're ok now. Thanks for your help!
>>>
>>> Regards,
>>> Vanessa
>>>
>>>
>>>
>>> On Wed, Oct 7, 2015 at 9:33 AM, Dmitri Zagidulin 
>>> wrote:
>>>
 Yeah, definitely find out what the sysadmin's experience was, 

Re: Java Riak client can't handle a Riak node failure?

2015-10-08 Thread Alexander Sicular
Greetings and salutations, Vanessa.

I am obliged to point out that running n_val of 3 (or more) is highly
detrimental in a cluster size smaller than 5 nodes. Needless to say, it is
not recommended. Let's talk about why that is the case for a moment.

The default level of abstraction in Riak is the virtual node, or vnode. The
vnode represents a segment of the ring (ring_size, configurable in steps of
power of 2, default 64, min 8, max 1024.) The "ring" is the number line 0 -
2^160 which represents the output of the SHA hash.

Riak achieves high availability through replication. Riak replicates data
to vnodes. A hypothetical replica set may be, for example, set[vnode 1,
vnode 10, vnode 20]. Note, I said vnodes. Not physical nodes. And therein
lies the concern. Considering a default ring size of 64 and a default
replica count of 3, the minimum recommended production deployment of a Riak
cluster should be 5 due to the fact that in that circumstance every replica
set combination is guaranteed to have each vnode on a distinct physical
node. Anything less than that will certainly have some combinations of
replica sets which have two of its copies on the same physical host. Note I
said some combinations. Some fraction of node replica set combinations will
have two of their copies allocated to one physical machine.

You can see where I'm going with this. Firstly, performance will be
negatively impacted when writing more than one copy of data to the same
physical hardware, aka disk. But more importantly, you are negating Riak's
high availability mechanic. If you lost any given physical node you would
lose access to two copies of the set of data which had two replicas on that
node.

Riak is designed to withstand loss of any two physical nodes while
maintaining access to 100% of your corpus assuming the fact that you are
running the default settings and have deployed 5 nodes.

Here is the rule of thumb that I recommend (me personally, not Basho) to
folks looking to deploy clusters with less than 5 nodes:

1,2 nodes: n_val 1
3,4 nodes: n_val 2
5+ nodes: n_val 3

In summary, please consider reconfiguring your production deployment.

Sincerely,
Alexander

@siculars
http://siculars.posthaven.com

Sent from my iRotaryPhone

On Oct 7, 2015, at 19:56, Vanessa Williams 
wrote:

Hi Dmitri, what would be the benefit of r=2, exactly? It isn't necessary to
trigger read-repair, is it? If it's important I'd rather try it sooner than
later...

Regards,
Vanessa



On Wed, Oct 7, 2015 at 4:02 PM, Dmitri Zagidulin 
wrote:

> Glad you sorted it out!
>
> (I do want to encourage you to bump your R setting to at least 2, though.
> Run some tests -- I think you'll find that the difference in speed will not
> be noticeable, but you do get a lot more data resilience with 2.)
>
> On Wed, Oct 7, 2015 at 6:24 PM, Vanessa Williams <
> vanessa.willi...@thoughtwire.ca> wrote:
>
>> Hi Dmitri, well...we solved our problem to our satisfaction but it turned
>> out to be something unexpected.
>>
>> The keys were two properties mentioned in a blog post on "configuring
>> Riak’s oft-subtle behavioral characteristics":
>> http://basho.com/posts/technical/riaks-config-behaviors-part-4/
>>
>> notfound_ok= false
>> basic_quorum=true
>>
>> The 2nd one just makes things a little faster, but the first one is the
>> one whose default value of true was killing us.
>>
>> With r=1 and notfound_ok=true (default) the first node to respond, if it
>> didn't find the requested key, the authoritative answer was "this key is
>> not found". Not what we were expecting at all.
>>
>> With the changed settings, it will wait for a quorum of responses and
>> only if *no one* finds the key will "not found" be returned. Perfect.
>> (Without this setting it would wait for all responses, not ideal.)
>>
>> Now there is only one snag, which is that if the Riak node the client
>> connects to goes down, there will be no communication and we have a
>> problem. This is easily solvable with a load-balancer, though for
>> complicated reasons we actually don't need to do that right now. It's just
>> acceptable for us temporarily. Later, we'll get the load-balancer working
>> and even that won't be a problem.
>>
>> I *think* we're ok now. Thanks for your help!
>>
>> Regards,
>> Vanessa
>>
>>
>>
>> On Wed, Oct 7, 2015 at 9:33 AM, Dmitri Zagidulin 
>> wrote:
>>
>>> Yeah, definitely find out what the sysadmin's experience was, with the
>>> load balancer. It could have just been a wrong configuration or something.
>>>
>>> And yes, that's the documentation page I recommend -
>>> http://docs.basho.com/riak/latest/ops/advanced/configs/load-balancing-proxy/
>>> Just set up HAProxy, and point your Java clients to its IP.
>>>
>>> The drawbacks to load-balancing on the java client side (yes, the
>>> cluster object) instead of a standalone load balancer like HAProxy, are the
>>> following:
>>>
>>> 1) Adding node means code 

Re: Java Riak client can't handle a Riak node failure?

2015-10-07 Thread Dmitri Zagidulin
Glad you sorted it out!

(I do want to encourage you to bump your R setting to at least 2, though.
Run some tests -- I think you'll find that the difference in speed will not
be noticeable, but you do get a lot more data resilience with 2.)

On Wed, Oct 7, 2015 at 6:24 PM, Vanessa Williams <
vanessa.willi...@thoughtwire.ca> wrote:

> Hi Dmitri, well...we solved our problem to our satisfaction but it turned
> out to be something unexpected.
>
> The keys were two properties mentioned in a blog post on "configuring
> Riak’s oft-subtle behavioral characteristics":
> http://basho.com/posts/technical/riaks-config-behaviors-part-4/
>
> notfound_ok= false
> basic_quorum=true
>
> The 2nd one just makes things a little faster, but the first one is the
> one whose default value of true was killing us.
>
> With r=1 and notfound_ok=true (default) the first node to respond, if it
> didn't find the requested key, the authoritative answer was "this key is
> not found". Not what we were expecting at all.
>
> With the changed settings, it will wait for a quorum of responses and only
> if *no one* finds the key will "not found" be returned. Perfect. (Without
> this setting it would wait for all responses, not ideal.)
>
> Now there is only one snag, which is that if the Riak node the client
> connects to goes down, there will be no communication and we have a
> problem. This is easily solvable with a load-balancer, though for
> complicated reasons we actually don't need to do that right now. It's just
> acceptable for us temporarily. Later, we'll get the load-balancer working
> and even that won't be a problem.
>
> I *think* we're ok now. Thanks for your help!
>
> Regards,
> Vanessa
>
>
>
> On Wed, Oct 7, 2015 at 9:33 AM, Dmitri Zagidulin 
> wrote:
>
>> Yeah, definitely find out what the sysadmin's experience was, with the
>> load balancer. It could have just been a wrong configuration or something.
>>
>> And yes, that's the documentation page I recommend -
>> http://docs.basho.com/riak/latest/ops/advanced/configs/load-balancing-proxy/
>> Just set up HAProxy, and point your Java clients to its IP.
>>
>> The drawbacks to load-balancing on the java client side (yes, the cluster
>> object) instead of a standalone load balancer like HAProxy, are the
>> following:
>>
>> 1) Adding node means code changes (or at very least, config file changes)
>> rolled out to all your clients. Which turns out to be a pretty serious
>> hassle. Instead, HAProxy allows you to add or remove nodes without changing
>> any java code or config files.
>>
>> 2) Performance. We've ran many tests to compare performance, and
>> client-side load balancing results in significantly lower throughput than
>> you'd have using haproxy (or nginx). (Specifically, you actually want to
>> use the 'leastconn' load balancing algorithm with HAProxy, instead of round
>> robin).
>>
>> 3) The health check on the client side (so that the java load balancer
>> can tell when a remote node is down) is much less intelligent than a
>> dedicated load balancer would provide. With something like HAProxy, you
>> should be able to take down nodes with no ill effects for the client code.
>>
>> Now, if you load balance on the client side and you take a node down,
>> it's not supposed to stop working completely. (I'm not sure why it's
>> failing for you, we can investigate, but it'll be easier to just use a load
>> balancer). It should throw an error or two, but then start working again
>> (on the retry).
>>
>> Dmitri
>>
>> On Wed, Oct 7, 2015 at 2:45 PM, Vanessa Williams <
>> vanessa.willi...@thoughtwire.ca> wrote:
>>
>>> Hi Dmitri, thanks for the quick reply.
>>>
>>> It was actually our sysadmin who tried the load balancer approach and
>>> had no success, late last evening. However I haven't discussed the gory
>>> details with him yet. The failure he saw was at the application level (i.e.
>>> failure to read a key), but I don't know a) how he set up the LB or b) what
>>> the Java exception was, if any. I'll find that out in an hour or two and
>>> report back.
>>>
>>> I did find this article just now:
>>>
>>>
>>> http://docs.basho.com/riak/latest/ops/advanced/configs/load-balancing-proxy/
>>>
>>> So I suppose we'll give those suggestions a try this morning.
>>>
>>> What is the drawback to having the client connect to all 4 nodes (the
>>> cluster client, I assume you mean?) My understanding from reading articles
>>> I've found is that one of the nodes going away causes that client to fail
>>> as well. Is that what you mean, or are there other drawbacks as well?
>>>
>>> If there's anything else you can recommend, or links other than the one
>>> above you can point me to, it would be much appreciated. We expect both
>>> node failure and deliberate node removal for upgrade, repair, replacement,
>>> etc.
>>>
>>> Regards,
>>> Vanessa
>>>
>>> On Wed, Oct 7, 2015 at 8:29 AM, Dmitri Zagidulin 
>>> wrote:
>>>
 Hi Vanessa,

 Riak is definitely 

Re: Java Riak client can't handle a Riak node failure?

2015-10-07 Thread Vanessa Williams
Hi Dmitri, what would be the benefit of r=2, exactly? It isn't necessary to
trigger read-repair, is it? If it's important I'd rather try it sooner than
later...

Regards,
Vanessa



On Wed, Oct 7, 2015 at 4:02 PM, Dmitri Zagidulin 
wrote:

> Glad you sorted it out!
>
> (I do want to encourage you to bump your R setting to at least 2, though.
> Run some tests -- I think you'll find that the difference in speed will not
> be noticeable, but you do get a lot more data resilience with 2.)
>
> On Wed, Oct 7, 2015 at 6:24 PM, Vanessa Williams <
> vanessa.willi...@thoughtwire.ca> wrote:
>
>> Hi Dmitri, well...we solved our problem to our satisfaction but it turned
>> out to be something unexpected.
>>
>> The keys were two properties mentioned in a blog post on "configuring
>> Riak’s oft-subtle behavioral characteristics":
>> http://basho.com/posts/technical/riaks-config-behaviors-part-4/
>>
>> notfound_ok= false
>> basic_quorum=true
>>
>> The 2nd one just makes things a little faster, but the first one is the
>> one whose default value of true was killing us.
>>
>> With r=1 and notfound_ok=true (default) the first node to respond, if it
>> didn't find the requested key, the authoritative answer was "this key is
>> not found". Not what we were expecting at all.
>>
>> With the changed settings, it will wait for a quorum of responses and
>> only if *no one* finds the key will "not found" be returned. Perfect.
>> (Without this setting it would wait for all responses, not ideal.)
>>
>> Now there is only one snag, which is that if the Riak node the client
>> connects to goes down, there will be no communication and we have a
>> problem. This is easily solvable with a load-balancer, though for
>> complicated reasons we actually don't need to do that right now. It's just
>> acceptable for us temporarily. Later, we'll get the load-balancer working
>> and even that won't be a problem.
>>
>> I *think* we're ok now. Thanks for your help!
>>
>> Regards,
>> Vanessa
>>
>>
>>
>> On Wed, Oct 7, 2015 at 9:33 AM, Dmitri Zagidulin 
>> wrote:
>>
>>> Yeah, definitely find out what the sysadmin's experience was, with the
>>> load balancer. It could have just been a wrong configuration or something.
>>>
>>> And yes, that's the documentation page I recommend -
>>> http://docs.basho.com/riak/latest/ops/advanced/configs/load-balancing-proxy/
>>> Just set up HAProxy, and point your Java clients to its IP.
>>>
>>> The drawbacks to load-balancing on the java client side (yes, the
>>> cluster object) instead of a standalone load balancer like HAProxy, are the
>>> following:
>>>
>>> 1) Adding node means code changes (or at very least, config file
>>> changes) rolled out to all your clients. Which turns out to be a pretty
>>> serious hassle. Instead, HAProxy allows you to add or remove nodes without
>>> changing any java code or config files.
>>>
>>> 2) Performance. We've ran many tests to compare performance, and
>>> client-side load balancing results in significantly lower throughput than
>>> you'd have using haproxy (or nginx). (Specifically, you actually want to
>>> use the 'leastconn' load balancing algorithm with HAProxy, instead of round
>>> robin).
>>>
>>> 3) The health check on the client side (so that the java load balancer
>>> can tell when a remote node is down) is much less intelligent than a
>>> dedicated load balancer would provide. With something like HAProxy, you
>>> should be able to take down nodes with no ill effects for the client code.
>>>
>>> Now, if you load balance on the client side and you take a node down,
>>> it's not supposed to stop working completely. (I'm not sure why it's
>>> failing for you, we can investigate, but it'll be easier to just use a load
>>> balancer). It should throw an error or two, but then start working again
>>> (on the retry).
>>>
>>> Dmitri
>>>
>>> On Wed, Oct 7, 2015 at 2:45 PM, Vanessa Williams <
>>> vanessa.willi...@thoughtwire.ca> wrote:
>>>
 Hi Dmitri, thanks for the quick reply.

 It was actually our sysadmin who tried the load balancer approach and
 had no success, late last evening. However I haven't discussed the gory
 details with him yet. The failure he saw was at the application level (i.e.
 failure to read a key), but I don't know a) how he set up the LB or b) what
 the Java exception was, if any. I'll find that out in an hour or two and
 report back.

 I did find this article just now:


 http://docs.basho.com/riak/latest/ops/advanced/configs/load-balancing-proxy/

 So I suppose we'll give those suggestions a try this morning.

 What is the drawback to having the client connect to all 4 nodes (the
 cluster client, I assume you mean?) My understanding from reading articles
 I've found is that one of the nodes going away causes that client to fail
 as well. Is that what you mean, or are there other drawbacks as well?

 If there's anything else you 

Re: Java Riak client can't handle a Riak node failure?

2015-10-07 Thread Vanessa Williams
Hi Dmitri, thanks for the quick reply.

It was actually our sysadmin who tried the load balancer approach and had
no success, late last evening. However I haven't discussed the gory details
with him yet. The failure he saw was at the application level (i.e. failure
to read a key), but I don't know a) how he set up the LB or b) what the
Java exception was, if any. I'll find that out in an hour or two and report
back.

I did find this article just now:

http://docs.basho.com/riak/latest/ops/advanced/configs/load-balancing-proxy/

So I suppose we'll give those suggestions a try this morning.

What is the drawback to having the client connect to all 4 nodes (the
cluster client, I assume you mean?) My understanding from reading articles
I've found is that one of the nodes going away causes that client to fail
as well. Is that what you mean, or are there other drawbacks as well?

If there's anything else you can recommend, or links other than the one
above you can point me to, it would be much appreciated. We expect both
node failure and deliberate node removal for upgrade, repair, replacement,
etc.

Regards,
Vanessa

On Wed, Oct 7, 2015 at 8:29 AM, Dmitri Zagidulin 
wrote:

> Hi Vanessa,
>
> Riak is definitely meant to run behind a load balancer. (Or, at the worst
> case, to be load-balanced on the client side. That is, all clients connect
> to all 4 nodes).
>
> When you say "we did try putting all 4 Riak nodes behind a load-balancer
> and pointing the clients at it, but it didn't help." -- what do you mean
> exactly, by "it didn't help"? What happened when you tried using the load
> balancer?
>
>
>
> On Wed, Oct 7, 2015 at 1:57 PM, Vanessa Williams <
> vanessa.willi...@thoughtwire.ca> wrote:
>
>> Hi all, we are still (for a while longer) using Riak 1.4 and the matching
>> Java client. The client(s) connect to one node in the cluster (since that's
>> all it can do in this client version). The cluster itself has 4 nodes
>> (sorry, we can't use 5 in this scenario). There are 2 separate clients.
>>
>> We've tried both n_val = 3 and n_val=4. We achieve consistency-by-writes
>> by setting w=all. Therefore, we only require one successful read (r=1).
>>
>> When all nodes are up, everything is fine. If one node fails, the clients
>> can no longer read any keys at all. There's an exception like this:
>>
>> com.basho.riak.client.RiakRetryFailedException:
>> java.net.ConnectException: Connection refused
>>
>> Now, it isn't possible that Riak can't operate when one node fails, so
>> we're clearly missing something here.
>>
>> Note: we did try putting all 4 Riak nodes behind a load-balancer and
>> pointing the clients at it, but it didn't help.
>>
>> Riak is a high-availability key-value store, so... why are we failing to
>> achieve high-availability? Any suggestions greatly appreciated, and if more
>> info is required I'll do my best to provide it.
>>
>> Thanks in advance,
>> Vanessa
>>
>> --
>> Vanessa Williams
>> ThoughtWire Corporation
>> http://www.thoughtwire.com
>>
>>
>> ___
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Java Riak client can't handle a Riak node failure?

2015-10-07 Thread Dmitri Zagidulin
Hi Vanessa,

Riak is definitely meant to run behind a load balancer. (Or, at the worst
case, to be load-balanced on the client side. That is, all clients connect
to all 4 nodes).

When you say "we did try putting all 4 Riak nodes behind a load-balancer
and pointing the clients at it, but it didn't help." -- what do you mean
exactly, by "it didn't help"? What happened when you tried using the load
balancer?



On Wed, Oct 7, 2015 at 1:57 PM, Vanessa Williams <
vanessa.willi...@thoughtwire.ca> wrote:

> Hi all, we are still (for a while longer) using Riak 1.4 and the matching
> Java client. The client(s) connect to one node in the cluster (since that's
> all it can do in this client version). The cluster itself has 4 nodes
> (sorry, we can't use 5 in this scenario). There are 2 separate clients.
>
> We've tried both n_val = 3 and n_val=4. We achieve consistency-by-writes
> by setting w=all. Therefore, we only require one successful read (r=1).
>
> When all nodes are up, everything is fine. If one node fails, the clients
> can no longer read any keys at all. There's an exception like this:
>
> com.basho.riak.client.RiakRetryFailedException:
> java.net.ConnectException: Connection refused
>
> Now, it isn't possible that Riak can't operate when one node fails, so
> we're clearly missing something here.
>
> Note: we did try putting all 4 Riak nodes behind a load-balancer and
> pointing the clients at it, but it didn't help.
>
> Riak is a high-availability key-value store, so... why are we failing to
> achieve high-availability? Any suggestions greatly appreciated, and if more
> info is required I'll do my best to provide it.
>
> Thanks in advance,
> Vanessa
>
> --
> Vanessa Williams
> ThoughtWire Corporation
> http://www.thoughtwire.com
>
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Java Riak client can't handle a Riak node failure?

2015-10-07 Thread Dmitri Zagidulin
Yeah, definitely find out what the sysadmin's experience was, with the load
balancer. It could have just been a wrong configuration or something.

And yes, that's the documentation page I recommend -
http://docs.basho.com/riak/latest/ops/advanced/configs/load-balancing-proxy/
Just set up HAProxy, and point your Java clients to its IP.

The drawbacks to load-balancing on the java client side (yes, the cluster
object) instead of a standalone load balancer like HAProxy, are the
following:

1) Adding node means code changes (or at very least, config file changes)
rolled out to all your clients. Which turns out to be a pretty serious
hassle. Instead, HAProxy allows you to add or remove nodes without changing
any java code or config files.

2) Performance. We've ran many tests to compare performance, and
client-side load balancing results in significantly lower throughput than
you'd have using haproxy (or nginx). (Specifically, you actually want to
use the 'leastconn' load balancing algorithm with HAProxy, instead of round
robin).

3) The health check on the client side (so that the java load balancer can
tell when a remote node is down) is much less intelligent than a dedicated
load balancer would provide. With something like HAProxy, you should be
able to take down nodes with no ill effects for the client code.

Now, if you load balance on the client side and you take a node down, it's
not supposed to stop working completely. (I'm not sure why it's failing for
you, we can investigate, but it'll be easier to just use a load balancer).
It should throw an error or two, but then start working again (on the
retry).

Dmitri

On Wed, Oct 7, 2015 at 2:45 PM, Vanessa Williams <
vanessa.willi...@thoughtwire.ca> wrote:

> Hi Dmitri, thanks for the quick reply.
>
> It was actually our sysadmin who tried the load balancer approach and had
> no success, late last evening. However I haven't discussed the gory details
> with him yet. The failure he saw was at the application level (i.e. failure
> to read a key), but I don't know a) how he set up the LB or b) what the
> Java exception was, if any. I'll find that out in an hour or two and report
> back.
>
> I did find this article just now:
>
>
> http://docs.basho.com/riak/latest/ops/advanced/configs/load-balancing-proxy/
>
> So I suppose we'll give those suggestions a try this morning.
>
> What is the drawback to having the client connect to all 4 nodes (the
> cluster client, I assume you mean?) My understanding from reading articles
> I've found is that one of the nodes going away causes that client to fail
> as well. Is that what you mean, or are there other drawbacks as well?
>
> If there's anything else you can recommend, or links other than the one
> above you can point me to, it would be much appreciated. We expect both
> node failure and deliberate node removal for upgrade, repair, replacement,
> etc.
>
> Regards,
> Vanessa
>
> On Wed, Oct 7, 2015 at 8:29 AM, Dmitri Zagidulin 
> wrote:
>
>> Hi Vanessa,
>>
>> Riak is definitely meant to run behind a load balancer. (Or, at the worst
>> case, to be load-balanced on the client side. That is, all clients connect
>> to all 4 nodes).
>>
>> When you say "we did try putting all 4 Riak nodes behind a load-balancer
>> and pointing the clients at it, but it didn't help." -- what do you mean
>> exactly, by "it didn't help"? What happened when you tried using the load
>> balancer?
>>
>>
>>
>> On Wed, Oct 7, 2015 at 1:57 PM, Vanessa Williams <
>> vanessa.willi...@thoughtwire.ca> wrote:
>>
>>> Hi all, we are still (for a while longer) using Riak 1.4 and the
>>> matching Java client. The client(s) connect to one node in the cluster
>>> (since that's all it can do in this client version). The cluster itself has
>>> 4 nodes (sorry, we can't use 5 in this scenario). There are 2 separate
>>> clients.
>>>
>>> We've tried both n_val = 3 and n_val=4. We achieve consistency-by-writes
>>> by setting w=all. Therefore, we only require one successful read (r=1).
>>>
>>> When all nodes are up, everything is fine. If one node fails, the
>>> clients can no longer read any keys at all. There's an exception like this:
>>>
>>> com.basho.riak.client.RiakRetryFailedException:
>>> java.net.ConnectException: Connection refused
>>>
>>> Now, it isn't possible that Riak can't operate when one node fails, so
>>> we're clearly missing something here.
>>>
>>> Note: we did try putting all 4 Riak nodes behind a load-balancer and
>>> pointing the clients at it, but it didn't help.
>>>
>>> Riak is a high-availability key-value store, so... why are we failing to
>>> achieve high-availability? Any suggestions greatly appreciated, and if more
>>> info is required I'll do my best to provide it.
>>>
>>> Thanks in advance,
>>> Vanessa
>>>
>>> --
>>> Vanessa Williams
>>> ThoughtWire Corporation
>>> http://www.thoughtwire.com
>>>
>>>
>>> ___
>>> riak-users mailing list
>>> riak-users@lists.basho.com
>>> 

Re: Java Riak client can't handle a Riak node failure?

2015-10-07 Thread Vanessa Williams
Hi Dmitri, well...we solved our problem to our satisfaction but it turned
out to be something unexpected.

The keys were two properties mentioned in a blog post on "configuring
Riak’s oft-subtle behavioral characteristics":
http://basho.com/posts/technical/riaks-config-behaviors-part-4/

notfound_ok= false
basic_quorum=true

The 2nd one just makes things a little faster, but the first one is the one
whose default value of true was killing us.

With r=1 and notfound_ok=true (default) the first node to respond, if it
didn't find the requested key, the authoritative answer was "this key is
not found". Not what we were expecting at all.

With the changed settings, it will wait for a quorum of responses and only
if *no one* finds the key will "not found" be returned. Perfect. (Without
this setting it would wait for all responses, not ideal.)

Now there is only one snag, which is that if the Riak node the client
connects to goes down, there will be no communication and we have a
problem. This is easily solvable with a load-balancer, though for
complicated reasons we actually don't need to do that right now. It's just
acceptable for us temporarily. Later, we'll get the load-balancer working
and even that won't be a problem.

I *think* we're ok now. Thanks for your help!

Regards,
Vanessa



On Wed, Oct 7, 2015 at 9:33 AM, Dmitri Zagidulin 
wrote:

> Yeah, definitely find out what the sysadmin's experience was, with the
> load balancer. It could have just been a wrong configuration or something.
>
> And yes, that's the documentation page I recommend -
> http://docs.basho.com/riak/latest/ops/advanced/configs/load-balancing-proxy/
> Just set up HAProxy, and point your Java clients to its IP.
>
> The drawbacks to load-balancing on the java client side (yes, the cluster
> object) instead of a standalone load balancer like HAProxy, are the
> following:
>
> 1) Adding node means code changes (or at very least, config file changes)
> rolled out to all your clients. Which turns out to be a pretty serious
> hassle. Instead, HAProxy allows you to add or remove nodes without changing
> any java code or config files.
>
> 2) Performance. We've ran many tests to compare performance, and
> client-side load balancing results in significantly lower throughput than
> you'd have using haproxy (or nginx). (Specifically, you actually want to
> use the 'leastconn' load balancing algorithm with HAProxy, instead of round
> robin).
>
> 3) The health check on the client side (so that the java load balancer can
> tell when a remote node is down) is much less intelligent than a dedicated
> load balancer would provide. With something like HAProxy, you should be
> able to take down nodes with no ill effects for the client code.
>
> Now, if you load balance on the client side and you take a node down, it's
> not supposed to stop working completely. (I'm not sure why it's failing for
> you, we can investigate, but it'll be easier to just use a load balancer).
> It should throw an error or two, but then start working again (on the
> retry).
>
> Dmitri
>
> On Wed, Oct 7, 2015 at 2:45 PM, Vanessa Williams <
> vanessa.willi...@thoughtwire.ca> wrote:
>
>> Hi Dmitri, thanks for the quick reply.
>>
>> It was actually our sysadmin who tried the load balancer approach and had
>> no success, late last evening. However I haven't discussed the gory details
>> with him yet. The failure he saw was at the application level (i.e. failure
>> to read a key), but I don't know a) how he set up the LB or b) what the
>> Java exception was, if any. I'll find that out in an hour or two and report
>> back.
>>
>> I did find this article just now:
>>
>>
>> http://docs.basho.com/riak/latest/ops/advanced/configs/load-balancing-proxy/
>>
>> So I suppose we'll give those suggestions a try this morning.
>>
>> What is the drawback to having the client connect to all 4 nodes (the
>> cluster client, I assume you mean?) My understanding from reading articles
>> I've found is that one of the nodes going away causes that client to fail
>> as well. Is that what you mean, or are there other drawbacks as well?
>>
>> If there's anything else you can recommend, or links other than the one
>> above you can point me to, it would be much appreciated. We expect both
>> node failure and deliberate node removal for upgrade, repair, replacement,
>> etc.
>>
>> Regards,
>> Vanessa
>>
>> On Wed, Oct 7, 2015 at 8:29 AM, Dmitri Zagidulin 
>> wrote:
>>
>>> Hi Vanessa,
>>>
>>> Riak is definitely meant to run behind a load balancer. (Or, at the
>>> worst case, to be load-balanced on the client side. That is, all clients
>>> connect to all 4 nodes).
>>>
>>> When you say "we did try putting all 4 Riak nodes behind a
>>> load-balancer and pointing the clients at it, but it didn't help." -- what
>>> do you mean exactly, by "it didn't help"? What happened when you tried
>>> using the load balancer?
>>>
>>>
>>>
>>> On Wed, Oct 7, 2015 at 1:57 PM, Vanessa Williams <