Re: Consistency Level vs. Retry Policy when no local nodes are available

2017-03-21 Thread Shannon Carey
Thanks for the perspective Ben, it's food for thought.

At minimum, it seems like the documentation should be updated to mention that 
the retry policy will not be consulted when using a local consistency level but 
with no local nodes available. That way, people won't be surprised by it. It 
looks like the docs are included in the Github repo, so I guess I'll try to 
contribute an update there.


From: Ben Slater <ben.sla...@instaclustr.com<mailto:ben.sla...@instaclustr.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Monday, March 20, 2017 at 6:25 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Consistency Level vs. Retry Policy when no local nodes are 
available

I think the general assumption is that DC failover happens at the client app 
level rather than the Cassandra level due to the potentially very significant 
difference in request latency if you move from a app-local DC to a remote DC. 
The preferred pattern for most people is that the app fails in a failed  DC and 
some load balancer above the app redirects traffic to a different DC.

The other factor is that the fail-back scenario from a failed DC and LOCAL_* 
consistencies is potentially complex. Do you want to immediately start using 
the new DC when it becomes available (with missing data) or wait until it 
catches up on writes (and how do you know when that has happened)?

Note also QUORUM is a clear majority of replicas across both DCs. Some people 
run 3 DCs with RF 3 in each and QUORUM to maintain strong consistency across 
DCs even with DC failure.

Cheers
Ben

On Tue, 21 Mar 2017 at 10:00 Shannon Carey 
<sca...@expedia.com<mailto:sca...@expedia.com>> wrote:
Specifically, this puts us in an awkward position because LOCAL_QUORUM is 
desirable so that we don't have unnecessary cross-DC traffic from the client by 
default, but we can't use it because it will cause complete failure if the 
local DC goes down. And we can't use QUORUM because it would fail if there's 
not a quorum in either DC (as would happen if one DC goes down). So it seems 
like we are forced to use a lesser consistency such as ONE or TWO.

-Shannon

From: Shannon Carey <sca...@expedia.com<mailto:sca...@expedia.com>>
Date: Monday, March 20, 2017 at 5:25 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Consistency Level vs. Retry Policy when no local nodes are available

I am running DSE 5.0, and I have a Java client using the Datastax 3.0.0 client 
library.

The client is configured to use a DCAwareRoundRobinPolicy wrapped in a 
TokenAwarePolicy. Nothing special.

When I run my query, I set a custom retry policy.

I am testing cross-DC failover. I have disabled connectivity to the "local" DC 
(relative to my client) in order to perform the test. When I run a query with 
the first consistency level set to LOCAL_ONE (or local anything), my retry 
policy is never called and I always get this exception:
"com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
tried for query failed (no host was tried)"

getErrors() on the exception is empty.

This is contrary to my expectation that the first attempt would fail and would 
allow my RetryPolicy to attempt a different (non-LOCAL) consistency level. I 
have no choice but to avoid using any kind of LOCAL consistency level 
throughout my applications. Is this expected? Or is there anything I can do 
about it? Thanks! It certainly seems like a bug to me or at least something 
that should be improved.

-Shannon
--

Ben Slater
Chief Product Officer
[https://cdn2.hubspot.net/hubfs/2549680/Instaclustr-Navy-logo-new.png]<https://www.instaclustr.com/>

[http://cdn2.hubspot.net/hubfs/184235/dev_images/signature_app/facebook_sig.png]<https://www.facebook.com/instaclustr>
  
[http://cdn2.hubspot.net/hubfs/184235/dev_images/signature_app/twitter_sig.png] 
<https://twitter.com/instaclustr>   
[http://cdn2.hubspot.net/hubfs/184235/dev_images/signature_app/linkedin_sig.png]
 <https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia) and 
Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally privileged 
information.  If you are not the intended recipient, do not copy or disclose 
its content, but please reply to this email immediately and highlight the error 
to the sender and then immediately delete the message.


Re: Consistency Level vs. Retry Policy when no local nodes are available

2017-03-20 Thread Ben Slater
I think the general assumption is that DC failover happens at the client
app level rather than the Cassandra level due to the potentially very
significant difference in request latency if you move from a app-local DC
to a remote DC. The preferred pattern for most people is that the app fails
in a failed  DC and some load balancer above the app redirects traffic to a
different DC.

The other factor is that the fail-back scenario from a failed DC and
LOCAL_* consistencies is potentially complex. Do you want to immediately
start using the new DC when it becomes available (with missing data) or
wait until it catches up on writes (and how do you know when that has
happened)?

Note also QUORUM is a clear majority of replicas across both DCs. Some
people run 3 DCs with RF 3 in each and QUORUM to maintain strong
consistency across DCs even with DC failure.

Cheers
Ben

On Tue, 21 Mar 2017 at 10:00 Shannon Carey <sca...@expedia.com> wrote:

Specifically, this puts us in an awkward position because LOCAL_QUORUM is
desirable so that we don't have unnecessary cross-DC traffic from the
client by default, but we can't use it because it will cause complete
failure if the local DC goes down. And we can't use QUORUM because it would
fail if there's not a quorum in either DC (as would happen if one DC goes
down). So it seems like we are forced to use a lesser consistency such as
ONE or TWO.

-Shannon

From: Shannon Carey <sca...@expedia.com>
Date: Monday, March 20, 2017 at 5:25 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Consistency Level vs. Retry Policy when no local nodes are
available

I am running DSE 5.0, and I have a Java client using the Datastax 3.0.0
client library.

The client is configured to use a DCAwareRoundRobinPolicy wrapped in a
TokenAwarePolicy. Nothing special.

When I run my query, I set a custom retry policy.

I am testing cross-DC failover. I have disabled connectivity to the "local"
DC (relative to my client) in order to perform the test. When I run a query
with the first consistency level set to LOCAL_ONE (or local anything), my
retry policy is never called and I always get this exception:
"com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s)
tried for query failed (no host was tried)"

getErrors() on the exception is empty.

This is contrary to my expectation that the first attempt would fail and
would allow my RetryPolicy to attempt a different (non-LOCAL) consistency
level. I have no choice but to avoid using any kind of LOCAL consistency
level throughout my applications. Is this expected? Or is there anything I
can do about it? Thanks! It certainly seems like a bug to me or at least
something that should be improved.

-Shannon

-- 


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


Re: Consistency Level vs. Retry Policy when no local nodes are available

2017-03-20 Thread Shannon Carey
Specifically, this puts us in an awkward position because LOCAL_QUORUM is 
desirable so that we don't have unnecessary cross-DC traffic from the client by 
default, but we can't use it because it will cause complete failure if the 
local DC goes down. And we can't use QUORUM because it would fail if there's 
not a quorum in either DC (as would happen if one DC goes down). So it seems 
like we are forced to use a lesser consistency such as ONE or TWO.

-Shannon

From: Shannon Carey <sca...@expedia.com<mailto:sca...@expedia.com>>
Date: Monday, March 20, 2017 at 5:25 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Consistency Level vs. Retry Policy when no local nodes are available

I am running DSE 5.0, and I have a Java client using the Datastax 3.0.0 client 
library.

The client is configured to use a DCAwareRoundRobinPolicy wrapped in a 
TokenAwarePolicy. Nothing special.

When I run my query, I set a custom retry policy.

I am testing cross-DC failover. I have disabled connectivity to the "local" DC 
(relative to my client) in order to perform the test. When I run a query with 
the first consistency level set to LOCAL_ONE (or local anything), my retry 
policy is never called and I always get this exception:
"com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) 
tried for query failed (no host was tried)"

getErrors() on the exception is empty.

This is contrary to my expectation that the first attempt would fail and would 
allow my RetryPolicy to attempt a different (non-LOCAL) consistency level. I 
have no choice but to avoid using any kind of LOCAL consistency level 
throughout my applications. Is this expected? Or is there anything I can do 
about it? Thanks! It certainly seems like a bug to me or at least something 
that should be improved.

-Shannon