[jira] [Commented] (CASSANDRA-13993) Add optional startup delay to wait until peers are ready

2018-04-06 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428638#comment-16428638
 ] 

Jason Brown commented on CASSANDRA-13993:
-

[~iamaleksey] Yup, that''s basically what i meant, as well :)

> Add optional startup delay to wait until peers are ready
> 
>
> Key: CASSANDRA-13993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Lifecycle
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 4.0
>
>
> When bouncing a node in a large cluster, is can take a while to recognize the 
> rest of the cluster as available. This is especially true if using TLS on 
> internode messaging connections. The bouncing node (and any clients connected 
> to it) may see a series of Unavailable or Timeout exceptions until the node 
> is 'warmed up' as connecting to the rest of the cluster is asynchronous from 
> the rest of the startup process.
> There are two aspects that drive a node's ability to successfully communicate 
> with a peer after a bounce:
> - marking the peer as 'alive' (state that is held in gossip). This affects 
> the unavailable exceptions
> - having both open outbound and inbound connections open and ready to each 
> peer. This affects timeouts.
> Details of each of these mechanisms are described in the comments below.
> This ticket proposes adding a mechanism, optional and configurable, to delay 
> opening the client native protocol port until some percentage of the peers in 
> the cluster is marked alive and connected to/from. Thus while we potentially 
> slow down startup (delay opening the client port), we alleviate the chance 
> that queries made by clients don't hit transient unavailable/timeout 
> exceptions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13993) Add optional startup delay to wait until peers are ready

2018-04-06 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428607#comment-16428607
 ] 

Aleksey Yeschenko commented on CASSANDRA-13993:
---

[~jasobrown] I don't mean backporting this whole ticket - just the ability to 
parse {{PING}} messages. We can just discard them once parsed (:

> Add optional startup delay to wait until peers are ready
> 
>
> Key: CASSANDRA-13993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Lifecycle
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 4.0
>
>
> When bouncing a node in a large cluster, is can take a while to recognize the 
> rest of the cluster as available. This is especially true if using TLS on 
> internode messaging connections. The bouncing node (and any clients connected 
> to it) may see a series of Unavailable or Timeout exceptions until the node 
> is 'warmed up' as connecting to the rest of the cluster is asynchronous from 
> the rest of the startup process.
> There are two aspects that drive a node's ability to successfully communicate 
> with a peer after a bounce:
> - marking the peer as 'alive' (state that is held in gossip). This affects 
> the unavailable exceptions
> - having both open outbound and inbound connections open and ready to each 
> peer. This affects timeouts.
> Details of each of these mechanisms are described in the comments below.
> This ticket proposes adding a mechanism, optional and configurable, to delay 
> opening the client native protocol port until some percentage of the peers in 
> the cluster is marked alive and connected to/from. Thus while we potentially 
> slow down startup (delay opening the client port), we alleviate the chance 
> that queries made by clients don't hit transient unavailable/timeout 
> exceptions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13993) Add optional startup delay to wait until peers are ready

2018-04-06 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428604#comment-16428604
 ] 

Jason Brown commented on CASSANDRA-13993:
-

On the surface, it seems like backporting Ping-related stuffs is more invasive 
than just skipping some arbitrary bytes in the stream. However, if I think 
understand [~iamaleksey]'s reasoning, skipping some bytes in the stream has 
larger implications and essentially a larger behavior change than simply adding 
a new message. If that's true, then I agree that backporting the Ping message 
is the behavior-wise best route to go.

I'll go ahead and start working on the changes as discussed.

> Add optional startup delay to wait until peers are ready
> 
>
> Key: CASSANDRA-13993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Lifecycle
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 4.0
>
>
> When bouncing a node in a large cluster, is can take a while to recognize the 
> rest of the cluster as available. This is especially true if using TLS on 
> internode messaging connections. The bouncing node (and any clients connected 
> to it) may see a series of Unavailable or Timeout exceptions until the node 
> is 'warmed up' as connecting to the rest of the cluster is asynchronous from 
> the rest of the startup process.
> There are two aspects that drive a node's ability to successfully communicate 
> with a peer after a bounce:
> - marking the peer as 'alive' (state that is held in gossip). This affects 
> the unavailable exceptions
> - having both open outbound and inbound connections open and ready to each 
> peer. This affects timeouts.
> Details of each of these mechanisms are described in the comments below.
> This ticket proposes adding a mechanism, optional and configurable, to delay 
> opening the client native protocol port until some percentage of the peers in 
> the cluster is marked alive and connected to/from. Thus while we potentially 
> slow down startup (delay opening the client port), we alleviate the chance 
> that queries made by clients don't hit transient unavailable/timeout 
> exceptions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13993) Add optional startup delay to wait until peers are ready

2018-04-06 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428460#comment-16428460
 ] 

Aleksey Yeschenko commented on CASSANDRA-13993:
---

I agree with basically everything you said here - except what we should 
backport, so:

bq. if we do get an unknown verb id, skip the payload bytes in MessageIn. This 
leaves the input stream clean to process future messages.

Yes, please, for 4.0+.

bq. Further, I think we can eliminate the whole UNUSED_ verbs thing as that was 
an incomplete defense against unknown verbs, and it didn't account for message 
payload.

Yes please. Keep the five we have - or, four, rather, because one will be 
consumed by {{PING}} - and I'd still say let it be {{UNUSED_4}} or 5, but don't 
introduce any more in 4.0, or after 4.0. We will reclaim the existing ones 
eventually as we EOL older releases.

bq. backport part of CASSANDRA-13283 to get the Verb from a map, not an index 
array offset. This gives us safety for future-proofing against unknown verbs.

Not a bad idea, but we should probably be a bit more conservative re: what we 
backport to 3.0, and especially 2.2 at this point. How about, instead, we just 
backport {{PING}} to 3.11 and 3.0, so in the upgrade scenario there will be no 
harm to connections?

So, TL;DR, maybe do this?
1. Make 4.0 robust against {{null}} verb and skip remainders of messages we 
can't parse. There is precedent for it as well, see 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/hints/HintMessage.java#L126-L128
2. Stop introducing new {{UNUSED_}} verbs starting with 4.0.
3. Backport {{PING}} to 3.0 and 3.11, so upgraders from recent 3.0 and 3.11 
with the fix will have a smoother experience when going to 4.0.

> Add optional startup delay to wait until peers are ready
> 
>
> Key: CASSANDRA-13993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Lifecycle
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 4.0
>
>
> When bouncing a node in a large cluster, is can take a while to recognize the 
> rest of the cluster as available. This is especially true if using TLS on 
> internode messaging connections. The bouncing node (and any clients connected 
> to it) may see a series of Unavailable or Timeout exceptions until the node 
> is 'warmed up' as connecting to the rest of the cluster is asynchronous from 
> the rest of the startup process.
> There are two aspects that drive a node's ability to successfully communicate 
> with a peer after a bounce:
> - marking the peer as 'alive' (state that is held in gossip). This affects 
> the unavailable exceptions
> - having both open outbound and inbound connections open and ready to each 
> peer. This affects timeouts.
> Details of each of these mechanisms are described in the comments below.
> This ticket proposes adding a mechanism, optional and configurable, to delay 
> opening the client native protocol port until some percentage of the peers in 
> the cluster is marked alive and connected to/from. Thus while we potentially 
> slow down startup (delay opening the client port), we alleviate the chance 
> that queries made by clients don't hit transient unavailable/timeout 
> exceptions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13993) Add optional startup delay to wait until peers are ready

2018-04-06 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428361#comment-16428361
 ] 

Jason Brown commented on CASSANDRA-13993:
-

Responding to @aleksey's comments out of order, but hopefully makes sense at 
the end.

bq. we should handle ordinals that are outside of our known range robustly 
instead.

Yeah, I think this is where we should get to.

In {{MessageIn#read()}}, we read the verb id from the stream, and then fetch 
the {{Verb}} instance. In pre-4.0, we literally index into the {{Verb[]}} in 
{{MessagingService}}, so any unknown {{Verb}} s would blow up there with an 
ArrayIndexOutOfBoundsException. With CASSANDRA-13283, committed on trunk, we 
are more intelligently resistant to unknown {{Verb}} s, and would just get a 
null {{Verb}}. Unfortunately, trunk would still have problems with an unknown 
{{Verb}} as it would not know how to deserialize the message (pre-4.0, of 
course, suffers the same problem). It justs reads the basic header data, and 
passes it down, where [it would be dropped by 
{{MessageDeliveryTask}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/MessageDeliveryTask.java#L66].
 Unfortunately, if the message had more bytes in the stream which we didn't try 
to deseriliaze, trying to read the next message on the connection would fail 
spectacularly.

It's easy enough to avoid that, though, as we [already know the 
{{payloadSize}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/MessageIn.java#L146],
 so we can easily skip over the payload, and leave the incoming stream in a 
clean state after we account for the unknown message. Note: {{payloadSize}} is 
required by the internode messaging protocol, so we are sure to have the 
payload size. Thus, we can just safely skip the stream forward when we don't 
know how to deserialize the message, send it forward, and just discard it at 
{{MessagedeliveryTask}}.

bq. So I was thinking about a major upgrade bounce scenario. Think the first 
ever node to upgrade to 4.0 in a cluster of 3.0 nodes - will send out pings to 
every node, but receive no pongs, correct? So every node until a threshold will 
have a significantly longer bounce. Do we care about this case?

As the {{PingMessage}} contains a one-byte payload, it would leave the stream 
in a bad (unconsumed) state. This is a bug for the upgrade scenario. It's not a 
terrible bug, but it will cause the connection that we tried to eagerly create 
(to the un-upgraded peer) to be thrown away as it will fail on the next 
succeeding message on the connection. See proposal at the end.

bq. As implemented currently, we are going to send PINGs potentially to 
3.11/3.0 - unless we switch to gating by version, which we do sometimes.

So here's the rub: we don't necessarily know the peer's version yet. The ping 
messages are sent on the large/small connections, but we're not guaranteed that 
at least one round of gossip has completed wherein we would learn the version 
of the peers (we're still at in the startup process). The un-upgraded node 
won't know how to respond to the the unkown {{Verb}}, which is acceptable, but 
we shouldn't leave the stream on that connection in a broken state (see above).

Proposal: 
- backport part of CASSANDRA-13283 to get the {{Verb}} from a map, not an index 
array offset. This gives us safety for future-proofing against unknown verbs.  
- if we do get an unknown verb id, skip the payload bytes in {{MessageIn}}. 
This leaves the input stream clean to process future messages.
- Further, I think we can eliminate the whole {{UNUSED_}} verbs thing as that 
was an incomplete defense against unknown verbs, and it didn't account for 
message payload.

I think if we backport this to at least 3.0 (maybe 2.2?) that should be 
sufficient for future-proofing against unknown messages. 

If this sounds reasonable, I'll open a separate ticket for that work.

> Add optional startup delay to wait until peers are ready
> 
>
> Key: CASSANDRA-13993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Lifecycle
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 4.0
>
>
> When bouncing a node in a large cluster, is can take a while to recognize the 
> rest of the cluster as available. This is especially true if using TLS on 
> internode messaging connections. The bouncing node (and any clients connected 
> to it) may see a series of Unavailable or Timeout exceptions until the node 
> is 'warmed up' as connecting to the rest of the cluster is asynchronous from 
> the rest of the startup process.
> There are two aspects that drive a node's ability to 

[jira] [Commented] (CASSANDRA-13993) Add optional startup delay to wait until peers are ready

2018-04-05 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427583#comment-16427583
 ] 

Aleksey Yeschenko commented on CASSANDRA-13993:
---

The out-of-range problem, however, feels a bit silly. We shouldn't have padding 
just to avoid going out of ordinal bounds - we should handle ordinals that are 
outside of our known range robustly instead.

> Add optional startup delay to wait until peers are ready
> 
>
> Key: CASSANDRA-13993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Lifecycle
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 4.0
>
>
> When bouncing a node in a large cluster, is can take a while to recognize the 
> rest of the cluster as available. This is especially true if using TLS on 
> internode messaging connections. The bouncing node (and any clients connected 
> to it) may see a series of Unavailable or Timeout exceptions until the node 
> is 'warmed up' as connecting to the rest of the cluster is asynchronous from 
> the rest of the startup process.
> There are two aspects that drive a node's ability to successfully communicate 
> with a peer after a bounce:
> - marking the peer as 'alive' (state that is held in gossip). This affects 
> the unavailable exceptions
> - having both open outbound and inbound connections open and ready to each 
> peer. This affects timeouts.
> Details of each of these mechanisms are described in the comments below.
> This ticket proposes adding a mechanism, optional and configurable, to delay 
> opening the client native protocol port until some percentage of the peers in 
> the cluster is marked alive and connected to/from. Thus while we potentially 
> slow down startup (delay opening the client port), we alleviate the chance 
> that queries made by clients don't hit transient unavailable/timeout 
> exceptions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13993) Add optional startup delay to wait until peers are ready

2018-04-05 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427566#comment-16427566
 ] 

Aleksey Yeschenko commented on CASSANDRA-13993:
---

Disregard my last comment here, I was wrong, by a big margin. My apologies.

As implemented currently, we are going to send PINGs potentially to 3.11/3.0 - 
unless we switch to gating by version, which we do sometimes. And if you pick a 
verb after {{UNUSED_5}}, it would error out on 3.11/3.0 side. So, again, unless 
we gate by version (on which - see below*), we need to pick an ordinal that is 
within the range of 3.0/3.11 - so one of {{UNUSED_1..5}} verbs.

The latest still supported release is 2.2, which has only 3 {{UNUSED}} verbs. 
To be super paranoid and maxmimise the # of available {{UNUSED}} verbs in case 
of bad things happening that would force us to introduce new verbs in old 
versions - which is very unlikely to happen, but did happen before, we should 
use one of {{UNUSED_4}} or {{UNUSED_5}} verbs here, in my opinion.

But not inserting a verb before {{UNUSED_1}} like it is now - it's essentially 
taking up {{UNUSED_1}} verb, but confusing things between 4.0 and 3.0/3.11, 
where everything would slide by one and might introduce mistakes.

* So I was thinking about a major upgrade bounce scenario. Think the first ever 
node to upgrade to 4.0 in a cluster of 3.0 nodes - will send out pings to every 
node, but receive no pongs, correct? So every node until a threshold will have 
a significantly longer bounce. Do we care about this case?

> Add optional startup delay to wait until peers are ready
> 
>
> Key: CASSANDRA-13993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Lifecycle
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 4.0
>
>
> When bouncing a node in a large cluster, is can take a while to recognize the 
> rest of the cluster as available. This is especially true if using TLS on 
> internode messaging connections. The bouncing node (and any clients connected 
> to it) may see a series of Unavailable or Timeout exceptions until the node 
> is 'warmed up' as connecting to the rest of the cluster is asynchronous from 
> the rest of the startup process.
> There are two aspects that drive a node's ability to successfully communicate 
> with a peer after a bounce:
> - marking the peer as 'alive' (state that is held in gossip). This affects 
> the unavailable exceptions
> - having both open outbound and inbound connections open and ready to each 
> peer. This affects timeouts.
> Details of each of these mechanisms are described in the comments below.
> This ticket proposes adding a mechanism, optional and configurable, to delay 
> opening the client native protocol port until some percentage of the peers in 
> the cluster is marked alive and connected to/from. Thus while we potentially 
> slow down startup (delay opening the client port), we alleviate the chance 
> that queries made by clients don't hit transient unavailable/timeout 
> exceptions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13993) Add optional startup delay to wait until peers are ready

2018-04-03 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424314#comment-16424314
 ] 

Aleksey Yeschenko commented on CASSANDRA-13993:
---

So, while the comment is before the {{UNUSED_}} verbs, we should still be doing 
what the comment says, and add new verbs in the end. In our case - after 
{{UNUSED_5}}.

Now, it doesn't often happen that thing go wrong in a way that forces us to 
retroactively add new verbs to already released majors, but it does sometimes.

Imagine for example there is a bug that causes us to add a new verb to 2.2 and 
3.0, to address some issue with reads. Normally we would go an see which unused 
ranges overlap. In this case, {{UNUSED_1}} to {{UNUSED_3}} could be 
appropriated. This is why we keep the buffer there. If 4.0 appropriates the 
slot just before {{UNUSED_1}} - it's essentially taking over {{UNUSED_1}} spot, 
reducing that available buffer by 1.

Now, it is unlikely that we are going to need 3 new verbs in 3.11/3.0/2.2, but 
it's not like extra ordinals are a precious resource. So we might as well stick 
to the ways of the old, and either, a) move {{PING}} verb to the end of the 
list, after {{UNUSED_5}}, or b) Reuse one of the ancient deprecated verbs (we 
did that at least for hints and batchlog recently).

> Add optional startup delay to wait until peers are ready
> 
>
> Key: CASSANDRA-13993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Lifecycle
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 4.0
>
>
> When bouncing a node in a large cluster, is can take a while to recognize the 
> rest of the cluster as available. This is especially true if using TLS on 
> internode messaging connections. The bouncing node (and any clients connected 
> to it) may see a series of Unavailable or Timeout exceptions until the node 
> is 'warmed up' as connecting to the rest of the cluster is asynchronous from 
> the rest of the startup process.
> There are two aspects that drive a node's ability to successfully communicate 
> with a peer after a bounce:
> - marking the peer as 'alive' (state that is held in gossip). This affects 
> the unavailable exceptions
> - having both open outbound and inbound connections open and ready to each 
> peer. This affects timeouts.
> Details of each of these mechanisms are described in the comments below.
> This ticket proposes adding a mechanism, optional and configurable, to delay 
> opening the client native protocol port until some percentage of the peers in 
> the cluster is marked alive and connected to/from. Thus while we potentially 
> slow down startup (delay opening the client port), we alleviate the chance 
> that queries made by clients don't hit transient unavailable/timeout 
> exceptions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13993) Add optional startup delay to wait until peers are ready

2018-03-07 Thread Joseph Lynch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390394#comment-16390394
 ] 

Joseph Lynch commented on CASSANDRA-13993:
--

I cut CASSANDRA-14297 for follow up, will iterate in that ticket.

> Add optional startup delay to wait until peers are ready
> 
>
> Key: CASSANDRA-13993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Lifecycle
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 4.0
>
>
> When bouncing a node in a large cluster, is can take a while to recognize the 
> rest of the cluster as available. This is especially true if using TLS on 
> internode messaging connections. The bouncing node (and any clients connected 
> to it) may see a series of Unavailable or Timeout exceptions until the node 
> is 'warmed up' as connecting to the rest of the cluster is asynchronous from 
> the rest of the startup process.
> There are two aspects that drive a node's ability to successfully communicate 
> with a peer after a bounce:
> - marking the peer as 'alive' (state that is held in gossip). This affects 
> the unavailable exceptions
> - having both open outbound and inbound connections open and ready to each 
> peer. This affects timeouts.
> Details of each of these mechanisms are described in the comments below.
> This ticket proposes adding a mechanism, optional and configurable, to delay 
> opening the client native protocol port until some percentage of the peers in 
> the cluster is marked alive and connected to/from. Thus while we potentially 
> slow down startup (delay opening the client port), we alleviate the chance 
> that queries made by clients don't hit transient unavailable/timeout 
> exceptions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13993) Add optional startup delay to wait until peers are ready

2018-02-26 Thread Joseph Lynch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377336#comment-16377336
 ] 

Joseph Lynch commented on CASSANDRA-13993:
--

{quote}Further, I intentionally wanted this feature to "just work out of the 
box", without requiring extra configuration (for local vs each dc, and so on).
{quote}
[~jasobrown], I completely agree, and I believe there is a difference in 
"percent UP" from "count of DOWN" from a usability perspective, in particular 
"percent UP" is harder (or impossible) for users of the database to set 
properly (it will do what they want) or consistently (they leave it to the 
default or if they change it they use one setting everywhere), and the best 
default I can think of is 100%. Compare this to a "count DOWN" which is more 
likely to be a constant 1 or 2. Consider a user who has two multi-region 
clusters, one that has 12 nodes and one with 120 nodes. Seventy percent is an 
ok default for the first cluster, but a very bad one in the second and in 
either case you still have no guarantee that you will not see latency or errors 
even if you put the timeout at 2 days, and reflecting on it I think 
{{(percent_up, timeout) = (100%, 10-30s)}} would be the only default that gives 
users what they expect (restarting their database does not lead to errors). 
That aggressive setting would have clients doing local CLs waiting on all 
remote replicas, however, which other than preventing hint replay is a bit 
wasteful. On the other hand, in both clusters a {{block_for_peers_local_dc=1}} 
default setting is quite reasonable. The way that my patch implemented the 
three options it works out of the box for all deployments (vnodes, no vnodes, 
large clusters, small clusters, etc) whereas percent up only works well if the 
user _changes_ the default percentage to 100% or is not using vnodes.
{quote}I'm reticent to tie this new behavior to one of those values as the use 
cases are different; meaning, if you change the value for one semantic meaning, 
you alter the other.
{quote}
Ok, that makes sense.
{quote}This is a fair point, and I'd be open to bumping up the default 
threshold. However, remember that behavior exists already in cassandra (it's 
what you buy in to when using vnodes); this patch helps to alleviate the 
unavailables/timeouts, not eliminate nor accentuate them.
{quote}
I agree, this is a great step forward, but with a small change I think this 
strategy could practically eliminate the unavailables/timeouts. If I 
implemented the functionality with unit tests in a separate Jira would you 
consider reviewing it or do you think the slight additional complexity is not 
worth it? Even separating percentage up by local/remote datacenters would be a 
big step forward I think, and if we went with counts I could reduce the number 
of settings to 2 or 1 instead of 3 to give the advanced users less control if 
you think that would be less confusing for newer users.
  

> Add optional startup delay to wait until peers are ready
> 
>
> Key: CASSANDRA-13993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Lifecycle
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 4.0
>
>
> When bouncing a node in a large cluster, is can take a while to recognize the 
> rest of the cluster as available. This is especially true if using TLS on 
> internode messaging connections. The bouncing node (and any clients connected 
> to it) may see a series of Unavailable or Timeout exceptions until the node 
> is 'warmed up' as connecting to the rest of the cluster is asynchronous from 
> the rest of the startup process.
> There are two aspects that drive a node's ability to successfully communicate 
> with a peer after a bounce:
> - marking the peer as 'alive' (state that is held in gossip). This affects 
> the unavailable exceptions
> - having both open outbound and inbound connections open and ready to each 
> peer. This affects timeouts.
> Details of each of these mechanisms are described in the comments below.
> This ticket proposes adding a mechanism, optional and configurable, to delay 
> opening the client native protocol port until some percentage of the peers in 
> the cluster is marked alive and connected to/from. Thus while we potentially 
> slow down startup (delay opening the client port), we alleviate the chance 
> that queries made by clients don't hit transient unavailable/timeout 
> exceptions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 

[jira] [Commented] (CASSANDRA-13993) Add optional startup delay to wait until peers are ready

2018-02-26 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376945#comment-16376945
 ] 

Jason Brown commented on CASSANDRA-13993:
-

[~jolynch] I understand what you are saying. I think the difference between a 
"percent UP" and a "count of DOWN nodes" isn't that much, so either one is 
probably fine. Further, I intentionally wanted this feature to "just work out 
of the box", without requiring extra configuration (for local vs each dc, and 
so on).

bq. relying on the timeout for large clusters (although it would be awesome if 
this timeout re-used or defaulted to an existing timeout relevant to gossip 
convergence such as BROADCAST_INTERVAL or RING_DELAY).

I'm reticent to tie this new behavior to one of those values as the use cases 
are different; meaning, if you change the value for one semantic meaning, you 
alter the other.

bq. especially with vnode=256 clusters where any 2 nodes down in different 
racks essentially guarantees an unavailable error for some intersecting token 
range.

This is a fair point, and I'd be open to bumping up the default threshold. 
However, remember that behavior exists already in cassandra (it's what you buy 
in to when using vnodes); this patch helps to alleviate the 
unavailables/timeouts, not eliminate nor accentuate them.



> Add optional startup delay to wait until peers are ready
> 
>
> Key: CASSANDRA-13993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Lifecycle
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 4.x
>
>
> When bouncing a node in a large cluster, is can take a while to recognize the 
> rest of the cluster as available. This is especially true if using TLS on 
> internode messaging connections. The bouncing node (and any clients connected 
> to it) may see a series of Unavailable or Timeout exceptions until the node 
> is 'warmed up' as connecting to the rest of the cluster is asynchronous from 
> the rest of the startup process.
> There are two aspects that drive a node's ability to successfully communicate 
> with a peer after a bounce:
> - marking the peer as 'alive' (state that is held in gossip). This affects 
> the unavailable exceptions
> - having both open outbound and inbound connections open and ready to each 
> peer. This affects timeouts.
> Details of each of these mechanisms are described in the comments below.
> This ticket proposes adding a mechanism, optional and configurable, to delay 
> opening the client native protocol port until some percentage of the peers in 
> the cluster is marked alive and connected to/from. Thus while we potentially 
> slow down startup (delay opening the client port), we alleviate the chance 
> that queries made by clients don't hit transient unavailable/timeout 
> exceptions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13993) Add optional startup delay to wait until peers are ready

2018-02-21 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371686#comment-16371686
 ] 

Ariel Weisberg commented on CASSANDRA-13993:


I am generally +1 other than I would like to see it spin more aggressively on 
checking whether the responses came back.

I'm not sure about Joseph's point. I mean this is going to improve the 
situation just by virtue of priming all the connections even if it doesn't wait 
for all of them to complete setup. For nodes that are going to be available 
they might now be available within the timeout budget of subsequent reads and 
writes. For nodes that aren't available in time they might not have become 
available anyways.

> Add optional startup delay to wait until peers are ready
> 
>
> Key: CASSANDRA-13993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Lifecycle
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 4.x
>
>
> When bouncing a node in a large cluster, is can take a while to recognize the 
> rest of the cluster as available. This is especially true if using TLS on 
> internode messaging connections. The bouncing node (and any clients connected 
> to it) may see a series of Unavailable or Timeout exceptions until the node 
> is 'warmed up' as connecting to the rest of the cluster is asynchronous from 
> the rest of the startup process.
> There are two aspects that drive a node's ability to successfully communicate 
> with a peer after a bounce:
> - marking the peer as 'alive' (state that is held in gossip). This affects 
> the unavailable exceptions
> - having both open outbound and inbound connections open and ready to each 
> peer. This affects timeouts.
> Details of each of these mechanisms are described in the comments below.
> This ticket proposes adding a mechanism, optional and configurable, to delay 
> opening the client native protocol port until some percentage of the peers in 
> the cluster is marked alive and connected to/from. Thus while we potentially 
> slow down startup (delay opening the client port), we alleviate the chance 
> that queries made by clients don't hit transient unavailable/timeout 
> exceptions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13993) Add optional startup delay to wait until peers are ready

2018-02-20 Thread Joseph Lynch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370969#comment-16370969
 ] 

Joseph Lynch commented on CASSANDRA-13993:
--

[~jasobrown] this is a great idea and would definitely help with the pain of 
rolling restarts! I'm curious though about the choice to make this a percentage 
instead of a raw count either for only the local datacenter or for each 
datacenter (e.g. block until N nodes or fewer are marked down in this nodes 
local datacenter)? In particular I'm concerned that in typical setups (maybe 
something like 2 datacenters, <60 nodes, RF=3, mostly 
{{NetworkTopologyStrategy}} keyspaces) having anything more than one node down 
in gossip in the same datacenter will mean a high probability of getting 
unavailable exceptions @ {{LOCAL_QUORUM}} or timeouts, especially with 
vnode=256 clusters where any 2 nodes down in different racks essentially 
guarantees an unavailable error for some intersecting token range.

What if instead of a percentage the system waited for a fixed number (or fewer) 
of endpoints to be marked as down in the local datacenter, such as 1 by 
default, relying on the timeout for large clusters (although it would be 
awesome if this timeout re-used or defaulted to an existing timeout relevant to 
gossip convergence such as {{BROADCAST_INTERVAL}} or {{RING_DELAY}}).

What do you think? I worked up a quick proof of concept implementation that 
implements counts for the local DC, each DC, or all DCs (for users that are 
using {{LOCAL_QUORUM}} vs {{EACH_QUORUM}} vs {{QUORUM}}) over on 
[github|https://github.com/jasobrown/cassandra/compare/13993...jolynch:13993] 
to show kind of what I'm thinking. I didn't fix the unit tests but if you think 
it's a good idea I can fix them up and add some more (that test multi dc 
setups).

I guess that to make it even smarter a previous {{CassandraDaemon.stop}} could 
persist how many {{DOWN}} nodes there were in a local table or some such and 
then the {{CassandraDaemon.start}} waits for the maximum of that persisted 
number and the configured default, but that adds more complexity and given the 
flexibility of the three counts I am not sure it's worth it.

> Add optional startup delay to wait until peers are ready
> 
>
> Key: CASSANDRA-13993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Lifecycle
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 4.x
>
>
> When bouncing a node in a large cluster, is can take a while to recognize the 
> rest of the cluster as available. This is especially true if using TLS on 
> internode messaging connections. The bouncing node (and any clients connected 
> to it) may see a series of Unavailable or Timeout exceptions until the node 
> is 'warmed up' as connecting to the rest of the cluster is asynchronous from 
> the rest of the startup process.
> There are two aspects that drive a node's ability to successfully communicate 
> with a peer after a bounce:
> - marking the peer as 'alive' (state that is held in gossip). This affects 
> the unavailable exceptions
> - having both open outbound and inbound connections open and ready to each 
> peer. This affects timeouts.
> Details of each of these mechanisms are described in the comments below.
> This ticket proposes adding a mechanism, optional and configurable, to delay 
> opening the client native protocol port until some percentage of the peers in 
> the cluster is marked alive and connected to/from. Thus while we potentially 
> slow down startup (delay opening the client port), we alleviate the chance 
> that queries made by clients don't hit transient unavailable/timeout 
> exceptions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13993) Add optional startup delay to wait until peers are ready

2018-02-17 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368255#comment-16368255
 ] 

Jason Brown commented on CASSANDRA-13993:
-

Addessed [~aweisberg]'s first round of feedback. Now initializing all 
connection types at startup. Also, I've modified {{MessageOut}} to allow 
senders to declare the connection type they want to use. Related to this, I 
corrected the behavior of the gossip `EchoMessage` on the peer's side by 
sending out on the gossip channel (as it currently responds on the small 
message channel, because it's sending a {{REQUEST_RESPONSE}}). However, it's 
still necessary to distinguish between `EchoMessage` and {{PingMessages}} as 
{{PingMessage}} includes an extra byte to express the connection type the peer 
should use. Deserialization of {{EchoMessage}} on a node that doesn't know to 
read the extra byte at the end will cause problems on that connection when 
trying to deserialize the next message as there's that extra byte it wasn't 
expecting.

Also, I don't need to make {{PongMessage}} a verb as is won't need a custom 
{{VerbHandler}}; it can just use {{ResponseVerbHandler}}, which is assigned to 
{{RESPONSE_RESPONSE}} messages.

The main logic of this patch was originally in {{MessagingService}}, but I've 
moved it into it's own class ({{StartupClusterConnectivityChecker}}) and 
slightly refactored it to make unit testing easier. Also, added a unit test.

Cleaned up the comments on {{MessagingService.Verb}} to be more correct and 
more clearer wrt intent and use. Added a sanity check in the static block 
within {{MessagingService.Verb}} where we build up the {{#idToVerbMap}}. We 
should never allow two verbs to have the same id.

 

> Add optional startup delay to wait until peers are ready
> 
>
> Key: CASSANDRA-13993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Lifecycle
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 4.x
>
>
> When bouncing a node in a large cluster, is can take a while to recognize the 
> rest of the cluster as available. This is especially true if using TLS on 
> internode messaging connections. The bouncing node (and any clients connected 
> to it) may see a series of Unavailable or Timeout exceptions until the node 
> is 'warmed up' as connecting to the rest of the cluster is asynchronous from 
> the rest of the startup process.
> There are two aspects that drive a node's ability to successfully communicate 
> with a peer after a bounce:
> - marking the peer as 'alive' (state that is held in gossip). This affects 
> the unavailable exceptions
> - having both open outbound and inbound connections open and ready to each 
> peer. This affects timeouts.
> Details of each of these mechanisms are described in the comments below.
> This ticket proposes adding a mechanism, optional and configurable, to delay 
> opening the client native protocol port until some percentage of the peers in 
> the cluster is marked alive and connected to/from. Thus while we potentially 
> slow down startup (delay opening the client port), we alleviate the chance 
> that queries made by clients don't hit transient unavailable/timeout 
> exceptions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13993) Add optional startup delay to wait until peers are ready

2018-02-09 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358383#comment-16358383
 ] 

Joshua McKenzie commented on CASSANDRA-13993:
-

{quote}wdyt?{quote}
Passes the smell test. Legacy code is such a delight.

Anyone that's relying on extending these verbs can do the leg work to better 
integrate with 13283's impl after this change if they haven't yet anyway, as 
it's a cleaner solution to this than just "keep adding a little breathing room 
so we maybe don't overflow".

> Add optional startup delay to wait until peers are ready
> 
>
> Key: CASSANDRA-13993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Lifecycle
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 4.x
>
>
> When bouncing a node in a large cluster, is can take a while to recognize the 
> rest of the cluster as available. This is especially true if using TLS on 
> internode messaging connections. The bouncing node (and any clients connected 
> to it) may see a series of Unavailable or Timeout exceptions until the node 
> is 'warmed up' as connecting to the rest of the cluster is asynchronous from 
> the rest of the startup process.
> There are two aspects that drive a node's ability to successfully communicate 
> with a peer after a bounce:
> - marking the peer as 'alive' (state that is held in gossip). This affects 
> the unavailable exceptions
> - having both open outbound and inbound connections open and ready to each 
> peer. This affects timeouts.
> Details of each of these mechanisms are described in the comments below.
> This ticket proposes adding a mechanism, optional and configurable, to delay 
> opening the client native protocol port until some percentage of the peers in 
> the cluster is marked alive and connected to/from. Thus while we potentially 
> slow down startup (delay opening the client port), we alleviate the chance 
> that queries made by clients don't hit transient unavailable/timeout 
> exceptions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13993) Add optional startup delay to wait until peers are ready

2018-02-08 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357555#comment-16357555
 ] 

Jason Brown commented on CASSANDRA-13993:
-

Back in the mists of time, in cassandra 1.2 we had two comments in the Verbs 
enum:
 - a message about [backward 
compatibility|https://github.com/apache/cassandra/blob/cassandra-1.2/src/java/org/apache/cassandra/net/MessagingService.java#L119],
 which appeared before the {{UNUSED_}} verbs
{code:java}
 // use as padding for backwards compatability where a previous version needs 
to validate a verb from the future.
{code}

 - a message about [adding to new 
verbs|https://github.com/apache/cassandra/blob/cassandra-1.2/src/java/org/apache/cassandra/net/MessagingService.java#L124]
 the end of the list (after the UNUSED verbs)
{code:java}
 // remember to add new verbs at the end, since we serialize by ordinal
{code}

The former message assumes we can receive some limited number of messages with 
verb ids that are unknown, and not blow up trying to deserialize the message.

In 2.0, both of those comments were moved:
 - the backward compatibility comment is [now before the newly introduced paxos 
verbs|https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/net/MessagingService.java#L125]
 - the new verbs comment is [*before* the UNUSED 
verbs|https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/net/MessagingService.java#L130]

I think this is where things become confusing. But read on ...

The situation stayed the same until 3.0, where we deleted the backward 
compatibility comment, but kept the message about adding new verbs in the same 
place. This is more or less what we have in trunk. Hence, looking at trunk now, 
it's not clear if the UNUSED verbs are for future proofing the deserialization 
or are some sort of external party-specific messages.

Further, in this current scheme it's not guaranteed for someone to create their 
custom verb and have it be safe across versions and upgrades - at least not 
until CASSANDRA-13283 (committed for 4.0).

It seems that the original intent of the UNUSED verbs was to allow "verbs from 
the future" to be "validated"; that is, [not throw an 
ArrayIndexOutOfBoundsException|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/net/MessageIn.java#L87]
 when a node sees a message with a verb id it's doen't know about (assuming 
that verb ids matches one of the UNUSED verb ids. That message would 
ultimiately [be thrown away in 
{{MessageDeliveryTask}}|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/net/MessageDeliveryTask.java#L58]
 as we would have no {{IVerbHandler}} for the unused verb.

Further, if we assume the UNUSED verb to be future proofing, then new verbs 
should, in fact, be added *before* the UNUSED verbs.

As the ability to add new, custom verbs and be future proof from new 
conflicting verbs (assuming all verbs got their id from the enum's ordinal) 
didn't arrive until CASSANDRA-13283 (basically 4.0), I think it's reasonable to 
assume that nobody is currently running with custom verbs (unless they have 
backported CASSANDRA-13283). Thus, I think it should be safe to add new verbs 
to 4.0 before the UNUSED verbs as long as the new verb ids fall into the UNUSED 
verb ids that 3.0 and 3.11 have declared. I believe this is what we have done 
along.

wdyt? [~aweisberg] [~JoshuaMcKenzie]

> Add optional startup delay to wait until peers are ready
> 
>
> Key: CASSANDRA-13993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Lifecycle
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 4.x
>
>
> When bouncing a node in a large cluster, is can take a while to recognize the 
> rest of the cluster as available. This is especially true if using TLS on 
> internode messaging connections. The bouncing node (and any clients connected 
> to it) may see a series of Unavailable or Timeout exceptions until the node 
> is 'warmed up' as connecting to the rest of the cluster is asynchronous from 
> the rest of the startup process.
> There are two aspects that drive a node's ability to successfully communicate 
> with a peer after a bounce:
> - marking the peer as 'alive' (state that is held in gossip). This affects 
> the unavailable exceptions
> - having both open outbound and inbound connections open and ready to each 
> peer. This affects timeouts.
> Details of each of these mechanisms are described in the comments below.
> This ticket proposes adding a mechanism, optional and configurable, to delay 
> opening the client native protocol port until some 

[jira] [Commented] (CASSANDRA-13993) Add optional startup delay to wait until peers are ready

2018-02-08 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357409#comment-16357409
 ] 

Joshua McKenzie commented on CASSANDRA-13993:
-

Regardless of whether the unused slots are currently used by other consumers 
(be it DSE or otherwise), inserting an enum in the middle explicitly violates 
the contract / comment in the code:
{code:java}
// remember to add new verbs at the end, since we serialize by ordinal
UNUSED_1,
UNUSED_2,
UNUSED_3,
UNUSED_4,
UNUSED_5,
;{code}
So I'd recommend against inserting a new verb in the middle.

> Add optional startup delay to wait until peers are ready
> 
>
> Key: CASSANDRA-13993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Lifecycle
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 4.x
>
>
> When bouncing a node in a large cluster, is can take a while to recognize the 
> rest of the cluster as available. This is especially true if using TLS on 
> internode messaging connections. The bouncing node (and any clients connected 
> to it) may see a series of Unavailable or Timeout exceptions until the node 
> is 'warmed up' as connecting to the rest of the cluster is asynchronous from 
> the rest of the startup process.
> There are two aspects that drive a node's ability to successfully communicate 
> with a peer after a bounce:
> - marking the peer as 'alive' (state that is held in gossip). This affects 
> the unavailable exceptions
> - having both open outbound and inbound connections open and ready to each 
> peer. This affects timeouts.
> Details of each of these mechanisms are described in the comments below.
> This ticket proposes adding a mechanism, optional and configurable, to delay 
> opening the client native protocol port until some percentage of the peers in 
> the cluster is marked alive and connected to/from. Thus while we potentially 
> slow down startup (delay opening the client port), we alleviate the chance 
> that queries made by clients don't hit transient unavailable/timeout 
> exceptions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13993) Add optional startup delay to wait until peers are ready

2018-02-08 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357354#comment-16357354
 ] 

Ariel Weisberg commented on CASSANDRA-13993:


I put up a review on a pull request 
https://github.com/apache/cassandra/pull/191#pullrequestreview-95170964

Those unused slots in the enum are relevant for DSE I'm not sure we can 
actually take them or not? Maybe they are there for us to use them? 
[~JoshuaMcKenzie] do you know?

> Add optional startup delay to wait until peers are ready
> 
>
> Key: CASSANDRA-13993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Lifecycle
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 4.x
>
>
> When bouncing a node in a large cluster, is can take a while to recognize the 
> rest of the cluster as available. This is especially true if using TLS on 
> internode messaging connections. The bouncing node (and any clients connected 
> to it) may see a series of Unavailable or Timeout exceptions until the node 
> is 'warmed up' as connecting to the rest of the cluster is asynchronous from 
> the rest of the startup process.
> There are two aspects that drive a node's ability to successfully communicate 
> with a peer after a bounce:
> - marking the peer as 'alive' (state that is held in gossip). This affects 
> the unavailable exceptions
> - having both open outbound and inbound connections open and ready to each 
> peer. This affects timeouts.
> Details of each of these mechanisms are described in the comments below.
> This ticket proposes adding a mechanism, optional and configurable, to delay 
> opening the client native protocol port until some percentage of the peers in 
> the cluster is marked alive and connected to/from. Thus while we potentially 
> slow down startup (delay opening the client port), we alleviate the chance 
> that queries made by clients don't hit transient unavailable/timeout 
> exceptions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13993) Add optional startup delay to wait until peers are ready

2018-01-30 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345006#comment-16345006
 ] 

Jason Brown commented on CASSANDRA-13993:
-

bq. I think the new/unknown messages should just be ignored at 
MessageDeliveryTask#run()

Even though these are new messages, and we don't have CASSANDRA-13283 in 
pre-4.0, I don't think 3.0/3.11 will fail to deserialize on 3.0/3.11 as the new 
Ping/Pong messages will get the next cardinal value from the {{Verbs}} enum (in 
4.0), and it looks like we have some "UNUSED_" slots in the enum for safety. 
Thus a 3.11 node could successfully deserialize the {{PingMessage}}, but it 
won't have a {{VerbHandler}} to send back a {{PongMessage}}. This is acceptable 
as the connection will be successfully established (one way, at least), and the 
message won't deserialize incorrectly and thus throw away the connection.

This would only be a transient issue during upgrade to 4.0.

However, I need to test this, but at least the initial code reading seems 
reasonable.


> Add optional startup delay to wait until peers are ready
> 
>
> Key: CASSANDRA-13993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Lifecycle
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 4.x
>
>
> When bouncing a node in a large cluster, is can take a while to recognize the 
> rest of the cluster as available. This is especially true if using TLS on 
> internode messaging connections. The bouncing node (and any clients connected 
> to it) may see a series of Unavailable or Timeout exceptions until the node 
> is 'warmed up' as connecting to the rest of the cluster is asynchronous from 
> the rest of the startup process.
> There are two aspects that drive a node's ability to successfully communicate 
> with a peer after a bounce:
> - marking the peer as 'alive' (state that is held in gossip). This affects 
> the unavailable exceptions
> - having both open outbound and inbound connections open and ready to each 
> peer. This affects timeouts.
> Details of each of these mechanisms are described in the comments below.
> This ticket proposes adding a mechanism, optional and configurable, to delay 
> opening the client native protocol port until some percentage of the peers in 
> the cluster is marked alive and connected to/from. Thus while we potentially 
> slow down startup (delay opening the client port), we alleviate the chance 
> that queries made by clients don't hit transient unavailable/timeout 
> exceptions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13993) Add optional startup delay to wait until peers are ready

2017-11-05 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239556#comment-16239556
 ] 

Jason Brown commented on CASSANDRA-13993:
-

A mostly complete branch here:

||13993||
|[branch|https://github.com/jasobrown/cassandra/tree/13993]|
|[utests|https://circleci.com/gh/jasobrown/cassandra/tree/13993]|

The patch proposes to allow the operator to configure some extra time to wait 
until a configurable percentage of the peers in the cluster are marked alive 
(In {{Gossip.endpoitStateMap}}) and connected to. 

For the alives, we simply check each known peer's state in 
{{Gossip.endpoitStateMap}} to see if it is marked alive, using all the existing 
infrastructre in Gossiper (see {{Gossiper#markAlive()}}. 

For the connections, the bouncing node sends a new {{PingMessage}} to the peer, 
which will be sent on the small message channel. The peer responds with a 
{{PongMessage}}, sent on it's own small message channel. Thus, we eagerly 
create the outbound and inbound connections (small message channel) with each 
peer in the cluster before the client native protocol port is opened.

Note: the gossip outbound and inbound connections will be created by the 
{{EchoMessage}} and response that is sent by {{Gossiper#markAlive()}}.

There are a couple of open questions I'm still thinking through:
- should the configurable parameters be yaml properties? The current 
implementation naively uses System props, and hard coded default values at that 
(which will need to change before commit).
- I need to test how upgrades work, to make sure that nodes which do not know 
about the new messages (and their verbs), do not fail spectacularly. I think 
the new/unknown messages should just [be ignored at 
{{MessageDeliveryTask#run()}}|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/net/MessageDeliveryTask.java#L58].
 If there is a problem, I'll need to add a version check before sending the new 
message.


> Add optional startup delay to wait until peers are ready
> 
>
> Key: CASSANDRA-13993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Lifecycle
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 4.x
>
>
> When bouncing a node in a large cluster, is can take a while to recognize the 
> rest of the cluster as available. This is especially true if using TLS on 
> internode messaging connections. The bouncing node (and any clients connected 
> to it) may see a series of Unavailable or Timeout exceptions until the node 
> is 'warmed up' as connecting to the rest of the cluster is asynchronous from 
> the rest of the startup process.
> There are two aspects that drive a node's ability to successfully communicate 
> with a peer after a bounce:
> - marking the peer as 'alive' (state that is held in gossip). This affects 
> the unavailable exceptions
> - having both open outbound and inbound connections open and ready to each 
> peer. This affects timeouts.
> Details of each of these mechanisms are described in the comments below.
> This ticket proposes adding a mechanism, optional and configurable, to delay 
> opening the client native protocol port until some percentage of the peers in 
> the cluster is marked alive and connected to/from. Thus while we potentially 
> slow down startup (delay opening the client port), we alleviate the chance 
> that queries made by clients don't hit transient unavailable/timeout 
> exceptions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13993) Add optional startup delay to wait until peers are ready

2017-11-05 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239555#comment-16239555
 ] 

Jason Brown commented on CASSANDRA-13993:
-

The details of the timeouts after startup are:

- a client request comes in on the native protocol, either a read or write
- the newly bounced node figures out which peers are responsible for the data 
(by partition key)
- the node sends the request to the peers, we have to build up both the 
outbound and inbound connections (note: internode messaging connections are 
unidrectional)
- if building those connections are not fast enough, the request will timeout 
(either at the coordinator or the client driver)

On each connection we have to build TCP connection, possiblly perform the TLS 
handshake, and then perform the c* internode messaging handshake. The time for 
this is exacerbated with nodes that are in remote datacenters, where the round 
trip time is significantly higher. In pre-4.0 (before CASSANDRA-8457), this is 
even worse as all those actions were performed sequentially, per-each 
connection attempt, [on the (single) accept 
thread|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/net/MessagingService.java#L1284].


> Add optional startup delay to wait until peers are ready
> 
>
> Key: CASSANDRA-13993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Lifecycle
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 4.x
>
>
> When bouncing a node in a large cluster, is can take a while to recognize the 
> rest of the cluster as available. This is especially true if using TLS on 
> internode messaging connections. The bouncing node (and any clients connected 
> to it) may see a series of Unavailable or Timeout exceptions until the node 
> is 'warmed up' as connecting to the rest of the cluster is asynchronous from 
> the rest of the startup process.
> There are two aspects that drive a node's ability to successfully communicate 
> with a peer after a bounce:
> - marking the peer as 'alive' (state that is held in gossip). This affects 
> the unavailable exceptions
> - having both open outbound and inbound connections open and ready to each 
> peer. This affects timeouts.
> Details of each of these mechanisms are described in the comments below.
> This ticket proposes adding a mechanism, optional and configurable, to delay 
> opening the client native protocol port until some percentage of the peers in 
> the cluster is marked alive and connected to/from. Thus while we potentially 
> slow down startup (delay opening the client port), we alleviate the chance 
> that queries made by clients don't hit transient unavailable/timeout 
> exceptions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13993) Add optional startup delay to wait until peers are ready

2017-11-05 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239554#comment-16239554
 ] 

Jason Brown commented on CASSANDRA-13993:
-

The details of the causes of unavailables after startup are:

- a client request comes in on the native protocol, either a read or write
- the newly bounced node figures out which peers are responsible for the data 
(by partition key)
- the node checks to see if it thinks the peers are available (see below)
- if not a sufficient enough number of replicas are alive to fulfill the 
request, the unavailable error is returned to the client 

a bouncing node determines if a peer is alive by:

- In StorageService#initServer(), add the IP addresses of previously known 
peers to gossip via {{Gossiper#addSavedEndpoint}}
- {{Gossiper#addSavedEndpoint}} sets up the local state about the peer, and 
marks the peer as dead ({{EndpointState#markDead}})
... time passes in the process startup sequence ...
- when we get gossip data from any peer in the cluster, we will start updating 
the known state in gossip about each peer
- for each peer updated that we think will be a live node (not decomissioned, 
shutdown, whatever), {{Gossiper#markAlive()}} will send an {{EchoMessage to the 
peer}}. This is sent on the {{OutboundMessagingPool#gossipChannel}} socket, 
which opens up a TCP socket, does the TCP handshake, and when we go to write 
the message to the socket (which will be the cassandra internode handshake), 
the TLS handshake is initiated and completed before the message bytes sent.
- The peer will respond with a simple request-response message. This (should 
be) sent on the peer's {{OutboundMessagingPool#gossipChannel}} [1], which 
requires it's own socket, TCP handhsake, TLS handshake, and so on before the 
request-response bytes are sent to the socket.
- The bounced node receives the request-response, and invokes the callback 
{{Gossiper#markRealAlive()}}. In that method we finally mark the peer as alive 
by invoking {{EndpointState#markAlive()}}.
- All clilent-initiated DML operations will look into the EndpointState for a 
peer inside of Gossiper to check if the peer is alive.

Thus, we must have a successful {{EchoMessage}} and response between any two 
nodes for the initiator to consider a peer as available for user-initiated 
queries.

[1] Actaully, there is a bug wherein the response is sent on the 
{{OutboundMessagingPool#smallMessageChannel}}. CASSANDRA-13714 exists to 
address it.

> Add optional startup delay to wait until peers are ready
> 
>
> Key: CASSANDRA-13993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Lifecycle
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 4.x
>
>
> When bouncing a node in a large cluster, is can take a while to recognize the 
> rest of the cluster as available. This is especially true if using TLS on 
> internode messaging connections. The bouncing node (and any clients connected 
> to it) may see a series of Unavailable or Timeout exceptions until the node 
> is 'warmed up' as connecting to the rest of the cluster is asynchronous from 
> the rest of the startup process.
> There are two aspects that drive a node's ability to successfully communicate 
> with a peer after a bounce:
> - marking the peer as 'alive' (state that is held in gossip). This affects 
> the unavailable exceptions
> - having both open outbound and inbound connections open and ready to each 
> peer. This affects timeouts.
> Details of each of these mechanisms are described in the comments below.
> This ticket proposes adding a mechanism, optional and configurable, to delay 
> opening the client native protocol port until some percentage of the peers in 
> the cluster is marked alive and connected to/from. Thus while we potentially 
> slow down startup (delay opening the client port), we alleviate the chance 
> that queries made by clients don't hit transient unavailable/timeout 
> exceptions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org