[GitHub] cassandra pull request #191: 13993

2018-04-02 Thread aweisberg
Github user aweisberg closed the pull request at:

https://github.com/apache/cassandra/pull/191


---

-
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org



[GitHub] cassandra pull request #191: 13993

2018-02-21 Thread aweisberg
Github user aweisberg commented on a diff in the pull request:

https://github.com/apache/cassandra/pull/191#discussion_r169707744
  
--- Diff: 
src/java/org/apache/cassandra/net/StartupClusterConnectivityChecker.java ---
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.net;
+
+import java.util.Set;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.Predicate;
+import java.util.stream.Collectors;
+
+import com.google.common.util.concurrent.Uninterruptibles;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.cassandra.locator.InetAddressAndPort;
+import org.apache.cassandra.net.async.OutboundConnectionIdentifier;
+import org.apache.cassandra.utils.FBUtilities;
+
+import static org.apache.cassandra.net.MessagingService.Verb.PING;
+
+public class StartupClusterConnectivityChecker
+{
+private static final Logger logger = 
LoggerFactory.getLogger(StartupClusterConnectivityChecker.class);
+
+enum State { CONTINUE, FINISH_SUCCESS, FINISH_TIMEOUT }
+
+private final int targetPercent;
+private final int timeoutSecs;
+private final Predicate gossipStatus;
+
+public StartupClusterConnectivityChecker(int targetPercent, int 
timeoutSecs, Predicate gossipStatus)
+{
+if (targetPercent < 0)
+{
+targetPercent = 0;
+}
+else if (targetPercent > 100)
+{
+targetPercent = 100;
+}
+this.targetPercent = targetPercent;
+
+if (timeoutSecs < 0)
+{
+timeoutSecs = 1;
+}
+else if (timeoutSecs > 100)
+{
+logger.warn("setting the block-for-peers timeout (in seconds) 
to {} might be a bit excessive, but using it nonetheless", timeoutSecs);
+}
+this.timeoutSecs = timeoutSecs;
+
+this.gossipStatus = gossipStatus;
+}
+
+public void execute(Set peers)
+{
+if (peers == null || targetPercent == 0)
+return;
+
+// remove current node from the set
+peers = peers.stream()
+ .filter(peer -> 
!peer.equals(FBUtilities.getBroadcastAddressAndPort()))
+ .collect(Collectors.toSet());
+
+// don't block if there's no other nodes in the cluster (or we 
don't know about them)
+if (peers.size() <= 0)
+return;
+
+logger.info("choosing to block until {}% of peers are marked alive 
and connections are established; max time to wait = {} seconds",
+targetPercent, timeoutSecs);
+
+// first, send out a ping message to open up the non-gossip 
connections
+final AtomicInteger connectedCount = sendPingMessages(peers);
+
+final long startNanos = System.nanoTime();
+final long expirationNanos = startNanos + 
TimeUnit.SECONDS.toNanos(timeoutSecs);
+int completedRounds = 0;
+while (checkStatus(peers, connectedCount, startNanos, 
expirationNanos < System.nanoTime(), completedRounds) == State.CONTINUE)
+{
+completedRounds++;
+Uninterruptibles.sleepUninterruptibly(1, TimeUnit.SECONDS);
--- End diff --

I think we want to check the condition pretty aggressively so that startup 
in test harnesses is as fast as possible since we do it a lot. Like check every 
millisecond.


---

-
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org



[GitHub] cassandra pull request #191: 13993

2018-02-21 Thread aweisberg
Github user aweisberg commented on a diff in the pull request:

https://github.com/apache/cassandra/pull/191#discussion_r169697893
  
--- Diff: src/java/org/apache/cassandra/net/MessageOut.java ---
@@ -96,6 +97,11 @@
 //the second object is the POJO to serialize
 public final List parameters;
 
+/**
+ * Allows sender to explicitly state which connection type the message 
should be sent on.
+ */
+public final ConnectionType connectionType;
--- End diff --

A part of me wants to say don't make MessageOut bigger, but it's probably a 
drop in the bucket in terms of allocation rate for processing network messages.


---

-
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org



[GitHub] cassandra pull request #191: 13993

2018-02-14 Thread jasobrown
Github user jasobrown commented on a diff in the pull request:

https://github.com/apache/cassandra/pull/191#discussion_r168326107
  
--- Diff: src/java/org/apache/cassandra/net/MessagingService.java ---
@@ -1664,4 +1676,113 @@ public static boolean 
isEncryptedConnection(InetAddressAndPort address)
 }
 return true;
 }
+
+public void blockForPeers()
+{
+// TODO make these yaml props?
+int alivePercent = Integer.getInteger(Config.PROPERTY_PREFIX + 
"blockForPeers.percent", 70);
+if (alivePercent < 0)
--- End diff --

hmm, starting thinking about this, and, yeah, I guess we can leave the 
upper bound unenforced and instead just log that the operator can choose to 
ignore


---

-
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org



[GitHub] cassandra pull request #191: 13993

2018-02-14 Thread jasobrown
Github user jasobrown commented on a diff in the pull request:

https://github.com/apache/cassandra/pull/191#discussion_r168324237
  
--- Diff: src/java/org/apache/cassandra/net/MessagingService.java ---
@@ -1664,4 +1676,113 @@ public static boolean 
isEncryptedConnection(InetAddressAndPort address)
 }
 return true;
 }
+
+public void blockForPeers()
+{
+// TODO make these yaml props?
--- End diff --

ok, sgtm. I will add the params to `Config`, but not add to the yaml.


---

-
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org



[GitHub] cassandra pull request #191: 13993

2018-02-14 Thread aweisberg
Github user aweisberg commented on a diff in the pull request:

https://github.com/apache/cassandra/pull/191#discussion_r168321619
  
--- Diff: src/java/org/apache/cassandra/net/MessagingService.java ---
@@ -1664,4 +1676,113 @@ public static boolean 
isEncryptedConnection(InetAddressAndPort address)
 }
 return true;
 }
+
+public void blockForPeers()
+{
+// TODO make these yaml props?
--- End diff --

If we are going that route it seems like well then it should just be in the 
YAML. I would say jvm.options should be for things we don't control because 
they aren't directly part of Cassandra.

Another option is as a YAML option but undocumented if you want to not 
clutter the YAML, but then operators get annoyed and confused and ask why it's 
not a yaml option and not documented.

I never really know how to balance the tension of not putting out thousands 
of config options into a file with not having undocumented but important knobs. 
I think this one doesn't need to be in the documented category yet because 
setting it is really as an escape hatch not something people should have to 
worry about. This is just some feature that should silently work and people 
don't have to think about. Then later if we find out we are wrong we can add 
the option and document. This stems the tide of config options people need to 
think about which has value.


---

-
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org



[GitHub] cassandra pull request #191: 13993

2018-02-14 Thread aweisberg
Github user aweisberg commented on a diff in the pull request:

https://github.com/apache/cassandra/pull/191#discussion_r168320433
  
--- Diff: src/java/org/apache/cassandra/net/MessagingService.java ---
@@ -1664,4 +1676,113 @@ public static boolean 
isEncryptedConnection(InetAddressAndPort address)
 }
 return true;
 }
+
+public void blockForPeers()
+{
+// TODO make these yaml props?
+int alivePercent = Integer.getInteger(Config.PROPERTY_PREFIX + 
"blockForPeers.percent", 70);
+if (alivePercent < 0)
--- End diff --

My issue isn't with having a bound it's with enforcing it silently. Some of 
the time you can pick a value for them, but not all of the time.


---

-
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org



[GitHub] cassandra pull request #191: 13993

2018-02-14 Thread aweisberg
Github user aweisberg commented on a diff in the pull request:

https://github.com/apache/cassandra/pull/191#discussion_r168320211
  
--- Diff: src/java/org/apache/cassandra/net/MessagingService.java ---
@@ -1664,4 +1676,113 @@ public static boolean 
isEncryptedConnection(InetAddressAndPort address)
 }
 return true;
 }
+
+public void blockForPeers()
+{
+// TODO make these yaml props?
+int alivePercent = Integer.getInteger(Config.PROPERTY_PREFIX + 
"blockForPeers.percent", 70);
+if (alivePercent < 0)
--- End diff --

I don't like silently clamping any of them. If you feel strongly about 
upper bounding aliveTimeoutSecs it's the closest to being OK because the DB 
still comes up and is reliable at up to 100 seconds and most of the time it 
won't wait that long since it will get to 70%. If someone sets a value higher 
than that we "know" it's not useful so clamping it doesn't hurt them.


---

-
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org



[GitHub] cassandra pull request #191: 13993

2018-02-14 Thread jasobrown
Github user jasobrown commented on a diff in the pull request:

https://github.com/apache/cassandra/pull/191#discussion_r168319052
  
--- Diff: src/java/org/apache/cassandra/net/MessagingService.java ---
@@ -1664,4 +1676,113 @@ public static boolean 
isEncryptedConnection(InetAddressAndPort address)
 }
 return true;
 }
+
+public void blockForPeers()
+{
+// TODO make these yaml props?
--- End diff --

If we choose to not put these params (`alivePercent` and 
`aliveTimeoutSecs`) into the yaml, should they be documented and added to 
`conf/jvm.options`?


---

-
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org



[GitHub] cassandra pull request #191: 13993

2018-02-14 Thread jasobrown
Github user jasobrown commented on a diff in the pull request:

https://github.com/apache/cassandra/pull/191#discussion_r168318195
  
--- Diff: src/java/org/apache/cassandra/net/MessagingService.java ---
@@ -1664,4 +1676,113 @@ public static boolean 
isEncryptedConnection(InetAddressAndPort address)
 }
 return true;
 }
+
+public void blockForPeers()
+{
+// TODO make these yaml props?
+int alivePercent = Integer.getInteger(Config.PROPERTY_PREFIX + 
"blockForPeers.percent", 70);
+if (alivePercent < 0)
--- End diff --

Just to clarify: you do not like the bounding of `aliveTimeoutSecs`, 
correct? `alivePercent` is just bounding the percentage.


---

-
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org



[GitHub] cassandra pull request #191: 13993

2018-02-14 Thread jasobrown
Github user jasobrown commented on a diff in the pull request:

https://github.com/apache/cassandra/pull/191#discussion_r168316916
  
--- Diff: src/java/org/apache/cassandra/net/MessagingService.java ---
@@ -1664,4 +1676,113 @@ public static boolean 
isEncryptedConnection(InetAddressAndPort address)
 }
 return true;
 }
+
+public void blockForPeers()
+{
+// TODO make these yaml props?
+int alivePercent = Integer.getInteger(Config.PROPERTY_PREFIX + 
"blockForPeers.percent", 70);
+if (alivePercent < 0)
+alivePercent = 0;
+else if (alivePercent > 100)
+alivePercent = 100;
+
+int aliveTimeoutSecs = Integer.getInteger(Config.PROPERTY_PREFIX + 
"blockForPeers.timeout_in_secs", 10);
+if (aliveTimeoutSecs < 0)
+aliveTimeoutSecs = 1;
+else if (aliveTimeoutSecs > 100)
+aliveTimeoutSecs = 100;
+
+if (alivePercent > 0)
+blockForPeers(alivePercent, aliveTimeoutSecs);
+}
+
+private void blockForPeers(int targetAlivePercent, int 
aliveTimeoutSecs)
+{
+// grab a snapshot of the current cluster from Gossiper. this is 
completely prone to race conditions, but it's
+// good enough for the purposes of blocking until some certain 
percentage of nodes are considered 'alive'/connected.
+Set> peers = new 
HashSet<>(Gossiper.instance.getEndpointStates());
+
+// remove current node from the set
+peers = peers.stream()
+ .filter(entry -> 
!entry.getKey().equals(FBUtilities.getBroadcastAddressAndPort()))
+ .collect(Collectors.toSet());
+
+final int totalSize = peers.size();
+
+// don't block if there's no other nodes in the cluster (or we 
don't know about them)
+if (totalSize <= 1)
+return;
+
+logger.info("choosing to block until {}% of peers are marked 
alive; max time to wait = {} seconds", targetAlivePercent, aliveTimeoutSecs);
+
+// first, send out a ping message to open up the non-gossip 
connections
+AtomicInteger connectedCount = sendPingMessages(peers);
--- End diff --

>> Do the connections consume resources if there are no queued messages?

Reviewing `OutboundMessageConnection`, no, they do not consume much in 
terms of resources. I'm fine either way creating the large message connections 
eagerly or lazily, so I'll add that in.

>> do we even need to send a message or is just instructing the transport 
system to open them enough?

Because connections are unidirectional, and as we'll want each "pair" of 
outbound/inbound connections for each connection type (gossip, small message, 
large message) to be established, the only way to do that is to send a 'real' 
message that the peer can respond to. That being said, perhaps I can work in 
something that lets a Ping/Pong message "select" which type of transport it 
wants to be sent on.




---

-
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org



[GitHub] cassandra pull request #191: 13993

2018-02-14 Thread aweisberg
Github user aweisberg commented on a diff in the pull request:

https://github.com/apache/cassandra/pull/191#discussion_r168305034
  
--- Diff: src/java/org/apache/cassandra/net/MessagingService.java ---
@@ -1664,4 +1676,113 @@ public static boolean 
isEncryptedConnection(InetAddressAndPort address)
 }
 return true;
 }
+
+public void blockForPeers()
+{
+// TODO make these yaml props?
--- End diff --

This is default on so no I don't think we need to add it to the YAML just 
clean up the TODO.


---

-
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org



[GitHub] cassandra pull request #191: 13993

2018-02-14 Thread aweisberg
Github user aweisberg commented on a diff in the pull request:

https://github.com/apache/cassandra/pull/191#discussion_r168304560
  
--- Diff: src/java/org/apache/cassandra/net/MessagingService.java ---
@@ -1664,4 +1676,113 @@ public static boolean 
isEncryptedConnection(InetAddressAndPort address)
 }
 return true;
 }
+
+public void blockForPeers()
+{
+// TODO make these yaml props?
+int alivePercent = Integer.getInteger(Config.PROPERTY_PREFIX + 
"blockForPeers.percent", 70);
+if (alivePercent < 0)
+alivePercent = 0;
+else if (alivePercent > 100)
+alivePercent = 100;
+
+int aliveTimeoutSecs = Integer.getInteger(Config.PROPERTY_PREFIX + 
"blockForPeers.timeout_in_secs", 10);
+if (aliveTimeoutSecs < 0)
+aliveTimeoutSecs = 1;
+else if (aliveTimeoutSecs > 100)
+aliveTimeoutSecs = 100;
+
+if (alivePercent > 0)
+blockForPeers(alivePercent, aliveTimeoutSecs);
+}
+
+private void blockForPeers(int targetAlivePercent, int 
aliveTimeoutSecs)
+{
+// grab a snapshot of the current cluster from Gossiper. this is 
completely prone to race conditions, but it's
+// good enough for the purposes of blocking until some certain 
percentage of nodes are considered 'alive'/connected.
+Set> peers = new 
HashSet<>(Gossiper.instance.getEndpointStates());
+
+// remove current node from the set
+peers = peers.stream()
+ .filter(entry -> 
!entry.getKey().equals(FBUtilities.getBroadcastAddressAndPort()))
+ .collect(Collectors.toSet());
+
+final int totalSize = peers.size();
+
+// don't block if there's no other nodes in the cluster (or we 
don't know about them)
+if (totalSize <= 1)
+return;
+
+logger.info("choosing to block until {}% of peers are marked 
alive; max time to wait = {} seconds", targetAlivePercent, aliveTimeoutSecs);
+
+// first, send out a ping message to open up the non-gossip 
connections
+AtomicInteger connectedCount = sendPingMessages(peers);
--- End diff --

Do the connections consume resources if there are no queued messages? Are 
there any buffers allocated? The resources consumed should be minimal and even 
then if we can theoretically provision these resources then we will get more 
predictable performance and behavior if we provision them all up front instead 
of lazily where we find out oh we didn't have enough at some arbitrary later 
time.

Maybe we are trying to hack this process too much by having the transport 
system unaware of what we are attempting and if we made it possible to say "hey 
connect on all things and send this small message on each" it would still be 
simple. In fact do we even need to send a message or is just instructing the 
transport system to open them enough?


---

-
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org



[GitHub] cassandra pull request #191: 13993

2018-02-14 Thread aweisberg
Github user aweisberg commented on a diff in the pull request:

https://github.com/apache/cassandra/pull/191#discussion_r168302781
  
--- Diff: src/java/org/apache/cassandra/net/MessagingService.java ---
@@ -1664,4 +1676,113 @@ public static boolean 
isEncryptedConnection(InetAddressAndPort address)
 }
 return true;
 }
+
+public void blockForPeers()
+{
+// TODO make these yaml props?
+int alivePercent = Integer.getInteger(Config.PROPERTY_PREFIX + 
"blockForPeers.percent", 70);
+if (alivePercent < 0)
--- End diff --

Well I think it really sucks when you think you are getting the benefit 
from some config option and your not and you don't know or don't know why. Out 
of scope arguments aren't really applicable because this is the ticket adding 
the feature driven by these config options.

I am not sure what you mean by order of magnitude. The zero clamp means 
someone tried to enable this feature, but didn't and so their stuff is silently 
unreliable.

The 100 clamp is they meant to set a value < 100 but had an extra digit and 
didn't and it's silently being slower to start. I doubt they are going to care 
as much about that in production use cases.

I don't think you'll find many operators that complain about being informed 
via a clear error message their config is not sane. This is also an option that 
people have to request explicitly in order to get an invalid value to clamp in 
the first place so I really do think they want to know.


---

-
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org



[GitHub] cassandra pull request #191: 13993

2018-02-14 Thread jasobrown
Github user jasobrown commented on a diff in the pull request:

https://github.com/apache/cassandra/pull/191#discussion_r168186993
  
--- Diff: src/java/org/apache/cassandra/net/MessagingService.java ---
@@ -1664,4 +1676,113 @@ public static boolean 
isEncryptedConnection(InetAddressAndPort address)
 }
 return true;
 }
+
+public void blockForPeers()
+{
+// TODO make these yaml props?
+int alivePercent = Integer.getInteger(Config.PROPERTY_PREFIX + 
"blockForPeers.percent", 70);
+if (alivePercent < 0)
+alivePercent = 0;
+else if (alivePercent > 100)
+alivePercent = 100;
+
+int aliveTimeoutSecs = Integer.getInteger(Config.PROPERTY_PREFIX + 
"blockForPeers.timeout_in_secs", 10);
+if (aliveTimeoutSecs < 0)
+aliveTimeoutSecs = 1;
+else if (aliveTimeoutSecs > 100)
+aliveTimeoutSecs = 100;
+
+if (alivePercent > 0)
+blockForPeers(alivePercent, aliveTimeoutSecs);
+}
+
+private void blockForPeers(int targetAlivePercent, int 
aliveTimeoutSecs)
+{
+// grab a snapshot of the current cluster from Gossiper. this is 
completely prone to race conditions, but it's
+// good enough for the purposes of blocking until some certain 
percentage of nodes are considered 'alive'/connected.
+Set> peers = new 
HashSet<>(Gossiper.instance.getEndpointStates());
--- End diff --

Yes, this is only called after we've joined the ring, done bootstrapping, 
started gossip, and so on.


---

-
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org



[GitHub] cassandra pull request #191: 13993

2018-02-14 Thread jasobrown
Github user jasobrown commented on a diff in the pull request:

https://github.com/apache/cassandra/pull/191#discussion_r168186702
  
--- Diff: src/java/org/apache/cassandra/net/MessagingService.java ---
@@ -254,6 +258,8 @@ public long getTimeout()
 return DatabaseDescriptor.getRangeRpcTimeout();
 }
 },
+PING(),
--- End diff --

See discussion on ticket, but I'll clean up the enum and comments


---

-
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org



[GitHub] cassandra pull request #191: 13993

2018-02-14 Thread jasobrown
Github user jasobrown commented on a diff in the pull request:

https://github.com/apache/cassandra/pull/191#discussion_r168186466
  
--- Diff: src/java/org/apache/cassandra/net/MessagingService.java ---
@@ -1664,4 +1676,113 @@ public static boolean 
isEncryptedConnection(InetAddressAndPort address)
 }
 return true;
 }
+
+public void blockForPeers()
+{
+// TODO make these yaml props?
--- End diff --

well, I wanted to get more input before shovelling more stuff into the yaml 
:) If we want users to actually use this in a 'standard' way, then they 
probably should be in the yaml


---

-
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org



[GitHub] cassandra pull request #191: 13993

2018-02-14 Thread jasobrown
Github user jasobrown commented on a diff in the pull request:

https://github.com/apache/cassandra/pull/191#discussion_r168186136
  
--- Diff: src/java/org/apache/cassandra/net/MessagingService.java ---
@@ -1664,4 +1676,113 @@ public static boolean 
isEncryptedConnection(InetAddressAndPort address)
 }
 return true;
 }
+
+public void blockForPeers()
+{
+// TODO make these yaml props?
+int alivePercent = Integer.getInteger(Config.PROPERTY_PREFIX + 
"blockForPeers.percent", 70);
+if (alivePercent < 0)
--- End diff --

This is fair, but I felt that giving *some* upper bound didn't seem 
unreasonable. Should we really delay starting the process by an extra order of 
magnitude just because an operator added an extra zero to the configuration? I 
understand your point about least surprise, but that might be beyond the spirit 
of this ticket. wdyt?


---

-
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org



[GitHub] cassandra pull request #191: 13993

2018-02-14 Thread jasobrown
Github user jasobrown commented on a diff in the pull request:

https://github.com/apache/cassandra/pull/191#discussion_r168185310
  
--- Diff: src/java/org/apache/cassandra/net/MessagingService.java ---
@@ -1664,4 +1676,113 @@ public static boolean 
isEncryptedConnection(InetAddressAndPort address)
 }
 return true;
 }
+
+public void blockForPeers()
+{
+// TODO make these yaml props?
+int alivePercent = Integer.getInteger(Config.PROPERTY_PREFIX + 
"blockForPeers.percent", 70);
+if (alivePercent < 0)
+alivePercent = 0;
+else if (alivePercent > 100)
+alivePercent = 100;
+
+int aliveTimeoutSecs = Integer.getInteger(Config.PROPERTY_PREFIX + 
"blockForPeers.timeout_in_secs", 10);
+if (aliveTimeoutSecs < 0)
+aliveTimeoutSecs = 1;
+else if (aliveTimeoutSecs > 100)
+aliveTimeoutSecs = 100;
+
+if (alivePercent > 0)
+blockForPeers(alivePercent, aliveTimeoutSecs);
+}
+
+private void blockForPeers(int targetAlivePercent, int 
aliveTimeoutSecs)
+{
+// grab a snapshot of the current cluster from Gossiper. this is 
completely prone to race conditions, but it's
+// good enough for the purposes of blocking until some certain 
percentage of nodes are considered 'alive'/connected.
+Set> peers = new 
HashSet<>(Gossiper.instance.getEndpointStates());
+
+// remove current node from the set
+peers = peers.stream()
+ .filter(entry -> 
!entry.getKey().equals(FBUtilities.getBroadcastAddressAndPort()))
+ .collect(Collectors.toSet());
+
+final int totalSize = peers.size();
+
+// don't block if there's no other nodes in the cluster (or we 
don't know about them)
+if (totalSize <= 1)
+return;
+
+logger.info("choosing to block until {}% of peers are marked 
alive; max time to wait = {} seconds", targetAlivePercent, aliveTimeoutSecs);
+
+// first, send out a ping message to open up the non-gossip 
connections
+AtomicInteger connectedCount = sendPingMessages(peers);
--- End diff --

I thought about that, as well. I can force a message to go out on the large 
message connection, but the `REQUEST_RESPONSE` will come back on the small 
message connection. Unless, of course, I send some empty byte array that 
exceeds the `OutboundMessagingPool#LARGE_MESSAGE_THRESHOLD`, [which is 
currently 
64k](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/async/OutboundMessagingPool.java#L47).
 Admittedly, I'm reticent to do that. I could, however, create variant of the 
Ping/Pong messages (or modify those) to switch between either large or small 
message connection.

I guess the concern i had was that many apps might not need the large 
message connection, and thus it becomes unused, but consumed, resources. Every 
instance will need the gossip and small message connections, but not every use 
case calls for the large connections. wdyt?


---

-
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org



[GitHub] cassandra pull request #191: 13993

2018-02-08 Thread josh-mckenzie
Github user josh-mckenzie commented on a diff in the pull request:

https://github.com/apache/cassandra/pull/191#discussion_r167033937
  
--- Diff: src/java/org/apache/cassandra/net/MessagingService.java ---
@@ -254,6 +258,8 @@ public long getTimeout()
 return DatabaseDescriptor.getRangeRpcTimeout();
 }
 },
+PING(),
--- End diff --

Left a similar comment on ticket, but this should be added at the end:
*** 
// remember to add new verbs at the end, since we serialize by ordinal
UNUSED_1,
UNUSED_2,
UNUSED_3,
UNUSED_4,
UNUSED_5,



---

-
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org



[GitHub] cassandra pull request #191: 13993

2018-02-08 Thread aweisberg
Github user aweisberg commented on a diff in the pull request:

https://github.com/apache/cassandra/pull/191#discussion_r167020713
  
--- Diff: src/java/org/apache/cassandra/net/MessagingService.java ---
@@ -1664,4 +1676,113 @@ public static boolean 
isEncryptedConnection(InetAddressAndPort address)
 }
 return true;
 }
+
+public void blockForPeers()
+{
+// TODO make these yaml props?
+int alivePercent = Integer.getInteger(Config.PROPERTY_PREFIX + 
"blockForPeers.percent", 70);
+if (alivePercent < 0)
+alivePercent = 0;
+else if (alivePercent > 100)
+alivePercent = 100;
+
+int aliveTimeoutSecs = Integer.getInteger(Config.PROPERTY_PREFIX + 
"blockForPeers.timeout_in_secs", 10);
+if (aliveTimeoutSecs < 0)
+aliveTimeoutSecs = 1;
+else if (aliveTimeoutSecs > 100)
+aliveTimeoutSecs = 100;
+
+if (alivePercent > 0)
+blockForPeers(alivePercent, aliveTimeoutSecs);
+}
+
+private void blockForPeers(int targetAlivePercent, int 
aliveTimeoutSecs)
+{
+// grab a snapshot of the current cluster from Gossiper. this is 
completely prone to race conditions, but it's
+// good enough for the purposes of blocking until some certain 
percentage of nodes are considered 'alive'/connected.
+Set> peers = new 
HashSet<>(Gossiper.instance.getEndpointStates());
+
+// remove current node from the set
+peers = peers.stream()
+ .filter(entry -> 
!entry.getKey().equals(FBUtilities.getBroadcastAddressAndPort()))
+ .collect(Collectors.toSet());
+
+final int totalSize = peers.size();
+
+// don't block if there's no other nodes in the cluster (or we 
don't know about them)
+if (totalSize <= 1)
+return;
+
+logger.info("choosing to block until {}% of peers are marked 
alive; max time to wait = {} seconds", targetAlivePercent, aliveTimeoutSecs);
+
+// first, send out a ping message to open up the non-gossip 
connections
+AtomicInteger connectedCount = sendPingMessages(peers);
--- End diff --

Should we also set up the large message connections? We can set up 
connections in parallel now with Netty right (sweet!). I think we should up all 
of them. Gossip, large, small.


---

-
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org



[GitHub] cassandra pull request #191: 13993

2018-02-08 Thread aweisberg
Github user aweisberg commented on a diff in the pull request:

https://github.com/apache/cassandra/pull/191#discussion_r167017641
  
--- Diff: src/java/org/apache/cassandra/net/MessagingService.java ---
@@ -1664,4 +1676,113 @@ public static boolean 
isEncryptedConnection(InetAddressAndPort address)
 }
 return true;
 }
+
+public void blockForPeers()
+{
+// TODO make these yaml props?
+int alivePercent = Integer.getInteger(Config.PROPERTY_PREFIX + 
"blockForPeers.percent", 70);
+if (alivePercent < 0)
--- End diff --

As a matter of practice silently clamping values always seems wrong to me. 
They asked for something and you aren't going to give it to them therefore 
someone is in error and it shouldn't be silent so we can come back and supply 
the value we intended. I suppose it's how I interpret the principle of least 
surprise.


---

-
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org



[GitHub] cassandra pull request #191: 13993

2018-02-08 Thread aweisberg
Github user aweisberg commented on a diff in the pull request:

https://github.com/apache/cassandra/pull/191#discussion_r167016420
  
--- Diff: src/java/org/apache/cassandra/net/MessagingService.java ---
@@ -254,6 +258,8 @@ public long getTimeout()
 return DatabaseDescriptor.getRangeRpcTimeout();
 }
 },
+PING(),
--- End diff --

Should this go this early in the enum before USUED_* and company? Does it 
make sense to ever insert anything into the middle of the enum?


---

-
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org



[GitHub] cassandra pull request #191: 13993

2018-02-08 Thread aweisberg
Github user aweisberg commented on a diff in the pull request:

https://github.com/apache/cassandra/pull/191#discussion_r167017281
  
--- Diff: src/java/org/apache/cassandra/net/MessagingService.java ---
@@ -1664,4 +1676,113 @@ public static boolean 
isEncryptedConnection(InetAddressAndPort address)
 }
 return true;
 }
+
+public void blockForPeers()
+{
+// TODO make these yaml props?
--- End diff --

So, should these be yaml props? 


---

-
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org



[GitHub] cassandra pull request #191: 13993

2018-02-08 Thread aweisberg
Github user aweisberg commented on a diff in the pull request:

https://github.com/apache/cassandra/pull/191#discussion_r167016249
  
--- Diff: src/java/org/apache/cassandra/net/MessagingService.java ---
@@ -1664,4 +1676,113 @@ public static boolean 
isEncryptedConnection(InetAddressAndPort address)
 }
 return true;
 }
+
+public void blockForPeers()
+{
+// TODO make these yaml props?
+int alivePercent = Integer.getInteger(Config.PROPERTY_PREFIX + 
"blockForPeers.percent", 70);
+if (alivePercent < 0)
+alivePercent = 0;
+else if (alivePercent > 100)
+alivePercent = 100;
+
+int aliveTimeoutSecs = Integer.getInteger(Config.PROPERTY_PREFIX + 
"blockForPeers.timeout_in_secs", 10);
+if (aliveTimeoutSecs < 0)
+aliveTimeoutSecs = 1;
+else if (aliveTimeoutSecs > 100)
+aliveTimeoutSecs = 100;
+
+if (alivePercent > 0)
+blockForPeers(alivePercent, aliveTimeoutSecs);
+}
+
+private void blockForPeers(int targetAlivePercent, int 
aliveTimeoutSecs)
+{
+// grab a snapshot of the current cluster from Gossiper. this is 
completely prone to race conditions, but it's
+// good enough for the purposes of blocking until some certain 
percentage of nodes are considered 'alive'/connected.
+Set> peers = new 
HashSet<>(Gossiper.instance.getEndpointStates());
--- End diff --

When we reach this how do we know that Gossiper will be seeded with any 
endpoint states so we know to wait on a realistic portion of the cluster?

I assume it's implicit in the bootstrapping process before we get to this 
point?


---

-
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org