On Nov 30, 2006, at 12:31 AM, Mark Fasheh wrote:
Hi Andrew,
Things are looking much better, but there's still a few issues that
I found while reviewing the patch. I got Zach to look at it too
(he's the
original author of the ocfs2 network code) which has generated some
good
comments.
On Wed, Nov 29, 2006 at 09:51:31AM +0100, [EMAIL PROTECTED] wrote:
From: Andrew Beekhof <[EMAIL PROTECTED]>
Subject: [patch 1/1] OCFS2 Configurable timeouts - Protocol changes
Modify the OCFS2 handshake to ensure essential timeouts are
configured
identically on all nodes.
Only allow changes when there are no connected peers
Improves the logic in o2net_advance_rx() which broke now that
sizeof(struct o2net_handshake) is greater than sizeof(struct
o2net_msg)
Included is the field for userspace-heartbeat timeout to avoid the
need for
further protocol changes.
Uses a global spinlock to ensure the decisions to update configfs
entries
are made on the correct value. The region covered by the
spinlock when
incrimenting the counter is much larger as this is the more
critical case.
Nitpick: Can you format that commit log to be a bit more in line with
standard kernel commits (the indenting is weird)
sure
Index: fs/ocfs2/cluster/nodemanager.c
===================================================================
--- fs/ocfs2/cluster/nodemanager.c.orig 2006-11-20
16:25:58.000000000 +0100
+++ fs/ocfs2/cluster/nodemanager.c 2006-11-27 09:57:56.000000000
+0100
@@ -558,15 +558,14 @@ static ssize_t o2nm_cluster_attr_write(c
return count;
}
-static ssize_t o2nm_cluster_attr_idle_timeout_ms_read(struct
o2nm_cluster *cluster,
- char *page)
+static ssize_t o2nm_cluster_attr_idle_timeout_ms_read(
+ struct o2nm_cluster *cluster, char *page)
{
return sprintf(page, "%u\n", cluster->cl_idle_timeout_ms);
}
Can you not re-write the function prototypes unless they're actually
changing please? It clutters up the patch and makes it harder to
find the
actual code to check (see below).
ah, bad habit i picked up working on smaller projects.
is it ok in a separate patch? or have I got my wrap points set too
small by default?
@@ -574,10 +573,22 @@ static ssize_t o2nm_cluster_attr_idle_ti
ret = o2nm_cluster_attr_write(page, count, &val);
if (ret > 0) {
+ if (cluster->cl_idle_timeout_ms != val) {
+ spin_lock(&connected_lock);
+ if(o2net_num_connected_peers()) {
+ mlog(ML_NOTICE,
+ "o2net: cannot change idle timeout after "
+ "the first peer has agreed to it."
+ " %d connected peers\n",
+ o2net_num_connected_peers());
+ ret = -EINVAL;
+ }
+ spin_unlock(&connected_lock);
+ }
if (val <= cluster->cl_keepalive_delay_ms) {
mlog(ML_NOTICE, "o2net: idle timeout must be larger "
"than keepalive delay\n");
- return -EINVAL;
+ ret = -EINVAL;
}
cluster->cl_idle_timeout_ms = val;
I don't know how I missed this before, but you're erroring with a
negative return
value, yet continuing with the work of setting cluster-
>cl_idle_timeout_ms
anyway. I think we're missing some goto's here and in the similar
blocks
below.
my bad - fixed
@@ -1121,6 +1121,44 @@ static int o2net_check_handshake(struct
return -1;
}
+ /*
+ * Ensure timeouts are consistent with other nodes, otherwise
+ * we can end up with one node thinking that the other must be
down,
+ * but isn't. This can ultimately cause corruption.
+ */
+ if (be32_to_cpu(hand->o2net_idle_timeout_ms) !=
+ o2net_idle_timeout(sc->sc_node)) {
+ mlog(ML_NOTICE, SC_NODEF_FMT " uses a network idle timeout of "
+ "%u ms, but we use %u ms locally. disconnecting\n",
+ SC_NODEF_ARGS(sc),
+ be32_to_cpu(hand->o2net_idle_timeout_ms),
+ o2net_idle_timeout(sc->sc_node));
+ o2net_ensure_shutdown(nn, sc, -ENOTCONN);
+ return -1;
+ }
+
+ if (be32_to_cpu(hand->o2net_keepalive_delay_ms) !=
+ o2net_keepalive_delay(sc->sc_node)) {
+ mlog(ML_NOTICE, SC_NODEF_FMT " uses a keepalive delay of "
+ "%u ms, but we use %u ms locally. disconnecting\n",
+ SC_NODEF_ARGS(sc),
+ be32_to_cpu(hand->o2net_keepalive_delay_ms),
+ o2net_keepalive_delay(sc->sc_node));
+ o2net_ensure_shutdown(nn, sc, -ENOTCONN);
+ return -1;
+ }
+
+ if (be32_to_cpu(hand->o2hb_heartbeat_timeout_ms) !=
+ O2HB_MAX_WRITE_TIMEOUT_MS) {
+ mlog(ML_NOTICE, SC_NODEF_FMT " uses a heartbeat timeout of "
+ "%u ms, but we use %u ms locally. disconnecting\n",
+ SC_NODEF_ARGS(sc),
+ be32_to_cpu(hand->o2net_keepalive_delay_ms),
We check hearbeat timeout here, but print keepalive delay...
fixed
@@ -1153,6 +1191,26 @@ static int o2net_advance_rx(struct o2net
sclog(sc, "receiving\n");
do_gettimeofday(&sc->sc_tv_advance_start);
+ if(unlikely(sc->sc_handshake_ok == 0)) {
+ if(sc->sc_page_off < sizeof(struct o2net_handshake)) {
+ data = page_address(sc->sc_page) + sc->sc_page_off;
+ datalen = sizeof(struct o2net_handshake) -
sc->sc_page_off;
+ ret = o2net_recv_tcp_msg(sc->sc_sock, data, datalen);
+ if (ret > 0)
+ sc->sc_page_off += ret;
+ }
+
+ if (sc->sc_page_off == sizeof(struct o2net_handshake)) {
+ o2net_check_handshake(sc);
+ if(sc->sc_handshake_ok == 0) {
+ BUG_ON(sizeof(struct o2net_handshake)
+ == sizeof(struct o2net_msg));
Is this necessary?
I wasnt sure at the time - see below - so i wanted to make sure it at
least died sanely
apparently i still needed education on how that is done :)
Didn't we fix the logic such that the relative sizes
don't matter any more? If it _is_ necessary, then it should be a
BUILD_BUG_ON() in a more visible place,
ah, I was not familiar with that macro yet
with a nice fat comment explaining
why...
+ ret = -EPROTO;
+ }
+ goto out;
Do you mean to move that goto within the
if (sc->sc_handshake_ok == 0) {
block? I _think_ it's ok for us to continue otherwise...
i did - but if we never want to process an o2net_msg if the handshake
has not been completed, then i can structure things a little
differently/clearly
@@ -1178,8 +1227,7 @@ static int o2net_advance_rx(struct o2net
O2NET_MAX_PAYLOAD_BYTES)
ret = -EOVERFLOW;
}
- }
- if (ret <= 0)
+ } else
goto out;
}
Why are you doing that? We'll continue now if we want to return -
EOVERFLOW
where we would error out before.
damn - i should have noticed that
I'll resubmit once we sort out the o2net_msg / o2net_handshake
situation
--
Andrew Beekhof
"Would the last person to leave please turn out the enlightenment?" -
TISM
_______________________________________________
Ocfs2-devel mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-devel