On Nov 30, 2006, at 12:31 AM, Mark Fasheh wrote:

Hi Andrew,
        Things are looking much better, but there's still a few issues that
I found while reviewing the patch. I got Zach to look at it too (he's the original author of the ocfs2 network code) which has generated some good
comments.

On Wed, Nov 29, 2006 at 09:51:31AM +0100, [EMAIL PROTECTED] wrote:

From: Andrew Beekhof <[EMAIL PROTECTED]>
Subject: [patch 1/1] OCFS2 Configurable timeouts - Protocol changes

Modify the OCFS2 handshake to ensure essential timeouts are configured
  identically on all nodes.
Only allow changes when there are no connected peers
Improves the logic in o2net_advance_rx() which broke now that
sizeof(struct o2net_handshake) is greater than sizeof(struct o2net_msg) Included is the field for userspace-heartbeat timeout to avoid the need for
  further protocol changes.
Uses a global spinlock to ensure the decisions to update configfs entries are made on the correct value. The region covered by the spinlock when incrimenting the counter is much larger as this is the more critical case.
Nitpick: Can you format that commit log to be a bit more in line with
standard kernel commits (the indenting is weird)

sure



Index: fs/ocfs2/cluster/nodemanager.c
===================================================================
--- fs/ocfs2/cluster/nodemanager.c.orig 2006-11-20 16:25:58.000000000 +0100 +++ fs/ocfs2/cluster/nodemanager.c 2006-11-27 09:57:56.000000000 +0100
@@ -558,15 +558,14 @@ static ssize_t o2nm_cluster_attr_write(c
        return count;
 }

-static ssize_t o2nm_cluster_attr_idle_timeout_ms_read(struct o2nm_cluster *cluster,
-                                                 char *page)
+static ssize_t o2nm_cluster_attr_idle_timeout_ms_read(
+       struct o2nm_cluster *cluster, char *page)
 {
        return sprintf(page, "%u\n", cluster->cl_idle_timeout_ms);
 }
Can you not re-write the function prototypes unless they're actually
changing please? It clutters up the patch and makes it harder to find the
actual code to check (see below).

ah, bad habit i picked up working on smaller projects.
is it ok in a separate patch? or have I got my wrap points set too small by default?



@@ -574,10 +573,22 @@ static ssize_t o2nm_cluster_attr_idle_ti
        ret =  o2nm_cluster_attr_write(page, count, &val);

        if (ret > 0) {
+               if (cluster->cl_idle_timeout_ms != val) {
+                       spin_lock(&connected_lock);
+                       if(o2net_num_connected_peers()) {
+                               mlog(ML_NOTICE,
+                                    "o2net: cannot change idle timeout after "
+                                    "the first peer has agreed to it."
+                                    "  %d connected peers\n",
+                                    o2net_num_connected_peers());
+                               ret = -EINVAL;
+                       }
+                       spin_unlock(&connected_lock);
+               }
                if (val <= cluster->cl_keepalive_delay_ms) {
                        mlog(ML_NOTICE, "o2net: idle timeout must be larger "
                             "than keepalive delay\n");
-                       return -EINVAL;
+                       ret = -EINVAL;
                }
                cluster->cl_idle_timeout_ms = val;
I don't know how I missed this before, but you're erroring with a negative return value, yet continuing with the work of setting cluster- >cl_idle_timeout_ms anyway. I think we're missing some goto's here and in the similar blocks
below.

my bad - fixed


@@ -1121,6 +1121,44 @@ static int o2net_check_handshake(struct
                return -1;
        }

+       /*
+        * Ensure timeouts are consistent with other nodes, otherwise
+ * we can end up with one node thinking that the other must be down,
+        * but isn't. This can ultimately cause corruption.
+        */
+       if (be32_to_cpu(hand->o2net_idle_timeout_ms) !=
+                               o2net_idle_timeout(sc->sc_node)) {
+               mlog(ML_NOTICE, SC_NODEF_FMT " uses a network idle timeout of "
+                    "%u ms, but we use %u ms locally.  disconnecting\n",
+                    SC_NODEF_ARGS(sc),
+                    be32_to_cpu(hand->o2net_idle_timeout_ms),
+                    o2net_idle_timeout(sc->sc_node));
+               o2net_ensure_shutdown(nn, sc, -ENOTCONN);
+               return -1;
+       }
+
+       if (be32_to_cpu(hand->o2net_keepalive_delay_ms) !=
+                       o2net_keepalive_delay(sc->sc_node)) {
+               mlog(ML_NOTICE, SC_NODEF_FMT " uses a keepalive delay of "
+                    "%u ms, but we use %u ms locally.  disconnecting\n",
+                    SC_NODEF_ARGS(sc),
+                    be32_to_cpu(hand->o2net_keepalive_delay_ms),
+                    o2net_keepalive_delay(sc->sc_node));
+               o2net_ensure_shutdown(nn, sc, -ENOTCONN);
+               return -1;
+       }
+
+       if (be32_to_cpu(hand->o2hb_heartbeat_timeout_ms) !=
+                       O2HB_MAX_WRITE_TIMEOUT_MS) {
+               mlog(ML_NOTICE, SC_NODEF_FMT " uses a heartbeat timeout of "
+                    "%u ms, but we use %u ms locally.  disconnecting\n",
+                    SC_NODEF_ARGS(sc),
+                    be32_to_cpu(hand->o2net_keepalive_delay_ms),
We check hearbeat timeout here, but print keepalive delay...


fixed


@@ -1153,6 +1191,26 @@ static int o2net_advance_rx(struct o2net
        sclog(sc, "receiving\n");
        do_gettimeofday(&sc->sc_tv_advance_start);

+       if(unlikely(sc->sc_handshake_ok == 0)) {
+               if(sc->sc_page_off < sizeof(struct o2net_handshake)) {
+                       data = page_address(sc->sc_page) + sc->sc_page_off;
+                       datalen = sizeof(struct o2net_handshake) - 
sc->sc_page_off;
+                       ret = o2net_recv_tcp_msg(sc->sc_sock, data, datalen);
+                       if (ret > 0)
+                               sc->sc_page_off += ret;
+               }
+
+               if (sc->sc_page_off == sizeof(struct o2net_handshake)) {
+                       o2net_check_handshake(sc);
+                       if(sc->sc_handshake_ok == 0) {
+                               BUG_ON(sizeof(struct o2net_handshake)
+                                      == sizeof(struct o2net_msg));
Is this necessary?

I wasnt sure at the time - see below - so i wanted to make sure it at least died sanely
apparently i still needed education on how that is done :)

Didn't we fix the logic such that the relative sizes
don't matter any more? If it _is_ necessary, then it should be a
BUILD_BUG_ON() in a more visible place,

ah, I was not familiar with that macro yet

with a nice fat comment explaining
why...

+                               ret = -EPROTO;
+                       }
+                       goto out;
Do you mean to move that goto within the

if (sc->sc_handshake_ok == 0) {

block? I _think_ it's ok for us to continue otherwise...

i did - but if we never want to process an o2net_msg if the handshake has not been completed, then i can structure things a little differently/clearly



@@ -1178,8 +1227,7 @@ static int o2net_advance_rx(struct o2net
                                    O2NET_MAX_PAYLOAD_BYTES)
                                        ret = -EOVERFLOW;
                        }
-               }
-               if (ret <= 0)
+               } else
                        goto out;
        }
Why are you doing that? We'll continue now if we want to return - EOVERFLOW
where we would error out before.

damn - i should have noticed that


I'll resubmit once we sort out the o2net_msg / o2net_handshake situation

--
Andrew Beekhof

"Would the last person to leave please turn out the enlightenment?" - TISM


_______________________________________________
Ocfs2-devel mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-devel

Reply via email to