Re: [Ocfs2-devel] [patch 1/1] ocfs2-timeout-protocol.patch

Mark Fasheh Wed, 29 Nov 2006 15:35:42 -0800

Hi Andrew,
        Things are looking much better, but there's still a few issues that
I found while reviewing the patch. I got Zach to look at it too (he's the
original author of the ocfs2 network code) which has generated some good
comments.


On Wed, Nov 29, 2006 at 09:51:31AM +0100, [EMAIL PROTECTED] wrote:
> 
> From: Andrew Beekhof <[EMAIL PROTECTED]>
> Subject: [patch 1/1] OCFS2 Configurable timeouts - Protocol changes
> 
> Modify the OCFS2 handshake to ensure essential timeouts are configured
>   identically on all nodes.
> Only allow changes when there are no connected peers
> Improves the logic in o2net_advance_rx() which broke now that
>   sizeof(struct o2net_handshake) is greater than sizeof(struct o2net_msg)
> Included is the field for userspace-heartbeat timeout to avoid the need for
>   further protocol changes.
> Uses a global spinlock to ensure the decisions to update configfs entries
>   are made on the correct value.  The region covered by the spinlock when
>   incrimenting the counter is much larger as this is the more critical case.
Nitpick: Can you format that commit log to be a bit more in line with
standard kernel commits (the indenting is weird)


> Index: fs/ocfs2/cluster/nodemanager.c
> ===================================================================
> --- fs/ocfs2/cluster/nodemanager.c.orig       2006-11-20 16:25:58.000000000 
> +0100
> +++ fs/ocfs2/cluster/nodemanager.c    2006-11-27 09:57:56.000000000 +0100
> @@ -558,15 +558,14 @@ static ssize_t o2nm_cluster_attr_write(c
>       return count;
>  }
>  
> -static ssize_t o2nm_cluster_attr_idle_timeout_ms_read(struct o2nm_cluster 
> *cluster,
> -                                                 char *page)
> +static ssize_t o2nm_cluster_attr_idle_timeout_ms_read(
> +     struct o2nm_cluster *cluster, char *page)
>  {
>       return sprintf(page, "%u\n", cluster->cl_idle_timeout_ms);
>  }
Can you not re-write the function prototypes unless they're actually
changing please? It clutters up the patch and makes it harder to find the
actual code to check (see below).


> @@ -574,10 +573,22 @@ static ssize_t o2nm_cluster_attr_idle_ti
>       ret =  o2nm_cluster_attr_write(page, count, &val);
>  
>       if (ret > 0) {
> +             if (cluster->cl_idle_timeout_ms != val) {
> +                     spin_lock(&connected_lock);
> +                     if(o2net_num_connected_peers()) {
> +                             mlog(ML_NOTICE,
> +                                  "o2net: cannot change idle timeout after "
> +                                  "the first peer has agreed to it."
> +                                  "  %d connected peers\n",
> +                                  o2net_num_connected_peers());
> +                             ret = -EINVAL;
> +                     }
> +                     spin_unlock(&connected_lock);
> +             }
>               if (val <= cluster->cl_keepalive_delay_ms) {
>                       mlog(ML_NOTICE, "o2net: idle timeout must be larger "
>                            "than keepalive delay\n");
> -                     return -EINVAL;
> +                     ret = -EINVAL;
>               }
>               cluster->cl_idle_timeout_ms = val;
I don't know how I missed this before, but you're erroring with a negative 
return
value, yet continuing with the work of setting cluster->cl_idle_timeout_ms
anyway. I think we're missing some goto's here and in the similar blocks
below.


> @@ -1121,6 +1121,44 @@ static int o2net_check_handshake(struct 
>               return -1;
>       }
>  
> +     /*
> +      * Ensure timeouts are consistent with other nodes, otherwise
> +      * we can end up with one node thinking that the other must be down,
> +      * but isn't. This can ultimately cause corruption.
> +      */
> +     if (be32_to_cpu(hand->o2net_idle_timeout_ms) !=
> +                             o2net_idle_timeout(sc->sc_node)) {
> +             mlog(ML_NOTICE, SC_NODEF_FMT " uses a network idle timeout of "
> +                  "%u ms, but we use %u ms locally.  disconnecting\n",
> +                  SC_NODEF_ARGS(sc),
> +                  be32_to_cpu(hand->o2net_idle_timeout_ms),
> +                  o2net_idle_timeout(sc->sc_node));
> +             o2net_ensure_shutdown(nn, sc, -ENOTCONN);
> +             return -1;
> +     }
> +
> +     if (be32_to_cpu(hand->o2net_keepalive_delay_ms) !=
> +                     o2net_keepalive_delay(sc->sc_node)) {
> +             mlog(ML_NOTICE, SC_NODEF_FMT " uses a keepalive delay of "
> +                  "%u ms, but we use %u ms locally.  disconnecting\n",
> +                  SC_NODEF_ARGS(sc),
> +                  be32_to_cpu(hand->o2net_keepalive_delay_ms),
> +                  o2net_keepalive_delay(sc->sc_node));
> +             o2net_ensure_shutdown(nn, sc, -ENOTCONN);
> +             return -1;
> +     }
> +
> +     if (be32_to_cpu(hand->o2hb_heartbeat_timeout_ms) !=
> +                     O2HB_MAX_WRITE_TIMEOUT_MS) {
> +             mlog(ML_NOTICE, SC_NODEF_FMT " uses a heartbeat timeout of "
> +                  "%u ms, but we use %u ms locally.  disconnecting\n",
> +                  SC_NODEF_ARGS(sc),
> +                  be32_to_cpu(hand->o2net_keepalive_delay_ms),
We check hearbeat timeout here, but print keepalive delay...


> @@ -1153,6 +1191,26 @@ static int o2net_advance_rx(struct o2net
>       sclog(sc, "receiving\n");
>       do_gettimeofday(&sc->sc_tv_advance_start);
>  
> +     if(unlikely(sc->sc_handshake_ok == 0)) {
> +             if(sc->sc_page_off < sizeof(struct o2net_handshake)) {
> +                     data = page_address(sc->sc_page) + sc->sc_page_off;
> +                     datalen = sizeof(struct o2net_handshake) - 
> sc->sc_page_off;
> +                     ret = o2net_recv_tcp_msg(sc->sc_sock, data, datalen);
> +                     if (ret > 0)
> +                             sc->sc_page_off += ret;
> +             }
> +
> +             if (sc->sc_page_off == sizeof(struct o2net_handshake)) {
> +                     o2net_check_handshake(sc);
> +                     if(sc->sc_handshake_ok == 0) {
> +                             BUG_ON(sizeof(struct o2net_handshake)
> +                                    == sizeof(struct o2net_msg));
Is this necessary? Didn't we fix the logic such that the relative sizes
don't matter any more? If it _is_ necessary, then it should be a
BUILD_BUG_ON() in a more visible place, with a nice fat comment explaining
why...


> +                             ret = -EPROTO;
> +                     }
> +                     goto out;
Do you mean to move that goto within the

if (sc->sc_handshake_ok == 0) {

block? I _think_ it's ok for us to continue otherwise...


> @@ -1178,8 +1227,7 @@ static int o2net_advance_rx(struct o2net
>                                   O2NET_MAX_PAYLOAD_BYTES)
>                                       ret = -EOVERFLOW;
>                       }
> -             }
> -             if (ret <= 0)
> +             } else
>                       goto out;
>       }
Why are you doing that? We'll continue now if we want to return -EOVERFLOW
where we would error out before.

Thanks,
        --Mark

--
Mark Fasheh
Senior Software Developer, Oracle
[EMAIL PROTECTED]

_______________________________________________
Ocfs2-devel mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] [patch 1/1] ocfs2-timeout-protocol.patch

Reply via email to