On 06/05/2012 07:53 PM, Christoph Hellwig wrote:

> Make sure we exit if any operations fails, operations that can fail are:
> 
>  - get_latest_epoch - because local_stat_cluster is a a force operation
>    we might not have any epoch log yet.  In that case there is no point
>    to continue.
>  - epoch_log_read - can fail in any number of ways, although once we
>    have a valid epoch it generally shouldn't.
>  - if we didn't manage to read the epoch log from any node we should
>    return -1 as there is no valid data returned.  If we have a valid
>    node list of size 0 one of the nodes will return that to us.
> 
> Also adjust get_vnodes_from_epoch to deal with the negative return value,
> local_stat_cluster currently doesn't have any error handling and will
> be fixed in the next patch.
> 
> Signed-off-by: Christoph Hellwig <[email protected]>
> 
> Index: sheepdog/sheep/store.c
> ===================================================================
> --- sheepdog.orig/sheep/store.c       2012-06-05 07:40:55.336111143 -0400
> +++ sheepdog/sheep/store.c    2012-06-05 07:41:00.272111270 -0400
> @@ -102,10 +102,17 @@ void do_io_request(struct work *work)
>  int epoch_log_read_remote(uint32_t epoch, char *buf, int len)
>  {
>       int i, ret;
> -     unsigned int nr, le = get_latest_epoch();
> +     unsigned int nr, le;
>       struct sd_node nodes[SD_MAX_NODES];
>  
> +     le = get_latest_epoch();
> +     if (!le)
> +             return 0;
> +
>       nr = epoch_log_read(le, (char *)nodes, sizeof(nodes));
> +     if (nr < 0)
> +             return -1;
> +
>       nr /= sizeof(nodes[0]);
>  
>       for (i = 0; i < nr; i++) {
> @@ -135,16 +142,11 @@ int epoch_log_read_remote(uint32_t epoch
>               ret = exec_req(fd, &hdr, buf, &wlen, &rlen);
>               close(fd);
>  
> -             if (ret)
> -                     continue;
> -             if (rsp->result == SD_RES_SUCCESS) {
> -                     ret = rsp->data_length;
> -                     goto out;
> -             }
> +             if (!ret && rsp->result == SD_RES_SUCCESS)
> +                     return rsp->data_length;
>       }
> -     ret = 0; /* If no one has targeted epoch file, we can safely return 0 */
> -out:
> -     return ret;
> +
> +     return -1;
>  }
>  
>  int epoch_log_read_nr(uint32_t epoch, char *buf, int len)
> Index: sheepdog/sheep/recovery.c
> ===================================================================
> --- sheepdog.orig/sheep/recovery.c    2012-06-05 07:32:25.608098136 -0400
> +++ sheepdog/sheep/recovery.c 2012-06-05 07:41:15.808111665 -0400
> @@ -63,7 +63,7 @@ static struct vnode_info *get_vnodes_fro
>       if (nr_nodes < 0) {
>               nr_nodes = epoch_log_read_remote(epoch, (void *)nodes,
>                                                sizeof(nodes));
> -             if (nr_nodes == 0)
> +             if (nr_nodes <= 0)
>                       return NULL;
>               nr_nodes /= sizeof(nodes[0]);
>       }


We really should return 0 if we don't get any epoch file. Without your
patch, if we lost one of epochs file in all nodes, say bellow case, I
deliberately remove epoch 3, we can till get other informatoin:

--------
tailai.ly@taobao:~/sheepdog$ collie cluster info
Cluster status: running

Cluster created at Tue Jun  5 21:44:21 2012

Epoch Time           Version
2012-06-05 21:44:24      4 [127.0.0.1:7000, 127.0.0.1:7001,
127.0.0.1:7002, 127.0.0.1:7003, 127.0.0.1:7004, 127.0.0.1:7005]
1970-01-01 07:00:00      3 []
2012-06-05 21:44:22      2 [127.0.0.1:7000, 127.0.0.1:7001,
127.0.0.1:7002, 127.0.0.1:7003]
2012-06-05 21:44:21      1 [127.0.0.1:7000, 127.0.0.1:7001, 127.0.0.1:7002]

but with your patch, we can only get:
tailai.ly@taobao:~/sheepdog$ collie cluster info
Cluster status: running

Cluster created at Tue Jun  5 21:43:41 2012

Epoch Time           Version
2012-06-05 21:43:44      4 [127.0.0.1:7000, 127.0.0.1:7001,
127.0.0.1:7002, 127.0.0.1:7003, 127.0.0.1:7004, 127.0.0.1:7005]

------

Thanks,
Yuan
-- 
sheepdog mailing list
[email protected]
http://lists.wpkg.org/mailman/listinfo/sheepdog

Reply via email to