On 06/21/2012 04:12 PM, levin li wrote:
> From: levin li <[email protected]>
> 
> read_object() calls forward_read_obj_req() to get the object,
> but may fail with result SD_RES_OLD_NODE_VER, read_object() do
> nothing to handle this error, this patch fixed this problem
> by sending a gateway request to local node, making gateway to
> retry when error occur.
> 
> Signed-off-by: levin li <[email protected]>
> ---
>  sheep/store.c |   47 +++++++++++++++++++++++++++++++----------------
>  1 file changed, 31 insertions(+), 16 deletions(-)
> 
> diff --git a/sheep/store.c b/sheep/store.c
> index 52c4716..aae322b 100644
> --- a/sheep/store.c
> +++ b/sheep/store.c
> @@ -558,9 +558,11 @@ int read_object(struct vnode_info *vnodes, uint32_t 
> epoch,
>               uint64_t oid, char *data, unsigned int datalen,
>               uint64_t offset, int nr_copies)
>  {
> -     struct request read_req;
> -     struct sd_req *hdr = &read_req.rq;
> -     int ret;
> +     struct sd_req hdr;
> +     struct sd_rsp *rsp = (struct sd_rsp *)&hdr;
> +     char host[128];
> +     unsigned int wlen = 0, rlen = datalen;
> +     int fd, ret;
>  
>       if (sys->enable_write_cache && object_is_cached(oid)) {
>               ret = object_cache_read(oid, data, datalen, offset,
> @@ -572,23 +574,36 @@ int read_object(struct vnode_info *vnodes, uint32_t 
> epoch,
>               }
>               return ret;
>       }
> -     memset(&read_req, 0, sizeof(read_req));
> +
>  forward_read:
> -     hdr->opcode = SD_OP_READ_OBJ;
> -     hdr->data_length = datalen;
> -     hdr->epoch = epoch;
> +     addr_to_str(host, sizeof(host), sys->this_node.addr, 0);
> +     fd = connect_to(host, sys->this_node.port);
> +     if (fd < 0) {
> +             dprintf("Failed to connect to local node\n");
> +             return SD_RES_EIO;

Be careful to return SD_RES_EIO, return network err instead for this case.

> +     }
>  
> -     hdr->obj.oid = oid;
> -     hdr->obj.offset = offset;
> -     hdr->obj.copies = nr_copies;
> +     hdr.opcode = SD_OP_READ_OBJ;
> +     hdr.data_length = datalen;
> +     hdr.epoch = epoch;
>  
> -     read_req.data = data;
> -     read_req.op = get_sd_op(hdr->opcode);
> -     read_req.vnodes = vnodes;
> +     hdr.obj.oid = oid;
> +     hdr.obj.offset = offset;
> +     hdr.obj.copies = nr_copies;
>  
> -     ret = forward_read_obj_req(&read_req);
> -     if (ret != SD_RES_SUCCESS)
> -             eprintf("failed to forward read object %x\n", ret);
> +     ret = exec_req(fd, &hdr, data, &wlen, &rlen);
> +     close(fd);
> +
> +     if (ret) {
> +             dprintf("Failed to read object %" PRIx64 "\n", oid);
> +             return SD_RES_EIO;

return SD_NETWORK_ERR

> +     }
> +
> +     if (rsp->result != SD_RES_SUCCESS) {
> +             dprintf("Failed to read object %" PRIx64 " %s\n", oid,
> +                     sd_strerror(rsp->result));
> +             return rsp->result;
> +     }
>  
>       return ret;
>  }
> 

Seems that write/remove_object() and object cache push (which is more
fatal) could fail without any retry handling. Those places also need
this retry mechanism.

By looking more at the code, I think we also need a
forward_remove_obj_req() function too. Then
forward_write/read/remove_obj_req() can be used as gateway requests
which handles retry internally.

Also we'd better off cache a unix domain fd(s) to local node in 'sys'(
in order to accelerate the connection of this kind.

Thanks,
Yuan

-- 
sheepdog mailing list
[email protected]
http://lists.wpkg.org/mailman/listinfo/sheepdog

Reply via email to