On 06/21/2012 04:57 PM, Liu Yuan wrote:
> On 06/21/2012 04:12 PM, levin li wrote:
>> From: levin li <[email protected]>
>>
>> read_object() calls forward_read_obj_req() to get the object,
>> but may fail with result SD_RES_OLD_NODE_VER, read_object() do
>> nothing to handle this error, this patch fixed this problem
>> by sending a gateway request to local node, making gateway to
>> retry when error occur.
>>
>> Signed-off-by: levin li <[email protected]>
>> ---
>>  sheep/store.c |   47 +++++++++++++++++++++++++++++++----------------
>>  1 file changed, 31 insertions(+), 16 deletions(-)
>>
>> diff --git a/sheep/store.c b/sheep/store.c
>> index 52c4716..aae322b 100644
>> --- a/sheep/store.c
>> +++ b/sheep/store.c
>> @@ -558,9 +558,11 @@ int read_object(struct vnode_info *vnodes, uint32_t 
>> epoch,
>>              uint64_t oid, char *data, unsigned int datalen,
>>              uint64_t offset, int nr_copies)
>>  {
>> -    struct request read_req;
>> -    struct sd_req *hdr = &read_req.rq;
>> -    int ret;
>> +    struct sd_req hdr;
>> +    struct sd_rsp *rsp = (struct sd_rsp *)&hdr;
>> +    char host[128];
>> +    unsigned int wlen = 0, rlen = datalen;
>> +    int fd, ret;
>>  
>>      if (sys->enable_write_cache && object_is_cached(oid)) {
>>              ret = object_cache_read(oid, data, datalen, offset,
>> @@ -572,23 +574,36 @@ int read_object(struct vnode_info *vnodes, uint32_t 
>> epoch,
>>              }
>>              return ret;
>>      }
>> -    memset(&read_req, 0, sizeof(read_req));
>> +
>>  forward_read:
>> -    hdr->opcode = SD_OP_READ_OBJ;
>> -    hdr->data_length = datalen;
>> -    hdr->epoch = epoch;
>> +    addr_to_str(host, sizeof(host), sys->this_node.addr, 0);
>> +    fd = connect_to(host, sys->this_node.port);
>> +    if (fd < 0) {
>> +            dprintf("Failed to connect to local node\n");
>> +            return SD_RES_EIO;
> 
> Be careful to return SD_RES_EIO, return network err instead for this case.
> 
>> +    }
>>  
>> -    hdr->obj.oid = oid;
>> -    hdr->obj.offset = offset;
>> -    hdr->obj.copies = nr_copies;
>> +    hdr.opcode = SD_OP_READ_OBJ;
>> +    hdr.data_length = datalen;
>> +    hdr.epoch = epoch;
>>  
>> -    read_req.data = data;
>> -    read_req.op = get_sd_op(hdr->opcode);
>> -    read_req.vnodes = vnodes;
>> +    hdr.obj.oid = oid;
>> +    hdr.obj.offset = offset;
>> +    hdr.obj.copies = nr_copies;
>>  
>> -    ret = forward_read_obj_req(&read_req);
>> -    if (ret != SD_RES_SUCCESS)
>> -            eprintf("failed to forward read object %x\n", ret);
>> +    ret = exec_req(fd, &hdr, data, &wlen, &rlen);
>> +    close(fd);
>> +
>> +    if (ret) {
>> +            dprintf("Failed to read object %" PRIx64 "\n", oid);
>> +            return SD_RES_EIO;
> 
> return SD_NETWORK_ERR
> 
>> +    }
>> +
>> +    if (rsp->result != SD_RES_SUCCESS) {
>> +            dprintf("Failed to read object %" PRIx64 " %s\n", oid,
>> +                    sd_strerror(rsp->result));
>> +            return rsp->result;
>> +    }
>>  
>>      return ret;
>>  }
>>
> 
> Seems that write/remove_object() and object cache push (which is more
> fatal) could fail without any retry handling. Those places also need
> this retry mechanism.
> 

Yes, I'll fix them later.

> By looking more at the code, I think we also need a
> forward_remove_obj_req() function too. Then
> forward_write/read/remove_obj_req() can be used as gateway requests
> which handles retry internally.
> 

I don't think we need a forward_remove_obj_req(), in gateway request,
every request without SD_FLAG_CMD_WRITE would be forwarded by 
forward_read_obj_req(), others would be processed by forward_write_obj_req(),
we can send a remove request with SD_FLAG_CMD_WRITE to make the request be
forwarded to 3 nodes.

> Also we'd better off cache a unix domain fd(s) to local node in 'sys'(
> in order to accelerate the connection of this kind.
> 
> Thanks,
> Yuan
> 

thanks,

levin
-- 
sheepdog mailing list
[email protected]
http://lists.wpkg.org/mailman/listinfo/sheepdog

Reply via email to