At Mon, 04 Jun 2012 16:15:48 +0800,
Liu Yuan wrote:
> 
> On 06/04/2012 04:07 PM, MORITA Kazutaka wrote:
> 
> >> I am not 100% about this issue. It is from the experience from
> >> > development of sheepfs, when I use a single FD to read/write. Since FUSE
> >> > will issue highly concurrent requests, I noticed the same error as above
> >> > example: the error code is quite random (see above is '54014b01').
> >> > 
> >> > After a long time debugging, I came to a conclusion that the problem
> >> > *might* be:
> >> > 
> >> > The subsequent read/write requests interleaves with the previous one,
> >> > and wrongly read the response.
> > I think we should reveal how they interleave before working out how to
> > fix.
> > 
> > The current fd cache seems to allow multiple accesses to the same node
> > because cached_fds is a thread-local variable and there is no fd which
> > is used by multiple threads at the same time.
> 
> 
> Ah, yes, it is thread local. Then I have no idea how the ret value could
> be random, I don't find a reliable way to reproduce this problem.

One possibility is that if forward_write_obj_req() fails before
receiving data, the next forward_(read|write)_obj_req() could be
interleaved.

The below untested patch may fix the problem though the approach is a
poor way.

diff --git a/sheep/gateway.c b/sheep/gateway.c
index d287d0c..a8e090e 100644
--- a/sheep/gateway.c
+++ b/sheep/gateway.c
@@ -124,7 +124,7 @@ int forward_write_obj_req(struct request *req)
                if (fd < 0) {
                        eprintf("failed to connect to %s:%"PRIu32"\n", name, 
v->port);
                        ret = SD_RES_NETWORK_ERROR;
-                       goto out;
+                       goto err;
                }
 
                ret = send_req(fd, &fwd_hdr, req->data, &wlen);
@@ -132,7 +132,7 @@ int forward_write_obj_req(struct request *req)
                        del_sheep_fd(fd);
                        ret = SD_RES_NETWORK_ERROR;
                        dprintf("fail %"PRIu32"\n", ret);
-                       goto out;
+                       goto err;
                }
 
                pfds[nr_fds].fd = fd;
@@ -151,7 +151,8 @@ int forward_write_obj_req(struct request *req)
 
                if (rsp->result != SD_RES_SUCCESS) {
                        eprintf("fail %"PRIu32"\n", ret);
-                       goto out;
+                       ret = rsp->result;
+                       goto err;
                }
        }
 
@@ -212,6 +213,10 @@ again:
        }
 out:
        return ret;
+err:
+       for (i = 0; i < nr_fds; i++)
+               del_sheep_fd(pfds[i].fd);
+       return ret;
 }
 
 static int fix_object_consistency(struct request *req)
-- 
sheepdog mailing list
[email protected]
http://lists.wpkg.org/mailman/listinfo/sheepdog

Reply via email to