Re: [Gluster-devel] [RFC] Consistency issues with DHT after snapshots are taken

2014-04-11 Thread Raghavendra G
On Thu, Apr 10, 2014 at 11:02 AM, Raghavendra Gowdappa
rgowd...@redhat.comwrote:

 Hi all,

 I was trying to come up with some consistency issues. I am not sure
 whether case 5 is a valid one, since lookup would succeed and mkdir would
 fail with EEXIST (scroll down to the case for more detailed explanation).


Case 5 is a valid one. This comment was based on an earlier test case which
seemed to be invalid. Sorry about the confusion.



 We are considering a distribute of 3 bricks - b1, b2, b3.

 Case 1:
 ===

 Operation: rename (src, dst) - dst does not exist

 T0: rename successful on Hashed subvol but not on other bricks
 T1: Snapshot on b1, b2, b3

 Result: After snapshot is restored and healing is complete on src, dst we
 end up with two directories src and dst having gfid of src

 Case 2:
 ===

 Operation: Two parallel rename (src, dst) and rename (dst, src). Both src
 and dst exist and hash to b1 and b2 respectively

 T0: rename (src, dst) successful on b1
 T1: rename (dst, src) successful on b2
 T3: Snapshot on b1, b2, b3

 Result:
 After restore, if lookup happens on src and is healed to b1 from b2, gfids
 of src on each brick will be,
 b1 - (src, dst-gfid)
 b2 - (src, dst-gfid)
 b3 - (src, src-gfid)

 Case 3:
 ===

 Operation: Parallel rename and two mkdirs. Only src exists. Both hash to
 same brick b1.

 T0: two lookups triggered as part of application mkdir1 and mkdir2
 complete with ENOENT.
 T1: mkdir2 goes ahead and creates directory with gfid, gfid1
 T2: rename (src, dst) on b1
 T3: mkdir1 (src) on b1
 T4: snapshot on b1, b2 and b3

 Result:
 After restore and healing of src and dst, we end up with,
 b1 - (src, gfid2) and (dst, gfid1)
 b2 - (src, gfid1) and (dst, gfid1)
 b3 - (src, gfid1) and (dst, gfid1)

 Another reason for this inconsistency is that dht don't consider mkdir
 failures with EEXIST on subvols as failures. More details can be found in
 [2].

 Case 4:
 ===

 Operation: Parallel rename (src, dst) and rmdir (src). Both src and dst
 exist with gfids gfid1 and gfid2 respectively

 T0: rename (src, dst) on b1
 T1: rmdir (src) on b2 and b3
 T2: snapshot on b1, b2 and b3

 Result: After restore and healing,
 b1 - (dst, gfid1)
 b2 - (dst, gfid2)
 b3 - (dst, gfid2)

 case 5:
 ===

 This bug was hit and fix being reviewed at [1]

 Operation: Parallel two rmdir and two mkdirs. Directory dir does not exist
 to start with.

 T0: two lookups triggered as part of application mkdir1 and mkdir2
 complete with ENOENT.
 T1: mkdir2 goes ahead and creates directory with gfid, gfid1
 T2: rmdir1 (dir) on b1
 T3: lookup (dir) triggered as part of rmdir2 (or any name based
 opeartion), heals dir on b1 with gfid, gfid2
 T4: mkdir1 (dir, gfid2) on b2 and b3
 T5: snapshots on b1, b2 and b3

 Result:
 b1 - (dir, gfid1)
 b2 - (dir, gfid2)
 b3 - (dir, gfid2)

 Considering all these issues, following set of fixes have been proposed:

 1. in posix, if we receive mkdir (dir1) on an existing gfid (with name
 dir2), posix will convert mkdir (dir1) into rename (dir1, dir2). This
 solves case 1

 2. in case of rename (src, dst), if dst already exists, rmdir (dst), so
 that we don't bring in inconsistency into dst gfid space. This solves all
 the cases of inconsistencies in dst gfid with rename failing.

 3. hold entrylks in directory heal (part of lookup) and rmdir. This solves
 consistency issues because of races b/w mkdir and rmdir.

 [1] http://review.gluster.org/#/c/4846/
 [2] http://review.gluster.org/4459

 regards,
 Raghavendra.

 ___
 Gluster-devel mailing list
 Gluster-devel@nongnu.org
 https://lists.nongnu.org/mailman/listinfo/gluster-devel




-- 
Raghavendra G
___
Gluster-devel mailing list
Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel


[Gluster-devel] [RFC] Consistency issues with DHT after snapshots are taken

2014-04-09 Thread Raghavendra Gowdappa
Hi all,

I was trying to come up with some consistency issues. I am not sure whether 
case 5 is a valid one, since lookup would succeed and mkdir would fail with 
EEXIST (scroll down to the case for more detailed explanation).

We are considering a distribute of 3 bricks - b1, b2, b3.

Case 1:
===

Operation: rename (src, dst) - dst does not exist

T0: rename successful on Hashed subvol but not on other bricks
T1: Snapshot on b1, b2, b3

Result: After snapshot is restored and healing is complete on src, dst we end 
up with two directories src and dst having gfid of src

Case 2:
===

Operation: Two parallel rename (src, dst) and rename (dst, src). Both src and 
dst exist and hash to b1 and b2 respectively

T0: rename (src, dst) successful on b1
T1: rename (dst, src) successful on b2
T3: Snapshot on b1, b2, b3

Result:
After restore, if lookup happens on src and is healed to b1 from b2, gfids of 
src on each brick will be,
b1 - (src, dst-gfid)
b2 - (src, dst-gfid)
b3 - (src, src-gfid)

Case 3:
===

Operation: Parallel rename and two mkdirs. Only src exists. Both hash to same 
brick b1.

T0: two lookups triggered as part of application mkdir1 and mkdir2 complete 
with ENOENT.
T1: mkdir2 goes ahead and creates directory with gfid, gfid1
T2: rename (src, dst) on b1
T3: mkdir1 (src) on b1
T4: snapshot on b1, b2 and b3

Result:
After restore and healing of src and dst, we end up with,
b1 - (src, gfid2) and (dst, gfid1)
b2 - (src, gfid1) and (dst, gfid1)
b3 - (src, gfid1) and (dst, gfid1)

Another reason for this inconsistency is that dht don't consider mkdir failures 
with EEXIST on subvols as failures. More details can be found in [2].

Case 4:
===

Operation: Parallel rename (src, dst) and rmdir (src). Both src and dst exist 
with gfids gfid1 and gfid2 respectively

T0: rename (src, dst) on b1
T1: rmdir (src) on b2 and b3
T2: snapshot on b1, b2 and b3

Result: After restore and healing,
b1 - (dst, gfid1)
b2 - (dst, gfid2)
b3 - (dst, gfid2)

case 5:
===

This bug was hit and fix being reviewed at [1]

Operation: Parallel two rmdir and two mkdirs. Directory dir does not exist to 
start with.

T0: two lookups triggered as part of application mkdir1 and mkdir2 complete 
with ENOENT.
T1: mkdir2 goes ahead and creates directory with gfid, gfid1
T2: rmdir1 (dir) on b1
T3: lookup (dir) triggered as part of rmdir2 (or any name based opeartion), 
heals dir on b1 with gfid, gfid2
T4: mkdir1 (dir, gfid2) on b2 and b3
T5: snapshots on b1, b2 and b3

Result:
b1 - (dir, gfid1)
b2 - (dir, gfid2)
b3 - (dir, gfid2)

Considering all these issues, following set of fixes have been proposed:

1. in posix, if we receive mkdir (dir1) on an existing gfid (with name dir2), 
posix will convert mkdir (dir1) into rename (dir1, dir2). This solves case 1

2. in case of rename (src, dst), if dst already exists, rmdir (dst), so that we 
don't bring in inconsistency into dst gfid space. This solves all the cases of 
inconsistencies in dst gfid with rename failing.

3. hold entrylks in directory heal (part of lookup) and rmdir. This solves 
consistency issues because of races b/w mkdir and rmdir.

[1] http://review.gluster.org/#/c/4846/
[2] http://review.gluster.org/4459

regards,
Raghavendra.

___
Gluster-devel mailing list
Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel