Re: [Cluster-devel] About locking granularity of gfs2
On 04/24/2018 08:54 PM, Steven Whitehouse wrote: Hi, On 24/04/18 04:54, Guoqing Jiang wrote: Hi Steve, Thanks for your reply. On 04/24/2018 11:03 AM, Steven Whitehouse wrote: Hi, On 24/04/18 03:52, Guoqing Jiang wrote: Hi, Since gfs2 can "allow parallel allocation from different nodes simultaneously as the locking granularity is one lock per resource group" per section 3.2 of [1]. Could it possible to make the locking granularity also applies to R/W IO? Then, with the help of "sunit" and "swidth", we basically can lock a stripe, so all nodes can write to different stripes in parallel, so the basic IO unit is one stripe. Since I don't know gfs2 well, I am wondering it is possible to do it or it doesn't make sense at all for the idea due to some reasons. Any thoughts would be appreciated, thanks. I am asking the question because if people want to add the cluster support for md/raid5, then it is better to get the help from filesystem level to ensure only one node can access a stripe at a time, otherwise we have to locking a stripe in md layer which could cause performance issue. [1] https://www.kernel.org/doc/ols/2007/ols2007v2-pages-253-260.pdf Regards, Guoqing It is not just performance, it would be correctness too, since there is no guarantee that two nodes are not writing to the same stripe at the same time. Yes, no fs can guarantee it. I am wondering if using GFS2 as a local filesystem, and gfs2 runs on top of raid5, is it possible that gfs2 can write to two places simultaneously while the two places belong to one stripe? Yes Based on the possibility, I guess it is not recommend to run gfs2 on raid5. The locking granularity is per-inode generally, but also per-rgrp in case of rgrps, but that refers only to the header/bitmap since the allocated blocks are subject to the per-inode glocks in general. Please correct me, does it mean there are two types of locking granularity? per-rgrp is for allocate rgrp, and per-inode if for R/W IO, thanks. It depends what operation is being undertaken. The per-inode glock covers all the blocks related to the inode, but during allocation and deallocation, the responsibility for the allocated and deallocate blocks passes between the rgrp and inode to which they relate. So the situation is more complicated than when no allocation/deallocation is involved, Thanks a lot for your explanation. Regards, Guoqing
Re: [Cluster-devel] About locking granularity of gfs2
Hi, On 24/04/18 04:54, Guoqing Jiang wrote: Hi Steve, Thanks for your reply. On 04/24/2018 11:03 AM, Steven Whitehouse wrote: Hi, On 24/04/18 03:52, Guoqing Jiang wrote: Hi, Since gfs2 can "allow parallel allocation from different nodes simultaneously as the locking granularity is one lock per resource group" per section 3.2 of [1]. Could it possible to make the locking granularity also applies to R/W IO? Then, with the help of "sunit" and "swidth", we basically can lock a stripe, so all nodes can write to different stripes in parallel, so the basic IO unit is one stripe. Since I don't know gfs2 well, I am wondering it is possible to do it or it doesn't make sense at all for the idea due to some reasons. Any thoughts would be appreciated, thanks. I am asking the question because if people want to add the cluster support for md/raid5, then it is better to get the help from filesystem level to ensure only one node can access a stripe at a time, otherwise we have to locking a stripe in md layer which could cause performance issue. [1] https://www.kernel.org/doc/ols/2007/ols2007v2-pages-253-260.pdf Regards, Guoqing It is not just performance, it would be correctness too, since there is no guarantee that two nodes are not writing to the same stripe at the same time. Yes, no fs can guarantee it. I am wondering if using GFS2 as a local filesystem, and gfs2 runs on top of raid5, is it possible that gfs2 can write to two places simultaneously while the two places belong to one stripe? Yes The locking granularity is per-inode generally, but also per-rgrp in case of rgrps, but that refers only to the header/bitmap since the allocated blocks are subject to the per-inode glocks in general. Please correct me, does it mean there are two types of locking granularity? per-rgrp is for allocate rgrp, and per-inode if for R/W IO, thanks. It depends what operation is being undertaken. The per-inode glock covers all the blocks related to the inode, but during allocation and deallocation, the responsibility for the allocated and deallocate blocks passes between the rgrp and inode to which they relate. So the situation is more complicated than when no allocation/deallocation is involved, Steve. Guoqing
Re: [Cluster-devel] About locking granularity of gfs2
On 04/24/2018 01:13 PM, Gang He wrote: Stripe unit is logical volume concepts, for file system, it should not know this, for file system, the access unit is block (power of disk sector size). IMHO, It is true for typical fs, but I think zfs and btrfs can know about it well, though no cluster fs can support it now. Thanks, Guoqing
Re: [Cluster-devel] About locking granularity of gfs2
Hi Steve, Thanks for your reply. On 04/24/2018 11:03 AM, Steven Whitehouse wrote: Hi, On 24/04/18 03:52, Guoqing Jiang wrote: Hi, Since gfs2 can "allow parallel allocation from different nodes simultaneously as the locking granularity is one lock per resource group" per section 3.2 of [1]. Could it possible to make the locking granularity also applies to R/W IO? Then, with the help of "sunit" and "swidth", we basically can lock a stripe, so all nodes can write to different stripes in parallel, so the basic IO unit is one stripe. Since I don't know gfs2 well, I am wondering it is possible to do it or it doesn't make sense at all for the idea due to some reasons. Any thoughts would be appreciated, thanks. I am asking the question because if people want to add the cluster support for md/raid5, then it is better to get the help from filesystem level to ensure only one node can access a stripe at a time, otherwise we have to locking a stripe in md layer which could cause performance issue. [1] https://www.kernel.org/doc/ols/2007/ols2007v2-pages-253-260.pdf Regards, Guoqing It is not just performance, it would be correctness too, since there is no guarantee that two nodes are not writing to the same stripe at the same time. Yes, no fs can guarantee it. I am wondering if using GFS2 as a local filesystem, and gfs2 runs on top of raid5, is it possible that gfs2 can write to two places simultaneously while the two places belong to one stripe? The locking granularity is per-inode generally, but also per-rgrp in case of rgrps, but that refers only to the header/bitmap since the allocated blocks are subject to the per-inode glocks in general. Please correct me, does it mean there are two types of locking granularity? per-rgrp is for allocate rgrp, and per-inode if for R/W IO, thanks. I don't think it would be easy to try and make them correspond with raid stripes and getting gfs2 to work with md would be non-trivial, Totally agree :-). Regards,, Guoqing
Re: [Cluster-devel] About locking granularity of gfs2
Hi, On 24/04/18 03:52, Guoqing Jiang wrote: Hi, Since gfs2 can "allow parallel allocation from different nodes simultaneously as the locking granularity is one lock per resource group" per section 3.2 of [1]. Could it possible to make the locking granularity also applies to R/W IO? Then, with the help of "sunit" and "swidth", we basically can lock a stripe, so all nodes can write to different stripes in parallel, so the basic IO unit is one stripe. Since I don't know gfs2 well, I am wondering it is possible to do it or it doesn't make sense at all for the idea due to some reasons. Any thoughts would be appreciated, thanks. I am asking the question because if people want to add the cluster support for md/raid5, then it is better to get the help from filesystem level to ensure only one node can access a stripe at a time, otherwise we have to locking a stripe in md layer which could cause performance issue. [1] https://www.kernel.org/doc/ols/2007/ols2007v2-pages-253-260.pdf Regards, Guoqing It is not just performance, it would be correctness too, since there is no guarantee that two nodes are not writing to the same stripe at the same time. The locking granularity is per-inode generally, but also per-rgrp in case of rgrps, but that refers only to the header/bitmap since the allocated blocks are subject to the per-inode glocks in general. I don't think it would be easy to try and make them correspond with raid stripes and getting gfs2 to work with md would be non-trivial, Steve.