Re: [Ocfs2-devel] [patch 10/11] ocfs2: add ocfs2_overwrite_io()
Hi Gang, On 2017/12/1 6:24, a...@linux-foundation.org wrote: > From: Gang He> Subject: ocfs2: add ocfs2_overwrite_io() > > Add ocfs2_overwrite_io(), which is used to judge if overwrite allocated > blocks, otherwise, the write will bring extra block allocation overhead. > > [g...@suse.com: v2] > Link: > https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_1511944612-2D9629-2D3-2Dgit-2Dsend-2Demail-2Dghe-40suse.com=DwICAg=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y=C4AZ7lGI2-gfFtbla8puNvYIA3a9Cr_XOH6oyx94oGo=QhxdPyP9gdEJZTa2UhuSk0x7ENXyGXUOIBV_LyP8oZw= > Link: > https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_1511775987-2D841-2D3-2Dgit-2Dsend-2Demail-2Dghe-40suse.com=DwICAg=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y=C4AZ7lGI2-gfFtbla8puNvYIA3a9Cr_XOH6oyx94oGo=T9Ljku-nipBqCaeXWukY2hAwcHQL95gtlU7pS3l1fgs= > Signed-off-by: Gang He > Cc: Mark Fasheh > Cc: Joel Becker > Cc: Junxiao Bi > Cc: Joseph Qi > Cc: Changwei Ge > Signed-off-by: Andrew Morton > --- > > fs/ocfs2/extent_map.c | 41 > fs/ocfs2/extent_map.h |3 ++ > 2 files changed, 44 insertions(+) > > diff -puN fs/ocfs2/extent_map.c~ocfs2-add-ocfs2_overwrite_io-function > fs/ocfs2/extent_map.c > --- a/fs/ocfs2/extent_map.c~ocfs2-add-ocfs2_overwrite_io-function > +++ a/fs/ocfs2/extent_map.c > @@ -832,6 +832,47 @@ out: > return ret; > } > > +/* Is IO overwriting allocated blocks? */ > +int ocfs2_overwrite_io(struct inode *inode, struct buffer_head *di_bh, > +u64 map_start, u64 map_len) > +{ > + int ret = 0, is_last; > + u32 mapping_end, cpos; > + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); > + struct ocfs2_extent_rec rec; > + > + if ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) && > +((map_start + map_len) <= i_size_read(inode))) > + goto out; > + Should we return -EAGAIN directly when the condition ((OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) && ((map_start + map_len) > i_size_read(inode))) is true? Thanks, Alex > + cpos = map_start >> osb->s_clustersize_bits; > + mapping_end = ocfs2_clusters_for_bytes(inode->i_sb, > +map_start + map_len); > + is_last = 0; > + while (cpos < mapping_end && !is_last) { > + ret = ocfs2_get_clusters_nocache(inode, di_bh, cpos, > + NULL, , _last); > + if (ret) { > + mlog_errno(ret); > + goto out; > + } > + > + if (rec.e_blkno == 0ULL) > + break; > + > + if (rec.e_flags & OCFS2_EXT_REFCOUNTED) > + break; > + > + cpos = le32_to_cpu(rec.e_cpos) + > + le16_to_cpu(rec.e_leaf_clusters); > + } > + > + if (cpos < mapping_end) > + ret = -EAGAIN; > +out: > + return ret; > +} > + > int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int > whence) > { > struct inode *inode = file->f_mapping->host; > diff -puN fs/ocfs2/extent_map.h~ocfs2-add-ocfs2_overwrite_io-function > fs/ocfs2/extent_map.h > --- a/fs/ocfs2/extent_map.h~ocfs2-add-ocfs2_overwrite_io-function > +++ a/fs/ocfs2/extent_map.h > @@ -53,6 +53,9 @@ int ocfs2_extent_map_get_blocks(struct i > int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, >u64 map_start, u64 map_len); > > +int ocfs2_overwrite_io(struct inode *inode, struct buffer_head *di_bh, > +u64 map_start, u64 map_len); > + > int ocfs2_seek_data_hole_offset(struct file *file, loff_t *offset, int > origin); > > int ocfs2_xattr_get_clusters(struct inode *inode, u32 v_cluster, > _ > > ___ > Ocfs2-devel mailing list > Ocfs2-devel@oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel > > . > ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [patch 09/11] ocfs2: add ocfs2_try_rw_lock() and ocfs2_try_inode_lock()
Looks good to me. On 2017/12/1 6:24, a...@linux-foundation.org wrote: > From: Gang He> Subject: ocfs2: add ocfs2_try_rw_lock() and ocfs2_try_inode_lock() > > Patch series "ocfs2: add nowait aio support". > > VFS layer has introduced the non-blocking aio flag IOCB_NOWAIT, which > tells the kernel to bail out if an AIO request will block for reasons such > as file allocations, or writeback triggering, or would block while > allocating requests while performing direct I/O. > > Subsequently, pwritev2/preadv2 also can leverage this part of kernel code. > So far, ext4/xfs/btrfs have supported this feature. Add the related code > for the ocfs2 file system. > > > This patch (of 3): > > Add ocfs2_try_rw_lock and ocfs2_try_inode_lock functions, which will be > used in non-blocking IO scenarios. > > [g...@suse.com: v2] > Link: > https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_1511944612-2D9629-2D2-2Dgit-2Dsend-2Demail-2Dghe-40suse.com=DwICAg=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y=W4UpDjVlWv5DIIPCelEteB-SienntqWxnPDHSOG7DBo=O9wstNpVrzaQNqjJyE21XFRb16ul0wQp9AtfJ4Jr9hI= > Link: > https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_1511775987-2D841-2D2-2Dgit-2Dsend-2Demail-2Dghe-40suse.com=DwICAg=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y=W4UpDjVlWv5DIIPCelEteB-SienntqWxnPDHSOG7DBo=fhdJtmG-qeTxswCoTvc42Is_mJIFHda4B0pHag4jzFk= > Signed-off-by: Gang He Reviewed-by: Alex Chen > Cc: Mark Fasheh > Cc: Joel Becker > Cc: Junxiao Bi > Cc: Joseph Qi > Cc: Changwei Ge > Signed-off-by: Andrew Morton > --- > > fs/ocfs2/dlmglue.c | 21 + > fs/ocfs2/dlmglue.h |4 > 2 files changed, 25 insertions(+) > > diff -puN > fs/ocfs2/dlmglue.c~ocfs2-add-ocfs2_try_rw_lock-and-ocfs2_try_inode_lock > fs/ocfs2/dlmglue.c > --- a/fs/ocfs2/dlmglue.c~ocfs2-add-ocfs2_try_rw_lock-and-ocfs2_try_inode_lock > +++ a/fs/ocfs2/dlmglue.c > @@ -1742,6 +1742,27 @@ int ocfs2_rw_lock(struct inode *inode, i > return status; > } > > +int ocfs2_try_rw_lock(struct inode *inode, int write) > +{ > + int status, level; > + struct ocfs2_lock_res *lockres; > + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); > + > + mlog(0, "inode %llu try to take %s RW lock\n", > + (unsigned long long)OCFS2_I(inode)->ip_blkno, > + write ? "EXMODE" : "PRMODE"); > + > + if (ocfs2_mount_local(osb)) > + return 0; > + > + lockres = _I(inode)->ip_rw_lockres; > + > + level = write ? DLM_LOCK_EX : DLM_LOCK_PR; > + > + status = ocfs2_cluster_lock(osb, lockres, level, DLM_LKF_NOQUEUE, 0); > + return status; > +} > + > void ocfs2_rw_unlock(struct inode *inode, int write) > { > int level = write ? DLM_LOCK_EX : DLM_LOCK_PR; > diff -puN > fs/ocfs2/dlmglue.h~ocfs2-add-ocfs2_try_rw_lock-and-ocfs2_try_inode_lock > fs/ocfs2/dlmglue.h > --- a/fs/ocfs2/dlmglue.h~ocfs2-add-ocfs2_try_rw_lock-and-ocfs2_try_inode_lock > +++ a/fs/ocfs2/dlmglue.h > @@ -116,6 +116,7 @@ void ocfs2_lock_res_free(struct ocfs2_lo > int ocfs2_create_new_inode_locks(struct inode *inode); > int ocfs2_drop_inode_locks(struct inode *inode); > int ocfs2_rw_lock(struct inode *inode, int write); > +int ocfs2_try_rw_lock(struct inode *inode, int write); > void ocfs2_rw_unlock(struct inode *inode, int write); > int ocfs2_open_lock(struct inode *inode); > int ocfs2_try_open_lock(struct inode *inode, int write); > @@ -140,6 +141,9 @@ int ocfs2_inode_lock_with_page(struct in > /* 99% of the time we don't want to supply any additional flags -- > * those are for very specific cases only. */ > #define ocfs2_inode_lock(i, b, e) ocfs2_inode_lock_full_nested(i, b, e, 0, > OI_LS_NORMAL) > +#define ocfs2_try_inode_lock(i, b, e)\ > + ocfs2_inode_lock_full_nested(i, b, e, OCFS2_META_LOCK_NOQUEUE,\ > + OI_LS_NORMAL) > void ocfs2_inode_unlock(struct inode *inode, > int ex); > int ocfs2_super_lock(struct ocfs2_super *osb, > _ > > ___ > Ocfs2-devel mailing list > Ocfs2-devel@oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel > > . > ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing
Hi Changwei, On 2017/12/19 17:44, Changwei Ge wrote: > On 2017/12/19 17:30, Junxiao Bi wrote: >> On 12/19/2017 05:11 PM, Changwei Ge wrote: >>> Hi Junxiao, >>> >>> On 2017/12/19 16:15, Junxiao Bi wrote: Hi Changwei, On 12/19/2017 02:02 PM, Changwei Ge wrote: > On 2017/12/19 11:41, piaojun wrote: >> Hi Changwei, >> >> On 2017/12/19 11:05, Changwei Ge wrote: >>> Hi Jun, >>> >>> On 2017/12/19 9:48, piaojun wrote: Hi Changwei, On 2017/12/18 20:06, Changwei Ge wrote: > Before ocfs2 supporting allocating clusters while doing append-dio, > all append > dio will fall back to buffer io to allocate clusters firstly. Also, > when it > steps on a file hole, it will fall back to buffer io, too. But for > current > code, writing to file hole will leverage dio to allocate clusters. > This is not > right, since whether append-io is enabled tells the capability > whether ocfs2 can > allocate space while doing dio. > So introduce file hole check function back into ocfs2. > Once ocfs2 is doing dio upon a file hole with append-dio disabled, it > will fall > back to buffer IO to allocate clusters. > 1. Do you mean that filling hole can't go with dio when append-dio is disabled? >>> >>> Yes, direct IO will fall back to buffer IO with _append-dio_ disabled. >> >> Why does dio need fall back to buffer-io when append-dio disabled? >> Could 'common-dio' on file hole go through direct io process? If not, >> could you please point out the necessity. > Hi Jun, > > The intention to make dio fall back to buffer io is to provide *an > option* to users, which > is more stable and even fast. More memory will be consumed for some important user cases. Like if ocfs2 is using to store vm system image file, the file is highly sparse but never extended. If fall back buffer io, more memory will be consumed by page cache in dom0, that will cause less VM running on dom0 and performance issue. >>> >>> I admit your point above makes scene. >>> But, AFAIK, major ocfs2 crash issues comes from direct IO part, especially >>> when file is extremely >>> sparse. So for the worst case, virtual machine even can't run due to crash >>> again and again. >> Can you please point out where those crash were? I would like take a >> look at them. I run ocfs2-test for mainline sometimes, and never find a >> dio crash. As i know, Eric also run the test regularly, he also didn't >> have a dio crash report. > > Actually, we are running ocfs2-test, too. > I think why it can't detect some issue is that the target file is not sparse > enough. > > Weeks ago, John report a dio crash issue and provide a reproducer. > > I manage to reproduce it easily. Although, Alex has provided a way to fix it, > but I think it has something smells. Um, I think it doesn't matter to split _mark_extent_written_ and _set_inode_size_ into different transactions in my patch(ocfs2: check the metadate alloc before marking extent written). I think it is just a point that can be optimized. > Also, his patch didn't handle journal related operation well. Moreover, it > will bring extra several times of read > for a single time of write. It is necessary to bring extra reads for solving the crash problem. > I was planning to improve his patch, but it can't be very elegant. So If I > have to,otherwise I want to figure out > the root cause for the dio crash issue. > I hope you can send the patch which might be a draft and we can discuss it together. > So we are still investigating the root cause why metadata needed is not > estimated accurately. > The root cause is that the extent tree may be merged during marking extents written, which is caused by normal user write operation. IMO, we should fix the crash problem directly instead of falling back to buffer IO. Thanks, Alex > It will be better if you can also take a glance at the dio crash issue John > reported. > > For now we already have some clues that extents estimated are enough actually > but merged and declaimed during marking > extents written. We still need more test on it. > > Thanks, > Changwei > >> >>> >>> I think the most benefit DIO brings to VM is that data can be transferred >>> to LUN as soon as possible, >>> thus no data could be missed. >> Right, another benefit. >> >> Thanks, >> Junxiao. >>> >From my perspective, current ocfs2 dio implementation especially > around space allocation during > doing dio still needs more test and improvements. > > Whether to make ocfs2 fall back to buffer io is up to ocfs2 users through > toggling append-dio feature. > It rather makes ocfs2 configurable and flexible. > > Besides, do you still remember John's report about dio crash weeks ago?
Re: [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing
On 2017/12/19 17:30, Junxiao Bi wrote: > On 12/19/2017 05:11 PM, Changwei Ge wrote: >> Hi Junxiao, >> >> On 2017/12/19 16:15, Junxiao Bi wrote: >>> Hi Changwei, >>> >>> On 12/19/2017 02:02 PM, Changwei Ge wrote: On 2017/12/19 11:41, piaojun wrote: > Hi Changwei, > > On 2017/12/19 11:05, Changwei Ge wrote: >> Hi Jun, >> >> On 2017/12/19 9:48, piaojun wrote: >>> Hi Changwei, >>> >>> On 2017/12/18 20:06, Changwei Ge wrote: Before ocfs2 supporting allocating clusters while doing append-dio, all append dio will fall back to buffer io to allocate clusters firstly. Also, when it steps on a file hole, it will fall back to buffer io, too. But for current code, writing to file hole will leverage dio to allocate clusters. This is not right, since whether append-io is enabled tells the capability whether ocfs2 can allocate space while doing dio. So introduce file hole check function back into ocfs2. Once ocfs2 is doing dio upon a file hole with append-dio disabled, it will fall back to buffer IO to allocate clusters. >>> 1. Do you mean that filling hole can't go with dio when append-dio is >>> disabled? >> >> Yes, direct IO will fall back to buffer IO with _append-dio_ disabled. > > Why does dio need fall back to buffer-io when append-dio disabled? > Could 'common-dio' on file hole go through direct io process? If not, > could you please point out the necessity. Hi Jun, The intention to make dio fall back to buffer io is to provide *an option* to users, which is more stable and even fast. >>> More memory will be consumed for some important user cases. Like if >>> ocfs2 is using to store vm system image file, the file is highly sparse >>> but never extended. If fall back buffer io, more memory will be consumed >>> by page cache in dom0, that will cause less VM running on dom0 and >>> performance issue. >> >> I admit your point above makes scene. >> But, AFAIK, major ocfs2 crash issues comes from direct IO part, especially >> when file is extremely >> sparse. So for the worst case, virtual machine even can't run due to crash >> again and again. > Can you please point out where those crash were? I would like take a > look at them. I run ocfs2-test for mainline sometimes, and never find a > dio crash. As i know, Eric also run the test regularly, he also didn't > have a dio crash report. Actually, we are running ocfs2-test, too. I think why it can't detect some issue is that the target file is not sparse enough. Weeks ago, John report a dio crash issue and provide a reproducer. I manage to reproduce it easily. Although, Alex has provided a way to fix it, but I think it has something smells. Also, his patch didn't handle journal related operation well. Moreover, it will bring extra several times of read for a single time of write. I was planning to improve his patch, but it can't be very elegant. So If I have to,otherwise I want to figure out the root cause for the dio crash issue. So we are still investigating the root cause why metadata needed is not estimated accurately. It will be better if you can also take a glance at the dio crash issue John reported. For now we already have some clues that extents estimated are enough actually but merged and declaimed during marking extents written. We still need more test on it. Thanks, Changwei > >> >> I think the most benefit DIO brings to VM is that data can be transferred to >> LUN as soon as possible, >> thus no data could be missed. > Right, another benefit. > > Thanks, > Junxiao. >> >>> From my perspective, current ocfs2 dio implementation especially around space allocation during doing dio still needs more test and improvements. Whether to make ocfs2 fall back to buffer io is up to ocfs2 users through toggling append-dio feature. It rather makes ocfs2 configurable and flexible. Besides, do you still remember John's report about dio crash weeks ago? >>> Looks like this is a workaround, why not fix the bug directly? If with >>> this, people may disable append-dio by default to avoid dio issues. That >>> will make it never stable. But it is a useful feature. >> >> Arguably, this patch just provides an extra option to users. It's up to >> ocfs2 users how to use ocfs2 for >> their business. I think we should not limit ocfs2 users. >> >> Moreover, I agree that direct IO is a useful feature, but it is not mature >> enough yet. >> We have to improve it, however, it needs time. I suppose we still have a >> short or long journey until that. >> So before that we could provide a backup way. >> This may look like kind of workaround, but I prefer to call it an extra >> option. >> >> Thanks, >> Changwei >> >>> >>> Thanks, >>> Junxiao. >>> I managed to
Re: [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing
On 12/19/2017 05:11 PM, Changwei Ge wrote: > Hi Junxiao, > > On 2017/12/19 16:15, Junxiao Bi wrote: >> Hi Changwei, >> >> On 12/19/2017 02:02 PM, Changwei Ge wrote: >>> On 2017/12/19 11:41, piaojun wrote: Hi Changwei, On 2017/12/19 11:05, Changwei Ge wrote: > Hi Jun, > > On 2017/12/19 9:48, piaojun wrote: >> Hi Changwei, >> >> On 2017/12/18 20:06, Changwei Ge wrote: >>> Before ocfs2 supporting allocating clusters while doing append-dio, all >>> append >>> dio will fall back to buffer io to allocate clusters firstly. Also, >>> when it >>> steps on a file hole, it will fall back to buffer io, too. But for >>> current >>> code, writing to file hole will leverage dio to allocate clusters. This >>> is not >>> right, since whether append-io is enabled tells the capability whether >>> ocfs2 can >>> allocate space while doing dio. >>> So introduce file hole check function back into ocfs2. >>> Once ocfs2 is doing dio upon a file hole with append-dio disabled, it >>> will fall >>> back to buffer IO to allocate clusters. >>> >> 1. Do you mean that filling hole can't go with dio when append-dio is >> disabled? > > Yes, direct IO will fall back to buffer IO with _append-dio_ disabled. Why does dio need fall back to buffer-io when append-dio disabled? Could 'common-dio' on file hole go through direct io process? If not, could you please point out the necessity. >>> Hi Jun, >>> >>> The intention to make dio fall back to buffer io is to provide *an option* >>> to users, which >>> is more stable and even fast. >> More memory will be consumed for some important user cases. Like if >> ocfs2 is using to store vm system image file, the file is highly sparse >> but never extended. If fall back buffer io, more memory will be consumed >> by page cache in dom0, that will cause less VM running on dom0 and >> performance issue. > > I admit your point above makes scene. > But, AFAIK, major ocfs2 crash issues comes from direct IO part, especially > when file is extremely > sparse. So for the worst case, virtual machine even can't run due to crash > again and again. Can you please point out where those crash were? I would like take a look at them. I run ocfs2-test for mainline sometimes, and never find a dio crash. As i know, Eric also run the test regularly, he also didn't have a dio crash report. > > I think the most benefit DIO brings to VM is that data can be transferred to > LUN as soon as possible, > thus no data could be missed. Right, another benefit. Thanks, Junxiao. > >> >>> From my perspective, current ocfs2 dio implementation especially around >>> space allocation during >>> doing dio still needs more test and improvements. >>> >>> Whether to make ocfs2 fall back to buffer io is up to ocfs2 users through >>> toggling append-dio feature. >>> It rather makes ocfs2 configurable and flexible. >>> >>> Besides, do you still remember John's report about dio crash weeks ago? >> Looks like this is a workaround, why not fix the bug directly? If with >> this, people may disable append-dio by default to avoid dio issues. That >> will make it never stable. But it is a useful feature. > > Arguably, this patch just provides an extra option to users. It's up to ocfs2 > users how to use ocfs2 for > their business. I think we should not limit ocfs2 users. > > Moreover, I agree that direct IO is a useful feature, but it is not mature > enough yet. > We have to improve it, however, it needs time. I suppose we still have a > short or long journey until that. > So before that we could provide a backup way. > This may look like kind of workaround, but I prefer to call it an extra > option. > > Thanks, > Changwei > >> >> Thanks, >> Junxiao. >> >>> >>> I managed to reproduce this issue, so for now, I don't trust dio related >>> code one hundred percents. >>> So if something bad happens to dio writing with space allocation, we can >>> still make ocfs2 fall back to buffer io >>> It's an option not a mandatory action. >>> >>> Besides, append-dio feature is key to whether to allocate space with dio >>> writing. >>> So writing to file hole and enlarging file(appending file) should have the >>> same reflection to append-dio feature. >>> :) >>> >>> Thanks, >>> Changwei >>> > >> 2. Is your checking-hole just for 'append-dio' or for 'all-common-dio'? > > Just for append-dio > If your patch is just for 'append-dio', I wonder that will have impact on 'common-dio'. thanks, Jun >>> Signed-off-by: Changwei Ge>>> --- >>> fs/ocfs2/aops.c | 44 ++-- >>> 1 file changed, 42 insertions(+), 2 deletions(-) >>> >>> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c >>> index d151632..a982cf6 100644 >>> --- a/fs/ocfs2/aops.c >>> +++
Re: [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing
Hi Junxiao, On 2017/12/19 16:15, Junxiao Bi wrote: > Hi Changwei, > > On 12/19/2017 02:02 PM, Changwei Ge wrote: >> On 2017/12/19 11:41, piaojun wrote: >>> Hi Changwei, >>> >>> On 2017/12/19 11:05, Changwei Ge wrote: Hi Jun, On 2017/12/19 9:48, piaojun wrote: > Hi Changwei, > > On 2017/12/18 20:06, Changwei Ge wrote: >> Before ocfs2 supporting allocating clusters while doing append-dio, all >> append >> dio will fall back to buffer io to allocate clusters firstly. Also, when >> it >> steps on a file hole, it will fall back to buffer io, too. But for >> current >> code, writing to file hole will leverage dio to allocate clusters. This >> is not >> right, since whether append-io is enabled tells the capability whether >> ocfs2 can >> allocate space while doing dio. >> So introduce file hole check function back into ocfs2. >> Once ocfs2 is doing dio upon a file hole with append-dio disabled, it >> will fall >> back to buffer IO to allocate clusters. >> > 1. Do you mean that filling hole can't go with dio when append-dio is > disabled? Yes, direct IO will fall back to buffer IO with _append-dio_ disabled. >>> >>> Why does dio need fall back to buffer-io when append-dio disabled? >>> Could 'common-dio' on file hole go through direct io process? If not, >>> could you please point out the necessity. >> Hi Jun, >> >> The intention to make dio fall back to buffer io is to provide *an option* >> to users, which >> is more stable and even fast. > More memory will be consumed for some important user cases. Like if > ocfs2 is using to store vm system image file, the file is highly sparse > but never extended. If fall back buffer io, more memory will be consumed > by page cache in dom0, that will cause less VM running on dom0 and > performance issue. I admit your point above makes scene. But, AFAIK, major ocfs2 crash issues comes from direct IO part, especially when file is extremely sparse. So for the worst case, virtual machine even can't run due to crash again and again. I think the most benefit DIO brings to VM is that data can be transferred to LUN as soon as possible, thus no data could be missed. > >> From my perspective, current ocfs2 dio implementation especially around >> space allocation during >> doing dio still needs more test and improvements. >> >> Whether to make ocfs2 fall back to buffer io is up to ocfs2 users through >> toggling append-dio feature. >> It rather makes ocfs2 configurable and flexible. >> >> Besides, do you still remember John's report about dio crash weeks ago? > Looks like this is a workaround, why not fix the bug directly? If with > this, people may disable append-dio by default to avoid dio issues. That > will make it never stable. But it is a useful feature. Arguably, this patch just provides an extra option to users. It's up to ocfs2 users how to use ocfs2 for their business. I think we should not limit ocfs2 users. Moreover, I agree that direct IO is a useful feature, but it is not mature enough yet. We have to improve it, however, it needs time. I suppose we still have a short or long journey until that. So before that we could provide a backup way. This may look like kind of workaround, but I prefer to call it an extra option. Thanks, Changwei > > Thanks, > Junxiao. > >> >> I managed to reproduce this issue, so for now, I don't trust dio related >> code one hundred percents. >> So if something bad happens to dio writing with space allocation, we can >> still make ocfs2 fall back to buffer io >> It's an option not a mandatory action. >> >> Besides, append-dio feature is key to whether to allocate space with dio >> writing. >> So writing to file hole and enlarging file(appending file) should have the >> same reflection to append-dio feature. >> :) >> >> Thanks, >> Changwei >> >>> > 2. Is your checking-hole just for 'append-dio' or for 'all-common-dio'? Just for append-dio >>> >>> If your patch is just for 'append-dio', I wonder that will have impact >>> on 'common-dio'. >>> >>> thanks, >>> Jun >>> >> Signed-off-by: Changwei Ge>> --- >> fs/ocfs2/aops.c | 44 ++-- >> 1 file changed, 42 insertions(+), 2 deletions(-) >> >> diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c >> index d151632..a982cf6 100644 >> --- a/fs/ocfs2/aops.c >> +++ b/fs/ocfs2/aops.c >> @@ -2414,6 +2414,44 @@ static int ocfs2_dio_end_io(struct kiocb *iocb, >> return ret; >> } >> >> +/* >> + * Will look for holes and unwritten extents in the range starting at >> + * pos for count bytes (inclusive). >> + */ >> +static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos, >> + size_t count) >> +{ >> +int ret = 0; >> +
Re: [Ocfs2-devel] [PATCH] ocfs2: fall back to buffer IO when append dio is disabled with file hole existing
Hi Changwei, On 12/19/2017 02:02 PM, Changwei Ge wrote: > On 2017/12/19 11:41, piaojun wrote: >> Hi Changwei, >> >> On 2017/12/19 11:05, Changwei Ge wrote: >>> Hi Jun, >>> >>> On 2017/12/19 9:48, piaojun wrote: Hi Changwei, On 2017/12/18 20:06, Changwei Ge wrote: > Before ocfs2 supporting allocating clusters while doing append-dio, all > append > dio will fall back to buffer io to allocate clusters firstly. Also, when > it > steps on a file hole, it will fall back to buffer io, too. But for current > code, writing to file hole will leverage dio to allocate clusters. This > is not > right, since whether append-io is enabled tells the capability whether > ocfs2 can > allocate space while doing dio. > So introduce file hole check function back into ocfs2. > Once ocfs2 is doing dio upon a file hole with append-dio disabled, it > will fall > back to buffer IO to allocate clusters. > 1. Do you mean that filling hole can't go with dio when append-dio is disabled? >>> >>> Yes, direct IO will fall back to buffer IO with _append-dio_ disabled. >> >> Why does dio need fall back to buffer-io when append-dio disabled? >> Could 'common-dio' on file hole go through direct io process? If not, >> could you please point out the necessity. > Hi Jun, > > The intention to make dio fall back to buffer io is to provide *an option* to > users, which > is more stable and even fast. More memory will be consumed for some important user cases. Like if ocfs2 is using to store vm system image file, the file is highly sparse but never extended. If fall back buffer io, more memory will be consumed by page cache in dom0, that will cause less VM running on dom0 and performance issue. > From my perspective, current ocfs2 dio implementation especially around > space allocation during > doing dio still needs more test and improvements. > > Whether to make ocfs2 fall back to buffer io is up to ocfs2 users through > toggling append-dio feature. > It rather makes ocfs2 configurable and flexible. > > Besides, do you still remember John's report about dio crash weeks ago? Looks like this is a workaround, why not fix the bug directly? If with this, people may disable append-dio by default to avoid dio issues. That will make it never stable. But it is a useful feature. Thanks, Junxiao. > > I managed to reproduce this issue, so for now, I don't trust dio related code > one hundred percents. > So if something bad happens to dio writing with space allocation, we can > still make ocfs2 fall back to buffer io > It's an option not a mandatory action. > > Besides, append-dio feature is key to whether to allocate space with dio > writing. > So writing to file hole and enlarging file(appending file) should have the > same reflection to append-dio feature. > :) > > Thanks, > Changwei > >> >>> 2. Is your checking-hole just for 'append-dio' or for 'all-common-dio'? >>> >>> Just for append-dio >>> >> >> If your patch is just for 'append-dio', I wonder that will have impact >> on 'common-dio'. >> >> thanks, >> Jun >> > Signed-off-by: Changwei Ge> --- > fs/ocfs2/aops.c | 44 ++-- > 1 file changed, 42 insertions(+), 2 deletions(-) > > diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c > index d151632..a982cf6 100644 > --- a/fs/ocfs2/aops.c > +++ b/fs/ocfs2/aops.c > @@ -2414,6 +2414,44 @@ static int ocfs2_dio_end_io(struct kiocb *iocb, > return ret; > } > > +/* > + * Will look for holes and unwritten extents in the range starting at > + * pos for count bytes (inclusive). > + */ > +static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos, > +size_t count) > +{ > + int ret = 0; > + unsigned int extent_flags; > + u32 cpos, clusters, extent_len, phys_cpos; > + struct super_block *sb = inode->i_sb; > + > + cpos = pos >> OCFS2_SB(sb)->s_clustersize_bits; > + clusters = ocfs2_clusters_for_bytes(sb, pos + count) - cpos; > + > + while (clusters) { > + ret = ocfs2_get_clusters(inode, cpos, _cpos, _len, > + _flags); > + if (ret < 0) { > + mlog_errno(ret); > + goto out; > + } > + > + if (phys_cpos == 0 || (extent_flags & OCFS2_EXT_UNWRITTEN)) { > + ret = 1; > + break; > + } > + > + if (extent_len > clusters) > + extent_len = clusters; > + > + clusters -= extent_len; > + cpos += extent_len; > + } > +out: > + return ret; > +} > + > static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter > *iter) > { > struct file *file