Hi Guanghui, Please sort out your mail more orderly. It looks really messy! So, rework your mail by asking question yourself like:
- What is the problem you are facing? Looks like a BUG_ON() is triggered. but which BUG_ON()? the backtrace? How this can be reroduced? - What help or answer do you hope for? You didn't ask any question below! On 05/26/2017 11:46 AM, Zhangguanghui wrote: > This patch replace that function ocfs2_direct_IO_get_blocks with Which patch? Don't analyze code before telling the problem. > > this function ocfs2_get_blocks in ocfs2_direct_IO, and remove the > ip_alloc_sem. > > but i think ip_alloc_sem is still needed because protect allocation changes > is very correct. "still needed" - so, which commit dropped it? > > Now, BUG_ON have been tiggered in the process of testing direct-io. > > Comments and questions are, as always, welcome. Thanks Comments on what? > > > As wangww631 described A mail thread link is useful for people to know the discussion and background. > > In ocfs2, ip_alloc_sem is used to protect allocation changes on the node. > In direct IO, we add ip_alloc_sem to protect date consistent between > direct-io and ocfs2_truncate_file race (buffer io use ip_alloc_sem > already). Although inode->i_mutex lock is used to avoid concurrency of > above situation, i think ip_alloc_sem is still needed because protect > allocation changes is significant. > > Other filesystem like ext4 also uses rw_semaphore to protect data > consistent between get_block-vs-truncate race by other means, So > ip_alloc_sem in ocfs2 direct io is needed. > > > Date: Fri, 11 Sep 2015 16:19:18 +0800 > From: Ryan Ding <ryan.d...@oracle.com> > Subject: [Ocfs2-devel] [PATCH 7/8] ocfs2: fix sparse file & data > ordering issue in direct io. You email subject is almost the same as this patch, which brings confusion... > To: ocfs2-devel@oss.oracle.com > Cc: mfas...@suse.de > Message-ID: <1441959559-29947-8-git-send-email-ryan.d...@oracle.com> Don't copy & paste patch content, only making you mail too long to scare reader away. Eric > > There are mainly 3 issue in the direct io code path after commit 24c40b329e03 > ("ocfs2: implement ocfs2_direct_IO_write"): > * Do not support sparse file. > * Do not support data ordering. eg: when write to a file hole, it will > alloc > extent first. If system crashed before io finished, data will corrupt. > * Potential risk when doing aio+dio. The -EIOCBQUEUED return value is > likely > to be ignored by ocfs2_direct_IO_write(). > > To resolve above problems, re-design direct io code with following ideas: > * Use buffer io to fill in holes. And this will make better performance > also. > * Clear unwritten after direct write finished. So we can make sure meta > data > changes after data write to disk. (Unwritten extent is invisible to user, > from user's view, meta data is not changed when allocate an unwritten > extent.) > * Clear ocfs2_direct_IO_write(). Do all ending work in end_io. > > This patch has passed > fs,dio,ltp-aiodio.part1,ltp-aiodio.part2,ltp-aiodio.part4 > test cases of ltp. > > Signed-off-by: Ryan Ding <ryan.d...@oracle.com> > Reviewed-by: Junxiao Bi <junxiao...@oracle.com> > cc: Joseph Qi <joseph...@huawei.com> > --- > fs/ocfs2/aops.c | 851 > ++++++++++++++++++++++--------------------------------- > 1 files changed, 342 insertions(+), 509 deletions(-) > > diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c > index b4ec600..4bb9921 100644 > --- a/fs/ocfs2/aops.c > +++ b/fs/ocfs2/aops.c > @@ -499,152 +499,6 @@ bail: > return status; > } > > -/* > - * TODO: Make this into a generic get_blocks function. > - * > - * From do_direct_io in direct-io.c: > - * "So what we do is to permit the ->get_blocks function to populate > - * bh.b_size with the size of IO which is permitted at this offset and > - * this i_blkbits." > - * > - * This function is called directly from get_more_blocks in direct-io.c. > - * > - * called like this: dio->get_blocks(dio->inode, fs_startblk, > - * fs_count, map_bh, dio->rw == WRITE); > - */ > -static int ocfs2_direct_IO_get_blocks(struct inode *inode, sector_t iblock, > - struct buffer_head *bh_result, int > create) > -{ > - int ret; > - u32 cpos = 0; > - int alloc_locked = 0; > - u64 p_blkno, inode_blocks, contig_blocks; > - unsigned int ext_flags; > - unsigned char blocksize_bits = inode->i_sb->s_blocksize_bits; > - unsigned long max_blocks = bh_result->b_size >> inode->i_blkbits; > - unsigned long len = bh_result->b_size; > - unsigned int clusters_to_alloc = 0, contig_clusters = 0; > - > - cpos = ocfs2_blocks_to_clusters(inode->i_sb, iblock); > - > - /* This function won't even be called if the request isn't all > - * nicely aligned and of the right size, so there's no need > - * for us to check any of that. */ > - > - inode_blocks = ocfs2_blocks_for_bytes(inode->i_sb, > i_size_read(inode)); > - > - down_read(&OCFS2_I(inode)->ip_alloc_sem); > - > - /* This figures out the size of the next contiguous block, and > - * our logical offset */ > - ret = ocfs2_extent_map_get_blocks(inode, iblock, &p_blkno, > - &contig_blocks, &ext_flags); > - up_read(&OCFS2_I(inode)->ip_alloc_sem); > - > - if (ret) { > - mlog(ML_ERROR, "get_blocks() failed iblock=%llu\n", > - (unsigned long long)iblock); > - ret = -EIO; > - goto bail; > - } > - > - /* We should already CoW the refcounted extent in case of create. */ > - BUG_ON(create && (ext_flags & OCFS2_EXT_REFCOUNTED)); > - > - /* allocate blocks if no p_blkno is found, and create == 1 */ > - if (!p_blkno && create) { > - ret = ocfs2_inode_lock(inode, NULL, 1); > - if (ret < 0) { > - mlog_errno(ret); > - goto bail; > - } > - > - alloc_locked = 1; > - > - down_write(&OCFS2_I(inode)->ip_alloc_sem); > - > - /* fill hole, allocate blocks can't be larger than the size > - * of the hole */ > - clusters_to_alloc = ocfs2_clusters_for_bytes(inode->i_sb, > len); > - contig_clusters = ocfs2_clusters_for_blocks(inode->i_sb, > - contig_blocks); > - if (clusters_to_alloc > contig_clusters) > - clusters_to_alloc = contig_clusters; > - > - /* allocate extent and insert them into the extent tree */ > - ret = ocfs2_extend_allocation(inode, cpos, > - clusters_to_alloc, 0); > - if (ret < 0) { > - up_write(&OCFS2_I(inode)->ip_alloc_sem); > - mlog_errno(ret); > - goto bail; > - } > - > - ret = ocfs2_extent_map_get_blocks(inode, iblock, &p_blkno, > - &contig_blocks, &ext_flags); > - if (ret < 0) { > - up_write(&OCFS2_I(inode)->ip_alloc_sem); > - mlog(ML_ERROR, "get_blocks() failed iblock=%llu\n", > - (unsigned long long)iblock); > - ret = -EIO; > - goto bail; > - } > - up_write(&OCFS2_I(inode)->ip_alloc_sem); > - } > - > - /* > - * get_more_blocks() expects us to describe a hole by clearing > - * the mapped bit on bh_result(). > - * > - * Consider an unwritten extent as a hole. > - */ > - if (p_blkno && !(ext_flags & OCFS2_EXT_UNWRITTEN)) > - map_bh(bh_result, inode->i_sb, p_blkno); > - else > - clear_buffer_mapped(bh_result); > - > - /* make sure we don't map more than max_blocks blocks here as > - that's all the kernel will handle at this point. */ > - if (max_blocks < contig_blocks) > - contig_blocks = max_blocks; > - bh_result->b_size = contig_blocks << blocksize_bits; > -bail: > - if (alloc_locked) > - ocfs2_inode_unlock(inode, 1); > - return ret; > -} > > ...... > > +static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter, > + loff_t offset) > +{ > + struct file *file = iocb->ki_filp; > + struct inode *inode = file_inode(file)->i_mapping->host; > + struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); > + loff_t end = offset + iter->count; > + get_block_t *get_block; > + > + /* > + * Fallback to buffered I/O if we see an inode without > + * extents. > + */ > + if (OCFS2_I(inode)->ip_dyn_features & OCFS2_INLINE_DATA_FL) > + return 0; > + > + /* Fallback to buffered I/O if we do not support append dio. */ > + if (end > i_size_read(inode) && !ocfs2_supports_append_dio(osb)) > + return 0; > + > + if (iov_iter_rw(iter) == READ) > + get_block = ocfs2_get_block; > + else > + get_block = ocfs2_dio_get_block; > + > + return __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, > + iter, offset, get_block, > + ocfs2_dio_end_io, NULL, 0); > +} > + > const struct address_space_operations ocfs2_aops = { > .readpage = ocfs2_readpage, > .readpages = ocfs2_readpages, > ________________________________ > All the best wishes for you. > zhangguanghui > ------------------------------------------------------------------------------------------------------------------------------------- > 本邮件及其附件含有新华三技术有限公司的保密信息,仅限于发送给上面地址中列出 > 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、 > 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本 > 邮件! > This e-mail and its attachments contain confidential information from New > H3C, which is > intended only for the person or entity whose address is listed above. Any use > of the > information contained herein in any way (including, but not limited to, total > or partial > disclosure, reproduction, or dissemination) by persons other than the intended > recipient(s) is prohibited. If you receive this e-mail in error, please > notify the sender > by phone or email immediately and delete it! > > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel@oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel