Re: [PATCH v4 7/7] fs/xfs: Add dedupe support for fsdax
On Thu 08 Apr 2021 at 20:04, Shiyang Ruan wrote: Add xfs_break_two_dax_layouts() to break layout for tow dax files. Then call compare range function only when files are both DAX or not. Signed-off-by: Shiyang Ruan Not family with xfs code but reading code make my sleep better :) See bellow. --- fs/xfs/xfs_file.c| 20 fs/xfs/xfs_inode.c | 8 +++- fs/xfs/xfs_inode.h | 1 + fs/xfs/xfs_reflink.c | 5 +++-- 4 files changed, 31 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 5795d5d6f869..1fd457167c12 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -842,6 +842,26 @@ xfs_break_dax_layouts( 0, 0, xfs_wait_dax_page(inode)); } +int +xfs_break_two_dax_layouts( + struct inode*src, + struct inode*dest) +{ + int error; + boolretry = false; + +retry: 'retry = false;' ? since xfs_break_dax_layouts() won't set retry to false if there is no busy page in inode->i_mapping. Dead loop will happen if retry is true once. + error = xfs_break_dax_layouts(src, &retry); + if (error || retry) + goto retry; + + error = xfs_break_dax_layouts(dest, &retry); + if (error || retry) + goto retry; + + return error; +} + int xfs_break_layouts( struct inode*inode, diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index f93370bd7b1e..c01786917eef 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -3713,8 +3713,10 @@ xfs_ilock2_io_mmap( struct xfs_inode*ip2) { int ret; + struct inode*inode1 = VFS_I(ip1); + struct inode*inode2 = VFS_I(ip2); - ret = xfs_iolock_two_inodes_and_break_layout(VFS_I(ip1), VFS_I(ip2)); + ret = xfs_iolock_two_inodes_and_break_layout(inode1, inode2); if (ret) return ret; if (ip1 == ip2) @@ -3722,6 +3724,10 @@ xfs_ilock2_io_mmap( else xfs_lock_two_inodes(ip1, XFS_MMAPLOCK_EXCL, ip2, XFS_MMAPLOCK_EXCL); + + if (IS_DAX(inode1) && IS_DAX(inode2)) + ret = xfs_break_two_dax_layouts(inode1, inode2); + ret is ignored here. -- Su return 0; } diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 88ee4c3930ae..5ef21924dddc 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -435,6 +435,7 @@ enum xfs_prealloc_flags { intxfs_update_prealloc_flags(struct xfs_inode *ip, enum xfs_prealloc_flags flags); +int xfs_break_two_dax_layouts(struct inode *inode1, struct inode *inode2); intxfs_break_layouts(struct inode *inode, uint *iolock, enum layout_break_reason reason); diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index a4cd6e8a7aa0..4426bcc8a985 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -29,6 +29,7 @@ #include "xfs_iomap.h" #include "xfs_sb.h" #include "xfs_ag_resv.h" +#include /* * Copy on Write of Shared Blocks @@ -1324,8 +1325,8 @@ xfs_reflink_remap_prep( if (XFS_IS_REALTIME_INODE(src) || XFS_IS_REALTIME_INODE(dest)) goto out_unlock; - /* Don't share DAX file data for now. */ - if (IS_DAX(inode_in) || IS_DAX(inode_out)) + /* Don't share DAX file data with non-DAX file. */ + if (IS_DAX(inode_in) != IS_DAX(inode_out)) goto out_unlock; if (!IS_DAX(inode_in))
Re: [PATCH 2/2] blktrace: limit allowed total trace buffer size
On Tue 23 Mar 2021 at 16:14, Ming Lei wrote: On some ARCHs, such as aarch64, page size may be 64K, meantime there may be lots of CPU cores. relay_open() needs to allocate pages on each CPU blktrace, so easily too many pages are taken by blktrace. For example, on one ARM64 server: 224 CPU cores, 16G RAM, blktrace finally got allocated 7GB in case of 'blktrace -b 8192' which is used by device-mapper test suite[1]. This way could cause OOM easily. Fix the issue by limiting max allowed pages to be 1/8 of totalram_pages(). [1] https://github.com/jthornber/device-mapper-test-suite.git Signed-off-by: Ming Lei --- kernel/trace/blktrace.c | 32 1 file changed, 32 insertions(+) diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index c221e4c3f625..8403ff19d533 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -466,6 +466,35 @@ static void blk_trace_setup_lba(struct blk_trace *bt, } } +/* limit total allocated buffer size is <= 1/8 of total pages */ +static void validate_and_adjust_buf(struct blk_user_trace_setup *buts) +{ + unsigned buf_size = buts->buf_size; + unsigned buf_nr = buts->buf_nr; + unsigned long max_allowed_pages = totalram_pages() >> 3; + unsigned long req_pages = PAGE_ALIGN(buf_size * buf_nr) >> PAGE_SHIFT; + + if (req_pages * num_online_cpus() <= max_allowed_pages) + return; + + req_pages = DIV_ROUND_UP(max_allowed_pages, num_online_cpus()); + + if (req_pages == 0) { + buf_size = PAGE_SIZE; + buf_nr = 1; + } else { + buf_size = req_pages << PAGE_SHIFT / buf_nr; Should it be: buf_size = (req_pages << PAGE_SHIFT) / buf_nr; ? The priority of '<<' is lower than '/', right? :) -- Su + if (buf_size < PAGE_SIZE) + buf_size = PAGE_SIZE; + buf_nr = req_pages << PAGE_SHIFT / buf_size; + if (buf_nr == 0) + buf_nr = 1; + } + + buts->buf_size = min_t(unsigned, buf_size, buts->buf_size); + buts->buf_nr = min_t(unsigned, buf_nr, buts->buf_nr); +} + /* * Setup everything required to start tracing */ @@ -482,6 +511,9 @@ static int do_blk_trace_setup(struct request_queue *q, char *name, dev_t dev, if (!buts->buf_size || !buts->buf_nr) return -EINVAL; + /* make sure not allocate too much for userspace */ + validate_and_adjust_buf(buts); + strncpy(buts->name, name, BLKTRACE_BDEV_SIZE); buts->name[BLKTRACE_BDEV_SIZE - 1] = '\0';
Re: [RFC PATCH v5 0/4] add simple copy support
On Fri 19 Feb 2021 at 20:45, SelvaKumar S wrote: This patchset tries to add support for TP4065a ("Simple Copy Command"), v2020.05.04 ("Ratified") The Specification can be found in following link. https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs-1.zip 404 not found. Should it be https://nvmexpress.org/wp-content/uploads/NVM-Express-1.4-Ratified-TPs.zip ? Simple copy command is a copy offloading operation and is used to copy multiple contiguous ranges (source_ranges) of LBA's to a single destination LBA within the device reducing traffic between host and device. This implementation doesn't add native copy offload support for stacked devices rather copy offload is done through emulation. Possible use cases are F2FS gc and BTRFS relocation/balance. *blkdev_issue_copy* takes source bdev, no of sources, array of source ranges (in sectors), destination bdev and destination offset(in sectors). If both source and destination block devices are same and copy_offload = 1, then copy is done through native copy offloading. Copy emulation is used in other cases. As SCSI XCOPY can take two different block devices and no of source range is equal to 1, this interface can be extended in future to support SCSI XCOPY. For devices supporting native simple copy, attach the control information as payload to the bio and submit to the device. For devices without native copy support, copy emulation is done by reading each source range into memory and writing it to the destination. Caller can choose not to try emulation if copy offload is not supported by setting BLKDEV_COPY_NOEMULATION flag. Following limits are added to queue limits and are exposed in sysfs to userspace - *copy_offload* controls copy_offload. set 0 to disable copy offload, 1 to enable native copy offloading support. - *max_copy_sectors* limits the sum of all source_range length - *max_copy_nr_ranges* limits the number of source ranges - *max_copy_range_sectors* limit the maximum number of sectors that can constitute a single source range. max_copy_sectors = 0 indicates the device doesn't support copy offloading. *copy offload* sysfs entry is configurable and can be used toggle between emulation and native support depending upon the usecase. Changes from v4 1. Extend dm-kcopyd to leverage copy-offload, while copying within the same device. The other approach was to have copy-emulation by moving dm-kcopyd to block layer. But it also required moving core dm-io infra, causing a massive churn across multiple dm-targets. 2. Remove export in bio_map_kern() 3. Change copy_offload sysfs to accept 0 or else 4. Rename copy support flag to QUEUE_FLAG_SIMPLE_COPY 5. Rename payload entries, add source bdev field to be used while partition remapping, remove copy_size 6. Change the blkdev_issue_copy() interface to accept destination and source values in sector rather in bytes 7. Add payload to bio using bio_map_kern() for copy_offload case 8. Add check to return error if one of the source range length is 0 9. Add BLKDEV_COPY_NOEMULATION flag to allow user to not try copy emulation incase of copy offload is not supported. Caller can his use his existing copying logic to complete the io. 10. Bug fix copy checks and reduce size of rcu_lock() Planned for next: - adding blktests - handling larger (than device limits) copy - decide on ioctl interface (man-page etc.) Changes from v3 1. gfp_flag fixes. 2. Export bio_map_kern() and use it to allocate and add pages to bio. 3. Move copy offload, reading to buf, writing from buf to separate functions. 4. Send read bio of copy offload by chaining them and submit asynchronously. 5. Add gendisk->part0 and part->bd_start_sect changes to blk_check_copy(). 6. Move single source range limit check to blk_check_copy() 7. Rename __blkdev_issue_copy() to blkdev_issue_copy and remove old helper. 8. Change blkdev_issue_copy() interface generic to accepts destination bdev to support XCOPY as well. 9. Add invalidate_kernel_vmap_range() after reading data for vmalloc'ed memory. 10. Fix buf allocoation logic to allocate buffer for the total size of copy. 11. Reword patch commit description. Changes from v2 1. Add emulation support for devices not supporting copy. 2. Add *copy_offload* sysfs entry to enable and disable copy_offload in devices supporting simple copy. 3. Remove simple copy support for stacked devices. Changes from v1: 1. Fix memory leak in __blkdev_issue_copy 2. Unmark blk_check_copy inline 3. Fix line break in blk_check_copy_eod 4. Remove p checks and made code more readable 5. Don't use bio_set_op_attrs and remove op and set bi_opf directly 6. Use struct_size to calculate total_size 7. Fix partition remap of copy destination 8. Remove mcl,mssrl,msrc from nvme_ns 9. Initialize copy queue limits to 0 in nvme_config_copy 10. Remove return in QUEUE_FLAG_COPY
Re: [PATCH] fs: tree-checker: fix missing brace warning for old compilers
On Sat 03 Oct 2020 at 08:11, Pujin Shi wrote: For older versions of gcc, the array = {0}; will cause warnings: So what's the version number of the gcc? "struct foo = { 0 }" should be correct. May be the compiler issue[1] related? 1: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53119 fs/btrfs/tree-checker.c: In function 'check_root_item': fs/btrfs/tree-checker.c:1038:9: warning: missing braces around initializer [-Wmissing-braces] struct btrfs_root_item ri = { 0 }; ^ fs/btrfs/tree-checker.c:1038:9: warning: (near initialization for 'ri.inode') [-Wmissing-braces] 1 warnings generated Fixes: 443b313c7ff8 ("btrfs: tree-checker: fix false alert caused by legacy btrfs root item") Signed-off-by: Pujin Shi --- fs/btrfs/tree-checker.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c index f0ffd5ee77bd..5028b3af308c 100644 --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -1035,7 +1035,7 @@ static int check_root_item(struct extent_buffer *leaf, struct btrfs_key *key, int slot) { struct btrfs_fs_info *fs_info = leaf->fs_info; - struct btrfs_root_item ri = { 0 }; + struct btrfs_root_item ri = {}; const u64 valid_root_flags = BTRFS_ROOT_SUBVOL_RDONLY | BTRFS_ROOT_SUBVOL_DEAD; int ret;
Re: rcutorture: meaning of "End of test: RCU_HOTPLUG"
On 1/23/19 11:22 AM, Paul E. McKenney wrote: On Tue, Jan 22, 2019 at 04:42:19PM +0800, Su Yue wrote: Thanks for your quick reply! Paul On 1/22/19 12:01 PM, Paul E. McKenney wrote: On Tue, Jan 22, 2019 at 11:40:53AM +0800, Su Yue wrote: Hi, guys While running rcutorture tests with "onoff_interval", some tests failed and results show like: = [ 316.354501] srcud-torture:--- End of test: RCU_HOTPLUG: nreaders=1 nfakewriters=4 stat_interval=60 verbose=2 test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fq\ s_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4 shutdown_secs=0 stall_cpu=0 stall_cpu_holdoff=10 stall_cpu_irqsoff=0 n_ba\ rrier_cbs=0 onoff_interval=3 onoff_holdoff=0 I am wondering that meaning of "RCU_HOTPLUG". Is it expected because cpu hotplug is enabled in the test? Or just represents another type of failure? This says that at least one CPU hotplug operation failed, that is, the CPU didn't actually come online or go offline as requested. If you are introducing CPU hotplug to an architecture, this usually indicates that you have bugs in your CPU-hotplug code. Or it nmight be that It should hit the case since there is no RCU CPU stall warnings. RCU grace periods failed to progress -- though this would normally also result in RCU CPU stall warnings. There should be lines containing "ver:" in your console output. What does one of the later one of these say? The line says: == [ 318.850175] busted_srcud-torture: rtc: (null) ver: 27040 tfle: 0 rta: 27040 rtaf: 0 rtf: 27027 rtmbe: 0 rtbe: 0 rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 \ nt: 9497 onoff: 2639/2639:2640/5310 40,373:10,355 162868:67542 (HZ=1000) barrier: 0/0:0 Yes, you have many more offline attempts than successes, which is why RCU_HOTPLUG was printed. = And here are useful errors: = kern :info : [ 135.379693] KVM setup async PF for cpu 1 kern :info : [ 135.381412] kvm-stealtime: cpu 1, msr 23fd16180 kern :alert : [ 135.386897] busted_srcud-torture:torture_onoff Just so your know, busted_srcud can sometimes fail by design. Hence the "busted" in the name. But failure didn't happen this time. Yes..The corner case I mentioned actually happened in every "onoff" tests whatever the torture_type is. task: onlined 1 kern :alert : [ 135.408241] busted_srcud-torture:torture_onoff task: offlining 1 kern :info : [ 135.423310] Unregister pv shared memory for cpu 1 kern :info : [ 135.427940] smpboot: CPU 1 is now offline kern :alert : [ 135.430106] busted_srcud-torture:torture_onoff task: offlined 1 kern :alert : [ 135.436404] busted_srcud-torture:torture_onoff task: offlining 0 kern :alert : [ 135.446173] busted_srcud-torture:torture_onoff task: offline 0 failed: errno -16 kern :alert : [ 135.453076] busted_srcud-torture:torture_onoff task: offlining 0 kern :alert : [ 135.457461] busted_srcud-torture:torture_onoff task: offline 0 failed: errno -16 = There are only two CPUs on the VM. Torture try to offline the last one but -EBUSY occured. I spent time to understand kernel/torture.c. There is torture_onoff(): 225while (!torture_must_stop()) { 226cpu = (torture_random(&rand) >> 4) % (maxcpu + 1); 227if (!torture_offline(cpu, 228 &n_offline_attempts, &n_offline_successes, 229 &sum_offline, &min_offline, &max_offline)) 230torture_online(cpu, 231 &n_online_attempts, &n_online_successes, 232 &sum_online, &min_online, &max_online); 233schedule_timeout_interruptible(onoff_interval); 234} 235 torture_offline() and torture_offline() don't pre judge if the current cpu is only one usable. That does appear to be the case, and that would be a problem with the CONFIG_BOOTPARAM_HOTPLUG_CPU0 listed below. Good catch! Our test machines are configured with CONFIG_BOOTPARAM_HOTPLUG_CPU0. If there are only one oneline and hotplugable cpux, then n_offline_successes != n_offline_attempts which caused "End of test: RCU_HOTPLUG". Does I misunderstand something above? Feel free to correct me. Does the following patch help? Yes, no more "errnor: -16" in dmesg and "End of test: SUCCESS" is in the end. Thanks for your patch. If
Re: rcutorture: meaning of "End of test: RCU_HOTPLUG"
Thanks for your quick reply! Paul On 1/22/19 12:01 PM, Paul E. McKenney wrote: On Tue, Jan 22, 2019 at 11:40:53AM +0800, Su Yue wrote: Hi, guys While running rcutorture tests with "onoff_interval", some tests failed and results show like: = [ 316.354501] srcud-torture:--- End of test: RCU_HOTPLUG: nreaders=1 nfakewriters=4 stat_interval=60 verbose=2 test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fq\ s_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4 shutdown_secs=0 stall_cpu=0 stall_cpu_holdoff=10 stall_cpu_irqsoff=0 n_ba\ rrier_cbs=0 onoff_interval=3 onoff_holdoff=0 I am wondering that meaning of "RCU_HOTPLUG". Is it expected because cpu hotplug is enabled in the test? Or just represents another type of failure? This says that at least one CPU hotplug operation failed, that is, the CPU didn't actually come online or go offline as requested. If you are introducing CPU hotplug to an architecture, this usually indicates that you have bugs in your CPU-hotplug code. Or it nmight be that It should hit the case since there is no RCU CPU stall warnings. RCU grace periods failed to progress -- though this would normally also result in RCU CPU stall warnings. There should be lines containing "ver:" in your console output. What does one of the later one of these say? The line says: == [ 318.850175] busted_srcud-torture: rtc: (null) ver: 27040 tfle: 0 rta: 27040 rtaf: 0 rtf: 27027 rtmbe: 0 rtbe: 0 rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 \ nt: 9497 onoff: 2639/2639:2640/5310 40,373:10,355 162868:67542 (HZ=1000) barrier: 0/0:0 = And here are useful errors: = kern :info : [ 135.379693] KVM setup async PF for cpu 1 kern :info : [ 135.381412] kvm-stealtime: cpu 1, msr 23fd16180 kern :alert : [ 135.386897] busted_srcud-torture:torture_onoff task: onlined 1 kern :alert : [ 135.408241] busted_srcud-torture:torture_onoff task: offlining 1 kern :info : [ 135.423310] Unregister pv shared memory for cpu 1 kern :info : [ 135.427940] smpboot: CPU 1 is now offline kern :alert : [ 135.430106] busted_srcud-torture:torture_onoff task: offlined 1 kern :alert : [ 135.436404] busted_srcud-torture:torture_onoff task: offlining 0 kern :alert : [ 135.446173] busted_srcud-torture:torture_onoff task: offline 0 failed: errno -16 kern :alert : [ 135.453076] busted_srcud-torture:torture_onoff task: offlining 0 kern :alert : [ 135.457461] busted_srcud-torture:torture_onoff task: offline 0 failed: errno -16 = There are only two CPUs on the VM. Torture try to offline the last one but -EBUSY occured. I spent time to understand kernel/torture.c. There is torture_onoff(): 225while (!torture_must_stop()) { 226cpu = (torture_random(&rand) >> 4) % (maxcpu + 1); 227if (!torture_offline(cpu, 228 &n_offline_attempts, &n_offline_successes, 229 &sum_offline, &min_offline, &max_offline)) 230torture_online(cpu, 231 &n_online_attempts, &n_online_successes, 232 &sum_online, &min_online, &max_online); 233schedule_timeout_interruptible(onoff_interval); 234} 235 torture_offline() and torture_offline() don't pre judge if the current cpu is only one usable. Our test machines are configured with CONFIG_BOOTPARAM_HOTPLUG_CPU0. If there are only one oneline and hotplugable cpux, then n_offline_successes != n_offline_attempts which caused "End of test: RCU_HOTPLUG". Does I misunderstand something above? Feel free to correct me. Thanks, Su Thanx, Paul
rcutorture: meaning of "End of test: RCU_HOTPLUG"
Hi, guys While running rcutorture tests with "onoff_interval", some tests failed and results show like: = [ 316.354501] srcud-torture:--- End of test: RCU_HOTPLUG: nreaders=1 nfakewriters=4 stat_interval=60 verbose=2 test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fq\ s_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4 shutdown_secs=0 stall_cpu=0 stall_cpu_holdoff=10 stall_cpu_irqsoff=0 n_ba\ rrier_cbs=0 onoff_interval=3 onoff_holdoff=0 I am wondering that meaning of "RCU_HOTPLUG". Is it expected because cpu hotplug is enabled in the test? Or just represents another type of failure? Thanks, Su
Re: [PATCH v2] btrfs: add a check for sysfs_create_group
On 12/26/18 1:37 PM, Kangjie Lu wrote: In case sysfs_create_group fails, let's check its return value and issues an error message. Signed-off-by: Kangjie Lu --- fs/btrfs/sysfs.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c index 3717c864ba23..24ef416e700b 100644 --- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -889,6 +889,8 @@ void btrfs_sysfs_feature_update(struct btrfs_fs_info *fs_info, */ sysfs_remove_group(fsid_kobj, &btrfs_feature_attr_group); ret = sysfs_create_group(fsid_kobj, &btrfs_feature_attr_group); + if (ret) + btrfs_err(fs_info, "failed to create btrfs_feature_attr_group.\n"); NIT: ".\n" is unnecessary. --- Su } static int btrfs_init_debugfs(void)