Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
Quoting Stefan Behrens (2013-10-23 13:21:34) On Tue, 22 Oct 2013 18:55:59 +0200, Bob Marley wrote: On 22/10/2013 10:37, Stefan Behrens wrote: I don't believe that this issue can ever happen. I don't believe that somewhere on the path to the flash memory, to the magnetic disc or to the drive's cache memory, someone interrupts a 4KB write in the middle of operation to read from this 4KB area. This is not an issue IMHO. I think I have read that unfortunately it can happen. SAS and SATA specs for disks do not mandate that if a write is in-flight but still not completed, reads from the same sector should return the value it is being written; they can return the old value. I also think that Linux does not check either. If the _old_ 4KB block is returned, that's fine and won't cause a checksum error. The patch in question addresses the case that Btrfs submits a write request for a 4KB block, and a concurrent read request for that 4KB block reads partially the old block and partially the new block, resulting in a checksum error reported in the scrub statistic counters. Concurrent reads and writes to the device are completely undefined, and Any combination of old, new, random memory corruption wouldn't surprise me...I'd rather avoid them ;) Doing the transaction join during the super read is probably the least complex choice. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] xfstests btrfs/020: test device replace on RO btrfs
On Thu, Oct 24, 2013 at 12:44:43AM +0800, Eryu Guan wrote: +_cleanup() +{ + cd / Using root as temporary directory? + rm -f $tmp.* + $UMOUNT_PROG $loop_mnt + _destroy_loop_device $loop_dev1 + losetup -d $loop_dev2 /dev/null 21 + _destroy_loop_device $loop_dev3 + rm -rf $loop_mnt + rm -f $fs_img1 $fs_img2 $fs_img3 +} -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
On thu, 24 Oct 2013 06:08:42 -0400, Chris Mason wrote: Quoting Stefan Behrens (2013-10-23 13:21:34) On Tue, 22 Oct 2013 18:55:59 +0200, Bob Marley wrote: On 22/10/2013 10:37, Stefan Behrens wrote: I don't believe that this issue can ever happen. I don't believe that somewhere on the path to the flash memory, to the magnetic disc or to the drive's cache memory, someone interrupts a 4KB write in the middle of operation to read from this 4KB area. This is not an issue IMHO. I think I have read that unfortunately it can happen. SAS and SATA specs for disks do not mandate that if a write is in-flight but still not completed, reads from the same sector should return the value it is being written; they can return the old value. I also think that Linux does not check either. If the _old_ 4KB block is returned, that's fine and won't cause a checksum error. The patch in question addresses the case that Btrfs submits a write request for a 4KB block, and a concurrent read request for that 4KB block reads partially the old block and partially the new block, resulting in a checksum error reported in the scrub statistic counters. Concurrent reads and writes to the device are completely undefined, and Any combination of old, new, random memory corruption wouldn't surprise me...I'd rather avoid them ;) Doing the transaction join during the super read is probably the least complex choice. But it can not block the log tree sync, I think using device_list_mutex is better since we should acquire this mutex when writing the super blocks and we are sure that the super blocks are on non-volatile media on completion after we unlock the mutex. Thanks Miao -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs: stat(2) and /proc/pid/maps returns different devices
On 07/20/2013 12:51 AM, Mark Fasheh wrote: On Thu, Jul 11, 2013 at 12:26:50AM +0200, David Sterba wrote: On Wed, Jul 10, 2013 at 10:45:45AM -0700, Mark Fasheh wrote: Well, what do I get when I pretend I don't care any more? The little voice in my head says keep plugging away. Here's another attempt at fixing this problem in a sane manner. Basically, this time we're adding a flag to s_flags which btrfs sets. Proc will see the flag and call -getattr(). This compiles, but it needs testing (which I will get to soon). It still has a bunch of problems in my honest opinion but maybe if we get something acceptable upstream we can work from there. Also, as Andrew pointed out there's more than one place which is return different device than from stat(2) so I probably need to update more sites to deal with this. Does anyone see a problem with this approach? The approach looks ok to me, the implementation is internal to vfs and fairly minimal. The bit that bothers me is the name of the flag, it's completely unobvious what it means. I'll come up with something better for my next revision :) Mark, David, What are your plans about the next version? Any chance we can see it in the 3.13 merge window? (unless I've missed the fact, that it's already there) I'd really love to see it, as this thing is a blocker for checkpoint-restore on btrfs. Thanks, Pavel -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix race condition between writting and scrubing supers
On 10/24/2013 06:08 PM, Chris Mason wrote: Quoting Stefan Behrens (2013-10-23 13:21:34) On Tue, 22 Oct 2013 18:55:59 +0200, Bob Marley wrote: On 22/10/2013 10:37, Stefan Behrens wrote: I don't believe that this issue can ever happen. I don't believe that somewhere on the path to the flash memory, to the magnetic disc or to the drive's cache memory, someone interrupts a 4KB write in the middle of operation to read from this 4KB area. This is not an issue IMHO. I think I have read that unfortunately it can happen. SAS and SATA specs for disks do not mandate that if a write is in-flight but still not completed, reads from the same sector should return the value it is being written; they can return the old value. I also think that Linux does not check either. If the _old_ 4KB block is returned, that's fine and won't cause a checksum error. The patch in question addresses the case that Btrfs submits a write request for a 4KB block, and a concurrent read request for that 4KB block reads partially the old block and partially the new block, resulting in a checksum error reported in the scrub statistic counters. Concurrent reads and writes to the device are completely undefined, and Any combination of old, new, random memory corruption wouldn't surprise me...I'd rather avoid them ;) Doing the transaction join during the super read is probably the least complex choice. Yeah, by joining transaction we can solve this problem, but it is a little confused, because we don't involve writting in scrubing supers. And the only race condition happens in commiting transaction, Miao also pointed out that maybe the best way is to move btrfs_scrub_continue after write_ctree_super(). Thanks, Wang -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: relocate csums properly with prealloc extents - for 3.12-rc
Hi Chris, this needs to go to 3.12, the patch is only in btrfs-next. The bug can happen with systemd journal + balance, the fix helps quite a lot of users out there. (https://bugzilla.kernel.org/show_bug.cgi?id=63411) I have cherry-picked the patch to current master, applies cleanly and the test btrfs/013 passes, here's my Tested-by: David Sterba dste...@suse.cz david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs raid5 bug task mkfs.btrfs:3695 blocked for more than 120 seconds
when i create raid5 in btrfs ,command like this: ./mkfs.btrfs -d raid5 /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm -f WARNING! - Btrfs v0.20-rc1-358-g194aa4a-dirty IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using then no response, there is error in kernel log: Oct 24 21:25:36 host1 kernel: [ 3000.809503] INFO: task mkfs.btrfs:3695 blocked for more than 120 seconds. Oct 24 21:25:36 host1 kernel: [ 3000.809506] echo 0 /proc/sys/kernel/hung_task_ timeout_secs disables this message. Oct 24 21:25:36 host1 kernel: [ 3000.809508] mkfs.btrfs D 0001 0 3695 3519 0x Oct 24 21:25:36 host1 kernel: [ 3000.809513] 8807f4441c68 0082 8133677d 88080e049590 Oct 24 21:25:36 host1 kernel: [ 3000.809518] 8807f4441fd8 8807f4441fd8 8807f4441fd8 000139c0 Oct 24 21:25:36 host1 kernel: [ 3000.809522] 88080e9ddc00 8808115a8000 88080f7bc000 7fff Oct 24 21:25:36 host1 kernel: [ 3000.809527] Call Trace: Oct 24 21:25:36 host1 kernel: [ 3000.809534] [8133677d] ? rb_insert_color+0xad/0x150 Oct 24 21:25:36 host1 kernel: [ 3000.809539] [8169d8b9] schedule+0x29/0x70 Oct 24 21:25:36 host1 kernel: [ 3000.809543] [8169bfd5] schedule_timeout+0x2a5/0x320 Oct 24 21:25:36 host1 kernel: [ 3000.809547] [8131048c] ? blk_queue_bio+0x1cc/0x3a0 Oct 24 21:25:36 host1 kernel: [ 3000.809551] [8169d70f] wait_for_common+0xdf/0x180 Oct 24 21:25:36 host1 kernel: [ 3000.809555] [8108a360] ? try_to_wake_up+0x200/0x200 Oct 24 21:25:36 host1 kernel: [ 3000.809559] [8169d88d] wait_for_completion+0x1d/0x20 Oct 24 21:25:36 host1 kernel: [ 3000.809563] [81315c14] blkdev_issue_discard+0x1b4/0x1c0 Oct 24 21:25:36 host1 kernel: [ 3000.809567] [81316341] blkdev_ioctl+0x461/0x7a0 Oct 24 21:25:36 host1 kernel: [ 3000.809572] [811beb70] block_ioctl+0x40/0x50 Oct 24 21:25:36 host1 kernel: [ 3000.809576] [811996fa] do_vfs_ioctl+0x8a/0x340 Oct 24 21:25:36 host1 kernel: [ 3000.809579] [8118c72a] ? sys_newfstat+0x2a/0x40 Oct 24 21:25:36 host1 kernel: [ 3000.809583] [81199a41] sys_ioctl+0x91/0xa0 Oct 24 21:25:36 host1 kernel: [ 3000.809588] [816a7029] system_call_fastpath+0x16/0x1b -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid5 bug task mkfs.btrfs:3695 blocked for more than 120 seconds
On Thu, Oct 24, 2013 at 10:22:28PM +0800, lilofile wrote: Oct 24 21:25:36 host1 kernel: [ 3000.809563] [81315c14] blkdev_issue_discard+0x1b4/0x1c0 There's an discard/TRIM operation being done on all of the devices, current progs do not report that and it's really confusing. Fixed in integration branch. If you don't want to do the trim, use the -K switch. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: make filesystem show by label work
with design revamp around filesystem show the fsid filter by label wasn't planned. but apparently that seemed to be necessary. this patch will fix it. Signed-off-by: Anand Jain anand.j...@oracle.com --- cmds-filesystem.c | 120 - 1 files changed, 73 insertions(+), 47 deletions(-) diff --git a/cmds-filesystem.c b/cmds-filesystem.c index d08007e..d2cad81 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -179,6 +179,26 @@ static int cmd_df(int argc, char **argv) return !!ret; } +static int match_search_item_kernel(__u8 *fsid, char *mnt, char *label, + char *search) +{ + char uuidbuf[37]; + int search_len = strlen(search); + + search_len = min(search_len, 37); + uuid_unparse(fsid, uuidbuf); + if (!strncmp(uuidbuf, search, search_len)) + return 1; + + if (strlen(label) strcmp(label, search) == 0) + return 1; + + if (strcmp(mnt, search) == 0) + return 1; + + return 0; +} + static int uuid_search(struct btrfs_fs_devices *fs_devices, char *search) { char uuidbuf[37]; @@ -275,16 +295,18 @@ static int print_one_fs(struct btrfs_ioctl_fs_info_args *fs_info, struct btrfs_ioctl_dev_info_args *tmp_dev_info; uuid_unparse(fs_info-fsid, uuidbuf); - printf(Label: %s uuid: %s\n, - strlen(label) ? label : none, uuidbuf); + if (label strlen(label)) + printf(Label: '%s' , label); + else + printf(Label: none ); - printf(\tTotal devices %llu FS bytes used %s\n, - fs_info-num_devices, + printf( uuid: %s\n\tTotal devices %llu FS bytes used %s\n, uuidbuf, + fs_info-num_devices, pretty_size(calc_used_bytes(space_info))); for (i = 0; i fs_info-num_devices; i++) { tmp_dev_info = (struct btrfs_ioctl_dev_info_args *)dev_info[i]; - printf(\tdevid%llu size %s used %s path %s\n, + printf(\tdevid %4llu size %s used %s path %s\n, tmp_dev_info-devid, pretty_size(tmp_dev_info-total_bytes), pretty_size(tmp_dev_info-bytes_used), @@ -308,7 +330,7 @@ static int check_arg_type(char *input) char path[PATH_MAX]; if (!input) - return BTRFS_ARG_UNKNOWN; + return -EINVAL; if (realpath(input, path)) { if (is_block_device(input) == 1) @@ -320,7 +342,7 @@ static int check_arg_type(char *input) return BTRFS_ARG_UNKNOWN; } - if (!uuid_parse(input, out)) + if (strlen(input) == 36 !uuid_parse(input, out)) return BTRFS_ARG_UUID; return BTRFS_ARG_UNKNOWN; @@ -328,23 +350,19 @@ static int check_arg_type(char *input) static int btrfs_scan_kernel(void *search) { - int ret = 0, fd, type; + int ret = 0, fd; FILE *f; struct mntent *mnt; struct btrfs_ioctl_fs_info_args fs_info_arg; struct btrfs_ioctl_dev_info_args *dev_info_arg = NULL; struct btrfs_ioctl_space_args *space_info_arg; char label[BTRFS_LABEL_SIZE]; - uuid_t uuid; f = setmntent(/proc/self/mounts, r); if (f == NULL) return 1; - type = check_arg_type(search); - if (type == BTRFS_ARG_BLKDEV) - return 1; - + memset(label, 0, sizeof(label)); while ((mnt = getmntent(f)) != NULL) { if (strcmp(mnt-mnt_type, btrfs)) continue; @@ -353,38 +371,36 @@ static int btrfs_scan_kernel(void *search) if (ret) return ret; - switch (type) { - case BTRFS_ARG_UUID: - ret = uuid_parse(search, uuid); - if (ret) - return 1; - if (uuid_compare(fs_info_arg.fsid, uuid)) - continue; - break; - case BTRFS_ARG_MNTPOINT: - if (strcmp(search, mnt-mnt_dir)) - continue; - break; - case BTRFS_ARG_UNKNOWN: - break; + if (get_label_mounted(mnt-mnt_dir, label)) { + kfree(dev_info_arg); + return 1; + } + if (search !match_search_item_kernel(fs_info_arg.fsid, + mnt-mnt_dir, label, search)) { + kfree(dev_info_arg); + continue; } fd = open(mnt-mnt_dir, O_RDONLY); if (fd 0 !get_df(fd, space_info_arg)) { - get_label_mounted(mnt-mnt_dir, label);
Re: [PATCH] Btrfs: fix negative qgroup tracking from owner accounting (bug #61951)
Hello Jan, btrfs_dec_ref() queued a delayed ref for owner of a tree block. The qgroup tracking is based on delayed refs. The owner of a tree block is set when a tree block is allocated, it is never updated. When you allocate a tree block and then remove the subvolume that did the allocation, the qgroup accounting for that removal is correct. However, the removal was accounted again for each subvolume deletion that also referenced the tree block, because accounting was erroneously based on the owner. Instead of queueing delayed refs for the non-existent owner, we now queue delayed refs for the root being removed. This fixes the qgroup accounting. Thanks for tracking this, i apply your patch, and using the flowing patch, found the problem still exist, the test script like the following: #!/bin/sh for i in $(seq 1000) do dd if=/dev/zero of=mnt/$iaaa bs=10K count=1 done btrfs sub snapshot mnt mnt/1 for i in $(seq 100) do btrfs sub snapshot mnt/$i mnt/$(($i+1)) done for i in $(seq 101) do btrfs sub delete mnt/$i done Thanks, Wang Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net Tested-by: dustym...@gmail.com --- fs/btrfs/extent-tree.c | 14 +- 1 files changed, 9 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index d58bef1..7846cae 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3004,12 +3004,11 @@ out: static int __btrfs_mod_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct extent_buffer *buf, -int full_backref, int inc, int for_cow) +int full_backref, u64 ref_root, int inc, int for_cow) { u64 bytenr; u64 num_bytes; u64 parent; - u64 ref_root; u32 nritems; struct btrfs_key key; struct btrfs_file_extent_item *fi; @@ -3019,7 +3018,6 @@ static int __btrfs_mod_ref(struct btrfs_trans_handle *trans, int (*process_func)(struct btrfs_trans_handle *, struct btrfs_root *, u64, u64, u64, u64, u64, u64, int); - ref_root = btrfs_header_owner(buf); nritems = btrfs_header_nritems(buf); level = btrfs_header_level(buf); @@ -3075,13 +3073,19 @@ fail: int btrfs_inc_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct extent_buffer *buf, int full_backref, int for_cow) { - return __btrfs_mod_ref(trans, root, buf, full_backref, 1, for_cow); + u64 ref_root; + + ref_root = btrfs_header_owner(buf); + + return __btrfs_mod_ref(trans, root, buf, full_backref, ref_root, +1, for_cow); } int btrfs_dec_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct extent_buffer *buf, int full_backref, int for_cow) { - return __btrfs_mod_ref(trans, root, buf, full_backref, 0, for_cow); + return __btrfs_mod_ref(trans, root, buf, full_backref, root-objectid, +0, for_cow); } static int write_one_cache_group(struct btrfs_trans_handle *trans, -- 1.7.2.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] btrfs-progs: filesystem show of specified mounted disk should work
On Wed, Oct 23, 2013 at 10:08:16AM +0800, Anand Jain wrote: On 10/22/13 10:33 PM, David Sterba wrote: On Tue, Oct 22, 2013 at 01:53:22PM +0800, Anand Jain wrote: @@ -386,7 +395,7 @@ static int btrfs_scan_kernel(void *search) static const char * const cmd_show_usage[] = { - btrfs filesystem show [options] [path|uuid], + btrfs filesystem show [options|path|uuid], Options should stay separate from the path/uuid, you're extending the syntax to accept a device: btrfs filesystem show [options] [path|uuid|device], I'm fixing it locally, let me know if this doesn't match what you've intended. I am confused, on how the options should be represented, but the internal design is as below. Hm right, it is a bit confusing, I think because of the syntax that allows either options or the path/uuid/device specifier, not both, which is not so common. I still prefer to keep them separate, because it's something that can be clarified in the help text or documentation. Besides, that we may want to add more options that affect path/uuid/device output, the argument description looks consistent with other commands and if some combination is not allowed, then an error message will say why. I really don't expect an average user to think too hard about it. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] btrfs-progs: filesystem show of specified mounted disk should work
On Thu, Oct 24, 2013 at 04:51:00PM +0200, David Sterba wrote: On Wed, Oct 23, 2013 at 10:08:16AM +0800, Anand Jain wrote: On 10/22/13 10:33 PM, David Sterba wrote: On Tue, Oct 22, 2013 at 01:53:22PM +0800, Anand Jain wrote: @@ -386,7 +395,7 @@ static int btrfs_scan_kernel(void *search) static const char * const cmd_show_usage[] = { - btrfs filesystem show [options] [path|uuid], + btrfs filesystem show [options|path|uuid], Options should stay separate from the path/uuid, you're extending the syntax to accept a device: btrfs filesystem show [options] [path|uuid|device], I'm fixing it locally, let me know if this doesn't match what you've intended. I am confused, on how the options should be represented, but the internal design is as below. Hm right, it is a bit confusing, I think because of the syntax that allows either options or the path/uuid/device specifier, not both, which is not so common. Typically, that ends up written as something like: btrfs filesystem show [options] btrfs filesystem show [path|uuid|device] or just as btrfs filesystem show [options] [path|uuid|device] with a comment in the man page that the second parameter can't be supplied with any of the options (if that's the case). Of the two, I prefer the former, but with the acknowledgement that when the command grows some options that can be used with the p/u/d, you'll end up having to change the help text. Hugo. I still prefer to keep them separate, because it's something that can be clarified in the help text or documentation. Besides, that we may want to add more options that affect path/uuid/device output, the argument description looks consistent with other commands and if some combination is not allowed, then an error message will say why. I really don't expect an average user to think too hard about it. david -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I believe that it's closely correlated with --- the aeroswine coefficient. signature.asc Description: Digital signature
Why cannot I move a read-only snapshot around?
Dear list, (newbie alert) After sucessfully sending and receiving a dozen of related snapshots I want to move them all to the readonly folder but I cannot: ls -l . drwxr-xr-x. 1 root root 682 Oct 24 16:01 @20131001 drwxr-xr-x. 1 root root 682 Oct 24 16:07 @20131004 drwxr-xr-x. 1 root root 682 Oct 24 16:10 @20131008 drwxr-xr-x. 1 root root 682 Oct 24 16:16 @20131010 drwxr-xr-x. 1 root root 682 Oct 24 16:23 @20131014 drwxr-xr-x. 1 root root 706 Oct 24 16:24 @20131018 drwxr-xr-x. 1 root root 706 Oct 24 16:31 @20131021 drwxr-xr-x. 1 root root 734 Oct 24 16:36 @20131023 drwxr-xr-x. 1 root root 734 Oct 24 16:41 @20131024 drwxr-xr-x. 1 root root 734 Oct 24 16:41 F19 drwxr-xr-x. 1 root root 0 Oct 24 17:21 readonly mv \@20131024 readonly mv: cannot move ‘@20131024’ to ‘readonly/@20131024’: Read-only file system I know I can create other new ro snapshots within the readonly directory and then delete those above but in the future I want to send/receive based on those snapshots (send -p -c -c ) but I want to move them to a more convenient place. How can I move them without re-sending all? Karl -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix negative qgroup tracking from owner accounting (bug #61951)
On Thu, October 24, 2013 at 16:49 (+0200), Wang Shilong wrote: Hello Jan, btrfs_dec_ref() queued a delayed ref for owner of a tree block. The qgroup tracking is based on delayed refs. The owner of a tree block is set when a tree block is allocated, it is never updated. When you allocate a tree block and then remove the subvolume that did the allocation, the qgroup accounting for that removal is correct. However, the removal was accounted again for each subvolume deletion that also referenced the tree block, because accounting was erroneously based on the owner. Instead of queueing delayed refs for the non-existent owner, we now queue delayed refs for the root being removed. This fixes the qgroup accounting. Thanks for tracking this, i apply your patch, and using the flowing patch, found the problem still exist, the test script like the following: Reproduced. Gives more negative numbers due to accounting triggered by the cleaner thread, that's the common part here. I still believe that the fix I sent is correct, it's probably not complete. Looking into it. Thanks, -Jan #!/bin/sh for i in $(seq 1000) do dd if=/dev/zero of=mnt/$iaaa bs=10K count=1 done btrfs sub snapshot mnt mnt/1 for i in $(seq 100) do btrfs sub snapshot mnt/$i mnt/$(($i+1)) done for i in $(seq 101) do btrfs sub delete mnt/$i done Thanks, Wang -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why cannot I move a read-only snapshot around?
arrgh, forgot to mention: pc2:~ btrfs --version Btrfs v0.20-rc1 Fedora 19 x86_64 Karl -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: relocate csums properly with prealloc extents
The result of the scrubbing came back today and it was not pretty: ... scrub done for b64daec7-6c14-4996-94b3-80c6abfa26ce scrub started at Wed Oct 23 23:01:22 2013 and finished after 34990 seconds total bytes scrubbed: 12.55TB with 3859542 errors error details: csum=3859542 corrected errors: 0, uncorrectable errors: 3859542, unverified errors: 0 --- Still only two folder structures affected, but seemingly unrecoverable. I noticed the mail to include it in 3.12. Jippi! Until this is included I will have to pospone rebalancing over the four new drives. Mvh Hans-Kristian Bakke On 23 October 2013 23:49, Hans-Kristian Bakke hkba...@gmail.com wrote: OK. btrfs scrub and dmesg is hitting me with lots of unfixable errors. All in the same file. Example [13313.441091] btrfs: unable to fixup (regular) error at logical 560107954176 on dev /dev/sdn [13321.532223] scrub_handle_errored_block: 1510 callbacks suppressed [13321.532309] btrfs_dev_stat_print_on_error: 1510 callbacks suppressed [13321.532314] btrfs: bdev /dev/sdq errs: wr 0, rd 0, flush 0, corrupt 40016, gen 0 [13321.532420] btrfs: bdev /dev/sdq errs: wr 0, rd 0, flush 0, corrupt 40017, gen 0 [13321.532545] btrfs: bdev /dev/sdq errs: wr 0, rd 0, flush 0, corrupt 40018, gen 0 [13321.532605] btrfs: bdev /dev/sdq errs: wr 0, rd 0, flush 0, corrupt 40019, gen 0 [13321.533039] btrfs: bdev /dev/sdq errs: wr 0, rd 0, flush 0, corrupt 40020, gen 0 [13321.537519] scrub_handle_errored_block: 1508 callbacks suppressed [13321.537525] btrfs: unable to fixup (regular) error at logical 560630136832 on dev /dev/sdq [13321.537821] btrfs: bdev /dev/sdq errs: wr 0, rd 0, flush 0, corrupt 40021, gen 0 [13321.538081] btrfs: unable to fixup (regular) error at logical 560630140928 on dev /dev/sdq [13321.538438] btrfs: bdev /dev/sdq errs: wr 0, rd 0, flush 0, corrupt 40022, gen 0 [13321.538715] btrfs: unable to fixup (regular) error at logical 560630145024 on dev /dev/sdq [13321.539016] btrfs: bdev /dev/sdq errs: wr 0, rd 0, flush 0, corrupt 40023, gen 0 [13321.539234] btrfs: unable to fixup (regular) error at logical 560630149120 on dev /dev/sdq [13321.539522] btrfs: bdev /dev/sdq errs: wr 0, rd 0, flush 0, corrupt 40024, gen 0 [13321.539739] btrfs: unable to fixup (regular) error at logical 560630153216 on dev /dev/sdq [13321.540027] btrfs: bdev /dev/sdq errs: wr 0, rd 0, flush 0, corrupt 40025, gen 0 [13321.540242] btrfs: unable to fixup (regular) error at logical 560630157312 on dev /dev/sdq [13321.540620] btrfs: unable to fixup (regular) error at logical 560630161408 on dev /dev/sdq [13321.541140] btrfs: unable to fixup (regular) error at logical 560630165504 on dev /dev/sdq [13321.541571] btrfs: unable to fixup (regular) error at logical 560630169600 on dev /dev/sdq [13321.541931] btrfs: unable to fixup (regular) error at logical 560630173696 on dev /dev/sdq Luckily all the corruption seems to be in a single very large file, but on different part of it on different disks. The file was written by rtorrent which have the option system.file_allocate.set = yes configured. I also have samba configured with strict allocate = yes because it is recommended for best performance on extent based filesystems. Do that mean even samba files vulnerable to this corruption too? If so this could become very ugly very fast on certain systems. Mvh Hans-Kristian Bakke On 23 October 2013 23:24, Hans-Kristian Bakke hkba...@gmail.com wrote: I was hit by this when trying to rebalance a 16TB RAID10 to 32TB RAID10 going from 4 to 8 WD SE 4TB drives today. I cannot finish a rebalance because of failed csum. [10228.850910] BTRFS info (device sdq): csum failed ino 487 off 65536 csum 2566472073 private 151366068 [10228.850967] BTRFS info (device sdq): csum failed ino 487 off 69632 csum 2566472073 private 3056924305 [10228.850973] BTRFS info (device sdq): csum failed ino 487 off 593920 csum 2566472073 private 906093395 [10228.851004] BTRFS info (device sdq): csum failed ino 487 off 73728 csum 2566472073 private 2680502892 [10228.851014] BTRFS info (device sdq): csum failed ino 487 off 598016 csum 2566472073 private 1940162924 [10228.851029] BTRFS info (device sdq): csum failed ino 487 off 77824 csum 2566472073 private 2939385278 [10228.851051] BTRFS info (device sdq): csum failed ino 487 off 602112 csum 2566472073 private 645310077 [10228.851055] BTRFS info (device sdq): csum failed ino 487 off 81920 csum 2566472073 private 3600741549 [10228.851078] BTRFS info (device sdq): csum failed ino 487 off 86016 csum 2566472073 private 200201951 [10228.851091] BTRFS info (device sdq): csum failed ino 487 off 606208 csum 2566472073 private 1002916440 The system is running a scrub now and I will return with some more details later. I do not think systemd is logging to this volume, but the scrub wil probably show which files are affected. As this is a very serious issue for those hit by the
Re: Why cannot I move a read-only snapshot around?
On Oct 24, 2013, at 9:29 AM, Karl Kiniger karl.kini...@med.ge.com wrote: Dear list, (newbie alert) After sucessfully sending and receiving a dozen of related snapshots I want to move them all to the readonly folder but I cannot: ls -l . drwxr-xr-x. 1 root root 682 Oct 24 16:01 @20131001 drwxr-xr-x. 1 root root 682 Oct 24 16:07 @20131004 drwxr-xr-x. 1 root root 682 Oct 24 16:10 @20131008 drwxr-xr-x. 1 root root 682 Oct 24 16:16 @20131010 drwxr-xr-x. 1 root root 682 Oct 24 16:23 @20131014 drwxr-xr-x. 1 root root 706 Oct 24 16:24 @20131018 drwxr-xr-x. 1 root root 706 Oct 24 16:31 @20131021 drwxr-xr-x. 1 root root 734 Oct 24 16:36 @20131023 drwxr-xr-x. 1 root root 734 Oct 24 16:41 @20131024 drwxr-xr-x. 1 root root 734 Oct 24 16:41 F19 drwxr-xr-x. 1 root root 0 Oct 24 17:21 readonly mv \@20131024 readonly mv: cannot move ‘@20131024’ to ‘readonly/@20131024’: Read-only file system Are the @ snapshot read only snapshots? And is read only just a regular directory? I don't know that this is a bug, it seems like it could be intentional because a read only file system wouldn't let you move it out of one tree into another. But there was a bug that prevented moving of subvolumes into subvolumes (untested if moving subvolumes into folders worked) that was fixed in kernel 3.11.6 so that might be worth a shot. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: swapfile on btrfs, temporary solution for wiki
Hello, i suggest temporary solution to use swap file under btrfs. I test it, and it work good. I invent simple the way, how create and using swap file, just see following sh code: swapfile=$(losetup -f) #free loop device truncate -s 8G /swap #create 8G sparse swap file losetup $swapfile /swap #mount file to loop mkswap $swapfile swapon $swapfile i just adding this to rc.local and this work good. May be, add it to btrfs Wiki as temporary solution to using swap file? Timofey 2013/10/21 Тимофей Титовец nefelim...@gmail.com: Hello list, i know what btrfs don't support swap files. I read arch wiki and when i reading about systemd addon for auto create swapfile on btrfs, i invent the way, how create and using swap file, just see following sh code: swapfile=$(losetup -f) #free loop device truncate -s 8G /swap #create 8G sparse swap file losetup $swapfile /swap #mount file to loop mkswap $swapfile swapon $swapfile i just adding this to rc.local and this just work. May be, add it to Wiki as temporary solution to using swap file? (sorry for my bad english) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why cannot I move a read-only snapshot around?
Hi, On Thu 131024, Chris Murphy wrote: On Oct 24, 2013, at 9:29 AM, Karl Kiniger karl.kini...@med.ge.com wrote: Dear list, (newbie alert) After sucessfully sending and receiving a dozen of related snapshots I want to move them all to the readonly folder but I cannot: ls -l . .. drwxr-xr-x. 1 root root 734 Oct 24 16:36 @20131023 drwxr-xr-x. 1 root root 734 Oct 24 16:41 @20131024 drwxr-xr-x. 1 root root 734 Oct 24 16:41 F19 drwxr-xr-x. 1 root root 0 Oct 24 17:21 readonly mv \@20131024 readonly mv: cannot move ‘@20131024’ to ‘readonly/@20131024’: Read-only file system Are the @ snapshot read only snapshots? And is read only just a regular directory? Yes they are read only snapshots (just received by btrfs receive) and readonly is a regular directory. I deliberately did not try to move those snapshots into other snapshots. I can move r/w snapshots around without problems (into some regular directory), just the r/o snapshots refuse moving. cat /proc/version Linux version 3.11.6-200.fc19.x86_64 Still curious, Karl I don't know that this is a bug, it seems like it could be intentional because a read only file system wouldn't let you move it out of one tree into another. But there was a bug that prevented moving of subvolumes into subvolumes (untested if moving subvolumes into folders worked) that was fixed in kernel 3.11.6 so that might be worth a shot. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why cannot I move a read-only snapshot around?
Hi (pls see also my other reply in this thread) On Thu 131024, Duncan wrote: Karl Kiniger posted on Thu, 24 Oct 2013 17:29:56 +0200 as excerpted: Dear list, (newbie alert) After sucessfully sending and receiving a dozen of related snapshots I want to move them all to the readonly folder but I cannot: I see you mention fedora 19 in a followup, but for those not on fedora, that's not much help figuring out which kernel you're running. It's likely that the following is your problem, tho there's not enough information in your post to be sure. I promise to include more info in the future but just received snapshots should be read-only if I read the docs correctly. There was a recent regression with nested subvolumes that may be what you're running into. Kernel 3.11 was affected as well as early 3.12-rcs and I believe 3.10 also but I'm not sure how far back, except that someone mentioned trying an old kernel (3.8 or 3.6-ish) and moving subvolumes into subvolumes worked there (tho doing anything involving writing into read-only snapshots shouldn't work, by design, but that doesn't appear to be what you're doing, you're just trying to move read- only snapshots to a different location on a read/write base or parent subvolume, this post assuming it's a parent subvolume, thus triggering the nested subvolumes bug). No nested subvolumes involved. (Is this true? This all is inside the top level volume or what it is called in btrfs.) A fix is available but I'm not sure whether it got into 3.12 (which is just about to be released) or will now have to wait for 3.13. So either try latest 3.12 git and see if its there, or find and cherry-pick the patch, applying it against 3.11 or 3.12. (Given that btrfs is still an experimental filesystem with fixes applied every kernel, while reverting to an old enough kernel should unregress this particular problem, I can't recommend it except possibly for testing against data you don't care about, since by doing so you're exposing yourself to other known and now fixed bugs.) Agreed, I dont want to go back to older kernels - too risky. The data are backed up anyways (on ZFS if you are curious) but the time invested into my current btrfs setup would be gone. I can live with the current situation, its just not nice to have the snapshots lying around in a place where they should not belong. If it were possible to temporarily make the r/o snapshots r/w just for the purpose of moving (being aware that caution is needed) I would not hesitate ane try that. Karl -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why cannot I move a read-only snapshot around?
On Thu 131024, Chris Murphy wrote: dr. 1 chris chris 0 Oct 24 16:15 donotmove [chris@f20s ~]$ mv donotmove/ Videos/ mv: cannot move ‘donotmove/’ to ‘Videos/donotmove’: Permission denied' I own that directory. But because it's read only, I can't move it because moving it changes it. Of course if I become root, that overrides posix permissions, but the readonly status of a subvolume isn't like posix permissions and I see now reason why root should be able to modify it. And moving it does modify it. tries this all as root. drwxr-xr-x. 1 root root 734 Oct 24 16:41 @20131024 (this is a r/o snap) It looks to me similar to a read-only mounted filesystem: pc2:/u2/F19/@20131024# touch foo touch: cannot touch ‘foo’: Read-only file system In what way would a r/o snapshot be modified because of moving its mount point ? No one is ever doing something inside. Karl Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why cannot I move a read-only snapshot around?
On Oct 24, 2013, at 4:46 PM, Karl Kiniger karl.kini...@med.ge.com wrote: On Thu 131024, Chris Murphy wrote: dr. 1 chris chris 0 Oct 24 16:15 donotmove [chris@f20s ~]$ mv donotmove/ Videos/ mv: cannot move ‘donotmove/’ to ‘Videos/donotmove’: Permission denied' I own that directory. But because it's read only, I can't move it because moving it changes it. Of course if I become root, that overrides posix permissions, but the readonly status of a subvolume isn't like posix permissions and I see now reason why root should be able to modify it. And moving it does modify it. tries this all as root. drwxr-xr-x. 1 root root 734 Oct 24 16:41 @20131024 (this is a r/o snap) It looks to me similar to a read-only mounted filesystem: pc2:/u2/F19/@20131024# touch foo touch: cannot touch ‘foo’: Read-only file system In what way would a r/o snapshot be modified because of moving its mount point ? No one is ever doing something inside. For the same reason I can't move or rename a read only directory even though I'm not doing something inside. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix negative qgroup tracking from owner accounting (bug #61951)
Hello Jan, On 10/24/2013 11:36 PM, Jan Schmidt wrote: On Thu, October 24, 2013 at 16:49 (+0200), Wang Shilong wrote: Hello Jan, btrfs_dec_ref() queued a delayed ref for owner of a tree block. The qgroup tracking is based on delayed refs. The owner of a tree block is set when a tree block is allocated, it is never updated. When you allocate a tree block and then remove the subvolume that did the allocation, the qgroup accounting for that removal is correct. However, the removal was accounted again for each subvolume deletion that also referenced the tree block, because accounting was erroneously based on the owner. Instead of queueing delayed refs for the non-existent owner, we now queue delayed refs for the root being removed. This fixes the qgroup accounting. Thanks for tracking this, i apply your patch, and using the flowing patch, found the problem still exist, the test script like the following: Reproduced. Gives more negative numbers due to accounting triggered by the cleaner thread, that's the common part here. I still believe that the fix I sent is correct, it's probably not complete. Looking into it. I really wait cleaner thread to finish work, and i use btrfs-debug-tree to confirm all the fs tree have been deleted. But using btrfs qgroup show, i still get negative numers, also root subvolume's exclusive is wrong.. Statices are like following. 0/5 13090816 471040 0/257 13078528 0 0/259 13078528 0 0/260 13078528 0 0/261 13078528 0 . ... 0/350 13078528 0 0/351 13078528 0 0/352 13078528 0 0/353 13078528 0 0/354 13078528 0 0/355 13078528 0 0/356 13078528 0 0/357 13078528 0 0/358 12619776 -155648 Thanks, Wang Thanks, -Jan #!/bin/sh for i in $(seq 1000) do dd if=/dev/zero of=mnt/$iaaa bs=10K count=1 done btrfs sub snapshot mnt mnt/1 for i in $(seq 100) do btrfs sub snapshot mnt/$i mnt/$(($i+1)) done for i in $(seq 101) do btrfs sub delete mnt/$i done Thanks, Wang -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html