[PATCH] Btrfs-progs: introduce '-p' option and fullpath into subvolume set-default command
In command btrfs subvolume set-default, we used subvolume id and path to set the default subvolume of a filesystem. It's not easy for a common user, so I improved it and the fullpath of a subvolume can be used to set the default subvolume of a filesystem. Signed-off-by: Cheng Yang chenyang.f...@cn.fujitsu.com --- cmds-subvolume.c | 89 ++ man/btrfs.8.in |6 ++-- 2 files changed, 79 insertions(+), 16 deletions(-) diff --git a/cmds-subvolume.c b/cmds-subvolume.c index 8399e72..827234c 100644 --- a/cmds-subvolume.c +++ b/cmds-subvolume.c @@ -26,6 +26,7 @@ #include getopt.h #include kerncompat.h +#include ctree.h #include ioctl.h #include qgroup.h @@ -601,23 +602,66 @@ static int cmd_subvol_get_default(int argc, char **argv) } static const char * const cmd_subvol_set_default_usage[] = { - btrfs subvolume set-default subvolid path, + btrfs subvolume set-default [-p] [subvolid] path, Set the default subvolume of a filesystem, + -pSet the parent tree(subvolume) of the PATH, + as the default subvolume, if PATH is not a subvolume, NULL }; static int cmd_subvol_set_default(int argc, char **argv) { - int ret=0, fd, e; - u64 objectid; + int ret = 0, fd = -1, e; + int parent = 0; + u64 objectid = -1; char*path; - char*subvolid; + char*subvolid, *inv; - if (check_argc_exact(argc, 3)) + optind = 1; + while (1) { + int c = getopt(argc, argv, p); + if (c 0) + break; + + switch (c) { + case 'p': + parent = 1; + break; + default: + usage(cmd_subvol_set_default_usage); + } + } + + if (check_argc_min(argc - optind, 1) || + check_argc_max(argc - optind, 2)) usage(cmd_subvol_set_default_usage); - subvolid = argv[1]; - path = argv[2]; + if (argc - optind == 2) { + subvolid = argv[optind]; + path = argv[optind + 1]; + + objectid = (unsigned long long)strtoll(subvolid, inv, 0); + if (errno == ERANGE || subvolid == inv) { + fprintf(stderr, + ERROR: invalid tree id (%s)\n, subvolid); + return 30; + } + } else { + path = argv[optind]; + + ret = test_issubvolume(path); + if (ret 0) { + fprintf(stderr, + ERROR: error accessing '%s'\n, path); + return 12; + } + if (!ret !parent) { + fprintf(stderr, + ERROR: '%s' is not a subvolume\n, + path); + return 13; + } + } fd = open_file_or_dir(path); if (fd 0) { @@ -625,16 +669,35 @@ static int cmd_subvol_set_default(int argc, char **argv) return 12; } - objectid = (unsigned long long)strtoll(subvolid, NULL, 0); - if (errno == ERANGE) { - fprintf(stderr, ERROR: invalid tree id (%s)\n,subvolid); - return 30; + /* + When objectid is -1, it means that + subvolume id is not specified by user. + We will set default subvolume by path. + */ + if (objectid == -1) { + struct btrfs_ioctl_ino_lookup_args args; + + memset(args, 0, sizeof(args)); + args.treeid = 0; + args.objectid = BTRFS_FIRST_FREE_OBJECTID; + + ret = ioctl(fd, BTRFS_IOC_INO_LOOKUP, args); + if (ret) { + fprintf(stderr, + ERROR: can't perform the search - %s\n, + strerror(errno)); + return ret; + } + + objectid = args.treeid; } + ret = ioctl(fd, BTRFS_IOC_DEFAULT_SUBVOL, objectid); e = errno; close(fd); - if( ret 0 ){ - fprintf(stderr, ERROR: unable to set a new default subvolume - %s\n, + if (ret 0) { + fprintf(stderr, + ERROR: unable to set a new default subvolume - %s\n, strerror(e)); return 30; } diff --git a/man/btrfs.8.in b/man/btrfs.8.in index 3f7765d..2bc1d97 100644 --- a/man/btrfs.8.in +++ b/man/btrfs.8.in @@ -13,7 +13,7 @@ btrfs \- control a btrfs filesystem .PP \fBbtrfs\fP \fBsubvolume list\fP\fI [-pr] [-s 0|1] [-g [+|-]value] [-c [+|-]value] [--rootid=rootid,gen,ogen,path] path\fP .PP -\fBbtrfs\fP \fBsubvolume set-default\fP\fI id path\fP +\fBbtrfs\fP \fBsubvolume
Re: btrfs send/receive review by vfs folks
Hi, write_buf: Used to write the stream to a user space supplied pipe. Please note the ERESTARTSYS comment there, I need some help here as I don't know how to handle that correctly. If I ignore the return value, it loops forever. If I bail out to user space, it reenters the ioctl and starts from the beginning (which is really bad). I have two possible solutions in my mind. 1. Store some kind of state in the ioctl arguments so that we can continue where we stopped when the ioctl reenters. This would however complicate the code a lot. 2. Spawn a thread when the ioctl is called and leave the ioctl immediately. I don't know if ERESTARTSYS can happen in vfs_xxx calls if they happen from a non syscall thread. I am hitting the ERESTARTSYS issue also. To easiest way to repro this is to stop the user process in gdb. As Alexander mentioned, restarting the ioctl from the beginning is really bad, because some commands were already sent to the pipe, and possibly consumed by the user mode (dump_thread). Also the command, on which vfs_write() hit ERESTARTSYS, might not have been pushed fully to the pipe. So if the ioctl() restarts, it starts filling the pipe with duplicate commands, and at least one command in the pipe might be corrupted. So the receive part cannot process such stream successfully (usually it hits crc error). In addition to what Alexander suggested, I have a third suggestion, but I would like to know whether community believes this issue is worth to fix. Thanks! Alex. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send/receive review by vfs folks
Hi Alex, On Mon, September 24, 2012 at 11:13 (+0200), Alex Lyakas wrote: Hi, write_buf: Used to write the stream to a user space supplied pipe. Please note the ERESTARTSYS comment there, I need some help here as I don't know how to handle that correctly. If I ignore the return value, it loops forever. If I bail out to user space, it reenters the ioctl and starts from the beginning (which is really bad). I have two possible solutions in my mind. 1. Store some kind of state in the ioctl arguments so that we can continue where we stopped when the ioctl reenters. This would however complicate the code a lot. 2. Spawn a thread when the ioctl is called and leave the ioctl immediately. I don't know if ERESTARTSYS can happen in vfs_xxx calls if they happen from a non syscall thread. I am hitting the ERESTARTSYS issue also. To easiest way to repro this is to stop the user process in gdb. As Alexander mentioned, restarting the ioctl from the beginning is really bad, because some commands were already sent to the pipe, and possibly consumed by the user mode (dump_thread). Also the command, on which vfs_write() hit ERESTARTSYS, might not have been pushed fully to the pipe. So if the ioctl() restarts, it starts filling the pipe with duplicate commands, and at least one command in the pipe might be corrupted. So the receive part cannot process such stream successfully (usually it hits crc error). In addition to what Alexander suggested, I have a third suggestion, but I would like to know whether community believes this issue is worth to fix. It's a must-fix in my opinion. As you mentioned, it's easy to hit. Second, code like this doesn't look like it should be in mainline at all: 391 /* TODO handle that correctly */ 392 /*if (ret == -ERESTARTSYS) { 393 continue; 394 }*/ I'm looking forward to your proposal, preferably in form of a patch :-) -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Userland commands repository in wiki
On Fri, Sep 21, 2012 at 06:23:36PM +0200, Alfredo Esteban wrote: Yesterday, I modified the URL of the userland commands repository in this wiki page: https://btrfs.wiki.kernel.org/index.php/Debugging_Btrfs_with_GDB Former URL doesn't work: git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git I think the correct one is: git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git Please, confirm that my modification is ok. It's preferred to keep all URIs at a single page, https://btrfs.wiki.kernel.org/index.php/Btrfs_source_repositories Reading through the GDB tutorial page, it seems to come from the old ages when the progs were not usually packaged and one had to compile it from sources. The section Building Btrfs Userland Commands is IMHO not neccesary anymore. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-progs: btrfs subvolume delete could delete subvolumes
On Mon, Sep 24, 2012 at 10:59:34AM +0800, Anand Jain wrote: Yes, this is useful, thanks. I'm thinking if it's ok to stop on first error, ie. when the subvolume does not exist or is a directory. I am fine with either ways. I shall just keep it as it is as of now. Yes, I'm ok with that. The usecase I had in mind was something like btrfs subvol del * but I should know the dir I'm in and what I'm doing there, errors are to be expected. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix return value check in btrfs_ioctl_send()
On Fri, Sep 21, 2012 at 12:42:02PM +0800, Wei Yongjun wrote: From: Wei Yongjun yongjun_...@trendmicro.com.cn In case of error, the function btrfs_read_fs_root_no_name() returns ERR_PTR() and never returns NULL pointer. The NULL test in the return value check should be replaced with IS_ERR(), and remove useless NULL test. Good catch. Also please update the other remaining case in disk-io.c:open_ctree() 2587 fs_info-fs_root = btrfs_read_fs_root_no_name(fs_info, location); 2588 if (!fs_info-fs_root) 2589 goto fail_qgroup; 2590 if (IS_ERR(fs_info-fs_root)) { 2591 err = PTR_ERR(fs_info-fs_root); 2592 goto fail_qgroup; 2593 } dpatch engine is used to auto generated this patch. (https://github.com/weiyj/dpatch) I've played with it, looks nice, although it's full of hardcoded paths. I'd like to run it directly from the git repo. Can you please fix that? No patch from me, because I hadcoded my paths :) It would be great if we can run a set of .cocci scripts that would verify code invariants (limited to cocci capabilities, but eg. the IS_ERR is a good example) and add new ones eventually over time. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crash in read_extent_buffer+0xb7/0xfb
On Sun, Sep 23, 2012 at 09:16:34AM -0700, Marc MERLIN wrote: Oh my, now I'm trying again with a new drive, and a big cp from an existing array to a new one dies with: [32042.079411] [ cut here ] [32042.085799] kernel BUG at fs/btrfs/extent_io.c:1884! [32042.092528] invalid opcode: [#1] PREEMPT SMP [32042.099227] CPU 1 [32042.101095] Modules linked in:[32042.105950] raid456 async_raid6_recov async _pq raid6_pq async_xor xor async_memcpy async_tx ppdev lp tun autofs4 kl5kusb105 ftdi_sio keyspan nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc rc_ati_x10 s nd_timer i915 usbserial snd drm_kms_helper eeepc_wmi drm ati_remote asus_wmi rc_ core sparse_keymap I had a different crash while copying to a btrfs 5 disk array. Not sure if this is also fixed too, but pasting just in case. [207025.055956] btrfs: bdev /dev/mapper/crypt_sdo1 errs: wr 46779, rd 0, flush 7 6, corrupt 0, gen 0 So many write and flush errors? [207055.067267] btrfs bad mapping eb start 8653217792 len 4096, wanted 184467440 50581869634 4 4680 if (start + min_len eb-len) { 4681 printk(KERN_ERR btrfs bad mapping eb start %llu len %lu, 4682wanted %lu %lu\n, (unsigned long long)eb-start, 4683eb-len, start, min_len); 4684 WARN_ON(1); 4685 return -EINVAL; 4686 } 8653217792 = 0x203c5a000 eb-start 4096eb-len 184467440 = 0x00afebff0 start 50581869634 = 0xbc6ea1442 min_len bogus numbers, no pattern, not visible in the stacktrace. [207055.244330] Pid: 6456, comm: btrfs-transacti Tainted: GW 3.5.3-amd64-preempt-noide-20120903 #1 System manufacturer System Product Name/P8H67-M PRO [207055.261478] RIP: 0010:[811fc9ae] [811fc9ae] read_extent_buffer+0xb7/0xfb [207055.271621] RSP: 0018:880105ff3880 EFLAGS: 00010202 [207055.278516] RAX: 0bbe RBX: 8800405ba1f8 RCX: 8800405ba2c8 [207055.287257] RDX: 880105ff38ec RSI: 0086 RDI: 880105ff38ec [207055.295967] RBP: 880105ff38c0 R08: 007ffd4ebdc8 R09: 1600 [207055.304674] R10: 1000 R11: 6db6db6db6db6db7 R12: 0004 R11 contains the POISON_FREE pattern, though it's not clear who and where used it. It may come from some unhandled case in the write error recovery paths. The crash site is not any of the BUG_ON but some place that actually tries to access an unmapped memory, so from that point it slipped through sanity checks. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] Btrfs: tivial cleanup: add space between = and the rest code
On Mon, Sep 24, 2012 at 08:47:33AM +0800, Wang Sheng-Hui wrote: trivial code cleanup. - ret =btrfs_drop_snapshot(root, NULL, 1, 0); + ret = btrfs_drop_snapshot(root, NULL, 1, 0); Sorry but this is too trivial. Unless it really bugs you when you're going through code, I don't think that cleanups at this level are necessary. Reading through commit history of some code via 'git blame' and seeing such cleanups is not welcome. I have a patchet in testing that updates a few things around snapshot cleaning and this line will get fixed, so it'll not stay forever. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix return value check in btrfs_ioctl_send()
On 09/24/2012 08:42 PM, David Sterba wrote: On Fri, Sep 21, 2012 at 12:42:02PM +0800, Wei Yongjun wrote: From: Wei Yongjun yongjun_...@trendmicro.com.cn In case of error, the function btrfs_read_fs_root_no_name() returns ERR_PTR() and never returns NULL pointer. The NULL test in the return value check should be replaced with IS_ERR(), and remove useless NULL test. Good catch. Also please update the other remaining case in disk-io.c:open_ctree() 2587 fs_info-fs_root = btrfs_read_fs_root_no_name(fs_info, location); 2588 if (!fs_info-fs_root) 2589 goto fail_qgroup; 2590 if (IS_ERR(fs_info-fs_root)) { 2591 err = PTR_ERR(fs_info-fs_root); 2592 goto fail_qgroup; 2593 } will do in the patch v2. dpatch engine is used to auto generated this patch. (https://github.com/weiyj/dpatch) I've played with it, looks nice, although it's full of hardcoded paths. Yes, it is current hardcoded path, because I used the linux.git and linux-next.git, and dpatch will update the git tree every day. After login, you can change the git url in the web. I'd like to run it directly from the git repo. Can you please fix that? No patch from me, because I hadcoded my paths :) Do you mean does not need auto update git repo, only scan the change under your git repo? It would be great if we can run a set of .cocci scripts that would verify code invariants (limited to cocci capabilities, but eg. the IS_ERR is a good example) and add new ones eventually over time. By default, It does not include cocci scripts, only include dup include, useless 'linux/version.h' check engine, and is disabled. I am now testing some cocci scripts, and will add them to the github, so, everyone can import those cocci scripts to dpatch, also they can add the cocci scripts create by themself. ^_^ Thanks very much for you advise. Regards, Yongjun Wei -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] Btrfs: fix return value check
From: Wei Yongjun yongjun_...@trendmicro.com.cn In case of error, the function btrfs_read_fs_root_no_name() returns ERR_PTR() and never returns NULL pointer. The NULL test in the return value check should be replaced with IS_ERR(), and remove useless NULL test. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn --- v1 - v2: remove null test from fs/btrfs/disk-io.c --- fs/btrfs/disk-io.c | 2 -- fs/btrfs/send.c| 8 ++-- 2 files changed, 2 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 22e98e0..1405620 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2593,8 +2593,6 @@ retry_root_backup: location.offset = (u64)-1; fs_info-fs_root = btrfs_read_fs_root_no_name(fs_info, location); - if (!fs_info-fs_root) - goto fail_qgroup; if (IS_ERR(fs_info-fs_root)) { err = PTR_ERR(fs_info-fs_root); goto fail_qgroup; diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index fb5ffe9..a617451 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -4501,10 +4501,6 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_) key.type = BTRFS_ROOT_ITEM_KEY; key.offset = (u64)-1; clone_root = btrfs_read_fs_root_no_name(fs_info, key); - if (!clone_root) { - ret = -EINVAL; - goto out; - } if (IS_ERR(clone_root)) { ret = PTR_ERR(clone_root); goto out; @@ -4520,8 +4516,8 @@ long btrfs_ioctl_send(struct file *mnt_file, void __user *arg_) key.type = BTRFS_ROOT_ITEM_KEY; key.offset = (u64)-1; sctx-parent_root = btrfs_read_fs_root_no_name(fs_info, key); - if (!sctx-parent_root) { - ret = -EINVAL; + if (IS_ERR(sctx-parent_root)) { + ret = PTR_ERR(sctx-parent_root); goto out; } } -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Identifying reflink / CoW files
On Sun, Sep 23, 2012 at 09:56:34AM +1200, Jp Wise wrote: Afaik without playing with it myself fiemap can give you information about the mappings of each file. If the mappings of 2 files match, the data is shared. OK, have just done some searching on fiemap and located a code example using it to pull the file extent data. http://smackerelofopinion.blogspot.com/2010/01/using-fiemap-ioctl-to-get-file-extents.html Will have a play around to see if i might be able to hack it up to compare two files, or just parse it's output between two files to identify matches. Thank you for the pointer. :) Likewise if anyone else knows of an existing utility to do a non-bytewise compare between two files, and just check if they share the same datablocks please let me know. :) The FIEMAP is a way with stable and defined interface to show file extents, there's the filefrag utility (from e2fsprogs). It does not have a parser-friendly output, so you may want to call the ioctl directly. The key information is in the (struct fiemap_extent)-fe_physical field. If physical block ranges from two files overlap, they're shared. There's another way how to get the extent info, via btrfs' SEARCH_TREE ioctl, but it's more low-level and needs basic knowledge about the internal b-tree items and how they're linked together. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rebuilding chunk root?
On Mon, Sep 24, 2012 at 04:28:08PM +0300, Sami Haahtinen wrote: Due to certain unfortunate chain of events, I managed to overwrite a small portion of my btrfs array which had only single redundancy for metadata. The data itself is present and only a small portion (2.5%) of the array was overwritten. After quite a bit of debugging and tinkering, I realized that my chunk root was in the portion that was overwritten. After reading through the documentation I was able to pull together it's still unclear to me whether chunk root is something that can be rebuilt. Chris had some experimental code for doing it in btrfsck which never saw the light of day (because it was too unreliable). He may be able to offer you something to help, though. A transcript of btrfsck trying to recover with superblock 2 which is uncorrupted by itself: root@sysresccd /root/btrfs-progs % ./btrfsck --super 2 /dev/patience/home using SB copy 2, bytenr 274877906944 Check tree block failed, want=139264, have=0 Check tree block failed, want=139264, have=0 Check tree block failed, want=139264, have=0 read block failed check_tree_block Couldn't read chunk root If I'm interpreting the output correctly, it's trying to read bytes from address 139264, which would fall into the corrupted area. No, I believe the want=, have= text is referring to a generation ID, not a block number. That's not to say that your chunk tree isn't damaged, though -- I'm just clarifying your interpretation of the numbers. Out of interest, does mounting with -o recovery help at all? (I'm not expecting it to do much if your chunk tree's gone, but it might do something). Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Eighth Army Push Bottles Up Germans -- WWII newspaper --- headline (possibly apocryphal) signature.asc Description: Digital signature
Re: [PATCH] Btrfs: fix return value check in btrfs_ioctl_send()
On Mon, Sep 24, 2012 at 09:40:00PM +0800, Wei Yongjun wrote: dpatch engine is used to auto generated this patch. (https://github.com/weiyj/dpatch) I've played with it, looks nice, although it's full of hardcoded paths. Yes, it is current hardcoded path, because I used the linux.git and linux-next.git, and dpatch will update the git tree every day. After login, you can change the git url in the web. Sure, I was able to point the git source to my local linux.git. I'd like to run it directly from the git repo. Can you please fix that? No patch from me, because I hadcoded my paths :) Do you mean does not need auto update git repo, only scan the change under your git repo? I'm talking about hardcoded paths for the scripts in bin/ like dpatch/views/engine.py: args = '/usr/dpatch/bin/importcocci.sh %s' % fname or dpatch/models.py: return '/var/lib/dpatch/repo/' + dname Not to say that '/usr/' is not the right place for adding new per-package subdirs, refer to http://www.pathname.com/fhs/pub/fhs-2.3.html#THEUSRHIERARCHY . (Also the database has hardcoded path into /var/lib/dpatch) It would be great if we can run a set of .cocci scripts that would verify code invariants (limited to cocci capabilities, but eg. the IS_ERR is a good example) and add new ones eventually over time. By default, It does not include cocci scripts, only include dup include, useless 'linux/version.h' check engine, and is disabled. No problem that the .cocci scripts are not there, I'm fine with adding them manually now, just to test the tool, but all the hardcoded paths didn't let me do that :) david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Btrfs: fix return value check
On Mon, Sep 24, 2012 at 09:42:26PM +0800, Wei Yongjun wrote: From: Wei Yongjun yongjun_...@trendmicro.com.cn In case of error, the function btrfs_read_fs_root_no_name() returns ERR_PTR() and never returns NULL pointer. The NULL test in the return value check should be replaced with IS_ERR(), and remove useless NULL test. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn --- v1 - v2: remove null test from fs/btrfs/disk-io.c --- Reviewed-by: David Sterba dste...@suse.cz -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-progs: btrfs subvolume delete could delete subvolumes
On Mon, Sep 24, 2012 at 6:02 AM, David Sterba d...@jikos.cz wrote: On Mon, Sep 24, 2012 at 10:59:34AM +0800, Anand Jain wrote: Yes, this is useful, thanks. I'm thinking if it's ok to stop on first error, ie. when the subvolume does not exist or is a directory. I am fine with either ways. I shall just keep it as it is as of now. Yes, I'm ok with that. The usecase I had in mind was something like btrfs subvol del * but I should know the dir I'm in and what I'm doing there, errors are to be expected. For what it's worth, rmdir's behaviour is to continue after errors (i.e., mkdir 1; mkdir 3; rmdir 1 2 3 deletes 1 and 3, and exits with a non-zero exit code); unless there's a good reason to do otherwise, matching that behaviour is probably best. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crash in read_extent_buffer+0xb7/0xfb
On Mon, Sep 24, 2012 at 03:08:47PM +0200, David Sterba wrote: I had a different crash while copying to a btrfs 5 disk array. Not sure if this is also fixed too, but pasting just in case. [207025.055956] btrfs: bdev /dev/mapper/crypt_sdo1 errs: wr 46779, rd 0, flush 7 6, corrupt 0, gen 0 So many write and flush errors? It's possible, I have crappy drives that were cheap that I'm using for tests and copies. R11 contains the POISON_FREE pattern, though it's not clear who and where used it. It may come from some unhandled case in the write error recovery paths. Considering that I was doing a huge copy to a brtfs filesystem (source was ext4) and that I was using crappy drives in a 5 drives configuration with no redundancy since there is no raid5 yet, it's very possible. The crash site is not any of the BUG_ON but some place that actually tries to access an unmapped memory, so from that point it slipped through sanity checks. If that helps, I forgot to decode the ASM: 0: b7 6d mov$0x6d,%bh 2: db b6 6d db b6 6d (bad) 0x6db6db6d(%rsi) 8: 49 bd 00 00 00 00 00movabs $0x8800,%r13 f: 88 ff ff 12: 49 c1 e0 03 shl$0x3,%r8 16: eb 43 jmp0x5b 18: 48 8b 8b 50 01 00 00mov0x150(%rbx),%rcx 1f: 4c 89 d0mov%r10,%rax 22: 48 89 d7mov%rdx,%rdi 25: 4c 29 f8sub%r15,%rax 28: 4c 39 e0cmp%r12,%rax 2b:* 4a 8b 0c 01 mov(%rcx,%r8,1),%rcx -- trapping instruction 2f: 49 0f 47 c4 cmova %r12,%rax 33: 49 83 c0 08 add$0x8,%r8 37: 49 29 c4sub%rax,%r12 3a: 4c 01 c9add%r9,%rcx 3d: 48 rex.W 3e: c1 .byte 0xc1 3f: f9 stc Code starting with the faulting instruction === 0: 4a 8b 0c 01 mov(%rcx,%r8,1),%rcx 4: 49 0f 47 c4 cmova %r12,%rax 8: 49 83 c0 08 add$0x8,%r8 c: 49 29 c4sub%rax,%r12 f: 4c 01 c9add%r9,%rcx 12: 48 rex.W 13: c1 .byte 0xc1 14: f9 stc For [207055.244330] Pid: 6456, comm: btrfs-transacti Tainted: GW 3.5.3-amd64-preempt-noide-20120903 #1 System manufacturer System Product Name/P8H67-M PRO [207055.261478] RIP: 0010:[811fc9ae] [811fc9ae] read_extent_buffer+0xb7/0xfb [207055.271621] RSP: 0018:880105ff3880 EFLAGS: 00010202 [207055.278516] RAX: 0bbe RBX: 8800405ba1f8 RCX: 8800405ba2c8 [207055.287257] RDX: 880105ff38ec RSI: 0086 RDI: 880105ff38ec [207055.295967] RBP: 880105ff38c0 R08: 007ffd4ebdc8 R09: 1600 [207055.304674] R10: 1000 R11: 6db6db6db6db6db7 R12: 0004 [207055.313356] R13: 8800 R14: fffa9d7b9446 R15: 044 2 [207055.322032] FS: () GS:88011f38() knlGS: [207055.331692] CS: 0010 DS: ES: CR0: 8005003b [207055.339014] CR2: f7021000 CR3: 01a0c000 CR4: 000407e0 [207055.347715] DR0: DR1: DR2: [207055.356403] DR3: DR6: 0ff0 DR7: 0400 [207055.365092] Process btrfs-transacti (pid: 6456, threadinfo 880105ff2000,task 880105e7e600) [207055.376219] Stack: [207055.380369] fffa9d7b9442 000fffa9d7b9 880105ff38a0 [207055.389447] 8800405ba1f8 fffa9d7b9431 fffa9d7b9442 798be017 [207055.398481] 880105ff3910 811f2855 8800405ba1f8 fffa9d7b9000 [207055.407543] Call Trace: [207055.411582] [811f2855] btrfs_token_item_offset+0x86/0xb8 [207055.419436] [811f295f] btrfs_item_offset+0xb/0xd [207055.426585] [811c04bf] btrfs_item_offset_nr+0x14/0x16 [207055.434143] [811c08f9] leaf_space_used+0x58/0x81 [207055.441269] [811c42ea] btrfs_leaf_free_space+0x33/0x72 [207055.448924] [811c4d45] push_leaf_right+0xa1/0x142 [207055.456092] [814aa936] ? _raw_spin_lock+0x1b/0x1f [207055.463329] [811c4f13] split_leaf+0x79/0x52f [207055.470222] [811f295f] ? btrfs_item_offset+0xb/0xd [207055.477483] [811c08f9] ? leaf_space_used+0x58/0x81 [207055.484744] [814aac0e] ? _raw_write_unlock+0x28/0x33 [207055.492203] [8120a523] ? btrfs_set_lock_blocking_rw+0x9b/0xec [207055.500770] [811c5b5c] btrfs_search_slot+0x583/0x62e [207055.508199] [811c6e32] btrfs_insert_empty_items+0x62/0xb4 [207055.516029] [811cef40] run_clustered_refs+0x3e2/0x741 [207055.523655] [811cf503]
Re: Rebuilding chunk root?
On Mon, Sep 24, 2012 at 03:02:39PM +0100, Hugo Mills wrote: root@sysresccd /root/btrfs-progs % ./btrfsck --super 2 /dev/patience/home using SB copy 2, bytenr 274877906944 Check tree block failed, want=139264, have=0 Check tree block failed, want=139264, have=0 Check tree block failed, want=139264, have=0 read block failed check_tree_block Couldn't read chunk root If I'm interpreting the output correctly, it's trying to read bytes from address 139264, which would fall into the corrupted area. No, I believe the want=, have= text is referring to a generation ID, not a block number. That's not to say that your chunk tree isn't damaged, though -- I'm just clarifying your interpretation of the numbers. 40 static int check_tree_block(struct btrfs_root *root, struct extent_buffer *buf) 41 { 42 43 struct btrfs_fs_devices *fs_devices; 44 int ret = 1; 45 46 if (buf-start != btrfs_header_bytenr(buf)) { 47 printk(Check tree block failed, want=%Lu, have=%Lu\n, 48buf-start, btrfs_header_bytenr(buf)); 49 return ret; 50 } No, it's the block address in bytes, 4k-block number 34. Out of interest, does mounting with -o recovery help at all? (I'm not expecting it to do much if your chunk tree's gone, but it might do something). The -o recovery has access to the respective tree roots, but the contents may be destroyed already. The chunk tree is not deep, I can see height 1 on a 6 disk array (though lightly used, 1 node, 8 leaves) and 3 disk array (1/7 TB used, 1 node, 29 leaves). So it's quite a small amount of data to destroy the chunktree completely, COW will lower the chances a bit. Rebuilding from scratch does not look simple, the available information is stored in BLOCK_GROUP_ITEMs or INODE_ITEMs and covers portions of the chunks. Given that the device tree would be probably damaged as well, the amount of information to do cross-check is not high. Maybe replaying the chunk creation logic can save some guesswork. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crash in read_extent_buffer+0xb7/0xfb
On Mon, Sep 24, 2012 at 07:41:03AM -0700, Marc MERLIN wrote: It's possible, I have crappy drives that were cheap that I'm using for tests and copies. Yeah, that makes a good use of crappy disks :) Considering that I was doing a huge copy to a brtfs filesystem (source was ext4) and that I was using crappy drives in a 5 drives configuration with no redundancy since there is no raid5 yet, it's very possible. Well, in your case raid1 might not be enough to protect the data. 0: b7 6d mov$0x6d,%bh 2: db b6 6d db b6 6d (bad) 0x6db6db6d(%rsi) 8: 49 bd 00 00 00 00 00movabs $0x8800,%r13 f: 88 ff ff 12: 49 c1 e0 03 shl$0x3,%r8 16: eb 43 jmp0x5b 18: 48 8b 8b 50 01 00 00mov0x150(%rbx),%rcx 1f: 4c 89 d0mov%r10,%rax 22: 48 89 d7mov%rdx,%rdi 25: 4c 29 f8sub%r15,%rax 28: 4c 39 e0cmp%r12,%rax 2b:* 4a 8b 0c 01 mov(%rcx,%r8,1),%rcx -- trapping instruction 8800405ba2c8 + 007ffd4ebdc8 = 1007f88003daa6090 and overflows 64bit I'm afraid this does not tell much of the story. The last function that is not a struct helper was leaf_space_used(), via push_leaf_right, split_leaf() from btrfs_search_slot -- all sanity chcecks I see are past any of those calls, so it's probably corrupted on-disk. The call stack is unfortunatelly deep and going backwards in assembly to track where R11 could get set is tedious. Did you see any other messages in the log? If you could recreate the filesystem and workload, doing a fsck occasionally may narrow down the surface for analysis. Otherwise I'm out of ideas now. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: enquiry about autodefrag option
On Fri, Sep 21, 2012 at 05:03:41AM +0800, ching wrote: AFAIK, the autodefrag will read related extents and mark them dirty, io niceness should be applicable to the read operation I was hesitant to believe that, but tried it and it's right: $ dd if=/dev/zero bs=4k count=1 seek=15 of=file $ dd if=/dev/zero bs=4k count=1 seek=13 of=file conv=notrunc $ dd if=/dev/zero bs=4k count=1 seek=14 of=file conv=notrunc Filesystem type is: 9123683e File size of file is 65536 (16 blocks, blocksize 4096) ext logical physical expected length flags 0 13 1027 1 1 14 1050 1027 1 2 15 1025 1050 1 eof file: 4 extents found remount with autodefrag: $ dd if=/dev/zero bs=4k count=1 seek=16 of=file conv=notrunc Filesystem type is: 9123683e File size of file is 69632 (17 blocks, blocksize 4096) ext logical physical expected length flags 0 13 1033 4 eof file: 2 extents found david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs: check range early in map_private_extent_buffer
On Mon, Sep 24, 2012 at 12:38:07PM +0800, Wang Sheng-Hui wrote: Check range early to avoid further check/compute in case of range error. Signed-off-by: Wang Sheng-Hui shh...@gmail.com --- fs/btrfs/extent_io.c | 16 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 4c87847..9250cf5 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4643,6 +4643,14 @@ int map_private_extent_buffer(struct extent_buffer *eb, unsigned long start, unsigned long end_i = (start_offset + start + min_len - 1) PAGE_CACHE_SHIFT; + if (start + min_len eb-len) { + printk(KERN_ERR btrfs bad mapping eb start %llu len %lu, +wanted %lu %lu\n, (unsigned long long)eb-start, +eb-len, start, min_len); + WARN_ON(1); + return -EINVAL; + } + if (i != end_i) return -EINVAL; 4665 unsigned long i = (start_offset + start) PAGE_CACHE_SHIFT; 4666 unsigned long end_i = (start_offset + start + min_len - 1) 4667 PAGE_CACHE_SHIFT; so the check above effectively verifies that min_len - 1 PAGE_CACHE_SIZE AND is within the same page The other check if (start + min_len eb-len) { looks if the requested data do not lie out of the bounds of the extent buffer, where min_len is filled with sizeof(something). So, both the checks look for corrupted metadata, I don't see the need to swap them. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 1/2] Btrfs: cleanup duplicated division functions
On Sun, Sep 23, 2012 at 05:54:18PM +0800, Miao Xie wrote: @@ -2347,7 +2335,8 @@ static int chunk_usage_filter(struct btrfs_fs_info *fs_info, u64 chunk_offset, cache = btrfs_lookup_block_group(fs_info, chunk_offset); chunk_used = btrfs_block_group_used(cache-item); - user_thresh = div_factor_fine(cache-key.offset, bargs-usage); + BUG_ON(bargs-usage 0 || bargs-usage 100); otherwise it reliably crashes here Sorry, I don't know why it will crash here if we input 0. I tried to input 0, and it worked well. My oversight, sorry. I think the only case we must take into account is the users might input the wrong value (100 or 0) on the old kernel, and it can be stored into the filesystem. If we mount this filesystem on the new kernel, some problems may happen. So better avoid a BUG_ON. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rebuilding chunk root?
On Mon, Sep 24, 2012 at 6:12 PM, David Sterba d...@jikos.cz wrote: On Mon, Sep 24, 2012 at 03:02:39PM +0100, Hugo Mills wrote: Out of interest, does mounting with -o recovery help at all? (I'm not expecting it to do much if your chunk tree's gone, but it might do something). The -o recovery has access to the respective tree roots, but the contents may be destroyed already. The chunk tree is not deep, I can see height 1 on a 6 disk array (though lightly used, 1 node, 8 leaves) and 3 disk array (1/7 TB used, 1 node, 29 leaves). So it's quite a small amount of data to destroy the chunktree completely, COW will lower the chances a bit. Yeah, the whole tree is gone, I'm pretty sure of it since the first 20-50GB has been wiped from the drive and the mentioned address is in the beginning of that part. I just wonder if there is any chance of the older versions of the chunk tree still being somewhere and how to find them. I doubt it's an easy feat though. Rebuilding from scratch does not look simple, the available information is stored in BLOCK_GROUP_ITEMs or INODE_ITEMs and covers portions of the chunks. Given that the device tree would be probably damaged as well, the amount of information to do cross-check is not high. Maybe replaying the chunk creation logic can save some guesswork. Replaying chunk creation logic would not help that much, since the drive has been resized a few times and had other operations that have modified the chunk tree as well. The array itself is not that complex (2 drives), but it's still not as simple as a single drive array. Regards, -- Sami Haahtinen Bad Wolf Oy +358443302775 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 1/2] Btrfs: cleanup duplicated division functions
On Mon, Sep 24, 2012 at 10:05:02AM +0800, Miao Xie wrote: Generally, we should check the value when it is input. If not, we might run our program with the wrong value, and it is possible to cause unknown problems. So I think the above chunk is necessary. The difference is that giving an invalid value will exit early (your version), while Ilya's version will clamp the 'usage' to the allowed range during processing. From usability POV, I'd prefer to let the command fail early with a verbose error eg. that the range is invalid. (diff is on top of the patch in question) This is the most straightforward transformation I can think of. It doesn't result in an unnecessary BUG_ON, keeps churn to a minimum and agree with you. doesn't change the style of the balance ioctl. (If I were to check every filter argument that way, btrfs_balance_ioctl() would be very long and complicated.) I think the check in btrfs_balance_ioctl() is necessary, the reason is above. btrfs_balance_ioctl does not seem as the right place, it does the processing related to the state of balance (resume/cancel etc). Look at btrfs_balance() itself, it does lot more sanity checks of the parameters. We can put the extra checks into helpers (and not only this one) if clarity and readability of the function becomes a concern. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ENOSPC design issues
On Thu, Sep 20, 2012 at 2:03 PM, Josef Bacik jba...@fusionio.com wrote: Hello, I'm going to look at fixing some of the performance issues that crop up because of our reservation system. Before I go and do a whole lot of work I want some feedback. When I was trying to figure out the problem with gzip ENOSPC issues, I spent some time debugging and following the flow through the reserve_metadata_bytes() function in extent-tree.c. My observation was that the accounting around space_info-bytes_may_use did not appear to be tightly closed. The space_info-bytes_may_use value would grow large (often 3 or 4 times greater than space_info-total), and the flow through reserve_metadata_bytes() would stay in overcommit. I was unsuccessfull in figuring out how to rework or close the loop on the accounting for space_info-bytes_may_use. I noticed that btrfs seemed to work OK even though the value in space_info-bytes_may_use appeared inexplicably large, and btrfs was always in overcommit. So, since you're asking for possibly 'crazy ideas', I suggest considering finding a way to ignore space_info-bytes_may_use in reserve_metadata_bytes(). Either make the overcommit the default (which I found to approximate my real-life case anyhow), or have a simple mechanism for quick fail-over to overcommit. I doubt this will be any kind of comprehensive fix for ENOSPC issues, but simplifying reserve_metadata_bytes() may make it easier to find the other issues. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: run delayed refs first when out of space
Running delayed refs is faster than running delalloc, so lets do that first to try and reclaim space. This makes my fs_mark test about 20% faster. Thanks, Signed-off-by: Josef Bacik jba...@fusionio.com --- fs/btrfs/extent-tree.c | 20 ++-- 1 files changed, 10 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index ed5ea43..efb044e 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3800,10 +3800,10 @@ commit: } enum flush_state { - FLUSH_DELALLOC = 1, - FLUSH_DELALLOC_WAIT = 2, - FLUSH_DELAYED_ITEMS_NR = 3, - FLUSH_DELAYED_ITEMS = 4, + FLUSH_DELAYED_ITEMS_NR = 1, + FLUSH_DELAYED_ITEMS = 2, + FLUSH_DELALLOC = 3, + FLUSH_DELALLOC_WAIT = 4, ALLOC_CHUNK = 5, COMMIT_TRANS= 6, }; @@ -3817,11 +3817,6 @@ static int flush_space(struct btrfs_root *root, int ret = 0; switch (state) { - case FLUSH_DELALLOC: - case FLUSH_DELALLOC_WAIT: - shrink_delalloc(root, num_bytes, orig_bytes, - state == FLUSH_DELALLOC_WAIT); - break; case FLUSH_DELAYED_ITEMS_NR: case FLUSH_DELAYED_ITEMS: if (state == FLUSH_DELAYED_ITEMS_NR) { @@ -3842,6 +3837,11 @@ static int flush_space(struct btrfs_root *root, ret = btrfs_run_delayed_items_nr(trans, root, nr); btrfs_end_transaction(trans, root); break; + case FLUSH_DELALLOC: + case FLUSH_DELALLOC_WAIT: + shrink_delalloc(root, num_bytes, orig_bytes, + state == FLUSH_DELALLOC_WAIT); + break; case ALLOC_CHUNK: trans = btrfs_join_transaction(root); if (IS_ERR(trans)) { @@ -3886,7 +3886,7 @@ static int reserve_metadata_bytes(struct btrfs_root *root, struct btrfs_space_info *space_info = block_rsv-space_info; u64 used; u64 num_bytes = orig_bytes; - int flush_state = FLUSH_DELALLOC; + int flush_state = FLUSH_DELAYED_ITEMS_NR; int ret = 0; bool flushing = false; bool committed = false; -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: do not async metadata csums if we have hardware crc32c
The reason we offload csumming is because it is CPU intensive, except it is not on modern intel CPUs. So check to see if we support hardware crc32c, and if we do just do the csumming in our current threads context. Otherwise we can farm it off. Thanks, Signed-off-by: Josef Bacik jba...@fusionio.com --- fs/btrfs/disk-io.c | 17 + 1 files changed, 17 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index dcaf556..830b9af 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -31,6 +31,7 @@ #include linux/migrate.h #include linux/ratelimit.h #include asm/unaligned.h +#include asm/cpufeature.h #include compat.h #include ctree.h #include disk-io.h @@ -880,6 +881,22 @@ static int btree_submit_bio_hook(struct inode *inode, int rw, struct bio *bio, } /* +* Pretty sure I'm going to hell for this. If our CPU can do crc32cs in +* the hardware then there is no reason to do the csum stuff +* asynchronously, it will be faster to do it inline, so test to see if +* our CPU can do hardware crc32c and if it can just do the csum in our +* threads context. +*/ +#ifdef CONFIG_X86 + if (cpu_has_xmm4_2) { + printk(KERN_ERR doing it the fast way\n); + ret = btree_csum_one_bio(bio); + if (ret) + return ret; + return btrfs_map_bio(BTRFS_I(inode)-root, rw, bio, mirror_num, 0); + } +#endif + /* * kthread helpers are used to submit writes so that checksumming * can happen in parallel across all CPUs */ -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: do not async metadata csums if we have hardware crc32c
On 09/24/12 20:11, Josef Bacik wrote: The reason we offload csumming is because it is CPU intensive, except it is not on modern intel CPUs. So check to see if we support hardware crc32c, and if we do just do the csumming in our current threads context. Otherwise we can farm it off. Thanks, Signed-off-by: Josef Bacik jba...@fusionio.com --- fs/btrfs/disk-io.c | 17 + 1 files changed, 17 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index dcaf556..830b9af 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -31,6 +31,7 @@ #include linux/migrate.h #include linux/ratelimit.h #include asm/unaligned.h +#include asm/cpufeature.h #include compat.h #include ctree.h #include disk-io.h @@ -880,6 +881,22 @@ static int btree_submit_bio_hook(struct inode *inode, int rw, struct bio *bio, } /* + * Pretty sure I'm going to hell for this. If our CPU can do crc32cs in + * the hardware then there is no reason to do the csum stuff + * asynchronously, it will be faster to do it inline, so test to see if + * our CPU can do hardware crc32c and if it can just do the csum in our + * threads context. + */ +#ifdef CONFIG_X86 + if (cpu_has_xmm4_2) { + printk(KERN_ERR doing it the fast way\n); You'll probably go to hell for the printk... + ret = btree_csum_one_bio(bio); + if (ret) + return ret; + return btrfs_map_bio(BTRFS_I(inode)-root, rw, bio, mirror_num, 0); + } +#endif + /* * kthread helpers are used to submit writes so that checksumming * can happen in parallel across all CPUs */ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: do not async metadata csums if we have hardware crc32c
On Mon, Sep 24, 2012 at 12:19:20PM -0600, Arne Jansen wrote: On 09/24/12 20:11, Josef Bacik wrote: The reason we offload csumming is because it is CPU intensive, except it is not on modern intel CPUs. So check to see if we support hardware crc32c, and if we do just do the csumming in our current threads context. Otherwise we can farm it off. Thanks, Signed-off-by: Josef Bacik jba...@fusionio.com --- fs/btrfs/disk-io.c | 17 + 1 files changed, 17 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index dcaf556..830b9af 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -31,6 +31,7 @@ #include linux/migrate.h #include linux/ratelimit.h #include asm/unaligned.h +#include asm/cpufeature.h #include compat.h #include ctree.h #include disk-io.h @@ -880,6 +881,22 @@ static int btree_submit_bio_hook(struct inode *inode, int rw, struct bio *bio, } /* +* Pretty sure I'm going to hell for this. If our CPU can do crc32cs in +* the hardware then there is no reason to do the csum stuff +* asynchronously, it will be faster to do it inline, so test to see if +* our CPU can do hardware crc32c and if it can just do the csum in our +* threads context. +*/ +#ifdef CONFIG_X86 + if (cpu_has_xmm4_2) { + printk(KERN_ERR doing it the fast way\n); You'll probably go to hell for the printk... Hahah oops, at least I remembered to take out the other printk, it had much more colorful language ;). Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 1/2] Btrfs: cleanup duplicated division functions
On Mon, Sep 24, 2012 at 06:47:42PM +0200, David Sterba wrote: On Mon, Sep 24, 2012 at 10:05:02AM +0800, Miao Xie wrote: Generally, we should check the value when it is input. If not, we might run our program with the wrong value, and it is possible to cause unknown problems. So I think the above chunk is necessary. The difference is that giving an invalid value will exit early (your version), while Ilya's version will clamp the 'usage' to the allowed range during processing. From usability POV, I'd prefer to let the command fail early with a verbose error eg. that the range is invalid. I think that's the job for progs, and that's where this and a few other checks are currently implemented. There is no possibility for unknown problems, that's exactly how it's been implemented before the cleanup. The purpose of balance filters (and the usage filter in particular) is to lower the number of chunks we have to relocate. If someone decides to use raw ioctls, and supplies us with the invalid value, we just cut it down to a 100 and proceed. usage=100 is the equivalent of not using the usage filter at all, so the raw ioctl user will get what he deserves. This is very similar to what we currently do with other filters. For example, soft can only be used with convert, and this is checked by progs. But, if somebody were to set a soft flag without setting convert we would simply ignore that soft. Same goes for drange and devid, invalid ranges, invalid devids, etc. Pulling all these checks into the kernel is questionable at best, and, if enough people insist on it, should be discussed separately. Thanks, Ilya -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: do not async metadata csums if we have hardware crc32c
On Mon, Sep 24, 2012 at 12:19:20PM -0600, Arne Jansen wrote: On 09/24/12 20:11, Josef Bacik wrote: The reason we offload csumming is because it is CPU intensive, except it is not on modern intel CPUs. So check to see if we support hardware crc32c, and if we do just do the csumming in our current threads context. Otherwise we can farm it off. Thanks, Signed-off-by: Josef Bacik jba...@fusionio.com --- fs/btrfs/disk-io.c | 17 + 1 files changed, 17 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index dcaf556..830b9af 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -31,6 +31,7 @@ #include linux/migrate.h #include linux/ratelimit.h #include asm/unaligned.h +#include asm/cpufeature.h #include compat.h #include ctree.h #include disk-io.h @@ -880,6 +881,22 @@ static int btree_submit_bio_hook(struct inode *inode, int rw, struct bio *bio, } /* +* Pretty sure I'm going to hell for this. If our CPU can do crc32cs in +* the hardware then there is no reason to do the csum stuff +* asynchronously, it will be faster to do it inline, so test to see if +* our CPU can do hardware crc32c and if it can just do the csum in our +* threads context. +*/ +#ifdef CONFIG_X86 + if (cpu_has_xmm4_2) { + printk(KERN_ERR doing it the fast way\n); You'll probably go to hell for the printk... ;) Testing with dd on my recent intel box, I can hardware crc32c at 1.3GB/s. Anything beyond that and you really want more cpus jumping into the mix. I wanted to use this test for data crcs too, but I suppose the helpers only really hurt for the synchronous IO. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Resize, off by 1024 bytes
Hello, I wanted to resize my filesystem to 512MiB less. I had created the btrfs after I had set the partitions some days ago, and it was like this: # cat /sys/class/block/sda1/size 234436482 Getting in bytes: 234436482*512 = 120031478784 I run: # btrfs fi resize -512m / And I see in dmesg: btrfs: new size for /dev/sda1 is 119494606848 But that's not the previous size - 512MiB. It is: 120031478784 - 119494606848 = 536871936 = 512*1024*1024 + 1024 So, there is an 'off by +1024'. I shrinked the partition to +1024 bytes bigger than would be by subtracting 512MiB, just in case. What is that off by 1024? This is 3.5.4. Regards, Lluís. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: try to avoid doing a search in btrfs_next_leaf
Things like btrfs_drop_extents call btrfs_next_leaf to see if there is anything else they need on the next leaf. This will result in a re-search, but if we are already at the last leaf in the tree or if the first item in the next leaf doesn't match the objectid of the one we want we can just return without doing the search at all. This helps in things like fsync() where our tree is pretty shallow and we're likely to be on the last leaf often. Thanks, Signed-off-by: Josef Bacik jba...@fusionio.com --- fs/btrfs/ctree.c | 27 +++ fs/btrfs/ctree.h |1 + 2 files changed, 28 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 6d183f6..64ea61c 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -2441,6 +2441,7 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root lowest_level = p-lowest_level; WARN_ON(lowest_level ins_len 0); WARN_ON(p-nodes[0] != NULL); + p-shecantgoanyfarthercapt = 1; if (ins_len 0) { lowest_unlock = 2; @@ -2568,6 +2569,13 @@ cow_done: if (level != 0) { int dec = 0; + + /* +* Slot is not the last in the node, we can go farther +* capt. +*/ + if (slot btrfs_header_nritems(b)) + p-shecantgoanyfarthercapt = 0; if (ret slot 0) { dec = 1; slot -= 1; @@ -5612,8 +5620,27 @@ int btrfs_next_old_leaf(struct btrfs_root *root, struct btrfs_path *path, nritems = btrfs_header_nritems(path-nodes[0]); if (nritems == 0) return 1; + if (path-shecantgoanyfarthercapt) + return 1; + if (!path-nodes[1]) + return 1; btrfs_item_key_to_cpu(path-nodes[0], key, nritems - 1); + + /* +* If we have the level above us locked already just check and see if +* the key in the next leaf even has the same objectid, and if not +* return 1 and avoid the search. +*/ + if (path-locks[1] + path-slots[1] + 1 btrfs_header_nritems(path-nodes[1])) { + struct btrfs_key tmp; + + btrfs_node_key_to_cpu(path-nodes[1], tmp, + path-slots[1] + 1); + if (key.objectid != tmp.objectid) + return 1; + } again: level = 1; next = NULL; diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 6f2e7e6..2e5c6c5 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -571,6 +571,7 @@ struct btrfs_path { unsigned int skip_locking:1; unsigned int leave_spinning:1; unsigned int search_commit_root:1; + unsigned int shecantgoanyfarthercapt:1; }; /* -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: do not async metadata csums if we have hardware crc32c
On Mon, Sep 24, 2012 at 02:11:04PM -0400, Josef Bacik wrote: +#ifdef CONFIG_X86 + if (cpu_has_xmm4_2) { + printk(KERN_ERR doing it the fast way\n); + ret = btree_csum_one_bio(bio); + if (ret) + return ret; + return btrfs_map_bio(BTRFS_I(inode)-root, rw, bio, mirror_num, 0); + } +#endif Could you please put the check into a separate helper and avoid the #ifdef? This is a second candidate for a standalone utils.c where non-fs support code could reside. Or you can call it hellpers.c . david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] Btrfs: tivial cleanup: add space between = and the rest code
On 2012年09月24日 21:15, David Sterba wrote: On Mon, Sep 24, 2012 at 08:47:33AM +0800, Wang Sheng-Hui wrote: trivial code cleanup. -ret =btrfs_drop_snapshot(root, NULL, 1, 0); +ret = btrfs_drop_snapshot(root, NULL, 1, 0); Sorry but this is too trivial. Unless it really bugs you when you're going through code, I don't think that cleanups at this level are necessary. Reading through commit history of some code via 'git blame' and seeing such cleanups is not welcome. Got it. Thanks, I have a patchet in testing that updates a few things around snapshot cleaning and this line will get fixed, so it'll not stay forever. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs: check range early in map_private_extent_buffer
On 2012年09月25日 00:17, David Sterba wrote: On Mon, Sep 24, 2012 at 12:38:07PM +0800, Wang Sheng-Hui wrote: Check range early to avoid further check/compute in case of range error. Signed-off-by: Wang Sheng-Hui shh...@gmail.com --- fs/btrfs/extent_io.c | 16 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 4c87847..9250cf5 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4643,6 +4643,14 @@ int map_private_extent_buffer(struct extent_buffer *eb, unsigned long start, unsigned long end_i = (start_offset + start + min_len - 1) PAGE_CACHE_SHIFT; +if (start + min_len eb-len) { +printk(KERN_ERR btrfs bad mapping eb start %llu len %lu, + wanted %lu %lu\n, (unsigned long long)eb-start, + eb-len, start, min_len); +WARN_ON(1); +return -EINVAL; +} + if (i != end_i) return -EINVAL; 4665 unsigned long i = (start_offset + start) PAGE_CACHE_SHIFT; 4666 unsigned long end_i = (start_offset + start + min_len - 1) 4667 PAGE_CACHE_SHIFT; so the check above effectively verifies that min_len - 1 PAGE_CACHE_SIZE AND is within the same page The other check if (start + min_len eb-len) { looks if the requested data do not lie out of the bounds of the extent buffer, where min_len is filled with sizeof(something). So, both the checks look for corrupted metadata, I don't see the need to swap them. Reread the code and it really does the check. Got it. Thanks for your explanation. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
fs/btrfs/disk-io.c:34:28: fatal error: asm/cpufeature.h: No such file or directory
Hi Josef, FYI, kernel build failed on Commit: d5b04fb3bbb6fce0a834e43f51b6a5251867fcb7 Btrfs: do not async metadata csums if we have hardware crc32c Date: Mon Sep 24 14:40:24 2012 -0400 config: m68k-allmodconfig All related error/warning messages: fs/btrfs/disk-io.c:34:28: fatal error: asm/cpufeature.h: No such file or directory compilation terminated. vim +34 fs/btrfs/disk-io.c 28 #include linux/freezer.h 29 #include linux/crc32c.h 30 #include linux/slab.h 31 #include linux/migrate.h 32 #include linux/ratelimit.h 33 #include asm/unaligned.h 34 #include asm/cpufeature.h 35 #include compat.h 36 #include ctree.h 37 #include disk-io.h --- 0-DAY kernel build testing backend Open Source Technology Centre Fengguang Wu, Yuanhan Liu Intel Corporation -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/2] btrfs-progs: Fix up memory leakage
From: Zhi Yong Wu wu...@linux.vnet.ibm.com Some code pathes forget to free memory on exit. Changelog from v1: Fix the variable is used uncorrectly. [Ram Pai] Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- cmds-filesystem.c |6 -- 1 files changed, 4 insertions(+), 2 deletions(-) diff --git a/cmds-filesystem.c b/cmds-filesystem.c index e62c4fd..9c43d35 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -47,7 +47,7 @@ static const char * const cmd_df_usage[] = { static int cmd_df(int argc, char **argv) { - struct btrfs_ioctl_space_args *sargs; + struct btrfs_ioctl_space_args *sargs, *sargs_orig; u64 count = 0, i; int ret; int fd; @@ -65,7 +65,7 @@ static int cmd_df(int argc, char **argv) return 12; } - sargs = malloc(sizeof(struct btrfs_ioctl_space_args)); + sargs_orig = sargs = malloc(sizeof(struct btrfs_ioctl_space_args)); if (!sargs) return -ENOMEM; @@ -83,6 +83,7 @@ static int cmd_df(int argc, char **argv) } if (!sargs-total_spaces) { close(fd); + free(sargs); return 0; } @@ -92,6 +93,7 @@ static int cmd_df(int argc, char **argv) (count * sizeof(struct btrfs_ioctl_space_info))); if (!sargs) { close(fd); + free(sargs_orig); return -ENOMEM; } -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[resend][PATCH v2 0/2] btrfs-progs: some bugfixes
From: Zhi Yong Wu wu...@linux.vnet.ibm.com Some misc bugs are found when i work on other tasks. Now send out them for interview, thanks. Zhi Yong Wu (2): btrfs-progs: Close file descriptor on exit btrfs-progs: Fix up memory leakage cmds-filesystem.c | 16 1 files changed, 12 insertions(+), 4 deletions(-) -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/2] btrfs-progs: Close file descriptor on exit
From: Zhi Yong Wu wu...@linux.vnet.ibm.com Need to close fd on exit. Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- cmds-filesystem.c | 10 -- 1 files changed, 8 insertions(+), 2 deletions(-) diff --git a/cmds-filesystem.c b/cmds-filesystem.c index b1457de..e62c4fd 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -77,18 +77,23 @@ static int cmd_df(int argc, char **argv) if (ret) { fprintf(stderr, ERROR: couldn't get space info on '%s' - %s\n, path, strerror(e)); + close(fd); free(sargs); return ret; } - if (!sargs-total_spaces) + if (!sargs-total_spaces) { + close(fd); return 0; + } count = sargs-total_spaces; sargs = realloc(sargs, sizeof(struct btrfs_ioctl_space_args) + (count * sizeof(struct btrfs_ioctl_space_info))); - if (!sargs) + if (!sargs) { + close(fd); return -ENOMEM; + } sargs-space_slots = count; sargs-total_spaces = 0; @@ -148,6 +153,7 @@ static int cmd_df(int argc, char **argv) printf(%s: total=%s, used=%s\n, description, total_bytes, used_bytes); } + close(fd); free(sargs); return 0; -- 1.7.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix return value check in btrfs_ioctl_send()
On 09/24/2012 10:03 PM, David Sterba wrote: On Mon, Sep 24, 2012 at 09:40:00PM +0800, Wei Yongjun wrote: dpatch engine is used to auto generated this patch. (https://github.com/weiyj/dpatch) I've played with it, looks nice, although it's full of hardcoded paths. Yes, it is current hardcoded path, because I used the linux.git and linux-next.git, and dpatch will update the git tree every day. After login, you can change the git url in the web. Sure, I was able to point the git source to my local linux.git. I'd like to run it directly from the git repo. Can you please fix that? No patch from me, because I hadcoded my paths :) Do you mean does not need auto update git repo, only scan the change under your git repo? I'm talking about hardcoded paths for the scripts in bin/ like dpatch/views/engine.py: args = '/usr/dpatch/bin/importcocci.sh %s' % fname or dpatch/models.py: return '/var/lib/dpatch/repo/' + dname Not to say that '/usr/' is not the right place for adding new per-package subdirs, refer to http://www.pathname.com/fhs/pub/fhs-2.3.html#THEUSRHIERARCHY I have fix the hardcode path issue, and allow dpatch to be run without installation. . (Also the database has hardcoded path into /var/lib/dpatch) It would be great if we can run a set of .cocci scripts that would verify code invariants (limited to cocci capabilities, but eg. the IS_ERR is a good example) and add new ones eventually over time. By default, It does not include cocci scripts, only include dup include, useless 'linux/version.h' check engine, and is disabled. No problem that the .cocci scripts are not there, I'm fine with adding them manually now, just to test the tool, but all the hardcoded paths didn't let me do that :) I have put a sample cocci script to pattern/cocci dir, and we can import other cocci file using command: ./bin/importcocci.sh /path/to/file.cocci and then enable it vi the web ui. the import file have special format: /// title for patch /// /// patch description /// cocci script content ... and then your can manual scan using the following command: $ ./bin/dailypatch.sh patch scan + build test or $./bin/dailyupdate.sh patch scan or $ ./bin/dailybuild.shbuild test The database is adminstrator is admin, passwd: 11 Regards, Yongjun Wei -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html