Re: [PATCH v2] common: get fs type again using device canonical name in _fs_type
On Fri, Aug 01, 2014 at 01:02:58PM +0800, Eryu Guan wrote: On Fri, Aug 01, 2014 at 02:49:10PM +1000, Dave Chinner wrote: On Fri, Aug 01, 2014 at 12:02:41PM +0800, Eryu Guan wrote: On Fri, Aug 01, 2014 at 10:21:59AM +1000, Dave Chinner wrote: On Thu, Jul 31, 2014 at 06:52:37PM +0800, Eryu Guan wrote: When testing with lvm, a previous btrfsck run could change df output from something like /dev/mapper/rhel_hp--dl388eg8--01-testlv1 btrfs 15728640 900 13602172 1% /mnt/btrfs to /dev/dm-3 btrfs 15728640 900 13602172 1% /mnt/btrfs I don't follow you. Why would running btrfsck change the name of the device? If the filesystem is umounted and mounted again, then the device could change, but btrfsck should not be not doing the unmount/mount, and so unless the TEST_DEV/SCRATCH_DEV is changing the output of df should be identical... So before we change the _fs_type() code, can you explain exactly how, when and why the device name is changing to me? Assume that we have two btrfs filesystems, kernel is 3.16.0-rc4+ [root@hp-dl388eg8-01 btrfs-progs]# btrfs fi show Label: none uuid: 1aba7da5-ce2b-4af0-a716-db732abc60b2 Total devices 1 FS bytes used 384.00KiB devid1 size 15.00GiB used 2.04GiB path /dev/mapper/rhel_hp--dl388eg8--01-testlv1 Label: none uuid: 26ff4f12-f6d9-4cbc-aae2-57febeefde37 Total devices 2 FS bytes used 112.00KiB devid1 size 15.00GiB used 2.03GiB path /dev/mapper/rhel_hp--dl388eg8--01-testlv2 devid2 size 15.00GiB used 2.01GiB path /dev/mapper/rhel_hp--dl388eg8--01-testlv3 Btrfs v3.14.2 And testlv1 was mounted at /mnt/btrfs [root@hp-dl388eg8-01 btrfs-progs]# df -TP /mnt/btrfs FilesystemType 1024-blocks Used Available Capacity Mounted on /dev/mapper/rhel_hp--dl388eg8--01-testlv1 btrfs15728640 512 13602560 1% /mnt/btrfs Now run btrfsck on testlv2, btrfsck will scan all btrfs devices and somehow change the device name. [root@hp-dl388eg8-01 btrfs-progs]# btrfsck /dev/mapper/rhel_hp--dl388eg8--01-testlv2 /dev/null 21 # device name changed in df output and btrfs fi show output [root@hp-dl388eg8-01 btrfs-progs]# df -TP /mnt/btrfs Filesystem Type 1024-blocks Used Available Capacity Mounted on /dev/dm-3 btrfs15728640 512 13602560 1% /mnt/btrfs [root@hp-dl388eg8-01 btrfs-progs]# btrfs fi show Label: none uuid: 1aba7da5-ce2b-4af0-a716-db732abc60b2 Total devices 1 FS bytes used 384.00KiB devid1 size 15.00GiB used 2.04GiB path /dev/dm-3 Label: none uuid: 26ff4f12-f6d9-4cbc-aae2-57febeefde37 Total devices 2 FS bytes used 112.00KiB devid1 size 15.00GiB used 2.03GiB path /dev/mapper/rhel_hp--dl388eg8--01-testlv2 devid2 size 15.00GiB used 2.01GiB path /dev/mapper/rhel_hp--dl388eg8--01-testlv3 Btrfs v3.14.2 This only happens when btrfsck a btrfs with multiple devices, so this only affects xfstests run on btrfs with SCRATCH_DEV_POOL set to lvm lvs. Maybe this is a bug of btrfs-progs and we should fix it there? Yes, that smells of a btrfs-progs bug. If your /etc/mtab a link to /proc/mounts? If not, does the contents change when you run btrfsck, and does the problem go away when you replace /etc/mtab with a link to /proc/mounts? /etc/mtab is a symlink to /proc/self/mounts, so does /proc/mounts [root@hp-dl388eg8-01 btrfs-progs]# ls -l /etc/mtab lrwxrwxrwx. 1 root root 17 Sep 22 2013 /etc/mtab - /proc/self/mounts [root@hp-dl388eg8-01 btrfs-progs]# ls -l /proc/mounts lrwxrwxrwx. 1 root root 11 Aug 1 00:59 /proc/mounts - self/mounts And the device name also changed in /proc/mounts [root@hp-dl388eg8-01 btrfs-progs]# grep btrfs /proc/mounts /dev/dm-3 /mnt/btrfs btrfs rw,seclabel,relatime,space_cache 0 0 Well, that's exactly the last thing *I* expected. The kernel just doesn't change device names on mounted filesystems like that. Oh, the device name comes from btrfs_show_devname(). So this definitely seems to me to be a btrfs bug - btrfsck is causing the btrfs kernel code to change the name of devices associated with unrelated, mounted filesystems to that which it is operating on. That's just wrong. IOWs, btrfs needs fixing, not xfstests, because that can bite during any test that runs btrfsck in the middle of it Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: integration tree updated
Hi Chris, Looks like you missed Miao update (on 17 Jul) to back out the bad patch (below), and your integration branch (published on 25 Jul) still contains the same. [PATCH 2/2] Btrfs: fix wrong total device counter after removing a seed device You should remove it. Simple test cases on your integration branch (like btrfs dev del OR btrfs/003) is failing due to this and I (+ others would) spend some time digging it out Miao, If it helps - cur_devices in the loop is null. BUG: unable to handle kernel NULL pointer dereference at 0050 IP: [a00a8fc0] btrfs_rm_device+0x4c0/0x860 [btrfs] cur_devices = root-fs_info-fs_devices; do { cur_devices-total_devices--; cur_devices = cur_devices-seed; } while (device-fs_devices != cur_devices); PS: I didn't find the follow-up patch in the mailing list, did I missing anything ? Thanks, Anand On 07/25/2014 09:40 AM, Chris Mason wrote: Hi everyone, I've pushed out my current integration branch. It does have a few of Miao Xie's patches missing because there were some rejects. I think this was just because some things got pulled in out of order, and I'll get it fixed up. Also missing is Mark's quota snapshot deletion fixes. They were crashing during btrfs/011 with CONFIG_DEBUG_PAGE_ALLOC on. We'll get that nailed down. integration is subject to rebasing, so please treat it more like a patch queue. It is very lightly tested, the goal is just to show which patches are already applied and which ones are still pending. Thanks! -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add support to check for FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_ZERO_RANGE crap modes
On Thu, Jul 31, 2014 at 09:53:15PM -0400, Nick Krause wrote: On Thu, Jul 31, 2014 at 3:09 PM, Hugo Mills h...@carfax.org.uk wrote: On Thu, Jul 31, 2014 at 01:53:33PM -0400, Nicholas Krause wrote: This adds checks for the stated modes as if they are crap we will return error not supported. You've just enabled two options, but you haven't actually implemented the code behind it. I would tell you *NOT* to do anything else on this work until you can answer the question: What happens if you apply this patch, create a large file called foo.txt, and then a userspace program executes the following code? int fd = open(foo.txt, O_RDWR); fallocate(fd, FALLOCATE_FL_COLLAPSE_RANGE, 50, 50); Try it on a btrfs filesystem, both with and without your patch. Also try it on an ext4 filesystem. Once you've done all of that, reply to this mail and tell me what the problem is with this patch. You need to make two answers: what are the technical problems with the patch? What errors have you made in the development process? *Only* if you can answer those questions sensibly, should you write any more patches, of any kind. [snip] Calls are there in btrfs , therefore will either kernel panic or cause an oops. That's a guess. I can tell it's a guess, because I've actually read (some of) the rest of that function, so I've got a good idea of what I think it will do -- and panic or oops is not the answer. Try again. You can answer this question two ways: by test (see my suggestion above), or by reading and understanding the code. Either will work in this case, but doing neither is not an option for someone who wants to change the function. Need to test this patch as this is very easy to catch bug. So why didn't you? It's your patch, testing it is your job -- *before* it gets out into the outside world. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- But people have always eaten people, / what else is there to --- eat? / If the Juju had meant us not to eat people / he wouldn't have made us of meat. signature.asc Description: Digital signature
[PATCH v2] xfstests: add regression test for btrfs send with orphans
Regression test for a btrfs issue where we create a RO snapshot to use for a send operation, which fails with a -ESTALE error, due to the presence of orphan inodes accessible through the snapshot's commit root but no longer present through the main root. This issue is fixed by the following linux kernel btrfs patch: Btrfs: update commit root on snapshot creation after orphan cleanup Signed-off-by: Filipe Manana fdman...@suse.com --- V2: Replaced a redirect with a redirect to $seqres.full, and added a sleep. tests/btrfs/057 | 84 + tests/btrfs/057.out | 1 + tests/btrfs/group | 1 + 3 files changed, 86 insertions(+) create mode 100755 tests/btrfs/057 create mode 100644 tests/btrfs/057.out diff --git a/tests/btrfs/057 b/tests/btrfs/057 new file mode 100755 index 000..1e313e9 --- /dev/null +++ b/tests/btrfs/057 @@ -0,0 +1,84 @@ +#! /bin/bash +# FS QA Test No. btrfs/057 +# +# Regression test for a btrfs issue where we create a RO snapshot to use for +# a send operation which fails with a -ESTALE error, due to the presence of +# orphan inodes accessible through the snapshot's commit root but no longer +# present through the main root. +# +# This issue is fixed by the following linux kernel btrfs patch: +# +#Btrfs: update commit root on snapshot creation after orphan cleanup +# +#--- +# Copyright (C) 2014 SUSE Linux Products GmbH. All Rights Reserved. +# Author: Filipe Manana fdman...@suse.com +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +tmp=/tmp/$$ +status=1 # failure is the default! +trap _cleanup; exit \$status 0 1 2 3 15 + +_cleanup() +{ + if [ ! -z $XFS_IO_PID ]; then + kill $XFS_IO_PID /dev/null 21 + fi + rm -fr $tmp +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_supported_fs btrfs +_supported_os Linux +_require_scratch +# Requiring flink command tests for the presence of the -T option used +# to pass O_TMPFILE to open(2). +_require_xfs_io_command flink +_need_to_be_root + +rm -f $seqres.full + +_scratch_mkfs /dev/null 21 +_scratch_mount + +# Create a tmpfile file, write some data to it and leave it open, so that our +# main subvolume has an orphan inode item. +$XFS_IO_PROG -T $SCRATCH_MNT $seqres.full 21 ( + echo pwrite 0 65536 + read +) +XFS_IO_PID=$! + +# Give it some time to the xfs_io process to create the tmpfile. +sleep 3 + +# With the tmpfile open, create a RO snapshot and use it for a send operation. +# The send operation used to fail with -ESTALE due to the presence of the +# orphan inode. +_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap +_run_btrfs_util_prog send $SCRATCH_MNT/mysnap -f /dev/null + +status=0 +exit diff --git a/tests/btrfs/057.out b/tests/btrfs/057.out new file mode 100644 index 000..b26eefe --- /dev/null +++ b/tests/btrfs/057.out @@ -0,0 +1 @@ +QA output created by 057 diff --git a/tests/btrfs/group b/tests/btrfs/group index 2da7127..ebc38c5 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -59,3 +59,4 @@ 054 auto quick 055 auto quick 056 auto quick +057 auto quick -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
open_ctree failed on 3.16.0
Hello. A circuit breaker failed a few times and now I can't mount my btrfs volume - it fails with open_ctree failed: [ 337.004372] [817865a5] dump_stack+0x46/0x58 [ 337.004375] [810720ac] warn_slowpath_common+0x8c/0xc0 [ 337.004378] [810720fa] warn_slowpath_null+0x1a/0x20 [ 337.004387] [c04583b5] btrfs_put_block_group+0x75/0x80 [btrfs] [ 337.004398] [c046227d] btrfs_free_block_groups+0xbd/0x2e0 [btrfs] [ 337.004410] [c0470abd] open_ctree+0x188d/0x1f70 [btrfs] [ 337.004418] [c0442af1] btrfs_fill_super.isra.84+0x81/0x130 [btrfs] [ 337.004422] [8136d7f1] ? disk_name+0x61/0xc0 [ 337.004425] [81392ab7] ? strlcpy+0x47/0x60 [ 337.004434] [c04474be] btrfs_mount+0x3ae/0x3d0 [btrfs] [ 337.004439] [811e52e3] mount_fs+0x43/0x1b0 [ 337.004443] [812004c6] vfs_kern_mount+0x76/0x140 [ 337.004446] [81201c44] do_new_mount+0xa4/0x1f0 [ 337.004448] [81202fe6] do_mount+0x1e6/0x230 [ 337.004451] [812033b0] SyS_mount+0x90/0xe0 [ 337.004454] [8179402d] system_call_fastpath+0x1a/0x1f mount -o recovery doesn't succeed, nor does mount -o recovery,ro. I have tried the above with kernel 3.13.0 first and 3.16.0 later and the behaviour seems identical. This may or may not be relevant, but after I initialised the filesystem by copying some files to it (with kernel 3.13.0), one of the files failed a checksum error. I hadn't yet compared the file that was written with the original to determine whether the error was with the checksum or otherwise. What are the next steps I should try? Should I try btrfs-zero-log? Or should I try btrfsck? Or something else? Regards, Frankie Fisher # uname -a Linux mythtv 3.16.0-999-generic #201408010205 SMP Fri Aug 1 06:06:01 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux # btrfs --version Btrfs v3.12 # btrfs fi show Label: none uuid: beaff1b6-fb0e-4b80-8b59-74968fd51066 Total devices 1 FS bytes used 342.01GiB devid1 size 1.36TiB used 349.04GiB path /dev/sdc2 Label: none uuid: 34d2986d-6954-4c5c-922c-799cc66cd28e Total devices 1 FS bytes used 112.00KiB devid1 size 231.90GiB used 2.04GiB path /dev/sda2 Btrfs v3.12 [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Initializing cgroup subsys cpuacct [0.00] Linux version 3.16.0-999-generic (apw@gomeisa) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #201408010205 SMP Fri Aug 1 06:06:01 UTC 2014 [0.00] Command line: BOOT_IMAGE=/vmlinuz-3.16.0-999-generic root=UUID=86c5a6cb-9130-4193-8536-76051f2a1e7e ro quiet splash [0.00] KERNEL supported cpus: [0.00] Intel GenuineIntel [0.00] AMD AuthenticAMD [0.00] Centaur CentaurHauls [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009f7ff] usable [0.00] BIOS-e820: [mem 0x0009f800-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e4000-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xcff7] usable [0.00] BIOS-e820: [mem 0xcff8-0xcff8dfff] ACPI data [0.00] BIOS-e820: [mem 0xcff8e000-0xcffc] ACPI NVS [0.00] BIOS-e820: [mem 0xcffd-0xcfff] reserved [0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved [0.00] BIOS-e820: [mem 0xfff0-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x00022fff] usable [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.5 present. [0.00] DMI: System manufacturer P5QL PRO/P5QL PRO, BIOS 080310/08/2008 [0.00] e820: update [mem 0x-0x0fff] usable == reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] AGP: No AGP bridge found [0.00] e820: last_pfn = 0x23 max_arch_pfn = 0x4 [0.00] MTRR default type: uncachable [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-B uncachable [0.00] C-D write-protect [0.00] E-E write-through [0.00] F-F write-protect [0.00] MTRR variable ranges enabled: [0.00] 0 base 0 mask E write-back [0.00] 1 base 2 mask FE000 write-back [0.00] 2 base 22000 mask FF000 write-back [0.00] 3 base 0D000 mask FF000 uncachable [0.00] 4 base 0E000 mask FE000 uncachable [0.00] 5 disabled [0.00] 6 disabled [0.00] 7 disabled [0.00] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106 [0.00] original variable MTRRs [0.00] reg 0, base: 0GB, range: 8GB, type WB [
Re: Btrfs offline deduplication
On 07/31/2014 07:54 PM, Timofey Titovets wrote: Good time of day. I have several questions about data deduplication on btrfs. Sorry if i ask stupid questions or waste you time %) What about implementation of offline data deduplication? I don't see any activity on this place, may be i need to ask a particular person? Where the problem? May be a can i try to help (testing as example)? I could be wrong, but as i understand btrfs store crc32 checksum one per file, if this is true, may be make a sense to create small worker for dedup files? Like worker for autodefrag? With simple logic like: if sum1 == sum2 file_size1 == file_size2; then if (bit_to_bit_identical(file1,2)); then merge(file1, file2); This can be first attempt to implement per file offline dedup What you think about it? could i be wrong? or this is a horrible crutch? (as i understand it not change format of fs) (bedup and other tools, its cool, but have several problem with these tools and i think, what kernel implementation can work better). I think there may be some misunderstandings here about some of the internals of BTRFS. First of all, checksums are stored per block, not per file, and secondly, deduplication can be done on a much finer scale than individual files (you can deduplicate individual extents). I do think however that having the option of a background thread doing deduplication asynchronously is a good idea, but then you would have to have some way to trigger it on individual files/trees, and triggering on writes like the autodefrag thread does doesn't make much sense. Having some userspace program to tell it to run on a given set of files would probably be the best approach for a trigger. I don't remember if this kind of thing was also included in the online deduplication patches that got posted a while back or not. smime.p7s Description: S/MIME Cryptographic Signature
Re: open_ctree failed on 3.16.0
Frankie Fisher posted on Fri, 01 Aug 2014 10:58:39 +0100 as excerpted: A circuit breaker failed a few times and now I can't mount my btrfs volume - it fails with open_ctree failed: [snip stacktrace, which as a btrfs user not dev doesn't give me much anyway] mount -o recovery doesn't succeed, nor does mount -o recovery,ro. I have tried the above with kernel 3.13.0 first and 3.16.0 later and the behaviour seems identical. This may or may not be relevant, but after I initialised the filesystem by copying some files to it (with kernel 3.13.0), one of the files failed a checksum error. I hadn't yet compared the file that was written with the original to determine whether the error was with the checksum or otherwise. What are the next steps I should try? Should I try btrfs-zero-log? Or should I try btrfsck? Or something else? The standard advice concerning btrfs check (aka btrfsck) is that running it without --repair or similar won't hurt as in that case it's read-only, but by the same token, it won't help, except possibly to give you an idea of what's wrong. And don't run it with --repair except either on the direct advice of a someone here after seeing the read-only run output, or if you've otherwise given up and the next step would be a new mkfs as in that case you have nothing to lose, because check doesn't yet understand everything that can go wrong and in some cases may make the problem worse instead of better. The first thing you may want to do is make an image using dd or the like, so you can restore to the current state if nothing works. Of course that'll take quite some space... Another read-only alternative is btrfs restore. This is run on the /unmounted/ filesystem, allowing recovery of files from the filesystem without the possibility of damaging it further. If you don't have a good backup, this is likely to be your best shot at, more or less, making one after-the-fact. It may not recover everything and in particular, from my own experience I know it doesn't recover symlinks or file owner and permissions information, but in the absence of a proper current backup it does give you a reasonably good shot at recovering a good portion of the files. Of course this will require at least enough space on other filesystems to write the recovered files. If btrfs restore with the default options doesn't prove satisfactory, you can use the -l (list roots) -t (use tree location) and --dry-run options along with btrfs-find-root to hopefully find a better previous root, which can then be fet to the -t option to hopefully get a better recovery. Additionally, when I used btrfs restore here, I had to use it with the -i option and run it repeatedly, feeding it the same path each time, as it kept giving up with a looping too much error on some of the larger directories, but would progress further each time as it could skip more files that were already there. There's also btrfs-show-super to examine your superblocks and ensure they're not damaged, as well as to compare current generation/transid vs that of find-root and restore. Show-super can also be used with btrfs rescue as noted below, to recover from a bad superblock, should it be necessary. See the wiki page on restore for more details on it. https://btrfs.wiki.kernel.org/index.php/Restore You can then use btrfs-image to create a metadata image that you can give to the devs if they want to see what they can do with it. That doesn't give them the data but will let them see filenames unless you use the -s option to sanitize them, which I'd recommend if you have sensitive filenames you don't want others to see. With either a good backup or having restored as much as you can using restore, you can move on to potentially destructive attempts at further restoration. Here's a slightly dated (nearing a year old, select-super and chunk-recover are part of the main btrfs command, under rescue, now) but still useful list of what to try and in what order, there. http://permalink.gmane.org/gmane.comp.file-systems.btrfs/27999 -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Machine lockup due to btrfs-transaction on AWS EC2 Ubuntu 14.04
I've reproduced these issues on a single-core machine which doesn't appear to become completely unresponsive after 12 hours of copying (as the other machines are deadlocking after 5-10 minutes, perhaps?), but it does use 100% SYS CPU with no IO traffic for the vast majority of the time. (In fact, constantly?) The filesystem was originally created on an Ubuntu Saucy 13.10 machine (and therefore older kernel + older btrfs-tools) on a PV guest with a non-SSD block device. The newer current machine is a HVM guest and the data is now on an SSD. Below is a link to the dstat output and dmesg which shows this (note the sys column on the left and the IOPs column on the right). This was taken whilst copying files of varying sizes from many directories to a sibling in the same directory. `dstat -tcdnymlr 5`, which shows high sys CPU but no IO traffic: https://gist.github.com/pwaller/cb8d088ebceb2707d24b INFO: task sync:5906 blocked for more than 120 seconds. https://gist.github.com/pwaller/574a369ea4b65fe125b9#file-dmesg-log-L541 INFO: task btrfs-transacti:2531 blocked for more than 120 seconds. https://gist.github.com/pwaller/574a369ea4b65fe125b9#file-dmesg-log-L1015 INFO: task btrfs-flush_del:16764 blocked for more than 120 seconds. https://gist.github.com/pwaller/574a369ea4b65fe125b9#file-dmesg-log-L1041 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add support to check for FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_ZERO_RANGE crap modes
On Thu, Jul 31, 2014 at 08:09:10PM +0100, Hugo Mills wrote: On Thu, Jul 31, 2014 at 01:53:33PM -0400, Nicholas Krause wrote: This adds checks for the stated modes as if they are crap we will return error not supported. You've just enabled two options, but you haven't actually implemented the code behind it. I would tell you *NOT* to do anything else on this work until you can answer the question: What happens if you apply this patch, create a large file called foo.txt, and then a userspace program executes the following code? int fd = open(foo.txt, O_RDWR); fallocate(fd, FALLOCATE_FL_COLLAPSE_RANGE, 50, 50); Try it on a btrfs filesystem, both with and without your patch. Also try it on an ext4 filesystem. Once you've done all of that, reply to this mail and tell me what the problem is with this patch. You need to make two answers: what are the technical problems with the patch? What errors have you made in the development process? There are also the conceptual failures. Before you do anything else, you need to be able to answer the question, what do you think the flags FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_ZERO_RANGE are supposed to do? What are the possible appropriate things for btrfs to do if it sees these flags? (Hint: there is more than one correct answer, and its current choice is one of them. What is the other one?) Nick, the fact that you call these modes crap is a hint that you have a fundamental lack of understanding --- and before you waste more of kernel developers' time, you need to get that understanding first, for any bit of code that you propose to improve. This is why I suggested that you work on userspace testing scripts first. It's pretty clear you are (a) incredibly sloppy, and (b) lacking conceptual understanding of a lot of technical details, and (c) even worse, aren't letting this lack of understanding stop you from posting patches. As a result you are adding negative value to whatever project or subsystem you try to attach yourself to --- you're not helping. - Ted P.S. As a further hint, change the above code to read: int fd = open(foo.txt, O_RDWR); if (fallocate(fd, FALLOCATE_FL_COLLAPSE_RANGE, 4096, 8192) 0) perror(fallocate); And then run filefrag -vs foo.txt before and after running the above code fragment and then try something like this: cp /usr/share/dict/words foo.txt filefrag -vs foo.txt ls -l foo.txt /tmp/fallocate-test-prog filefrag -vs foo.txt ls -l foo.txt diff /usr/share/dict/words foo.txt Try doing this on an ext4 or xfs system and a btrfs file system. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: open_ctree failed on 3.16.0
On 01/08/14 12:29, Duncan wrote: Frankie Fisher posted on Fri, 01 Aug 2014 10:58:39 +0100 as excerpted: A circuit breaker failed a few times and now I can't mount my btrfs volume - it fails with open_ctree failed: What are the next steps I should try? Should I try btrfs-zero-log? Or should I try btrfsck? Or something else? The standard advice concerning btrfs check (aka btrfsck) is that running it without --repair or similar won't hurt as in that case it's read-only, but by the same token, it won't help, except possibly to give you an idea of what's wrong. fwiw, this is the output of btrfsck: http://cwillu.com:8080/86.136.116.221/2 snip helpful advice Thanks for the helpful info, I will work through it. It would be nice to be able to recover the filesystem, but if I can't recover the filesystem it wouldn't be the end of the world as I have a backup from a few days before the incident. Even if I can't recover the filesystem, if reporting this issue helps btrfs be more robust in the future that would be a good outcome. To that end I have made a copy of the filesystem with btrfs-image which I can supply if necessary. Regards, Frankie Fisher smime.p7s Description: S/MIME Cryptographic Signature
Re: Btrfs offline deduplication
On Fri, Aug 01, 2014 at 06:17:44AM -0400, Austin S Hemmelgarn wrote: I do think however that having the option of a background thread doing deduplication asynchronously is a good idea, but then you would have to have some way to trigger it on individual files/trees, and triggering on writes like the autodefrag thread does doesn't make much sense. Having some userspace program to tell it to run on a given set of files would probably be the best approach for a trigger. I don't remember if this kind of thing was also included in the online deduplication patches that got posted a while back or not. IIRC the proposed implementation only merged new writes with existing data. For the out-of-band (off-line) dedup there's bedup (https://github.com/g2p/bedup) or Mark's duperemove tool (https://github.com/markfasheh/duperemove) that work on a set of files. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add support to check for FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_ZERO_RANGE crap modes
On Fri, Aug 1, 2014 at 8:21 AM, Theodore Ts'o ty...@mit.edu wrote: On Thu, Jul 31, 2014 at 08:09:10PM +0100, Hugo Mills wrote: On Thu, Jul 31, 2014 at 01:53:33PM -0400, Nicholas Krause wrote: This adds checks for the stated modes as if they are crap we will return error not supported. You've just enabled two options, but you haven't actually implemented the code behind it. I would tell you *NOT* to do anything else on this work until you can answer the question: What happens if you apply this patch, create a large file called foo.txt, and then a userspace program executes the following code? int fd = open(foo.txt, O_RDWR); fallocate(fd, FALLOCATE_FL_COLLAPSE_RANGE, 50, 50); Try it on a btrfs filesystem, both with and without your patch. Also try it on an ext4 filesystem. Once you've done all of that, reply to this mail and tell me what the problem is with this patch. You need to make two answers: what are the technical problems with the patch? What errors have you made in the development process? There are also the conceptual failures. Before you do anything else, you need to be able to answer the question, what do you think the flags FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_ZERO_RANGE are supposed to do? What are the possible appropriate things for btrfs to do if it sees these flags? (Hint: there is more than one correct answer, and its current choice is one of them. What is the other one?) Nick, the fact that you call these modes crap is a hint that you have a fundamental lack of understanding --- and before you waste more of kernel developers' time, you need to get that understanding first, for any bit of code that you propose to improve. This is why I suggested that you work on userspace testing scripts first. It's pretty clear you are (a) incredibly sloppy, and (b) lacking conceptual understanding of a lot of technical details, and (c) even worse, aren't letting this lack of understanding stop you from posting patches. As a result you are adding negative value to whatever project or subsystem you try to attach yourself to --- you're not helping. - Ted P.S. As a further hint, change the above code to read: int fd = open(foo.txt, O_RDWR); if (fallocate(fd, FALLOCATE_FL_COLLAPSE_RANGE, 4096, 8192) 0) perror(fallocate); And then run filefrag -vs foo.txt before and after running the above code fragment and then try something like this: cp /usr/share/dict/words foo.txt filefrag -vs foo.txt ls -l foo.txt /tmp/fallocate-test-prog filefrag -vs foo.txt ls -l foo.txt diff /usr/share/dict/words foo.txt Try doing this on an ext4 or xfs system and a btrfs file system. I miss send this patch, that's my there are issues. Cheers Nick -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] btrfs-progs: mkfs: remove experimental tag
From: dste...@suse.cz To: linux-btrfs@vger.kernel.org CC: dste...@suse.cz Subject: [PATCH] btrfs-progs: mkfs: remove experimental tag Date: Thu, 31 Jul 2014 14:21:34 +0200 Make it consistent with kernel status and documentation. Signed-off-by: David Sterba dste...@suse.cz --- mkfs.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/mkfs.c b/mkfs.c index 16e92221a547..538b6e6837b2 100644 --- a/mkfs.c +++ b/mkfs.c @@ -1439,8 +1439,8 @@ int main(int ac, char **av) } /* if we are here that means all devs are good to btrfsify */ - printf(\nWARNING! - %s IS EXPERIMENTAL\n, BTRFS_BUILD_VERSION); - printf(WARNING! - see http://btrfs.wiki.kernel.org before using\n\n); + printf(%s\n, BTRFS_BUILD_VERSION); + printf(See http://btrfs.wiki.kernel.org for more\n\n); The sentence/thought isn't complete. I was left thinking more what? perhaps add: information, documentation Thanks. dev_cnt--; @@ -1597,7 +1597,6 @@ raid_groups: label, first_file, nodesize, leafsize, sectorsize, pretty_size(btrfs_super_total_bytes(root-fs_info-super_copy))); - printf(%s\n, BTRFS_BUILD_VERSION); btrfs_commit_transaction(trans, root); if (source_dir_set) { -- 1.9.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: mkfs: remove experimental tag
On Fri, Aug 01, 2014 at 11:38:09AM -0500, Kyle Gates wrote: The sentence/thought isn't complete. I was left thinking more what? perhaps add: information, documentation Changed to 'more information'. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with btrfs_zero_range function
Please forget my other questions , seems the only work to make punch hole work for zero range is to make a function like the one I am pasting below for zero range and change the calls to punch range to zero range as the other parts of the function can be the same from my reading. Regards Nick static int find_first_non_hole(struct inode *inode, u64 *start, u64 *len) { struct extent_map *em; int ret = 0; em = btrfs_get_extent(inode, NULL, 0, *start, *len, 0); if (IS_ERR_OR_NULL(em)) { if (!em) ret = -ENOMEM; else ret = PTR_ERR(em); return ret; } /* Hole or vacuum extent(only exists in no-hole mode) */ if (em-block_start == EXTENT_MAP_HOLE) { ret = 1; *len = em-start + em-len *start + *len ? 0 : *start + *len - em-start - em-len; *start = em-start + em-len; } free_extent_map(em); return ret; } -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with btrfs_zero_range function
On Fri, Aug 1, 2014 at 12:58 PM, Nick Krause xerofo...@gmail.com wrote: Please forget my other questions , seems the only work to make punch hole work for zero range is to make a function like the one I am pasting below for zero range and change the calls to punch range to zero range as the other parts of the function can be the same from my reading. Regards Nick static int find_first_non_hole(struct inode *inode, u64 *start, u64 *len) { struct extent_map *em; int ret = 0; em = btrfs_get_extent(inode, NULL, 0, *start, *len, 0); if (IS_ERR_OR_NULL(em)) { if (!em) ret = -ENOMEM; else ret = PTR_ERR(em); return ret; } /* Hole or vacuum extent(only exists in no-hole mode) */ if (em-block_start == EXTENT_MAP_HOLE) { ret = 1; *len = em-start + em-len *start + *len ? 0 : *start + *len - em-start - em-len; *start = em-start + em-len; } free_extent_map(em); return ret; } Sorry just forget we need one to fill a zero range. I am also now patching fill holes if someone can help me convert this it would be great and I will then change the calls and send out the function as I have titled, btrfs_zero_range with calls to in it the fallocate method for btrfs and also I am need help with the one for finding the first hole just converting again. Regards Nick static int fill_holes(struct btrfs_trans_handle *trans, struct inode *inode, struct btrfs_path *path, u64 offset, u64 end) { struct btrfs_root *root = BTRFS_I(inode)-root; struct extent_buffer *leaf; struct btrfs_file_extent_item *fi; struct extent_map *hole_em; struct extent_map_tree *em_tree = BTRFS_I(inode)-extent_tree; struct btrfs_key key; int ret; if (btrfs_fs_incompat(root-fs_info, NO_HOLES)) goto out; key.objectid = btrfs_ino(inode); key.type = BTRFS_EXTENT_DATA_KEY; key.offset = offset; ret = btrfs_search_slot(trans, root, key, path, 0, 1); if (ret 0) return ret; BUG_ON(!ret); leaf = path-nodes[0]; if (hole_mergeable(inode, leaf, path-slots[0]-1, offset, end)) { u64 num_bytes; path-slots[0]--; fi = btrfs_item_ptr(leaf, path-slots[0], struct btrfs_file_extent_item); num_bytes = btrfs_file_extent_num_bytes(leaf, fi) + end - offset; btrfs_set_file_extent_num_bytes(leaf, fi, num_bytes); btrfs_set_file_extent_ram_bytes(leaf, fi, num_bytes); btrfs_set_file_extent_offset(leaf, fi, 0); btrfs_mark_buffer_dirty(leaf); goto out; } if (hole_mergeable(inode, leaf, path-slots[0]+1, offset, end)) { u64 num_bytes; path-slots[0]++; key.offset = offset; btrfs_set_item_key_safe(root, path, key); fi = btrfs_item_ptr(leaf, path-slots[0], struct btrfs_file_extent_item); num_bytes = btrfs_file_extent_num_bytes(leaf, fi) + end - offset; btrfs_set_file_extent_num_bytes(leaf, fi, num_bytes); btrfs_set_file_extent_ram_bytes(leaf, fi, num_bytes); btrfs_set_file_extent_offset(leaf, fi, 0); btrfs_mark_buffer_dirty(leaf); goto out; } btrfs_release_path(path); ret = btrfs_insert_file_extent(trans, root, btrfs_ino(inode), offset, 0, 0, end - offset, 0, end - offset, 0, 0, 0); if (ret) return ret; out: btrfs_release_path(path); hole_em = alloc_extent_map(); if (!hole_em) { btrfs_drop_extent_cache(inode, offset, end - 1, 0); set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, BTRFS_I(inode)-runtime_flags); } else { hole_em-start = offset; hole_em-len = end - offset; hole_em-ram_bytes = hole_em-len; hole_em-orig_start = offset; hole_em-block_start = EXTENT_MAP_HOLE; hole_em-block_len = 0; hole_em-orig_block_len = 0; hole_em-bdev = root-fs_info-fs_devices-latest_bdev; hole_em-compress_type = BTRFS_COMPRESS_NONE; hole_em-generation = trans-transid; do { btrfs_drop_extent_cache(inode, offset, end - 1, 0); write_lock(em_tree-lock); ret = add_extent_mapping(em_tree, hole_em, 1); write_unlock(em_tree-lock); } while (ret == -EEXIST); free_extent_map(hole_em); if (ret) set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, BTRFS_I(inode)-runtime_flags); } return 0; } -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs offline deduplication
On Fri, Aug 01, 2014 at 10:16:08AM -0400, Austin S Hemmelgarn wrote: On 2014-08-01 09:23, David Sterba wrote: On Fri, Aug 01, 2014 at 06:17:44AM -0400, Austin S Hemmelgarn wrote: I do think however that having the option of a background thread doing deduplication asynchronously is a good idea, but then you would have to have some way to trigger it on individual files/trees, and triggering on writes like the autodefrag thread does doesn't make much sense. Having some userspace program to tell it to run on a given set of files would probably be the best approach for a trigger. I don't remember if this kind of thing was also included in the online deduplication patches that got posted a while back or not. IIRC the proposed implementation only merged new writes with existing data. For the out-of-band (off-line) dedup there's bedup (https://github.com/g2p/bedup) or Mark's duperemove tool (https://github.com/markfasheh/duperemove) that work on a set of files. Something kernel-side to do the work asynchronously would be nice, especially if it could leverage the check-sums that BTRFS already stores for the blocks. Having a userspace interface for offline deduplication similar to that for scrub operations would even better. Why does this have to be kernel side? There's userspace software already to dedupe that can be run on a regular basis. Exporting checksums is a differnet story (you can do that via ioctl) but running the dedupe software itself inside the kernel is exactly what we want to avoid by having the dedupe ioctl in the first place. --Mark -- Mark Fasheh -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs offline deduplication
On 08/01/2014 02:55 PM, Mark Fasheh wrote: On Fri, Aug 01, 2014 at 10:16:08AM -0400, Austin S Hemmelgarn wrote: On 2014-08-01 09:23, David Sterba wrote: On Fri, Aug 01, 2014 at 06:17:44AM -0400, Austin S Hemmelgarn wrote: I do think however that having the option of a background thread doing deduplication asynchronously is a good idea, but then you would have to have some way to trigger it on individual files/trees, and triggering on writes like the autodefrag thread does doesn't make much sense. Having some userspace program to tell it to run on a given set of files would probably be the best approach for a trigger. I don't remember if this kind of thing was also included in the online deduplication patches that got posted a while back or not. IIRC the proposed implementation only merged new writes with existing data. For the out-of-band (off-line) dedup there's bedup (https://github.com/g2p/bedup) or Mark's duperemove tool (https://github.com/markfasheh/duperemove) that work on a set of files. Something kernel-side to do the work asynchronously would be nice, especially if it could leverage the check-sums that BTRFS already stores for the blocks. Having a userspace interface for offline deduplication similar to that for scrub operations would even better. Why does this have to be kernel side? There's userspace software already to dedupe that can be run on a regular basis. Exporting checksums is a differnet story (you can do that via ioctl) but running the dedupe software itself inside the kernel is exactly what we want to avoid by having the dedupe ioctl in the first place. --Mark -- Mark Fasheh Based on the same logic however, we don't need scrub to be done kernel side, as it wouldn't take but one more ioctl to be able to tell it which block out of a set to treat as valid. I'm not saying that things need to be done in the kernel, but duperemove doesn't use the ioctl interface even if it exists, and bedup is buggy as hell (unless it's improved greatly in the last two weeks), and neither of them is at all efficient. I do understand that this isn't something that is computationally simple (especially on x86 with it's defficiency of registers), but rsync does almost the same thing for data transmission over the network, and it does so seemingly much more efficiently than either option available at the moment. smime.p7s Description: S/MIME Cryptographic Signature
Re: Btrfs offline deduplication
On Fri, Aug 01, 2014 at 03:18:46PM -0400, Austin S Hemmelgarn wrote: Why does this have to be kernel side? There's userspace software already to dedupe that can be run on a regular basis. Exporting checksums is a differnet story (you can do that via ioctl) but running the dedupe software itself inside the kernel is exactly what we want to avoid by having the dedupe ioctl in the first place. --Mark -- Mark Fasheh Based on the same logic however, we don't need scrub to be done kernel side, as it wouldn't take but one more ioctl to be able to tell it which block out of a set to treat as valid. I'm not saying that things need to be done in the kernel, but duperemove doesn't use the ioctl interface even if it exists, and bedup is buggy as hell (unless it's improved greatly in the last two weeks), and neither of them is at all efficient. Duperemove absolutely *does* use the ioctl interface for offline dedupe. I do understand that this isn't something that is computationally simple (especially on x86 with it's defficiency of registers), but rsync does almost the same thing for data transmission over the network, and it does so seemingly much more efficiently than either option available at the moment. None of the problems you mentioned get solved by pushing the entirety of offline deduplication into the kernel. If anything, it's more dangerous tod o that as bugs tend to be far more critical when we hit them from kernel. Regarding duperemove there's a series to fix up some performance issues that I'm working on importing at the moment. --Mark -- Mark Fasheh -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/12] btrfs: factor btrfs_setup_super() out of open_ctree()
Move all the superblock flag geometry testing fiddling into its own function. This does coalesce some far-flung tests, but it ... looks ok to me. Signed-off-by: Eric Sandeen sand...@redhat.com --- fs/btrfs/disk-io.c | 207 1 files changed, 112 insertions(+), 95 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 47fcacf..31e9791 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2209,6 +2209,104 @@ static void btrfs_qgroup_init(struct btrfs_fs_info *fs_info) mutex_init(fs_info-qgroup_rescan_lock); } +/* + * Test geometry and feature flags at mount time + */ +static int btrfs_setup_super(struct super_block *sb, +struct btrfs_fs_info *fs_info) +{ + struct btrfs_super_block *disk_super = fs_info-super_copy; + u32 nodesize = btrfs_super_nodesize(disk_super); + u32 leafsize = btrfs_super_leafsize(disk_super); + u32 sectorsize = btrfs_super_sectorsize(disk_super); + u64 features; + + + /* First sanity check magic sizes */ + if (btrfs_super_magic(disk_super) != BTRFS_MAGIC) { + printk(KERN_INFO BTRFS: valid FS not found on %s\n, sb-s_id); + return -EINVAL; + } + + if (leafsize != nodesize) { + printk(KERN_ERR BTRFS: couldn't mount because metadata + blocksizes don't match. node %d leaf %d\n, + nodesize, leafsize); + return -EINVAL; + } + + if (leafsize BTRFS_MAX_METADATA_BLOCKSIZE) { + printk(KERN_ERR BTRFS: couldn't mount because metadata + blocksize (%d) was too large\n, leafsize); + return -EINVAL; + } + + if (sectorsize != PAGE_SIZE) { + printk(KERN_WARNING BTRFS: Incompatible sector size(%lu) + found on %s\n, (unsigned long)sectorsize, sb-s_id); + return -EINVAL; + } + + /* check FS state, whether FS is broken. */ + if (btrfs_super_flags(disk_super) BTRFS_SUPER_FLAG_ERROR) + set_bit(BTRFS_FS_STATE_ERROR, fs_info-fs_state); + + features = btrfs_super_incompat_flags(disk_super); + if (features ~BTRFS_FEATURE_INCOMPAT_SUPP) { + printk(KERN_ERR BTRFS: couldn't mount because of + unsupported optional features (%Lx).\n, + features ~BTRFS_FEATURE_INCOMPAT_SUPP); + return -EINVAL; + } + + features |= BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF; + /* Original LZO commit didn't set incompat flag when mounted :( */ + if (fs_info-compress_type == BTRFS_COMPRESS_LZO) + features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO; + + if (features BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA) + printk(KERN_ERR BTRFS: has skinny extents\n); + + /* +* flag our filesystem as having big metadata blocks if +* they are bigger than the page size +*/ + if (leafsize PAGE_CACHE_SIZE) { + if (!(features BTRFS_FEATURE_INCOMPAT_BIG_METADATA)) + printk(KERN_INFO BTRFS: flagging fs with big metadata feature\n); + features |= BTRFS_FEATURE_INCOMPAT_BIG_METADATA; + } + + /* +* mixed block groups end up with duplicate but slightly offset +* extent buffers for the same range. It leads to corruptions +*/ + if ((features BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS) + (sectorsize != leafsize)) { + printk(KERN_WARNING BTRFS: unequal leaf/node/sector sizes + are not allowed for mixed block groups on %s\n, + sb-s_id); + return -EINVAL; + } + + /* +* Needn't use the lock because there is no other task which will +* update the flag. +*/ + btrfs_set_super_incompat_flags(disk_super, features); + + features = btrfs_super_compat_ro_flags(disk_super) + ~BTRFS_FEATURE_COMPAT_RO_SUPP; + if (!(sb-s_flags MS_RDONLY) features) { + printk(KERN_ERR BTRFS: couldn't mount RDWR because of + unsupported option features (%Lx).\n, + features); + return -EINVAL; + } + + return 0; +} + int open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_devices, char *options) @@ -2219,7 +2317,6 @@ int open_ctree(struct super_block *sb, u32 blocksize; u32 stripesize; u64 generation; - u64 features; struct btrfs_key location; struct buffer_head *bh; struct btrfs_super_block *disk_super; @@ -2458,10 +2555,6 @@ int open_ctree(struct super_block *sb, if (!btrfs_super_root(disk_super)) goto fail_alloc; - /* check FS state,
[PATCH 10/12] btrfs: factor btrfs_alloc_workqueues() out of open_ctree()
Signed-off-by: Eric Sandeen sand...@redhat.com --- fs/btrfs/disk-io.c | 144 1 files changed, 77 insertions(+), 67 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 31e9791..0465d43 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2307,6 +2307,80 @@ static int btrfs_setup_super(struct super_block *sb, return 0; } +static int btrfs_alloc_workqueues(struct btrfs_fs_info *fs_info, + struct btrfs_fs_devices *fs_devices) +{ + int max_active = fs_info-thread_pool_size; + int flags = WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_UNBOUND; + + fs_info-workers = + btrfs_alloc_workqueue(worker, flags | WQ_HIGHPRI, + max_active, 16); + fs_info-delalloc_workers = + btrfs_alloc_workqueue(delalloc, flags, max_active, 2); + + fs_info-flush_workers = + btrfs_alloc_workqueue(flush_delalloc, flags, max_active, 0); + + fs_info-caching_workers = + btrfs_alloc_workqueue(cache, flags, max_active, 0); + + /* +* a higher idle thresh on the submit workers makes it much more +* likely that bios will be send down in a sane order to the +* devices +*/ + fs_info-submit_workers = + btrfs_alloc_workqueue(submit, flags, + min_t(u64, fs_devices-num_devices, + max_active), 64); + fs_info-fixup_workers = + btrfs_alloc_workqueue(fixup, flags, 1, 0); + /* +* endios are largely parallel and should have a very +* low idle thresh +*/ + fs_info-endio_workers = + btrfs_alloc_workqueue(endio, flags, max_active, 4); + fs_info-endio_meta_workers = + btrfs_alloc_workqueue(endio-meta, flags, max_active, 4); + fs_info-endio_meta_write_workers = + btrfs_alloc_workqueue(endio-meta-write, flags, max_active, 2); + fs_info-endio_raid56_workers = + btrfs_alloc_workqueue(endio-raid56, flags, max_active, 4); + fs_info-rmw_workers = + btrfs_alloc_workqueue(rmw, flags, max_active, 2); + fs_info-endio_write_workers = + btrfs_alloc_workqueue(endio-write, flags, max_active, 2); + fs_info-endio_freespace_worker = + btrfs_alloc_workqueue(freespace-write, flags, max_active, 0); + fs_info-delayed_workers = + btrfs_alloc_workqueue(delayed-meta, flags, max_active, 0); + fs_info-readahead_workers = + btrfs_alloc_workqueue(readahead, flags, max_active, 2); + fs_info-qgroup_rescan_workers = + btrfs_alloc_workqueue(qgroup-rescan, flags, 1, 0); + fs_info-extent_workers = + btrfs_alloc_workqueue(extent-refs, flags, + min_t(u64, fs_devices-num_devices, + max_active), 8); + + if (!(fs_info-workers fs_info-delalloc_workers + fs_info-submit_workers fs_info-flush_workers + fs_info-endio_workers fs_info-endio_meta_workers + fs_info-endio_meta_write_workers + fs_info-endio_write_workers fs_info-endio_raid56_workers + fs_info-endio_freespace_worker fs_info-rmw_workers + fs_info-caching_workers fs_info-readahead_workers + fs_info-fixup_workers fs_info-delayed_workers + fs_info-fixup_workers fs_info-extent_workers + fs_info-qgroup_rescan_workers)) { + return -ENOMEM; + } + + return 0; +} + int open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_devices, char *options) @@ -2334,7 +2408,6 @@ int open_ctree(struct super_block *sb, int num_backups_tried = 0; int backup_index = 0; int max_active; - int flags = WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_UNBOUND; bool create_uuid_tree; bool check_uuid_tree; @@ -2582,72 +2655,9 @@ int open_ctree(struct super_block *sb, max_active = fs_info-thread_pool_size; - fs_info-workers = - btrfs_alloc_workqueue(worker, flags | WQ_HIGHPRI, - max_active, 16); - - fs_info-delalloc_workers = - btrfs_alloc_workqueue(delalloc, flags, max_active, 2); - - fs_info-flush_workers = - btrfs_alloc_workqueue(flush_delalloc, flags, max_active, 0); - - fs_info-caching_workers = - btrfs_alloc_workqueue(cache, flags, max_active, 0); - - /* -* a higher idle thresh on the submit workers makes it much more -* likely that bios will be send down in a sane order to the -* devices -*/ - fs_info-submit_workers = - btrfs_alloc_workqueue(submit,
[PATCH 12/12] btrfs: factor btrfs_replay_log() out of open_ctree()
Signed-off-by: Eric Sandeen sand...@redhat.com --- fs/btrfs/disk-io.c | 100 +--- 1 files changed, 56 insertions(+), 44 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index a50beca..ffb2f21 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2445,6 +2445,60 @@ static int btrfs_read_roots(struct btrfs_fs_info *fs_info, return 0; } +static int btrfs_replay_log(struct btrfs_fs_info *fs_info, + struct btrfs_fs_devices *fs_devices) +{ + int ret; + u32 blocksize; + struct btrfs_root *tree_root = fs_info-tree_root; + struct btrfs_root *log_tree_root; + struct btrfs_super_block *disk_super = fs_info-super_copy; + u64 bytenr = btrfs_super_log_root(disk_super); + + if (fs_devices-rw_devices == 0) { + printk(KERN_WARNING BTRFS: log replay required + on RO media\n); + return -EIO; + } + blocksize = btrfs_level_size(tree_root, + btrfs_super_log_root_level(disk_super)); + + log_tree_root = btrfs_alloc_root(fs_info); + if (!log_tree_root) + return -ENOMEM; + + __setup_root(tree_root-nodesize, tree_root-leafsize, +tree_root-sectorsize, tree_root-stripesize, +log_tree_root, fs_info, BTRFS_TREE_LOG_OBJECTID); + + log_tree_root-node = read_tree_block(tree_root, bytenr, + blocksize, + fs_info-generation + 1); + if (!log_tree_root-node || + !extent_buffer_uptodate(log_tree_root-node)) { + printk(KERN_ERR BTRFS: failed to read log tree\n); + free_extent_buffer(log_tree_root-node); + kfree(log_tree_root); + return -EIO; + } + /* returns with log_tree_root freed on success */ + ret = btrfs_recover_log_trees(log_tree_root); + if (ret) { + btrfs_error(tree_root-fs_info, ret, + Failed to recover log tree); + free_extent_buffer(log_tree_root-node); + kfree(log_tree_root); + return ret; + } + + if (fs_info-sb-s_flags MS_RDONLY) { + ret = btrfs_commit_super(tree_root); + if (ret) + return ret; + } + return 0; +} + int open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_devices, char *options) @@ -2461,7 +2515,6 @@ int open_ctree(struct super_block *sb, struct btrfs_fs_info *fs_info = btrfs_sb(sb); struct btrfs_root *tree_root; struct btrfs_root *chunk_root; - struct btrfs_root *log_tree_root; int ret; int err = -EINVAL; int num_backups_tried = 0; @@ -2905,52 +2958,11 @@ retry_root_backup: /* do not make disk changes in broken FS */ if (btrfs_super_log_root(disk_super) != 0) { - u64 bytenr = btrfs_super_log_root(disk_super); - - if (fs_devices-rw_devices == 0) { - printk(KERN_WARNING BTRFS: log replay required - on RO media\n); - err = -EIO; - goto fail_qgroup; - } - blocksize = -btrfs_level_size(tree_root, - btrfs_super_log_root_level(disk_super)); - - log_tree_root = btrfs_alloc_root(fs_info); - if (!log_tree_root) { - err = -ENOMEM; - goto fail_qgroup; - } - - __setup_root(nodesize, leafsize, sectorsize, stripesize, -log_tree_root, fs_info, BTRFS_TREE_LOG_OBJECTID); - - log_tree_root-node = read_tree_block(tree_root, bytenr, - blocksize, - generation + 1); - if (!log_tree_root-node || - !extent_buffer_uptodate(log_tree_root-node)) { - printk(KERN_ERR BTRFS: failed to read log tree\n); - free_extent_buffer(log_tree_root-node); - kfree(log_tree_root); - goto fail_qgroup; - } - /* returns with log_tree_root freed on success */ - ret = btrfs_recover_log_trees(log_tree_root); + ret = btrfs_replay_log(fs_info, fs_devices); if (ret) { - btrfs_error(tree_root-fs_info, ret, - Failed to recover log tree); - free_extent_buffer(log_tree_root-node); - kfree(log_tree_root); + err = ret; goto
[PATCH 04/12] btrfs: factor btrfs_scrub_init() out of open_ctree()
Signed-off-by: Eric Sandeen sand...@redhat.com --- fs/btrfs/disk-io.c | 19 --- 1 files changed, 12 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 28d35a8..b95635f 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2134,6 +2134,17 @@ void btrfs_free_fs_roots(struct btrfs_fs_info *fs_info) } } +static void btrfs_scrub_init(struct btrfs_fs_info *fs_info) +{ + mutex_init(fs_info-scrub_lock); + atomic_set(fs_info-scrubs_running, 0); + atomic_set(fs_info-scrub_pause_req, 0); + atomic_set(fs_info-scrubs_paused, 0); + atomic_set(fs_info-scrub_cancel_req, 0); + init_waitqueue_head(fs_info-scrub_pause_wait); + fs_info-scrub_workers_refcnt = 0; +} + int open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_devices, char *options) @@ -2281,14 +2292,8 @@ int open_ctree(struct super_block *sb, } btrfs_init_delayed_root(fs_info-delayed_root); - mutex_init(fs_info-scrub_lock); - atomic_set(fs_info-scrubs_running, 0); - atomic_set(fs_info-scrub_pause_req, 0); - atomic_set(fs_info-scrubs_paused, 0); - atomic_set(fs_info-scrub_cancel_req, 0); + btrfs_scrub_init(fs_info); init_waitqueue_head(fs_info-replace_wait); - init_waitqueue_head(fs_info-scrub_pause_wait); - fs_info-scrub_workers_refcnt = 0; #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY fs_info-check_integrity_print_mask = 0; #endif -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/12] btrfs: consistently use fs_info in close_ctree()
close_ctree() has a local fs_info var for convienience; use it consistently. Signed-off-by: Eric Sandeen sand...@redhat.com --- fs/btrfs/disk-io.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index f6d7afd..e6746be 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3651,7 +3651,7 @@ int close_ctree(struct btrfs_root *root) if (!(fs_info-sb-s_flags MS_RDONLY)) { ret = btrfs_commit_super(root); if (ret) - btrfs_err(root-fs_info, commit super ret %d, ret); + btrfs_err(fs_info, commit super ret %d, ret); } if (test_bit(BTRFS_FS_STATE_ERROR, fs_info-fs_state)) @@ -3663,10 +3663,10 @@ int close_ctree(struct btrfs_root *root) fs_info-closing = 2; smp_mb(); - btrfs_free_qgroup_config(root-fs_info); + btrfs_free_qgroup_config(fs_info); if (percpu_counter_sum(fs_info-delalloc_bytes)) { - btrfs_info(root-fs_info, at unmount delalloc count %lld, + btrfs_info(fs_info, at unmount delalloc count %lld, percpu_counter_sum(fs_info-delalloc_bytes)); } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/12] btrfs: handle errors from reading the quota tree root
Reading the quota tree root may fail with ENOENT if there is no quota, which is fine, but the code was ignoring every other error as well, which is not fine. Signed-off-by: Eric Sandeen sand...@redhat.com --- fs/btrfs/disk-io.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index e6746be..28d35a8 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2733,7 +2733,12 @@ retry_root_backup: location.objectid = BTRFS_QUOTA_TREE_OBJECTID; quota_root = btrfs_read_tree_root(tree_root, location); - if (!IS_ERR(quota_root)) { + if (IS_ERR(quota_root)) { + ret = PTR_ERR(quota_root); + /* It's fine to not have quotas */ + if (ret != -ENOENT) + goto recovery_tree_root; + } else { set_bit(BTRFS_ROOT_TRACK_DIRTY, quota_root-state); fs_info-quota_enabled = 1; fs_info-pending_quota_state = 1; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/12] btrfs: remove unused fs_info arg from btrfs_close_extra_devices()
The commit: 8dabb74 Btrfs: change core code of btrfs to support the device replace operations added the fs_info argument, but never used it - just remove it again. Signed-off-by: Eric Sandeen sand...@redhat.com --- fs/btrfs/disk-io.c |4 ++-- fs/btrfs/volumes.c |3 +-- fs/btrfs/volumes.h |3 +-- 3 files changed, 4 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 08e65e9..f6d7afd 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2672,7 +2672,7 @@ int open_ctree(struct super_block *sb, * keep the device that is marked to be the target device for the * dev_replace procedure */ - btrfs_close_extra_devices(fs_info, fs_devices, 0); + btrfs_close_extra_devices(fs_devices, 0); if (!fs_devices-latest_bdev) { printk(KERN_CRIT BTRFS: failed to read devices on %s\n, @@ -2778,7 +2778,7 @@ retry_root_backup: goto fail_block_groups; } - btrfs_close_extra_devices(fs_info, fs_devices, 1); + btrfs_close_extra_devices(fs_devices, 1); ret = btrfs_sysfs_add_one(fs_info); if (ret) { diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 6cb82f6..b5aa0c9 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -574,8 +574,7 @@ error: return ERR_PTR(-ENOMEM); } -void btrfs_close_extra_devices(struct btrfs_fs_info *fs_info, - struct btrfs_fs_devices *fs_devices, int step) +void btrfs_close_extra_devices(struct btrfs_fs_devices *fs_devices, int step) { struct btrfs_device *device, *next; diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 2aaa00c..2026741 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -307,8 +307,7 @@ int btrfs_open_devices(struct btrfs_fs_devices *fs_devices, int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder, struct btrfs_fs_devices **fs_devices_ret); int btrfs_close_devices(struct btrfs_fs_devices *fs_devices); -void btrfs_close_extra_devices(struct btrfs_fs_info *fs_info, - struct btrfs_fs_devices *fs_devices, int step); +void btrfs_close_extra_devices(struct btrfs_fs_devices *fs_devices, int step); int btrfs_find_device_missing_or_by_path(struct btrfs_root *root, char *device_path, struct btrfs_device **device); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/12] btrfs: factor btrfs_balance_init() out of open_ctree()
Signed-off-by: Eric Sandeen sand...@redhat.com --- fs/btrfs/disk-io.c | 20 1 files changed, 12 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index b95635f..8c7113b 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2145,6 +2145,17 @@ static void btrfs_scrub_init(struct btrfs_fs_info *fs_info) fs_info-scrub_workers_refcnt = 0; } +static void btrfs_balance_init(struct btrfs_fs_info *fs_info) +{ + spin_lock_init(fs_info-balance_lock); + mutex_init(fs_info-balance_mutex); + atomic_set(fs_info-balance_running, 0); + atomic_set(fs_info-balance_pause_req, 0); + atomic_set(fs_info-balance_cancel_req, 0); + fs_info-balance_ctl = NULL; + init_waitqueue_head(fs_info-balance_wait_q); +} + int open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_devices, char *options) @@ -2297,14 +2308,7 @@ int open_ctree(struct super_block *sb, #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY fs_info-check_integrity_print_mask = 0; #endif - - spin_lock_init(fs_info-balance_lock); - mutex_init(fs_info-balance_mutex); - atomic_set(fs_info-balance_running, 0); - atomic_set(fs_info-balance_pause_req, 0); - atomic_set(fs_info-balance_cancel_req, 0); - fs_info-balance_ctl = NULL; - init_waitqueue_head(fs_info-balance_wait_q); + btrfs_balance_init(fs_info); btrfs_init_async_reclaim_work(fs_info-async_reclaim_work); sb-s_blocksize = 4096; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/12] btrfs: factor btrfs_qgroup_init() out of open_ctree()
Signed-off-by: Eric Sandeen sand...@redhat.com --- fs/btrfs/disk-io.c | 26 +++--- 1 files changed, 15 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index a5fa84f..47fcacf 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2195,6 +2195,20 @@ static void btrfs_dev_replace_locks_init(struct btrfs_fs_info *fs_info) mutex_init(fs_info-dev_replace.lock); } +static void btrfs_qgroup_init(struct btrfs_fs_info *fs_info) +{ + spin_lock_init(fs_info-qgroup_lock); + mutex_init(fs_info-qgroup_ioctl_lock); + fs_info-qgroup_tree = RB_ROOT; + fs_info-qgroup_op_tree = RB_ROOT; + INIT_LIST_HEAD(fs_info-dirty_qgroups); + fs_info-qgroup_seq = 1; + fs_info-quota_enabled = 0; + fs_info-pending_quota_state = 0; + fs_info-qgroup_ulist = NULL; + mutex_init(fs_info-qgroup_rescan_lock); +} + int open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_devices, char *options) @@ -2381,17 +2395,7 @@ int open_ctree(struct super_block *sb, sema_init(fs_info-uuid_tree_rescan_sem, 1); btrfs_dev_replace_locks_init(fs_info); - - spin_lock_init(fs_info-qgroup_lock); - mutex_init(fs_info-qgroup_ioctl_lock); - fs_info-qgroup_tree = RB_ROOT; - fs_info-qgroup_op_tree = RB_ROOT; - INIT_LIST_HEAD(fs_info-dirty_qgroups); - fs_info-qgroup_seq = 1; - fs_info-quota_enabled = 0; - fs_info-pending_quota_state = 0; - fs_info-qgroup_ulist = NULL; - mutex_init(fs_info-qgroup_rescan_lock); + btrfs_qgroup_init(fs_info); btrfs_init_free_cluster(fs_info-meta_alloc_cluster); btrfs_init_free_cluster(fs_info-data_alloc_cluster); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/12] btrfs: factor btrfs_dev_replace_locks_init() out of open_ctree()
Signed-off-by: Eric Sandeen sand...@redhat.com --- fs/btrfs/disk-io.c | 16 +++- 1 files changed, 11 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 6636386..a5fa84f 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2186,6 +2186,15 @@ static void btrfs_btree_inode_init(struct btrfs_fs_info *fs_info, btrfs_insert_inode_hash(fs_info-btree_inode); } +static void btrfs_dev_replace_locks_init(struct btrfs_fs_info *fs_info) +{ + fs_info-dev_replace.lock_owner = 0; + atomic_set(fs_info-dev_replace.nesting_level, 0); + mutex_init(fs_info-dev_replace.lock_finishing_cancel_unmount); + mutex_init(fs_info-dev_replace.lock_management_lock); + mutex_init(fs_info-dev_replace.lock); +} + int open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_devices, char *options) @@ -2370,11 +2379,8 @@ int open_ctree(struct super_block *sb, init_rwsem(fs_info-cleanup_work_sem); init_rwsem(fs_info-subvol_sem); sema_init(fs_info-uuid_tree_rescan_sem, 1); - fs_info-dev_replace.lock_owner = 0; - atomic_set(fs_info-dev_replace.nesting_level, 0); - mutex_init(fs_info-dev_replace.lock_finishing_cancel_unmount); - mutex_init(fs_info-dev_replace.lock_management_lock); - mutex_init(fs_info-dev_replace.lock); + + btrfs_dev_replace_locks_init(fs_info); spin_lock_init(fs_info-qgroup_lock); mutex_init(fs_info-qgroup_ioctl_lock); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/12] disk-io.c / open_ctree cleanup refactoring
This is mostly to refactor open_ctree(); at the end of the series it's only around 600 lines instead of 900. The first 2 patches are just little cleanups I saw while doing this; the 3rd actually is something of a bugfix. The rest are refactoring - this is a bit of an RFC still; some seem like clear groups of code to move out of the way, others are a bit more gratuitous. Perhaps after these 300 lines are moved out of the way, folks who are familiar with the code can spot other reasonable groupings or functionality which could also be factored out. There are still large swaths of random initializations; I though about btrfs_initialize_locks_and_stuff() but decided against it. :) Anyway, it builds passes default xfstests -g auto runs, so it can't be all bad. Let me know what you think. Different function names might be better, better symmetry with close_ctree() might be good, but it's a start. Thanks, -Eric -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/12] btrfs: factor btrfs_btree_inode_init() out of open_ctree()
Signed-off-by: Eric Sandeen sand...@redhat.com --- fs/btrfs/disk-io.c | 56 --- 1 files changed, 31 insertions(+), 25 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 8c7113b..6636386 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2156,6 +2156,36 @@ static void btrfs_balance_init(struct btrfs_fs_info *fs_info) init_waitqueue_head(fs_info-balance_wait_q); } +static void btrfs_btree_inode_init(struct btrfs_fs_info *fs_info, + struct btrfs_root *tree_root) +{ + fs_info-btree_inode-i_ino = BTRFS_BTREE_INODE_OBJECTID; + set_nlink(fs_info-btree_inode, 1); + /* +* we set the i_size on the btree inode to the max possible int. +* the real end of the address space is determined by all of +* the devices in the system +*/ + fs_info-btree_inode-i_size = OFFSET_MAX; + fs_info-btree_inode-i_mapping-a_ops = btree_aops; + fs_info-btree_inode-i_mapping-backing_dev_info = fs_info-bdi; + + RB_CLEAR_NODE(BTRFS_I(fs_info-btree_inode)-rb_node); + extent_io_tree_init(BTRFS_I(fs_info-btree_inode)-io_tree, +fs_info-btree_inode-i_mapping); + BTRFS_I(fs_info-btree_inode)-io_tree.track_uptodate = 0; + extent_map_tree_init(BTRFS_I(fs_info-btree_inode)-extent_tree); + + BTRFS_I(fs_info-btree_inode)-io_tree.ops = btree_extent_io_ops; + + BTRFS_I(fs_info-btree_inode)-root = tree_root; + memset(BTRFS_I(fs_info-btree_inode)-location, 0, + sizeof(struct btrfs_key)); + set_bit(BTRFS_INODE_DUMMY, + BTRFS_I(fs_info-btree_inode)-runtime_flags); + btrfs_insert_inode_hash(fs_info-btree_inode); +} + int open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_devices, char *options) @@ -2315,31 +2345,7 @@ int open_ctree(struct super_block *sb, sb-s_blocksize_bits = blksize_bits(4096); sb-s_bdi = fs_info-bdi; - fs_info-btree_inode-i_ino = BTRFS_BTREE_INODE_OBJECTID; - set_nlink(fs_info-btree_inode, 1); - /* -* we set the i_size on the btree inode to the max possible int. -* the real end of the address space is determined by all of -* the devices in the system -*/ - fs_info-btree_inode-i_size = OFFSET_MAX; - fs_info-btree_inode-i_mapping-a_ops = btree_aops; - fs_info-btree_inode-i_mapping-backing_dev_info = fs_info-bdi; - - RB_CLEAR_NODE(BTRFS_I(fs_info-btree_inode)-rb_node); - extent_io_tree_init(BTRFS_I(fs_info-btree_inode)-io_tree, -fs_info-btree_inode-i_mapping); - BTRFS_I(fs_info-btree_inode)-io_tree.track_uptodate = 0; - extent_map_tree_init(BTRFS_I(fs_info-btree_inode)-extent_tree); - - BTRFS_I(fs_info-btree_inode)-io_tree.ops = btree_extent_io_ops; - - BTRFS_I(fs_info-btree_inode)-root = tree_root; - memset(BTRFS_I(fs_info-btree_inode)-location, 0, - sizeof(struct btrfs_key)); - set_bit(BTRFS_INODE_DUMMY, - BTRFS_I(fs_info-btree_inode)-runtime_flags); - btrfs_insert_inode_hash(fs_info-btree_inode); + btrfs_btree_inode_init(fs_info, tree_root); spin_lock_init(fs_info-block_group_cache_lock); fs_info-block_group_cache_tree = RB_ROOT; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/12] btrfs: factor btrfs_read_roots() out of open_ctree()
Also, remove the two local variables create_uuid_tree and check_uuid_tree; we can use the existence of the uuid root and/or the RESCAN_UUID_TREE flag to determine what action to take. Signed-off-by: Eric Sandeen sand...@redhat.com --- fs/btrfs/disk-io.c | 141 ++-- 1 files changed, 71 insertions(+), 70 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 0465d43..a50beca 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2381,6 +2381,70 @@ static int btrfs_alloc_workqueues(struct btrfs_fs_info *fs_info, return 0; } +static int btrfs_read_roots(struct btrfs_fs_info *fs_info, + struct btrfs_root *tree_root) +{ + struct btrfs_root *extent_root; + struct btrfs_root *dev_root; + struct btrfs_root *csum_root; + struct btrfs_root *quota_root; + struct btrfs_root *uuid_root; + struct btrfs_key location; + int ret; + + location.objectid = BTRFS_EXTENT_TREE_OBJECTID; + location.type = BTRFS_ROOT_ITEM_KEY; + location.offset = 0; + + extent_root = btrfs_read_tree_root(tree_root, location); + if (IS_ERR(extent_root)) + return PTR_ERR(extent_root); + set_bit(BTRFS_ROOT_TRACK_DIRTY, extent_root-state); + fs_info-extent_root = extent_root; + + location.objectid = BTRFS_DEV_TREE_OBJECTID; + dev_root = btrfs_read_tree_root(tree_root, location); + if (IS_ERR(dev_root)) + return PTR_ERR(dev_root); + set_bit(BTRFS_ROOT_TRACK_DIRTY, dev_root-state); + fs_info-dev_root = dev_root; + btrfs_init_devices_late(fs_info); + + location.objectid = BTRFS_CSUM_TREE_OBJECTID; + csum_root = btrfs_read_tree_root(tree_root, location); + if (IS_ERR(csum_root)) + return PTR_ERR(csum_root); + set_bit(BTRFS_ROOT_TRACK_DIRTY, csum_root-state); + fs_info-csum_root = csum_root; + + location.objectid = BTRFS_QUOTA_TREE_OBJECTID; + quota_root = btrfs_read_tree_root(tree_root, location); + if (IS_ERR(quota_root)) { + ret = PTR_ERR(quota_root); + /* It's fine to not have quotas */ + if (ret != -ENOENT) + return ret; + } else { + set_bit(BTRFS_ROOT_TRACK_DIRTY, quota_root-state); + fs_info-quota_enabled = 1; + fs_info-pending_quota_state = 1; + fs_info-quota_root = quota_root; + } + + location.objectid = BTRFS_UUID_TREE_OBJECTID; + uuid_root = btrfs_read_tree_root(tree_root, location); + if (IS_ERR(uuid_root)) { + ret = PTR_ERR(uuid_root); + if (ret != -ENOENT) + return ret; + } else { + set_bit(BTRFS_ROOT_TRACK_DIRTY, uuid_root-state); + fs_info-uuid_root = uuid_root; + } + + return 0; +} + int open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_devices, char *options) @@ -2396,20 +2460,13 @@ int open_ctree(struct super_block *sb, struct btrfs_super_block *disk_super; struct btrfs_fs_info *fs_info = btrfs_sb(sb); struct btrfs_root *tree_root; - struct btrfs_root *extent_root; - struct btrfs_root *csum_root; struct btrfs_root *chunk_root; - struct btrfs_root *dev_root; - struct btrfs_root *quota_root; - struct btrfs_root *uuid_root; struct btrfs_root *log_tree_root; int ret; int err = -EINVAL; int num_backups_tried = 0; int backup_index = 0; int max_active; - bool create_uuid_tree; - bool check_uuid_tree; tree_root = fs_info-tree_root = btrfs_alloc_root(fs_info); chunk_root = fs_info-chunk_root = btrfs_alloc_root(fs_info); @@ -2752,66 +2809,9 @@ retry_root_backup: tree_root-commit_root = btrfs_root_node(tree_root); btrfs_set_root_refs(tree_root-root_item, 1); - location.objectid = BTRFS_EXTENT_TREE_OBJECTID; - location.type = BTRFS_ROOT_ITEM_KEY; - location.offset = 0; - - extent_root = btrfs_read_tree_root(tree_root, location); - if (IS_ERR(extent_root)) { - ret = PTR_ERR(extent_root); - goto recovery_tree_root; - } - set_bit(BTRFS_ROOT_TRACK_DIRTY, extent_root-state); - fs_info-extent_root = extent_root; - - location.objectid = BTRFS_DEV_TREE_OBJECTID; - dev_root = btrfs_read_tree_root(tree_root, location); - if (IS_ERR(dev_root)) { - ret = PTR_ERR(dev_root); - goto recovery_tree_root; - } - set_bit(BTRFS_ROOT_TRACK_DIRTY, dev_root-state); - fs_info-dev_root = dev_root; - btrfs_init_devices_late(fs_info); - - location.objectid = BTRFS_CSUM_TREE_OBJECTID; - csum_root = btrfs_read_tree_root(tree_root, location); - if