Re: [PATCH v2] common: get fs type again using device canonical name in _fs_type

2014-08-01 Thread Dave Chinner
On Fri, Aug 01, 2014 at 01:02:58PM +0800, Eryu Guan wrote:
 On Fri, Aug 01, 2014 at 02:49:10PM +1000, Dave Chinner wrote:
  On Fri, Aug 01, 2014 at 12:02:41PM +0800, Eryu Guan wrote:
   On Fri, Aug 01, 2014 at 10:21:59AM +1000, Dave Chinner wrote:
On Thu, Jul 31, 2014 at 06:52:37PM +0800, Eryu Guan wrote:
 When testing with lvm, a previous btrfsck run could change df output
 from something like
 
 /dev/mapper/rhel_hp--dl388eg8--01-testlv1 btrfs 15728640 900 13602172 
 1% /mnt/btrfs
 
 to
 
 /dev/dm-3 btrfs 15728640 900 13602172 1% /mnt/btrfs

I don't follow you. Why would running btrfsck change the name of the
device? If the filesystem is umounted and mounted again, then the
device could change, but btrfsck should not be not doing the
unmount/mount, and so unless the TEST_DEV/SCRATCH_DEV is changing
the output of df should be identical...

So before we change the _fs_type() code, can you explain exactly
how, when and why the device name is changing to me?
   
   Assume that we have two btrfs filesystems, kernel is 3.16.0-rc4+
   
   [root@hp-dl388eg8-01 btrfs-progs]# btrfs fi show
   Label: none  uuid: 1aba7da5-ce2b-4af0-a716-db732abc60b2
   Total devices 1 FS bytes used 384.00KiB
   devid1 size 15.00GiB used 2.04GiB path 
   /dev/mapper/rhel_hp--dl388eg8--01-testlv1
   
   Label: none  uuid: 26ff4f12-f6d9-4cbc-aae2-57febeefde37
   Total devices 2 FS bytes used 112.00KiB
   devid1 size 15.00GiB used 2.03GiB path 
   /dev/mapper/rhel_hp--dl388eg8--01-testlv2
   devid2 size 15.00GiB used 2.01GiB path 
   /dev/mapper/rhel_hp--dl388eg8--01-testlv3
   
   Btrfs v3.14.2
   
   And testlv1 was mounted at /mnt/btrfs
   
   [root@hp-dl388eg8-01 btrfs-progs]# df -TP /mnt/btrfs
   FilesystemType  1024-blocks  Used 
   Available Capacity Mounted on
   /dev/mapper/rhel_hp--dl388eg8--01-testlv1 btrfs15728640   512  
   13602560   1% /mnt/btrfs
   
   Now run btrfsck on testlv2, btrfsck will scan all btrfs devices and
   somehow change the device name.
   
   [root@hp-dl388eg8-01 btrfs-progs]# btrfsck 
   /dev/mapper/rhel_hp--dl388eg8--01-testlv2 /dev/null 21
   
   # device name changed in df output and btrfs fi show output
   [root@hp-dl388eg8-01 btrfs-progs]# df -TP /mnt/btrfs
   Filesystem Type  1024-blocks  Used Available Capacity Mounted on
   /dev/dm-3  btrfs15728640   512  13602560   1% /mnt/btrfs
   [root@hp-dl388eg8-01 btrfs-progs]# btrfs fi show
   Label: none  uuid: 1aba7da5-ce2b-4af0-a716-db732abc60b2
   Total devices 1 FS bytes used 384.00KiB
   devid1 size 15.00GiB used 2.04GiB path /dev/dm-3
   
   Label: none  uuid: 26ff4f12-f6d9-4cbc-aae2-57febeefde37
   Total devices 2 FS bytes used 112.00KiB
   devid1 size 15.00GiB used 2.03GiB path 
   /dev/mapper/rhel_hp--dl388eg8--01-testlv2
   devid2 size 15.00GiB used 2.01GiB path 
   /dev/mapper/rhel_hp--dl388eg8--01-testlv3
   
   Btrfs v3.14.2
   
   This only happens when btrfsck a btrfs with multiple devices, so this
   only affects xfstests run on btrfs with SCRATCH_DEV_POOL set to lvm
   lvs.
   
   Maybe this is a bug of btrfs-progs and we should fix it there?
  
  Yes, that smells of a btrfs-progs bug. If your /etc/mtab a link to
  /proc/mounts? If not, does the contents change when you run btrfsck,
  and does the problem go away when you replace /etc/mtab with a link
  to /proc/mounts?
 
 /etc/mtab is a symlink to /proc/self/mounts, so does /proc/mounts
 
 [root@hp-dl388eg8-01 btrfs-progs]# ls -l /etc/mtab
 lrwxrwxrwx. 1 root root 17 Sep 22  2013 /etc/mtab - /proc/self/mounts
 [root@hp-dl388eg8-01 btrfs-progs]# ls -l /proc/mounts
 lrwxrwxrwx. 1 root root 11 Aug  1 00:59 /proc/mounts - self/mounts
 
 And the device name also changed in /proc/mounts
 
 [root@hp-dl388eg8-01 btrfs-progs]# grep btrfs /proc/mounts
 /dev/dm-3 /mnt/btrfs btrfs rw,seclabel,relatime,space_cache 0 0

Well, that's exactly the last thing *I* expected. The kernel just
doesn't change device names on mounted filesystems like that.

Oh, the device name comes from btrfs_show_devname().

So this definitely seems to me to be a btrfs bug - btrfsck is
causing the btrfs kernel code to change the name of devices
associated with unrelated, mounted filesystems to that which it is
operating on. That's just wrong. IOWs, btrfs needs fixing, not
xfstests, because that can bite during any test that runs btrfsck in
the middle of it

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: integration tree updated

2014-08-01 Thread Anand Jain



Hi Chris,

 Looks like you missed Miao update (on 17 Jul) to back out the bad
 patch (below), and your integration branch (published on 25 Jul)
 still contains the same.

[PATCH 2/2] Btrfs: fix wrong total device counter after removing a seed 
device


 You should remove it.

 Simple test cases on your integration branch (like btrfs dev del OR
 btrfs/003) is failing due to this and I (+ others would) spend some
 time digging it out


Miao,

 If it helps -
cur_devices in the loop is null.

BUG: unable to handle kernel NULL pointer dereference at 0050
IP: [a00a8fc0] btrfs_rm_device+0x4c0/0x860 [btrfs]

cur_devices = root-fs_info-fs_devices;
do {
cur_devices-total_devices--;
cur_devices = cur_devices-seed;
} while (device-fs_devices != cur_devices);


 PS: I didn't find the follow-up patch in the mailing list,
did I missing anything ?

Thanks, Anand


On 07/25/2014 09:40 AM, Chris Mason wrote:

Hi everyone,

I've pushed out my current integration branch.  It does have a few of
Miao Xie's patches missing because there were some rejects.  I think
this was just because some things got pulled in out of order, and I'll
get it fixed up.

Also missing is Mark's quota snapshot deletion fixes.  They were
crashing during btrfs/011 with CONFIG_DEBUG_PAGE_ALLOC on.  We'll get
that nailed down.

integration is subject to rebasing, so please treat it more like a patch
queue.  It is very lightly tested, the goal is just to show which
patches are already applied and which ones are still pending.

Thanks!

-chris


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add support to check for FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_ZERO_RANGE crap modes

2014-08-01 Thread Hugo Mills
On Thu, Jul 31, 2014 at 09:53:15PM -0400, Nick Krause wrote:
 On Thu, Jul 31, 2014 at 3:09 PM, Hugo Mills h...@carfax.org.uk wrote:
  On Thu, Jul 31, 2014 at 01:53:33PM -0400, Nicholas Krause wrote:
  This adds checks for the stated modes as if they are crap we will return 
  error
  not supported.
 
 You've just enabled two options, but you haven't actually
  implemented the code behind it. I would tell you *NOT* to do anything
  else on this work until you can answer the question: What happens if
  you apply this patch, create a large file called foo.txt, and then a
  userspace program executes the following code?
 
  int fd = open(foo.txt, O_RDWR);
  fallocate(fd, FALLOCATE_FL_COLLAPSE_RANGE, 50, 50);
 
 Try it on a btrfs filesystem, both with and without your patch.
  Also try it on an ext4 filesystem.
 
 Once you've done all of that, reply to this mail and tell me what
  the problem is with this patch. You need to make two answers: what are
  the technical problems with the patch? What errors have you made in
  the development process?
 
 *Only* if you can answer those questions sensibly, should you write
  any more patches, of any kind.
[snip]

 Calls are there in btrfs , therefore will either kernel panic or
 cause an oops.

   That's a guess. I can tell it's a guess, because I've actually read
(some of) the rest of that function, so I've got a good idea of what I
think it will do -- and panic or oops is not the answer. Try again.
You can answer this question two ways: by test (see my suggestion
above), or by reading and understanding the code. Either will work in
this case, but doing neither is not an option for someone who wants to
change the function.

 Need to test this patch as this is very easy to catch bug.

   So why didn't you? It's your patch, testing it is your job --
*before* it gets out into the outside world.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- But people have always eaten people,  / what else is there to ---  
 eat?  / If the Juju had meant us not to eat people / he 
 wouldn't have made us of meat.  


signature.asc
Description: Digital signature


[PATCH v2] xfstests: add regression test for btrfs send with orphans

2014-08-01 Thread Filipe Manana
Regression test for a btrfs issue where we create a RO snapshot
to use for a send operation, which fails with a -ESTALE error,
due to the presence of orphan inodes accessible through the
snapshot's commit root but no longer present through the main
root.

This issue is fixed by the following linux kernel btrfs patch:

  Btrfs: update commit root on snapshot creation after orphan cleanup

Signed-off-by: Filipe Manana fdman...@suse.com
---

V2: Replaced a  redirect with a  redirect to $seqres.full, and added a
sleep.

 tests/btrfs/057 | 84 +
 tests/btrfs/057.out |  1 +
 tests/btrfs/group   |  1 +
 3 files changed, 86 insertions(+)
 create mode 100755 tests/btrfs/057
 create mode 100644 tests/btrfs/057.out

diff --git a/tests/btrfs/057 b/tests/btrfs/057
new file mode 100755
index 000..1e313e9
--- /dev/null
+++ b/tests/btrfs/057
@@ -0,0 +1,84 @@
+#! /bin/bash
+# FS QA Test No. btrfs/057
+#
+# Regression test for a btrfs issue where we create a RO snapshot to use for
+# a send operation which fails with a -ESTALE error, due to the presence of
+# orphan inodes accessible through the snapshot's commit root but no longer
+# present through the main root.
+#
+# This issue is fixed by the following linux kernel btrfs patch:
+#
+#Btrfs: update commit root on snapshot creation after orphan cleanup
+#
+#---
+# Copyright (C) 2014 SUSE Linux Products GmbH. All Rights Reserved.
+# Author: Filipe Manana fdman...@suse.com
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo QA output created by $seq
+
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap _cleanup; exit \$status 0 1 2 3 15
+
+_cleanup()
+{
+   if [ ! -z $XFS_IO_PID ]; then
+   kill $XFS_IO_PID  /dev/null 21
+   fi
+   rm -fr $tmp
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+# Requiring flink command tests for the presence of the -T option used
+# to pass O_TMPFILE to open(2).
+_require_xfs_io_command flink
+_need_to_be_root
+
+rm -f $seqres.full
+
+_scratch_mkfs /dev/null 21
+_scratch_mount
+
+# Create a tmpfile file, write some data to it and leave it open, so that our
+# main subvolume has an orphan inode item.
+$XFS_IO_PROG -T $SCRATCH_MNT $seqres.full 21  (
+   echo pwrite 0 65536
+   read
+) 
+XFS_IO_PID=$!
+
+# Give it some time to the xfs_io process to create the tmpfile.
+sleep 3
+
+# With the tmpfile open, create a RO snapshot and use it for a send operation.
+# The send operation used to fail with -ESTALE due to the presence of the
+# orphan inode.
+_run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap
+_run_btrfs_util_prog send $SCRATCH_MNT/mysnap -f /dev/null
+
+status=0
+exit
diff --git a/tests/btrfs/057.out b/tests/btrfs/057.out
new file mode 100644
index 000..b26eefe
--- /dev/null
+++ b/tests/btrfs/057.out
@@ -0,0 +1 @@
+QA output created by 057
diff --git a/tests/btrfs/group b/tests/btrfs/group
index 2da7127..ebc38c5 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -59,3 +59,4 @@
 054 auto quick
 055 auto quick
 056 auto quick
+057 auto quick
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


open_ctree failed on 3.16.0

2014-08-01 Thread Frankie Fisher

Hello.

A circuit breaker failed a few times and now I can't mount my btrfs 
volume - it fails with open_ctree failed:


[  337.004372]  [817865a5] dump_stack+0x46/0x58
[  337.004375]  [810720ac] warn_slowpath_common+0x8c/0xc0
[  337.004378]  [810720fa] warn_slowpath_null+0x1a/0x20
[  337.004387]  [c04583b5] btrfs_put_block_group+0x75/0x80 [btrfs]
[  337.004398]  [c046227d] btrfs_free_block_groups+0xbd/0x2e0 
[btrfs]

[  337.004410]  [c0470abd] open_ctree+0x188d/0x1f70 [btrfs]
[  337.004418]  [c0442af1] btrfs_fill_super.isra.84+0x81/0x130 
[btrfs]

[  337.004422]  [8136d7f1] ? disk_name+0x61/0xc0
[  337.004425]  [81392ab7] ? strlcpy+0x47/0x60
[  337.004434]  [c04474be] btrfs_mount+0x3ae/0x3d0 [btrfs]
[  337.004439]  [811e52e3] mount_fs+0x43/0x1b0
[  337.004443]  [812004c6] vfs_kern_mount+0x76/0x140
[  337.004446]  [81201c44] do_new_mount+0xa4/0x1f0
[  337.004448]  [81202fe6] do_mount+0x1e6/0x230
[  337.004451]  [812033b0] SyS_mount+0x90/0xe0
[  337.004454]  [8179402d] system_call_fastpath+0x1a/0x1f

mount -o recovery doesn't succeed, nor does mount -o recovery,ro.

I have tried the above with kernel 3.13.0 first and 3.16.0 later and the 
behaviour seems identical. This may or may not be relevant, but after I 
initialised the filesystem by copying some files to it (with kernel 
3.13.0), one of the files failed a checksum error. I hadn't yet compared 
the file that was written with the original to determine whether the 
error was with the checksum or otherwise.


What are the next steps I should try? Should I try btrfs-zero-log? Or 
should I try btrfsck? Or something else?


Regards,
Frankie Fisher

# uname -a
Linux mythtv 3.16.0-999-generic #201408010205 SMP Fri Aug 1 06:06:01 UTC 
2014 x86_64 x86_64 x86_64 GNU/Linux

# btrfs --version
Btrfs v3.12
# btrfs fi show
Label: none  uuid: beaff1b6-fb0e-4b80-8b59-74968fd51066
Total devices 1 FS bytes used 342.01GiB
devid1 size 1.36TiB used 349.04GiB path /dev/sdc2

Label: none  uuid: 34d2986d-6954-4c5c-922c-799cc66cd28e
Total devices 1 FS bytes used 112.00KiB
devid1 size 231.90GiB used 2.04GiB path /dev/sda2

Btrfs v3.12

[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Initializing cgroup subsys cpuacct
[0.00] Linux version 3.16.0-999-generic (apw@gomeisa) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #201408010205 SMP Fri Aug 1 06:06:01 UTC 2014
[0.00] Command line: BOOT_IMAGE=/vmlinuz-3.16.0-999-generic root=UUID=86c5a6cb-9130-4193-8536-76051f2a1e7e ro quiet splash
[0.00] KERNEL supported cpus:
[0.00]   Intel GenuineIntel
[0.00]   AMD AuthenticAMD
[0.00]   Centaur CentaurHauls
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009f7ff] usable
[0.00] BIOS-e820: [mem 0x0009f800-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000e4000-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0xcff7] usable
[0.00] BIOS-e820: [mem 0xcff8-0xcff8dfff] ACPI data
[0.00] BIOS-e820: [mem 0xcff8e000-0xcffc] ACPI NVS
[0.00] BIOS-e820: [mem 0xcffd-0xcfff] reserved
[0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved
[0.00] BIOS-e820: [mem 0xfff0-0x] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00022fff] usable
[0.00] NX (Execute Disable) protection: active
[0.00] SMBIOS 2.5 present.
[0.00] DMI: System manufacturer P5QL PRO/P5QL PRO, BIOS 080310/08/2008
[0.00] e820: update [mem 0x-0x0fff] usable == reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] AGP: No AGP bridge found
[0.00] e820: last_pfn = 0x23 max_arch_pfn = 0x4
[0.00] MTRR default type: uncachable
[0.00] MTRR fixed ranges enabled:
[0.00]   0-9 write-back
[0.00]   A-B uncachable
[0.00]   C-D write-protect
[0.00]   E-E write-through
[0.00]   F-F write-protect
[0.00] MTRR variable ranges enabled:
[0.00]   0 base 0 mask E write-back
[0.00]   1 base 2 mask FE000 write-back
[0.00]   2 base 22000 mask FF000 write-back
[0.00]   3 base 0D000 mask FF000 uncachable
[0.00]   4 base 0E000 mask FE000 uncachable
[0.00]   5 disabled
[0.00]   6 disabled
[0.00]   7 disabled
[0.00] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
[0.00] original variable MTRRs
[0.00] reg 0, base: 0GB, range: 8GB, type WB
[

Re: Btrfs offline deduplication

2014-08-01 Thread Austin S Hemmelgarn
On 07/31/2014 07:54 PM, Timofey Titovets wrote:
 Good time of day.
 I have several questions about data deduplication on btrfs.
 Sorry if i ask stupid questions or waste you time %)
 
 What about implementation of offline data deduplication? I don't see
 any activity on this place, may be i need to ask a particular person?
 Where the problem? May be a can i try to help (testing as example)?
 
 I could be wrong, but as i understand btrfs store crc32 checksum one
 per file, if this is true, may be make a sense to create small worker
 for dedup files? Like worker for autodefrag?
 With simple logic like:
 if sum1 == sum2  file_size1 == file_size2; then
 if (bit_to_bit_identical(file1,2)); then merge(file1, file2);
 This can be first attempt to implement per file offline dedup
 What you think about it? could i be wrong? or this is a horrible crutch?
 (as i understand it not change format of fs)
 
 (bedup and other tools, its cool, but have several problem with these
 tools and i think, what kernel implementation can work better).
 
I think there may be some misunderstandings here about some of the
internals of BTRFS.  First of all, checksums are stored per block, not
per file, and secondly, deduplication can be done on a much finer scale
than individual files (you can deduplicate individual extents).

I do think however that having the option of a background thread doing
deduplication asynchronously is a good idea, but then you would have to
have some way to trigger it on individual files/trees, and triggering on
writes like the autodefrag thread does doesn't make much sense.  Having
some userspace program to tell it to run on a given set of files would
probably be the best approach for a trigger.  I don't remember if this
kind of thing was also included in the online deduplication patches that
got posted a while back or not.



smime.p7s
Description: S/MIME Cryptographic Signature


Re: open_ctree failed on 3.16.0

2014-08-01 Thread Duncan
Frankie Fisher posted on Fri, 01 Aug 2014 10:58:39 +0100 as excerpted:

 A circuit breaker failed a few times and now I can't mount my btrfs
 volume - it fails with open_ctree failed:

[snip stacktrace, which as a btrfs user not dev doesn't give me much 
anyway]

 
 mount -o recovery doesn't succeed, nor does mount -o recovery,ro.
 
 I have tried the above with kernel 3.13.0 first and 3.16.0 later and the
 behaviour seems identical. This may or may not be relevant, but after I
 initialised the filesystem by copying some files to it (with kernel
 3.13.0), one of the files failed a checksum error. I hadn't yet compared
 the file that was written with the original to determine whether the
 error was with the checksum or otherwise.
 
 What are the next steps I should try? Should I try btrfs-zero-log? Or
 should I try btrfsck? Or something else?

The standard advice concerning btrfs check (aka btrfsck) is that running 
it without --repair or similar won't hurt as in that case it's read-only, 
but by the same token, it won't help, except possibly to give you an idea 
of what's wrong.  And don't run it with --repair except either on the 
direct advice of a someone here after seeing the read-only run output, or 
if you've otherwise given up and the next step would be a new mkfs as in 
that case you have nothing to lose, because check doesn't yet understand 
everything that can go wrong and in some cases may make the problem worse 
instead of better.

The first thing you may want to do is make an image using dd or the like, 
so you can restore to the current state if nothing works.  Of course 
that'll take quite some space...

Another read-only alternative is btrfs restore.  This is run on the 
/unmounted/ filesystem, allowing recovery of files from the filesystem 
without the possibility of damaging it further.  If you don't have a good 
backup, this is likely to be your best shot at, more or less, making one 
after-the-fact.  It may not recover everything and in particular, from my 
own experience I know it doesn't recover symlinks or file owner and 
permissions information, but in the absence of a proper current backup it 
does give you a reasonably good shot at recovering a good portion of the 
files.  Of course this will require at least enough space on other 
filesystems to write the recovered files.

If btrfs restore with the default options doesn't prove satisfactory, you 
can use the -l (list roots) -t (use tree location) and --dry-run options 
along with btrfs-find-root to hopefully find a better previous root, 
which can then be fet to the -t option to hopefully get a better 
recovery.  Additionally, when I used btrfs restore here, I had to use it 
with the -i option and run it repeatedly, feeding it the same path each 
time, as it kept giving up with a looping too much error on some of the 
larger directories, but would progress further each time as it could skip 
more files that were already there.

There's also btrfs-show-super to examine your superblocks and ensure 
they're not damaged, as well as to compare current generation/transid vs 
that of find-root and restore.  Show-super can also be used with btrfs 
rescue as noted below, to recover from a bad superblock, should it be 
necessary.

See the wiki page on restore for more details on it.

https://btrfs.wiki.kernel.org/index.php/Restore

You can then use btrfs-image to create a metadata image that you can give 
to the devs if they want to see what they can do with it.  That doesn't 
give them the data but will let them see filenames unless you use the -s 
option to sanitize them, which I'd recommend if you have sensitive 
filenames you don't want others to see.

With either a good backup or having restored as much as you can using 
restore, you can move on to potentially destructive attempts at further 
restoration.  Here's a slightly dated (nearing a year old, select-super 
and chunk-recover are part of the main btrfs command, under rescue, now) 
but still useful list of what to try and in what order, there.

http://permalink.gmane.org/gmane.comp.file-systems.btrfs/27999

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Machine lockup due to btrfs-transaction on AWS EC2 Ubuntu 14.04

2014-08-01 Thread Peter Waller
I've reproduced these issues on a single-core machine which doesn't
appear to become completely unresponsive after 12 hours of copying (as
the other machines are deadlocking after 5-10 minutes, perhaps?), but
it does use 100% SYS CPU with no IO traffic for the vast majority of
the time. (In fact, constantly?)

The filesystem was originally created on an Ubuntu Saucy 13.10 machine
(and therefore older kernel + older btrfs-tools) on a PV guest with a
non-SSD block device.

The newer current machine is a HVM guest and the data is now on an SSD.

Below is a link to the dstat output and dmesg which shows this (note
the sys column on the left and the IOPs column on the right). This was
taken whilst copying files of varying sizes from many directories to a
sibling in the same directory.

`dstat -tcdnymlr 5`, which shows high sys CPU but no IO traffic:
https://gist.github.com/pwaller/cb8d088ebceb2707d24b

INFO: task sync:5906 blocked for more than 120 seconds.
https://gist.github.com/pwaller/574a369ea4b65fe125b9#file-dmesg-log-L541

INFO: task btrfs-transacti:2531 blocked for more than 120 seconds.
https://gist.github.com/pwaller/574a369ea4b65fe125b9#file-dmesg-log-L1015

INFO: task btrfs-flush_del:16764 blocked for more than 120 seconds.
https://gist.github.com/pwaller/574a369ea4b65fe125b9#file-dmesg-log-L1041
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add support to check for FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_ZERO_RANGE crap modes

2014-08-01 Thread Theodore Ts'o
On Thu, Jul 31, 2014 at 08:09:10PM +0100, Hugo Mills wrote:
 On Thu, Jul 31, 2014 at 01:53:33PM -0400, Nicholas Krause wrote:
  This adds checks for the stated modes as if they are crap we will return 
  error
  not supported.
 
You've just enabled two options, but you haven't actually
 implemented the code behind it. I would tell you *NOT* to do anything
 else on this work until you can answer the question: What happens if
 you apply this patch, create a large file called foo.txt, and then a
 userspace program executes the following code?
 
 int fd = open(foo.txt, O_RDWR);
 fallocate(fd, FALLOCATE_FL_COLLAPSE_RANGE, 50, 50);
 
Try it on a btrfs filesystem, both with and without your patch.
 Also try it on an ext4 filesystem.
 
Once you've done all of that, reply to this mail and tell me what
 the problem is with this patch. You need to make two answers: what are
 the technical problems with the patch? What errors have you made in
 the development process?

There are also the conceptual failures.  Before you do anything else,
you need to be able to answer the question, what do you think the
flags FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_ZERO_RANGE are supposed
to do?  What are the possible appropriate things for btrfs to do if
it sees these flags?  (Hint: there is more than one correct answer,
and its current choice is one of them.  What is the other one?)

Nick, the fact that you call these modes crap is a hint that you
have a fundamental lack of understanding --- and before you waste more
of kernel developers' time, you need to get that understanding first,
for any bit of code that you propose to improve.

This is why I suggested that you work on userspace testing scripts
first.  It's pretty clear you are (a) incredibly sloppy, and (b)
lacking conceptual understanding of a lot of technical details, and
(c) even worse, aren't letting this lack of understanding stop you
from posting patches.  As a result you are adding negative value to
whatever project or subsystem you try to attach yourself to --- you're
not helping.

- Ted

P.S.   As a further hint, change the above code to read:

int fd = open(foo.txt, O_RDWR);
if (fallocate(fd, FALLOCATE_FL_COLLAPSE_RANGE, 4096, 8192)  0)
perror(fallocate);

And then run filefrag -vs foo.txt before and after running the above
code fragment and then try something like this:

 cp /usr/share/dict/words foo.txt
 filefrag -vs foo.txt
 ls -l foo.txt
 /tmp/fallocate-test-prog
 filefrag -vs foo.txt
 ls -l foo.txt
 diff /usr/share/dict/words foo.txt

Try doing this on an ext4 or xfs system and a btrfs file system.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: open_ctree failed on 3.16.0

2014-08-01 Thread Frankie Fisher

On 01/08/14 12:29, Duncan wrote:

Frankie Fisher posted on Fri, 01 Aug 2014 10:58:39 +0100 as excerpted:


A circuit breaker failed a few times and now I can't mount my btrfs
volume - it fails with open_ctree failed:




What are the next steps I should try? Should I try btrfs-zero-log? Or
should I try btrfsck? Or something else?


The standard advice concerning btrfs check (aka btrfsck) is that running
it without --repair or similar won't hurt as in that case it's read-only,
but by the same token, it won't help, except possibly to give you an idea
of what's wrong.


fwiw, this is the output of btrfsck:
http://cwillu.com:8080/86.136.116.221/2

 snip helpful advice

Thanks for the helpful info, I will work through it. It would be nice to 
be able to recover the filesystem, but if I can't recover the filesystem 
it wouldn't be the end of the world as I have a backup from a few days 
before the incident. Even if I can't recover the filesystem, if 
reporting this issue helps btrfs be more robust in the future that would 
be a good outcome. To that end I have made a copy of the filesystem with 
btrfs-image which I can supply if necessary.


Regards,
Frankie Fisher



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Btrfs offline deduplication

2014-08-01 Thread David Sterba
On Fri, Aug 01, 2014 at 06:17:44AM -0400, Austin S Hemmelgarn wrote:
 I do think however that having the option of a background thread doing
 deduplication asynchronously is a good idea, but then you would have to
 have some way to trigger it on individual files/trees, and triggering on
 writes like the autodefrag thread does doesn't make much sense.  Having
 some userspace program to tell it to run on a given set of files would
 probably be the best approach for a trigger.  I don't remember if this
 kind of thing was also included in the online deduplication patches that
 got posted a while back or not.

IIRC the proposed implementation only merged new writes with existing
data.

For the out-of-band (off-line) dedup there's bedup
(https://github.com/g2p/bedup) or Mark's duperemove tool
(https://github.com/markfasheh/duperemove) that work on a set of files.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add support to check for FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_ZERO_RANGE crap modes

2014-08-01 Thread Nick Krause
On Fri, Aug 1, 2014 at 8:21 AM, Theodore Ts'o ty...@mit.edu wrote:
 On Thu, Jul 31, 2014 at 08:09:10PM +0100, Hugo Mills wrote:
 On Thu, Jul 31, 2014 at 01:53:33PM -0400, Nicholas Krause wrote:
  This adds checks for the stated modes as if they are crap we will return 
  error
  not supported.

You've just enabled two options, but you haven't actually
 implemented the code behind it. I would tell you *NOT* to do anything
 else on this work until you can answer the question: What happens if
 you apply this patch, create a large file called foo.txt, and then a
 userspace program executes the following code?

 int fd = open(foo.txt, O_RDWR);
 fallocate(fd, FALLOCATE_FL_COLLAPSE_RANGE, 50, 50);

Try it on a btrfs filesystem, both with and without your patch.
 Also try it on an ext4 filesystem.

Once you've done all of that, reply to this mail and tell me what
 the problem is with this patch. You need to make two answers: what are
 the technical problems with the patch? What errors have you made in
 the development process?

 There are also the conceptual failures.  Before you do anything else,
 you need to be able to answer the question, what do you think the
 flags FALLOC_FL_COLLAPSE_RANGE and FALLOC_FL_ZERO_RANGE are supposed
 to do?  What are the possible appropriate things for btrfs to do if
 it sees these flags?  (Hint: there is more than one correct answer,
 and its current choice is one of them.  What is the other one?)

 Nick, the fact that you call these modes crap is a hint that you
 have a fundamental lack of understanding --- and before you waste more
 of kernel developers' time, you need to get that understanding first,
 for any bit of code that you propose to improve.

 This is why I suggested that you work on userspace testing scripts
 first.  It's pretty clear you are (a) incredibly sloppy, and (b)
 lacking conceptual understanding of a lot of technical details, and
 (c) even worse, aren't letting this lack of understanding stop you
 from posting patches.  As a result you are adding negative value to
 whatever project or subsystem you try to attach yourself to --- you're
 not helping.

 - Ted

 P.S.   As a further hint, change the above code to read:

 int fd = open(foo.txt, O_RDWR);
 if (fallocate(fd, FALLOCATE_FL_COLLAPSE_RANGE, 4096, 8192)  0)
 perror(fallocate);

 And then run filefrag -vs foo.txt before and after running the above
 code fragment and then try something like this:

  cp /usr/share/dict/words foo.txt
  filefrag -vs foo.txt
  ls -l foo.txt
  /tmp/fallocate-test-prog
  filefrag -vs foo.txt
  ls -l foo.txt
  diff /usr/share/dict/words foo.txt

 Try doing this on an ext4 or xfs system and a btrfs file system.

I miss send this patch, that's my there are issues.
Cheers Nick
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] btrfs-progs: mkfs: remove experimental tag

2014-08-01 Thread Kyle Gates



 From: dste...@suse.cz
 To: linux-btrfs@vger.kernel.org
 CC: dste...@suse.cz
 Subject: [PATCH] btrfs-progs: mkfs: remove experimental tag
 Date: Thu, 31 Jul 2014 14:21:34 +0200

 Make it consistent with kernel status and documentation.

 Signed-off-by: David Sterba dste...@suse.cz
 ---
 mkfs.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

 diff --git a/mkfs.c b/mkfs.c
 index 16e92221a547..538b6e6837b2 100644
 --- a/mkfs.c
 +++ b/mkfs.c
 @@ -1439,8 +1439,8 @@ int main(int ac, char **av)
 }

 /* if we are here that means all devs are good to btrfsify */
 - printf(\nWARNING! - %s IS EXPERIMENTAL\n, BTRFS_BUILD_VERSION);
 - printf(WARNING! - see http://btrfs.wiki.kernel.org before using\n\n);
 + printf(%s\n, BTRFS_BUILD_VERSION);
 + printf(See http://btrfs.wiki.kernel.org for more\n\n);

The sentence/thought isn't complete. I was left thinking more what?
perhaps add: information, documentation

Thanks.

 dev_cnt--;

 @@ -1597,7 +1597,6 @@ raid_groups:
 label, first_file, nodesize, leafsize, sectorsize,
 pretty_size(btrfs_super_total_bytes(root-fs_info-super_copy)));

 - printf(%s\n, BTRFS_BUILD_VERSION);
 btrfs_commit_transaction(trans, root);

 if (source_dir_set) {
 --
 1.9.0

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at http://vger.kernel.org/majordomo-info.html
  --
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: mkfs: remove experimental tag

2014-08-01 Thread David Sterba
On Fri, Aug 01, 2014 at 11:38:09AM -0500, Kyle Gates wrote:
 The sentence/thought isn't complete. I was left thinking more what?
 perhaps add: information, documentation

Changed to 'more information'.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with btrfs_zero_range function

2014-08-01 Thread Nick Krause
Please forget my other questions , seems the only work to make punch
hole work for zero range is to
make a function like the one I am pasting below for zero range and
change the calls to punch range to
zero range as the other parts of the function can be the same from my reading.
Regards Nick
static int find_first_non_hole(struct inode *inode, u64 *start, u64 *len)
{
struct extent_map *em;
int ret = 0;

em = btrfs_get_extent(inode, NULL, 0, *start, *len, 0);
if (IS_ERR_OR_NULL(em)) {
if (!em)
ret = -ENOMEM;
else
ret = PTR_ERR(em);
return ret;
}

/* Hole or vacuum extent(only exists in no-hole mode) */
if (em-block_start == EXTENT_MAP_HOLE) {
ret = 1;
*len = em-start + em-len  *start + *len ?
   0 : *start + *len - em-start - em-len;
*start = em-start + em-len;
}
free_extent_map(em);
return ret;
}
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help with btrfs_zero_range function

2014-08-01 Thread Nick Krause
On Fri, Aug 1, 2014 at 12:58 PM, Nick Krause xerofo...@gmail.com wrote:
 Please forget my other questions , seems the only work to make punch
 hole work for zero range is to
 make a function like the one I am pasting below for zero range and
 change the calls to punch range to
 zero range as the other parts of the function can be the same from my reading.
 Regards Nick
 static int find_first_non_hole(struct inode *inode, u64 *start, u64 *len)
 {
 struct extent_map *em;
 int ret = 0;

 em = btrfs_get_extent(inode, NULL, 0, *start, *len, 0);
 if (IS_ERR_OR_NULL(em)) {
 if (!em)
 ret = -ENOMEM;
 else
 ret = PTR_ERR(em);
 return ret;
 }

 /* Hole or vacuum extent(only exists in no-hole mode) */
 if (em-block_start == EXTENT_MAP_HOLE) {
 ret = 1;
 *len = em-start + em-len  *start + *len ?
0 : *start + *len - em-start - em-len;
 *start = em-start + em-len;
 }
 free_extent_map(em);
 return ret;
 }

Sorry just forget we need one to fill a zero range. I am also now
patching fill holes if someone can help me convert this it would be
great and
I will then change the calls and send out the function as I have
titled, btrfs_zero_range with calls to in it the fallocate method for
btrfs and also
I am need help with the one for finding the first hole just converting again.
Regards Nick
static int fill_holes(struct btrfs_trans_handle *trans, struct inode *inode,
  struct btrfs_path *path, u64 offset, u64 end)
{
struct btrfs_root *root = BTRFS_I(inode)-root;
struct extent_buffer *leaf;
struct btrfs_file_extent_item *fi;
struct extent_map *hole_em;
struct extent_map_tree *em_tree = BTRFS_I(inode)-extent_tree;
struct btrfs_key key;
int ret;

if (btrfs_fs_incompat(root-fs_info, NO_HOLES))
goto out;

key.objectid = btrfs_ino(inode);
key.type = BTRFS_EXTENT_DATA_KEY;
key.offset = offset;

ret = btrfs_search_slot(trans, root, key, path, 0, 1);
if (ret  0)
return ret;
BUG_ON(!ret);

leaf = path-nodes[0];
if (hole_mergeable(inode, leaf, path-slots[0]-1, offset, end)) {
u64 num_bytes;

path-slots[0]--;
fi = btrfs_item_ptr(leaf, path-slots[0],
struct btrfs_file_extent_item);
num_bytes = btrfs_file_extent_num_bytes(leaf, fi) +
end - offset;
btrfs_set_file_extent_num_bytes(leaf, fi, num_bytes);
btrfs_set_file_extent_ram_bytes(leaf, fi, num_bytes);
btrfs_set_file_extent_offset(leaf, fi, 0);
btrfs_mark_buffer_dirty(leaf);
goto out;
}

if (hole_mergeable(inode, leaf, path-slots[0]+1, offset, end)) {
u64 num_bytes;

path-slots[0]++;
key.offset = offset;
btrfs_set_item_key_safe(root, path, key);
fi = btrfs_item_ptr(leaf, path-slots[0],
struct btrfs_file_extent_item);
num_bytes = btrfs_file_extent_num_bytes(leaf, fi) + end -
offset;
btrfs_set_file_extent_num_bytes(leaf, fi, num_bytes);
btrfs_set_file_extent_ram_bytes(leaf, fi, num_bytes);
btrfs_set_file_extent_offset(leaf, fi, 0);
btrfs_mark_buffer_dirty(leaf);
goto out;
}
btrfs_release_path(path);

ret = btrfs_insert_file_extent(trans, root, btrfs_ino(inode), offset,
   0, 0, end - offset, 0, end - offset,
   0, 0, 0);
if (ret)
return ret;

out:
btrfs_release_path(path);

hole_em = alloc_extent_map();
if (!hole_em) {
btrfs_drop_extent_cache(inode, offset, end - 1, 0);
set_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
BTRFS_I(inode)-runtime_flags);
} else {
hole_em-start = offset;
hole_em-len = end - offset;
hole_em-ram_bytes = hole_em-len;
hole_em-orig_start = offset;

hole_em-block_start = EXTENT_MAP_HOLE;
hole_em-block_len = 0;
hole_em-orig_block_len = 0;
hole_em-bdev = root-fs_info-fs_devices-latest_bdev;
hole_em-compress_type = BTRFS_COMPRESS_NONE;
hole_em-generation = trans-transid;

do {
btrfs_drop_extent_cache(inode, offset, end - 1, 0);
write_lock(em_tree-lock);
ret = add_extent_mapping(em_tree, hole_em, 1);
write_unlock(em_tree-lock);
} while (ret == -EEXIST);
free_extent_map(hole_em);
if (ret)
set_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
BTRFS_I(inode)-runtime_flags);
}

return 0;
}
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs offline deduplication

2014-08-01 Thread Mark Fasheh
On Fri, Aug 01, 2014 at 10:16:08AM -0400, Austin S Hemmelgarn wrote:
 On 2014-08-01 09:23, David Sterba wrote:
  On Fri, Aug 01, 2014 at 06:17:44AM -0400, Austin S Hemmelgarn wrote:
  I do think however that having the option of a background thread doing
  deduplication asynchronously is a good idea, but then you would have to
  have some way to trigger it on individual files/trees, and triggering on
  writes like the autodefrag thread does doesn't make much sense.  Having
  some userspace program to tell it to run on a given set of files would
  probably be the best approach for a trigger.  I don't remember if this
  kind of thing was also included in the online deduplication patches that
  got posted a while back or not.
  
  IIRC the proposed implementation only merged new writes with existing
  data.
  
  For the out-of-band (off-line) dedup there's bedup
  (https://github.com/g2p/bedup) or Mark's duperemove tool
  (https://github.com/markfasheh/duperemove) that work on a set of files.
  
 Something kernel-side to do the work asynchronously would be nice,
 especially if it could leverage the check-sums that BTRFS already stores
 for the blocks.  Having a userspace interface for offline deduplication
 similar to that for scrub operations would even better.

Why does this have to be kernel side? There's userspace software already to
dedupe that can be run on a regular basis. Exporting checksums is a
differnet story (you can do that via ioctl) but running the dedupe software
itself inside the kernel is exactly what we want to avoid by having the
dedupe ioctl in the first place.
--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs offline deduplication

2014-08-01 Thread Austin S Hemmelgarn
On 08/01/2014 02:55 PM, Mark Fasheh wrote:
 On Fri, Aug 01, 2014 at 10:16:08AM -0400, Austin S Hemmelgarn wrote:
 On 2014-08-01 09:23, David Sterba wrote:
 On Fri, Aug 01, 2014 at 06:17:44AM -0400, Austin S Hemmelgarn wrote:
 I do think however that having the option of a background thread doing
 deduplication asynchronously is a good idea, but then you would have to
 have some way to trigger it on individual files/trees, and triggering on
 writes like the autodefrag thread does doesn't make much sense.  Having
 some userspace program to tell it to run on a given set of files would
 probably be the best approach for a trigger.  I don't remember if this
 kind of thing was also included in the online deduplication patches that
 got posted a while back or not.

 IIRC the proposed implementation only merged new writes with existing
 data.

 For the out-of-band (off-line) dedup there's bedup
 (https://github.com/g2p/bedup) or Mark's duperemove tool
 (https://github.com/markfasheh/duperemove) that work on a set of files.

 Something kernel-side to do the work asynchronously would be nice,
 especially if it could leverage the check-sums that BTRFS already stores
 for the blocks.  Having a userspace interface for offline deduplication
 similar to that for scrub operations would even better.
 
 Why does this have to be kernel side? There's userspace software already to
 dedupe that can be run on a regular basis. Exporting checksums is a
 differnet story (you can do that via ioctl) but running the dedupe software
 itself inside the kernel is exactly what we want to avoid by having the
 dedupe ioctl in the first place.
   --Mark
 
 --
 Mark Fasheh
 
Based on the same logic however, we don't need scrub to be done kernel
side, as it wouldn't take but one more ioctl to be able to tell it which
block out of a set to treat as valid.  I'm not saying that things need
to be done in the kernel, but duperemove doesn't use the ioctl interface
even if it exists, and bedup is buggy as hell (unless it's improved
greatly in the last two weeks), and neither of them is at all efficient.
 I do understand that this isn't something that is computationally
simple (especially on x86 with it's defficiency of registers), but rsync
does almost the same thing for data transmission over the network, and
it does so seemingly much more efficiently than either option available
at the moment.



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Btrfs offline deduplication

2014-08-01 Thread Mark Fasheh
On Fri, Aug 01, 2014 at 03:18:46PM -0400, Austin S Hemmelgarn wrote:
  Why does this have to be kernel side? There's userspace software already to
  dedupe that can be run on a regular basis. Exporting checksums is a
  differnet story (you can do that via ioctl) but running the dedupe software
  itself inside the kernel is exactly what we want to avoid by having the
  dedupe ioctl in the first place.
  --Mark
  
  --
  Mark Fasheh
  
 Based on the same logic however, we don't need scrub to be done kernel
 side, as it wouldn't take but one more ioctl to be able to tell it which
 block out of a set to treat as valid.  I'm not saying that things need
 to be done in the kernel, but duperemove doesn't use the ioctl interface
 even if it exists, and bedup is buggy as hell (unless it's improved
 greatly in the last two weeks), and neither of them is at all efficient.

Duperemove absolutely *does* use the ioctl interface for offline dedupe.


  I do understand that this isn't something that is computationally
 simple (especially on x86 with it's defficiency of registers), but rsync
 does almost the same thing for data transmission over the network, and
 it does so seemingly much more efficiently than either option available
 at the moment.

None of the problems you mentioned get solved by pushing the entirety of
offline deduplication into the kernel. If anything, it's more dangerous tod
o that as bugs tend to be far more critical when we hit them from kernel.

Regarding duperemove there's a series to fix up some performance issues that
I'm working on importing at the moment.
--Mark

--
Mark Fasheh
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/12] btrfs: factor btrfs_setup_super() out of open_ctree()

2014-08-01 Thread Eric Sandeen
Move all the superblock flag  geometry testing  fiddling
into its own function.

This does coalesce some far-flung tests, but it ... looks
ok to me.

Signed-off-by: Eric Sandeen sand...@redhat.com
---
 fs/btrfs/disk-io.c |  207 
 1 files changed, 112 insertions(+), 95 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 47fcacf..31e9791 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2209,6 +2209,104 @@ static void btrfs_qgroup_init(struct btrfs_fs_info 
*fs_info)
mutex_init(fs_info-qgroup_rescan_lock);
 }
 
+/*
+ * Test geometry and feature flags at mount time
+ */
+static int btrfs_setup_super(struct super_block *sb,
+struct btrfs_fs_info *fs_info)
+{
+   struct btrfs_super_block *disk_super = fs_info-super_copy;
+   u32 nodesize = btrfs_super_nodesize(disk_super);
+   u32 leafsize = btrfs_super_leafsize(disk_super);
+   u32 sectorsize = btrfs_super_sectorsize(disk_super);
+   u64 features;
+
+
+   /* First sanity check magic   sizes */
+   if (btrfs_super_magic(disk_super) != BTRFS_MAGIC) {
+   printk(KERN_INFO BTRFS: valid FS not found on %s\n, sb-s_id);
+   return -EINVAL;
+   }
+
+   if (leafsize != nodesize) {
+   printk(KERN_ERR BTRFS: couldn't mount because metadata 
+  blocksizes don't match.  node %d leaf %d\n,
+  nodesize, leafsize);
+   return -EINVAL;
+   }
+
+   if (leafsize  BTRFS_MAX_METADATA_BLOCKSIZE) {
+   printk(KERN_ERR BTRFS: couldn't mount because metadata 
+  blocksize (%d) was too large\n, leafsize);
+   return -EINVAL;
+   }
+   
+   if (sectorsize != PAGE_SIZE) {
+   printk(KERN_WARNING BTRFS: Incompatible sector size(%lu) 
+  found on %s\n, (unsigned long)sectorsize, sb-s_id);
+   return -EINVAL;
+   }
+
+   /* check FS state, whether FS is broken. */
+   if (btrfs_super_flags(disk_super)  BTRFS_SUPER_FLAG_ERROR)
+   set_bit(BTRFS_FS_STATE_ERROR, fs_info-fs_state);
+
+   features = btrfs_super_incompat_flags(disk_super);
+   if (features  ~BTRFS_FEATURE_INCOMPAT_SUPP) {
+   printk(KERN_ERR BTRFS: couldn't mount because of 
+  unsupported optional features (%Lx).\n,
+  features  ~BTRFS_FEATURE_INCOMPAT_SUPP);
+   return -EINVAL;
+   }
+
+   features |= BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF;
+   /* Original LZO commit didn't set incompat flag when mounted :( */
+   if (fs_info-compress_type == BTRFS_COMPRESS_LZO)
+   features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO;
+
+   if (features  BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA)
+   printk(KERN_ERR BTRFS: has skinny extents\n);
+
+   /*
+* flag our filesystem as having big metadata blocks if
+* they are bigger than the page size
+*/
+   if (leafsize  PAGE_CACHE_SIZE) {
+   if (!(features  BTRFS_FEATURE_INCOMPAT_BIG_METADATA))
+   printk(KERN_INFO BTRFS: flagging fs with big metadata 
feature\n);
+   features |= BTRFS_FEATURE_INCOMPAT_BIG_METADATA;
+   }
+
+   /*
+* mixed block groups end up with duplicate but slightly offset
+* extent buffers for the same range.  It leads to corruptions
+*/
+   if ((features  BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS) 
+   (sectorsize != leafsize)) {
+   printk(KERN_WARNING BTRFS: unequal leaf/node/sector sizes 
+   are not allowed for mixed block groups on 
%s\n,
+   sb-s_id);
+   return -EINVAL;
+   }
+
+   /*
+* Needn't use the lock because there is no other task which will
+* update the flag.
+*/
+   btrfs_set_super_incompat_flags(disk_super, features);
+
+   features = btrfs_super_compat_ro_flags(disk_super) 
+   ~BTRFS_FEATURE_COMPAT_RO_SUPP;
+   if (!(sb-s_flags  MS_RDONLY)  features) {
+   printk(KERN_ERR BTRFS: couldn't mount RDWR because of 
+  unsupported option features (%Lx).\n,
+  features);
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 int open_ctree(struct super_block *sb,
   struct btrfs_fs_devices *fs_devices,
   char *options)
@@ -2219,7 +2317,6 @@ int open_ctree(struct super_block *sb,
u32 blocksize;
u32 stripesize;
u64 generation;
-   u64 features;
struct btrfs_key location;
struct buffer_head *bh;
struct btrfs_super_block *disk_super;
@@ -2458,10 +2555,6 @@ int open_ctree(struct super_block *sb,
if (!btrfs_super_root(disk_super))
goto fail_alloc;
 
-   /* check FS state, 

[PATCH 10/12] btrfs: factor btrfs_alloc_workqueues() out of open_ctree()

2014-08-01 Thread Eric Sandeen
Signed-off-by: Eric Sandeen sand...@redhat.com
---
 fs/btrfs/disk-io.c |  144 
 1 files changed, 77 insertions(+), 67 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 31e9791..0465d43 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2307,6 +2307,80 @@ static int btrfs_setup_super(struct super_block *sb,
return 0;
 }
 
+static int btrfs_alloc_workqueues(struct btrfs_fs_info *fs_info,
+ struct btrfs_fs_devices *fs_devices)
+{
+   int max_active = fs_info-thread_pool_size;
+   int flags = WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_UNBOUND;
+
+   fs_info-workers =
+   btrfs_alloc_workqueue(worker, flags | WQ_HIGHPRI,
+ max_active, 16);
+   fs_info-delalloc_workers =
+   btrfs_alloc_workqueue(delalloc, flags, max_active, 2);
+
+   fs_info-flush_workers =
+   btrfs_alloc_workqueue(flush_delalloc, flags, max_active, 0);
+
+   fs_info-caching_workers =
+   btrfs_alloc_workqueue(cache, flags, max_active, 0);
+
+   /*
+* a higher idle thresh on the submit workers makes it much more
+* likely that bios will be send down in a sane order to the
+* devices
+*/
+   fs_info-submit_workers =
+   btrfs_alloc_workqueue(submit, flags,
+ min_t(u64, fs_devices-num_devices,
+   max_active), 64);
+   fs_info-fixup_workers =
+   btrfs_alloc_workqueue(fixup, flags, 1, 0);
+   /*
+* endios are largely parallel and should have a very
+* low idle thresh
+*/
+   fs_info-endio_workers =
+   btrfs_alloc_workqueue(endio, flags, max_active, 4);
+   fs_info-endio_meta_workers =
+   btrfs_alloc_workqueue(endio-meta, flags, max_active, 4);
+   fs_info-endio_meta_write_workers =
+   btrfs_alloc_workqueue(endio-meta-write, flags, max_active, 2);
+   fs_info-endio_raid56_workers =
+   btrfs_alloc_workqueue(endio-raid56, flags, max_active, 4);
+   fs_info-rmw_workers =
+   btrfs_alloc_workqueue(rmw, flags, max_active, 2);
+   fs_info-endio_write_workers =
+   btrfs_alloc_workqueue(endio-write, flags, max_active, 2);
+   fs_info-endio_freespace_worker =
+   btrfs_alloc_workqueue(freespace-write, flags, max_active, 0);
+   fs_info-delayed_workers =
+   btrfs_alloc_workqueue(delayed-meta, flags, max_active, 0);
+   fs_info-readahead_workers =
+   btrfs_alloc_workqueue(readahead, flags, max_active, 2);
+   fs_info-qgroup_rescan_workers =
+   btrfs_alloc_workqueue(qgroup-rescan, flags, 1, 0);
+   fs_info-extent_workers =
+   btrfs_alloc_workqueue(extent-refs, flags,
+ min_t(u64, fs_devices-num_devices,
+   max_active), 8);
+
+   if (!(fs_info-workers  fs_info-delalloc_workers 
+ fs_info-submit_workers  fs_info-flush_workers 
+ fs_info-endio_workers  fs_info-endio_meta_workers 
+ fs_info-endio_meta_write_workers 
+ fs_info-endio_write_workers  fs_info-endio_raid56_workers 
+ fs_info-endio_freespace_worker  fs_info-rmw_workers 
+ fs_info-caching_workers  fs_info-readahead_workers 
+ fs_info-fixup_workers  fs_info-delayed_workers 
+ fs_info-fixup_workers  fs_info-extent_workers 
+ fs_info-qgroup_rescan_workers)) {
+   return -ENOMEM;
+   }
+
+   return 0;
+}
+
 int open_ctree(struct super_block *sb,
   struct btrfs_fs_devices *fs_devices,
   char *options)
@@ -2334,7 +2408,6 @@ int open_ctree(struct super_block *sb,
int num_backups_tried = 0;
int backup_index = 0;
int max_active;
-   int flags = WQ_MEM_RECLAIM | WQ_FREEZABLE | WQ_UNBOUND;
bool create_uuid_tree;
bool check_uuid_tree;
 
@@ -2582,72 +2655,9 @@ int open_ctree(struct super_block *sb,
 
max_active = fs_info-thread_pool_size;
 
-   fs_info-workers =
-   btrfs_alloc_workqueue(worker, flags | WQ_HIGHPRI,
- max_active, 16);
-
-   fs_info-delalloc_workers =
-   btrfs_alloc_workqueue(delalloc, flags, max_active, 2);
-
-   fs_info-flush_workers =
-   btrfs_alloc_workqueue(flush_delalloc, flags, max_active, 0);
-
-   fs_info-caching_workers =
-   btrfs_alloc_workqueue(cache, flags, max_active, 0);
-
-   /*
-* a higher idle thresh on the submit workers makes it much more
-* likely that bios will be send down in a sane order to the
-* devices
-*/
-   fs_info-submit_workers =
-   btrfs_alloc_workqueue(submit, 

[PATCH 12/12] btrfs: factor btrfs_replay_log() out of open_ctree()

2014-08-01 Thread Eric Sandeen
Signed-off-by: Eric Sandeen sand...@redhat.com
---
 fs/btrfs/disk-io.c |  100 +---
 1 files changed, 56 insertions(+), 44 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a50beca..ffb2f21 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2445,6 +2445,60 @@ static int btrfs_read_roots(struct btrfs_fs_info 
*fs_info,
return 0;
 }
 
+static int btrfs_replay_log(struct btrfs_fs_info *fs_info,
+   struct btrfs_fs_devices *fs_devices)
+{
+   int ret;
+   u32 blocksize;
+   struct btrfs_root *tree_root = fs_info-tree_root;
+   struct btrfs_root *log_tree_root;
+   struct btrfs_super_block *disk_super = fs_info-super_copy;
+   u64 bytenr = btrfs_super_log_root(disk_super);
+
+   if (fs_devices-rw_devices == 0) {
+   printk(KERN_WARNING BTRFS: log replay required 
+  on RO media\n);
+   return -EIO;
+   }
+   blocksize = btrfs_level_size(tree_root,
+ btrfs_super_log_root_level(disk_super));
+
+   log_tree_root = btrfs_alloc_root(fs_info);
+   if (!log_tree_root)
+   return -ENOMEM;
+
+   __setup_root(tree_root-nodesize, tree_root-leafsize,
+tree_root-sectorsize, tree_root-stripesize,
+log_tree_root, fs_info, BTRFS_TREE_LOG_OBJECTID);
+
+   log_tree_root-node = read_tree_block(tree_root, bytenr,
+ blocksize,
+ fs_info-generation + 1);
+   if (!log_tree_root-node ||
+   !extent_buffer_uptodate(log_tree_root-node)) {
+   printk(KERN_ERR BTRFS: failed to read log tree\n);
+   free_extent_buffer(log_tree_root-node);
+   kfree(log_tree_root);
+   return -EIO;
+   }
+   /* returns with log_tree_root freed on success */
+   ret = btrfs_recover_log_trees(log_tree_root);
+   if (ret) {
+   btrfs_error(tree_root-fs_info, ret,
+   Failed to recover log tree);
+   free_extent_buffer(log_tree_root-node);
+   kfree(log_tree_root);
+   return ret;
+   }
+
+   if (fs_info-sb-s_flags  MS_RDONLY) {
+   ret = btrfs_commit_super(tree_root);
+   if (ret)
+   return ret;
+   }
+   return 0;
+}
+
 int open_ctree(struct super_block *sb,
   struct btrfs_fs_devices *fs_devices,
   char *options)
@@ -2461,7 +2515,6 @@ int open_ctree(struct super_block *sb,
struct btrfs_fs_info *fs_info = btrfs_sb(sb);
struct btrfs_root *tree_root;
struct btrfs_root *chunk_root;
-   struct btrfs_root *log_tree_root;
int ret;
int err = -EINVAL;
int num_backups_tried = 0;
@@ -2905,52 +2958,11 @@ retry_root_backup:
 
/* do not make disk changes in broken FS */
if (btrfs_super_log_root(disk_super) != 0) {
-   u64 bytenr = btrfs_super_log_root(disk_super);
-
-   if (fs_devices-rw_devices == 0) {
-   printk(KERN_WARNING BTRFS: log replay required 
-  on RO media\n);
-   err = -EIO;
-   goto fail_qgroup;
-   }
-   blocksize =
-btrfs_level_size(tree_root,
- btrfs_super_log_root_level(disk_super));
-
-   log_tree_root = btrfs_alloc_root(fs_info);
-   if (!log_tree_root) {
-   err = -ENOMEM;
-   goto fail_qgroup;
-   }
-
-   __setup_root(nodesize, leafsize, sectorsize, stripesize,
-log_tree_root, fs_info, BTRFS_TREE_LOG_OBJECTID);
-
-   log_tree_root-node = read_tree_block(tree_root, bytenr,
- blocksize,
- generation + 1);
-   if (!log_tree_root-node ||
-   !extent_buffer_uptodate(log_tree_root-node)) {
-   printk(KERN_ERR BTRFS: failed to read log tree\n);
-   free_extent_buffer(log_tree_root-node);
-   kfree(log_tree_root);
-   goto fail_qgroup;
-   }
-   /* returns with log_tree_root freed on success */
-   ret = btrfs_recover_log_trees(log_tree_root);
+   ret = btrfs_replay_log(fs_info, fs_devices);
if (ret) {
-   btrfs_error(tree_root-fs_info, ret,
-   Failed to recover log tree);
-   free_extent_buffer(log_tree_root-node);
-   kfree(log_tree_root);
+   err = ret;
goto 

[PATCH 04/12] btrfs: factor btrfs_scrub_init() out of open_ctree()

2014-08-01 Thread Eric Sandeen
Signed-off-by: Eric Sandeen sand...@redhat.com
---
 fs/btrfs/disk-io.c |   19 ---
 1 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 28d35a8..b95635f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2134,6 +2134,17 @@ void btrfs_free_fs_roots(struct btrfs_fs_info *fs_info)
}
 }
 
+static void btrfs_scrub_init(struct btrfs_fs_info *fs_info)
+{
+   mutex_init(fs_info-scrub_lock);
+   atomic_set(fs_info-scrubs_running, 0);
+   atomic_set(fs_info-scrub_pause_req, 0);
+   atomic_set(fs_info-scrubs_paused, 0);
+   atomic_set(fs_info-scrub_cancel_req, 0);
+   init_waitqueue_head(fs_info-scrub_pause_wait);
+   fs_info-scrub_workers_refcnt = 0;
+}
+
 int open_ctree(struct super_block *sb,
   struct btrfs_fs_devices *fs_devices,
   char *options)
@@ -2281,14 +2292,8 @@ int open_ctree(struct super_block *sb,
}
btrfs_init_delayed_root(fs_info-delayed_root);
 
-   mutex_init(fs_info-scrub_lock);
-   atomic_set(fs_info-scrubs_running, 0);
-   atomic_set(fs_info-scrub_pause_req, 0);
-   atomic_set(fs_info-scrubs_paused, 0);
-   atomic_set(fs_info-scrub_cancel_req, 0);
+   btrfs_scrub_init(fs_info);
init_waitqueue_head(fs_info-replace_wait);
-   init_waitqueue_head(fs_info-scrub_pause_wait);
-   fs_info-scrub_workers_refcnt = 0;
 #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
fs_info-check_integrity_print_mask = 0;
 #endif
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/12] btrfs: consistently use fs_info in close_ctree()

2014-08-01 Thread Eric Sandeen
close_ctree() has a local fs_info var for convienience;
use it consistently.

Signed-off-by: Eric Sandeen sand...@redhat.com
---
 fs/btrfs/disk-io.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f6d7afd..e6746be 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3651,7 +3651,7 @@ int close_ctree(struct btrfs_root *root)
if (!(fs_info-sb-s_flags  MS_RDONLY)) {
ret = btrfs_commit_super(root);
if (ret)
-   btrfs_err(root-fs_info, commit super ret %d, ret);
+   btrfs_err(fs_info, commit super ret %d, ret);
}
 
if (test_bit(BTRFS_FS_STATE_ERROR, fs_info-fs_state))
@@ -3663,10 +3663,10 @@ int close_ctree(struct btrfs_root *root)
fs_info-closing = 2;
smp_mb();
 
-   btrfs_free_qgroup_config(root-fs_info);
+   btrfs_free_qgroup_config(fs_info);
 
if (percpu_counter_sum(fs_info-delalloc_bytes)) {
-   btrfs_info(root-fs_info, at unmount delalloc count %lld,
+   btrfs_info(fs_info, at unmount delalloc count %lld,
   percpu_counter_sum(fs_info-delalloc_bytes));
}
 
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/12] btrfs: handle errors from reading the quota tree root

2014-08-01 Thread Eric Sandeen
Reading the quota tree root may fail with ENOENT
if there is no quota, which is fine, but the code was
ignoring every other error as well, which is not fine.

Signed-off-by: Eric Sandeen sand...@redhat.com
---
 fs/btrfs/disk-io.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index e6746be..28d35a8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2733,7 +2733,12 @@ retry_root_backup:
 
location.objectid = BTRFS_QUOTA_TREE_OBJECTID;
quota_root = btrfs_read_tree_root(tree_root, location);
-   if (!IS_ERR(quota_root)) {
+   if (IS_ERR(quota_root)) {
+   ret = PTR_ERR(quota_root);
+   /* It's fine to not have quotas */
+   if (ret != -ENOENT)
+   goto recovery_tree_root;
+   } else {
set_bit(BTRFS_ROOT_TRACK_DIRTY, quota_root-state);
fs_info-quota_enabled = 1;
fs_info-pending_quota_state = 1;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/12] btrfs: remove unused fs_info arg from btrfs_close_extra_devices()

2014-08-01 Thread Eric Sandeen
The commit:
8dabb74 Btrfs: change core code of btrfs to support the
device replace operations
added the fs_info argument, but never used it -
just remove it again.

Signed-off-by: Eric Sandeen sand...@redhat.com
---
 fs/btrfs/disk-io.c |4 ++--
 fs/btrfs/volumes.c |3 +--
 fs/btrfs/volumes.h |3 +--
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 08e65e9..f6d7afd 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2672,7 +2672,7 @@ int open_ctree(struct super_block *sb,
 * keep the device that is marked to be the target device for the
 * dev_replace procedure
 */
-   btrfs_close_extra_devices(fs_info, fs_devices, 0);
+   btrfs_close_extra_devices(fs_devices, 0);
 
if (!fs_devices-latest_bdev) {
printk(KERN_CRIT BTRFS: failed to read devices on %s\n,
@@ -2778,7 +2778,7 @@ retry_root_backup:
goto fail_block_groups;
}
 
-   btrfs_close_extra_devices(fs_info, fs_devices, 1);
+   btrfs_close_extra_devices(fs_devices, 1);
 
ret = btrfs_sysfs_add_one(fs_info);
if (ret) {
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 6cb82f6..b5aa0c9 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -574,8 +574,7 @@ error:
return ERR_PTR(-ENOMEM);
 }
 
-void btrfs_close_extra_devices(struct btrfs_fs_info *fs_info,
-  struct btrfs_fs_devices *fs_devices, int step)
+void btrfs_close_extra_devices(struct btrfs_fs_devices *fs_devices, int step)
 {
struct btrfs_device *device, *next;
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 2aaa00c..2026741 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -307,8 +307,7 @@ int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
 int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
  struct btrfs_fs_devices **fs_devices_ret);
 int btrfs_close_devices(struct btrfs_fs_devices *fs_devices);
-void btrfs_close_extra_devices(struct btrfs_fs_info *fs_info,
-  struct btrfs_fs_devices *fs_devices, int step);
+void btrfs_close_extra_devices(struct btrfs_fs_devices *fs_devices, int step);
 int btrfs_find_device_missing_or_by_path(struct btrfs_root *root,
 char *device_path,
 struct btrfs_device **device);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/12] btrfs: factor btrfs_balance_init() out of open_ctree()

2014-08-01 Thread Eric Sandeen
Signed-off-by: Eric Sandeen sand...@redhat.com
---
 fs/btrfs/disk-io.c |   20 
 1 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b95635f..8c7113b 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2145,6 +2145,17 @@ static void btrfs_scrub_init(struct btrfs_fs_info 
*fs_info)
fs_info-scrub_workers_refcnt = 0;
 }
 
+static void btrfs_balance_init(struct btrfs_fs_info *fs_info)
+{
+   spin_lock_init(fs_info-balance_lock);
+   mutex_init(fs_info-balance_mutex);
+   atomic_set(fs_info-balance_running, 0);
+   atomic_set(fs_info-balance_pause_req, 0);
+   atomic_set(fs_info-balance_cancel_req, 0);
+   fs_info-balance_ctl = NULL;
+   init_waitqueue_head(fs_info-balance_wait_q);
+}
+
 int open_ctree(struct super_block *sb,
   struct btrfs_fs_devices *fs_devices,
   char *options)
@@ -2297,14 +2308,7 @@ int open_ctree(struct super_block *sb,
 #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
fs_info-check_integrity_print_mask = 0;
 #endif
-
-   spin_lock_init(fs_info-balance_lock);
-   mutex_init(fs_info-balance_mutex);
-   atomic_set(fs_info-balance_running, 0);
-   atomic_set(fs_info-balance_pause_req, 0);
-   atomic_set(fs_info-balance_cancel_req, 0);
-   fs_info-balance_ctl = NULL;
-   init_waitqueue_head(fs_info-balance_wait_q);
+   btrfs_balance_init(fs_info);
btrfs_init_async_reclaim_work(fs_info-async_reclaim_work);
 
sb-s_blocksize = 4096;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/12] btrfs: factor btrfs_qgroup_init() out of open_ctree()

2014-08-01 Thread Eric Sandeen
Signed-off-by: Eric Sandeen sand...@redhat.com
---
 fs/btrfs/disk-io.c |   26 +++---
 1 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a5fa84f..47fcacf 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2195,6 +2195,20 @@ static void btrfs_dev_replace_locks_init(struct 
btrfs_fs_info *fs_info)
mutex_init(fs_info-dev_replace.lock);
 }
 
+static void btrfs_qgroup_init(struct btrfs_fs_info *fs_info)
+{
+   spin_lock_init(fs_info-qgroup_lock);
+   mutex_init(fs_info-qgroup_ioctl_lock);
+   fs_info-qgroup_tree = RB_ROOT;
+   fs_info-qgroup_op_tree = RB_ROOT;
+   INIT_LIST_HEAD(fs_info-dirty_qgroups);
+   fs_info-qgroup_seq = 1;
+   fs_info-quota_enabled = 0;
+   fs_info-pending_quota_state = 0;
+   fs_info-qgroup_ulist = NULL;
+   mutex_init(fs_info-qgroup_rescan_lock);
+}
+
 int open_ctree(struct super_block *sb,
   struct btrfs_fs_devices *fs_devices,
   char *options)
@@ -2381,17 +2395,7 @@ int open_ctree(struct super_block *sb,
sema_init(fs_info-uuid_tree_rescan_sem, 1);
 
btrfs_dev_replace_locks_init(fs_info);
-
-   spin_lock_init(fs_info-qgroup_lock);
-   mutex_init(fs_info-qgroup_ioctl_lock);
-   fs_info-qgroup_tree = RB_ROOT;
-   fs_info-qgroup_op_tree = RB_ROOT;
-   INIT_LIST_HEAD(fs_info-dirty_qgroups);
-   fs_info-qgroup_seq = 1;
-   fs_info-quota_enabled = 0;
-   fs_info-pending_quota_state = 0;
-   fs_info-qgroup_ulist = NULL;
-   mutex_init(fs_info-qgroup_rescan_lock);
+   btrfs_qgroup_init(fs_info);
 
btrfs_init_free_cluster(fs_info-meta_alloc_cluster);
btrfs_init_free_cluster(fs_info-data_alloc_cluster);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/12] btrfs: factor btrfs_dev_replace_locks_init() out of open_ctree()

2014-08-01 Thread Eric Sandeen
Signed-off-by: Eric Sandeen sand...@redhat.com
---
 fs/btrfs/disk-io.c |   16 +++-
 1 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 6636386..a5fa84f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2186,6 +2186,15 @@ static void btrfs_btree_inode_init(struct btrfs_fs_info 
*fs_info,
btrfs_insert_inode_hash(fs_info-btree_inode);
 }
 
+static void btrfs_dev_replace_locks_init(struct btrfs_fs_info *fs_info)
+{
+   fs_info-dev_replace.lock_owner = 0;
+   atomic_set(fs_info-dev_replace.nesting_level, 0);
+   mutex_init(fs_info-dev_replace.lock_finishing_cancel_unmount);
+   mutex_init(fs_info-dev_replace.lock_management_lock);
+   mutex_init(fs_info-dev_replace.lock);
+}
+
 int open_ctree(struct super_block *sb,
   struct btrfs_fs_devices *fs_devices,
   char *options)
@@ -2370,11 +2379,8 @@ int open_ctree(struct super_block *sb,
init_rwsem(fs_info-cleanup_work_sem);
init_rwsem(fs_info-subvol_sem);
sema_init(fs_info-uuid_tree_rescan_sem, 1);
-   fs_info-dev_replace.lock_owner = 0;
-   atomic_set(fs_info-dev_replace.nesting_level, 0);
-   mutex_init(fs_info-dev_replace.lock_finishing_cancel_unmount);
-   mutex_init(fs_info-dev_replace.lock_management_lock);
-   mutex_init(fs_info-dev_replace.lock);
+
+   btrfs_dev_replace_locks_init(fs_info);
 
spin_lock_init(fs_info-qgroup_lock);
mutex_init(fs_info-qgroup_ioctl_lock);
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/12] disk-io.c / open_ctree cleanup refactoring

2014-08-01 Thread Eric Sandeen
This is mostly to refactor open_ctree(); at the end of the series
it's only around 600 lines instead of 900.

The first 2 patches are just little cleanups I saw while doing this;
the 3rd actually is something of a bugfix.

The rest are refactoring - this is a bit of an RFC still; some
seem like clear groups of code to move out of the way, others
are a bit more gratuitous.  Perhaps after these 300 lines are
moved out of the way, folks who are familiar with the code
can spot other reasonable groupings or functionality which could
also be factored out.

There are still large swaths of random initializations; I though
about btrfs_initialize_locks_and_stuff() but decided against it.
:)

Anyway, it builds  passes default xfstests -g auto runs, so it
can't be all bad.  Let me know what you think.  Different function
names might be better, better symmetry with close_ctree() might be good,
but it's a start.

Thanks,
-Eric

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/12] btrfs: factor btrfs_btree_inode_init() out of open_ctree()

2014-08-01 Thread Eric Sandeen
Signed-off-by: Eric Sandeen sand...@redhat.com
---
 fs/btrfs/disk-io.c |   56 ---
 1 files changed, 31 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8c7113b..6636386 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2156,6 +2156,36 @@ static void btrfs_balance_init(struct btrfs_fs_info 
*fs_info)
init_waitqueue_head(fs_info-balance_wait_q);
 }
 
+static void btrfs_btree_inode_init(struct btrfs_fs_info *fs_info,
+  struct btrfs_root *tree_root)
+{
+   fs_info-btree_inode-i_ino = BTRFS_BTREE_INODE_OBJECTID;
+   set_nlink(fs_info-btree_inode, 1);
+   /*
+* we set the i_size on the btree inode to the max possible int.
+* the real end of the address space is determined by all of
+* the devices in the system
+*/
+   fs_info-btree_inode-i_size = OFFSET_MAX;
+   fs_info-btree_inode-i_mapping-a_ops = btree_aops;
+   fs_info-btree_inode-i_mapping-backing_dev_info = fs_info-bdi;
+
+   RB_CLEAR_NODE(BTRFS_I(fs_info-btree_inode)-rb_node);
+   extent_io_tree_init(BTRFS_I(fs_info-btree_inode)-io_tree,
+fs_info-btree_inode-i_mapping);
+   BTRFS_I(fs_info-btree_inode)-io_tree.track_uptodate = 0;
+   extent_map_tree_init(BTRFS_I(fs_info-btree_inode)-extent_tree);
+
+   BTRFS_I(fs_info-btree_inode)-io_tree.ops = btree_extent_io_ops;
+
+   BTRFS_I(fs_info-btree_inode)-root = tree_root;
+   memset(BTRFS_I(fs_info-btree_inode)-location, 0,
+  sizeof(struct btrfs_key));
+   set_bit(BTRFS_INODE_DUMMY,
+   BTRFS_I(fs_info-btree_inode)-runtime_flags);
+   btrfs_insert_inode_hash(fs_info-btree_inode);
+}
+
 int open_ctree(struct super_block *sb,
   struct btrfs_fs_devices *fs_devices,
   char *options)
@@ -2315,31 +2345,7 @@ int open_ctree(struct super_block *sb,
sb-s_blocksize_bits = blksize_bits(4096);
sb-s_bdi = fs_info-bdi;
 
-   fs_info-btree_inode-i_ino = BTRFS_BTREE_INODE_OBJECTID;
-   set_nlink(fs_info-btree_inode, 1);
-   /*
-* we set the i_size on the btree inode to the max possible int.
-* the real end of the address space is determined by all of
-* the devices in the system
-*/
-   fs_info-btree_inode-i_size = OFFSET_MAX;
-   fs_info-btree_inode-i_mapping-a_ops = btree_aops;
-   fs_info-btree_inode-i_mapping-backing_dev_info = fs_info-bdi;
-
-   RB_CLEAR_NODE(BTRFS_I(fs_info-btree_inode)-rb_node);
-   extent_io_tree_init(BTRFS_I(fs_info-btree_inode)-io_tree,
-fs_info-btree_inode-i_mapping);
-   BTRFS_I(fs_info-btree_inode)-io_tree.track_uptodate = 0;
-   extent_map_tree_init(BTRFS_I(fs_info-btree_inode)-extent_tree);
-
-   BTRFS_I(fs_info-btree_inode)-io_tree.ops = btree_extent_io_ops;
-
-   BTRFS_I(fs_info-btree_inode)-root = tree_root;
-   memset(BTRFS_I(fs_info-btree_inode)-location, 0,
-  sizeof(struct btrfs_key));
-   set_bit(BTRFS_INODE_DUMMY,
-   BTRFS_I(fs_info-btree_inode)-runtime_flags);
-   btrfs_insert_inode_hash(fs_info-btree_inode);
+   btrfs_btree_inode_init(fs_info, tree_root);
 
spin_lock_init(fs_info-block_group_cache_lock);
fs_info-block_group_cache_tree = RB_ROOT;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/12] btrfs: factor btrfs_read_roots() out of open_ctree()

2014-08-01 Thread Eric Sandeen
Also, remove the two local variables create_uuid_tree
and check_uuid_tree; we can use the existence of
the uuid root and/or the RESCAN_UUID_TREE flag to
determine what action to take.

Signed-off-by: Eric Sandeen sand...@redhat.com
---
 fs/btrfs/disk-io.c |  141 ++--
 1 files changed, 71 insertions(+), 70 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 0465d43..a50beca 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2381,6 +2381,70 @@ static int btrfs_alloc_workqueues(struct btrfs_fs_info 
*fs_info,
return 0;
 }
 
+static int btrfs_read_roots(struct btrfs_fs_info *fs_info,
+   struct btrfs_root *tree_root)
+{
+   struct btrfs_root *extent_root;
+   struct btrfs_root *dev_root;
+   struct btrfs_root *csum_root;
+   struct btrfs_root *quota_root;
+   struct btrfs_root *uuid_root;
+   struct btrfs_key location;
+   int ret;
+
+   location.objectid = BTRFS_EXTENT_TREE_OBJECTID;
+   location.type = BTRFS_ROOT_ITEM_KEY;
+   location.offset = 0;
+
+   extent_root = btrfs_read_tree_root(tree_root, location);
+   if (IS_ERR(extent_root))
+   return PTR_ERR(extent_root);
+   set_bit(BTRFS_ROOT_TRACK_DIRTY, extent_root-state);
+   fs_info-extent_root = extent_root;
+
+   location.objectid = BTRFS_DEV_TREE_OBJECTID;
+   dev_root = btrfs_read_tree_root(tree_root, location);
+   if (IS_ERR(dev_root))
+   return PTR_ERR(dev_root);
+   set_bit(BTRFS_ROOT_TRACK_DIRTY, dev_root-state);
+   fs_info-dev_root = dev_root;
+   btrfs_init_devices_late(fs_info);
+
+   location.objectid = BTRFS_CSUM_TREE_OBJECTID;
+   csum_root = btrfs_read_tree_root(tree_root, location);
+   if (IS_ERR(csum_root))
+   return PTR_ERR(csum_root);
+   set_bit(BTRFS_ROOT_TRACK_DIRTY, csum_root-state);
+   fs_info-csum_root = csum_root;
+
+   location.objectid = BTRFS_QUOTA_TREE_OBJECTID;
+   quota_root = btrfs_read_tree_root(tree_root, location);
+   if (IS_ERR(quota_root)) {
+   ret = PTR_ERR(quota_root);
+   /* It's fine to not have quotas */
+   if (ret != -ENOENT)
+   return ret;
+   } else {
+   set_bit(BTRFS_ROOT_TRACK_DIRTY, quota_root-state);
+   fs_info-quota_enabled = 1;
+   fs_info-pending_quota_state = 1;
+   fs_info-quota_root = quota_root;
+   }
+
+   location.objectid = BTRFS_UUID_TREE_OBJECTID;
+   uuid_root = btrfs_read_tree_root(tree_root, location);
+   if (IS_ERR(uuid_root)) {
+   ret = PTR_ERR(uuid_root);
+   if (ret != -ENOENT)
+   return ret;
+   } else {
+   set_bit(BTRFS_ROOT_TRACK_DIRTY, uuid_root-state);
+   fs_info-uuid_root = uuid_root;
+   }
+
+   return 0;
+}
+
 int open_ctree(struct super_block *sb,
   struct btrfs_fs_devices *fs_devices,
   char *options)
@@ -2396,20 +2460,13 @@ int open_ctree(struct super_block *sb,
struct btrfs_super_block *disk_super;
struct btrfs_fs_info *fs_info = btrfs_sb(sb);
struct btrfs_root *tree_root;
-   struct btrfs_root *extent_root;
-   struct btrfs_root *csum_root;
struct btrfs_root *chunk_root;
-   struct btrfs_root *dev_root;
-   struct btrfs_root *quota_root;
-   struct btrfs_root *uuid_root;
struct btrfs_root *log_tree_root;
int ret;
int err = -EINVAL;
int num_backups_tried = 0;
int backup_index = 0;
int max_active;
-   bool create_uuid_tree;
-   bool check_uuid_tree;
 
tree_root = fs_info-tree_root = btrfs_alloc_root(fs_info);
chunk_root = fs_info-chunk_root = btrfs_alloc_root(fs_info);
@@ -2752,66 +2809,9 @@ retry_root_backup:
tree_root-commit_root = btrfs_root_node(tree_root);
btrfs_set_root_refs(tree_root-root_item, 1);
 
-   location.objectid = BTRFS_EXTENT_TREE_OBJECTID;
-   location.type = BTRFS_ROOT_ITEM_KEY;
-   location.offset = 0;
-
-   extent_root = btrfs_read_tree_root(tree_root, location);
-   if (IS_ERR(extent_root)) {
-   ret = PTR_ERR(extent_root);
-   goto recovery_tree_root;
-   }
-   set_bit(BTRFS_ROOT_TRACK_DIRTY, extent_root-state);
-   fs_info-extent_root = extent_root;
-
-   location.objectid = BTRFS_DEV_TREE_OBJECTID;
-   dev_root = btrfs_read_tree_root(tree_root, location);
-   if (IS_ERR(dev_root)) {
-   ret = PTR_ERR(dev_root);
-   goto recovery_tree_root;
-   }
-   set_bit(BTRFS_ROOT_TRACK_DIRTY, dev_root-state);
-   fs_info-dev_root = dev_root;
-   btrfs_init_devices_late(fs_info);
-
-   location.objectid = BTRFS_CSUM_TREE_OBJECTID;
-   csum_root = btrfs_read_tree_root(tree_root, location);
-   if