Re: BTRFS volume crashes after hard reset

2013-11-04 Thread Hugo Mills
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Mon, Nov 04, 2013 at 07:39:04AM +0100, Michael Eitelwein wrote:
 Hi
 
 Doing a btrfsck and then mounting with -o recovery finally worked. All
 seems well again.

   You should upgrade your kernel, too. 3.2 has a large number of
known bugs -- including the one you've just met here -- which have
been fixed in later versions. Some of those bugs are not as benign as
this one.

   Hugo.

 Best regards
 
 Michael
 
 Am 03.11.2013 23:50, schrieb Michael Eitelwein:
  Hi
 
  My computer froze and I had to power-cycle the PC. After reboot,
  attempting to mount the btrfs volume leads to a kernel oops:
 
  [  970.140850] device fsid 65db2d61-0301-42c0-9f1b-94dd215f694c devid 1
  transid 119849 /dev/sda3
  [  970.141475] btrfs: disk space caching is enabled
  [  991.149365] [ cut here ]
  [  991.149378] kernel BUG at
  /build/buildd/linux-3.2.0/fs/btrfs/free-space-cache.c:1515!
  [  991.149386] invalid opcode:  [#1] SMP
  [  991.149393] CPU 3
  [  991.149397] Modules linked in: des_generic md4 nls_utf8 cifs
  ip6table_filter ip6_tables ebtable_nat ebtables pci_stub vboxpci(O)
  vboxnetadp(O) vboxnetflt(O) vboxdrv(O) ipt_MASQUERADE iptable_nat nf_nat
  nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT
  xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables
  kvm_amd kvm bnep rfcomm parport_pc ppdev binfmt_misc dm_crypt btusb
  bluetooth snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel
  snd_hda_codec bridge stp snd_hwdep snd_pcm snd_seq_midi snd_rawmidi
  snd_seq_midi_event snd_seq snd_timer snd_seq_device snd edac_core
  k10temp sp5100_tco i2c_piix4 edac_mce_amd soundcore snd_page_alloc
  psmouse serio_raw mac_hid lp parport btrfs zlib_deflate libcrc32c
  firewire_ohci firewire_core usbhid hid crc_itu_t floppy pata_atiixp wmi
  r8169
  [  991.149516]
  [  991.149523] Pid: 4020, comm: mount Tainted: G   O
  3.2.0-32-generic #51-Ubuntu To Be Filled By O.E.M. To Be Filled By
  O.E.M./AOD790GX/128M
  [  991.149536] RIP: 0010:[a00f2d38]  [a00f2d38]
  remove_from_bitmap+0x248/0x250 [btrfs]
  [  991.149590] RSP: 0018:8801726e1698  EFLAGS: 00010286
  [  991.149596] RAX:  RBX: 880120ade700 RCX:
  0003
  [  991.149602] RDX: 8001 RSI: 8000 RDI:
  8801a0911000
  [  991.149608] RBP: 8801726e16f8 R08: 8801a0912000 R09:
  8000
  [  991.149614] R10: 8800 R11: 0010a7815400 R12:
  002c49c0
  [  991.149620] R13: 8801726e1718 R14: 8801726e1720 R15:
  8801819b8840
  [  991.149627] FS:  7f9e6f719800() GS:8801afd8()
  knlGS:f75b86c0
  [  991.149633] CS:  0010 DS:  ES:  CR0: 8005003b
  [  991.149639] CR2: 7ff9027b2110 CR3: 000159472000 CR4:
  06e0
  [  991.149645] DR0:  DR1:  DR2:
  
  [  991.149651] DR3:  DR6: 0ff0 DR7:
  0400
  [  991.149658] Process mount (pid: 4020, threadinfo 8801726e,
  task 8801727b4500)
  [  991.149663] Stack:
  [  991.149666]  880172682600 880172682600 880172682640
  8801819b8864
  [  991.149678]  002c43303000 a000 0286
  8801819b8840
  [  991.149688]   8801819b8864 880120ade700
  
  [  991.149698] Call Trace:
  [  991.149742]  [a00f3f99] btrfs_remove_free_space+0x69/0x330
  [btrfs]
  [  991.149779]  [a00ae270]
  btrfs_alloc_logged_file_extent+0x1c0/0x1e0 [btrfs]
  [  991.149811]  [a009c93a] ? btrfs_free_path+0x2a/0x40 [btrfs]
  [  991.149849]  [a00ef2f8] replay_one_extent+0x518/0x570 [btrfs]
  [  991.149861]  [8108abc0] ? autoremove_wake_function+0x40/0x40
  [  991.149901]  [a00efe2b] replay_one_buffer+0x26b/0x330 [btrfs]
  [  991.149941]  [a00dcb74] ? alloc_extent_buffer+0x74/0x410
  [btrfs]
  [  991.149979]  [a00eccea] walk_down_log_tree+0x1ea/0x3b0 [btrfs]
  [  991.150017]  [a00ed1ad] walk_log_tree+0xbd/0x1d0 [btrfs]
  [  991.150056]  [a00f0ea1] btrfs_recover_log_trees+0x211/0x300
  [btrfs]
  [  991.150095]  [a00efbc0] ?
  fixup_inode_link_counts+0x150/0x150 [btrfs]
  [  991.150133]  [a00bb635] open_ctree+0x14b5/0x1950 [btrfs]
  [  991.150145]  [81316eb4] ? snprintf+0x34/0x40
  [  991.150178]  [a010abe2] btrfs_fill_super.isra.37+0x72/0x12c
  [btrfs]
  [  991.150189]  [811e4711] ? disk_name+0x61/0xc0
  [  991.150197]  [813144d7] ? strlcpy+0x47/0x60
  [  991.150226]  [a009a897] btrfs_mount+0x497/0x4e0 [btrfs]
  [  991.150242]  [8117bad3] mount_fs+0x43/0x1b0
  [  991.150254]  [8119637a] vfs_kern_mount+0x6a/0xc0
  [  991.150265]  [81197884] do_kern_mount+0x54/0x110
  [  991.150274]  [811993e4] do_mount+0x1a4/0x260
  [  991.150282]  

Re: [PATCH] Btrfs: avoid heavy operations in btrfs_commit_super

2013-11-04 Thread Stefan Behrens
On Sun,  3 Nov 2013 23:24:34 +0800, Liu Bo wrote:
 The 'git blame' history shows that, the old transaction commit code has to do
 twice to ensure roots are updated and we have to flush metadata and super 
 block
 manually, however, right now all of these can be handled well inside
 the transaction commit code without extra efforts.
 
 And the error handling part remains same with the current code, -- 'return to
 caller once we get error'.
 
 This saves us a transaction commit and a flush of super block, which are both
 heavy operations according to ftrace output analysis.
 
 Signed-off-by: Liu Bo bo.li@oracle.com
 ---
  fs/btrfs/disk-io.c | 20 +---
  1 file changed, 1 insertion(+), 19 deletions(-)
 
 diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
 index 62176ad..d6728c3 100644
 --- a/fs/btrfs/disk-io.c
 +++ b/fs/btrfs/disk-io.c
 @@ -3542,25 +3542,7 @@ int btrfs_commit_super(struct btrfs_root *root)
   trans = btrfs_join_transaction(root);
   if (IS_ERR(trans))
   return PTR_ERR(trans);
 - ret = btrfs_commit_transaction(trans, root);
 - if (ret)
 - return ret;
 - /* run commit again to drop the original snapshot */
 - trans = btrfs_join_transaction(root);
 - if (IS_ERR(trans))
 - return PTR_ERR(trans);
 - ret = btrfs_commit_transaction(trans, root);
 - if (ret)
 - return ret;
 - ret = btrfs_write_and_wait_transaction(NULL, root);
 - if (ret) {
 - btrfs_error(root-fs_info, ret,
 - Failed to sync btree inode to disk.);
 - return ret;
 - }
 -
 - ret = write_ctree_super(NULL, root, 0);
 - return ret;
 + return btrfs_commit_transaction(trans, root);
  }
  
  int close_ctree(struct btrfs_root *root)
 

fs/btrfs/disk-io.c: In function 'btrfs_commit_super':
fs/btrfs/disk-io.c:3520: warning: unused variable 'ret'

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: avoid heavy operations in btrfs_commit_super

2013-11-04 Thread Liu Bo
On Mon, Nov 04, 2013 at 09:49:59AM +0100, Stefan Behrens wrote:
 On Sun,  3 Nov 2013 23:24:34 +0800, Liu Bo wrote:
  The 'git blame' history shows that, the old transaction commit code has to 
  do
  twice to ensure roots are updated and we have to flush metadata and super 
  block
  manually, however, right now all of these can be handled well inside
  the transaction commit code without extra efforts.
  
  And the error handling part remains same with the current code, -- 'return 
  to
  caller once we get error'.
  
  This saves us a transaction commit and a flush of super block, which are 
  both
  heavy operations according to ftrace output analysis.
  
  Signed-off-by: Liu Bo bo.li@oracle.com
  ---
   fs/btrfs/disk-io.c | 20 +---
   1 file changed, 1 insertion(+), 19 deletions(-)
  
  diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
  index 62176ad..d6728c3 100644
  --- a/fs/btrfs/disk-io.c
  +++ b/fs/btrfs/disk-io.c
  @@ -3542,25 +3542,7 @@ int btrfs_commit_super(struct btrfs_root *root)
  trans = btrfs_join_transaction(root);
  if (IS_ERR(trans))
  return PTR_ERR(trans);
  -   ret = btrfs_commit_transaction(trans, root);
  -   if (ret)
  -   return ret;
  -   /* run commit again to drop the original snapshot */
  -   trans = btrfs_join_transaction(root);
  -   if (IS_ERR(trans))
  -   return PTR_ERR(trans);
  -   ret = btrfs_commit_transaction(trans, root);
  -   if (ret)
  -   return ret;
  -   ret = btrfs_write_and_wait_transaction(NULL, root);
  -   if (ret) {
  -   btrfs_error(root-fs_info, ret,
  -   Failed to sync btree inode to disk.);
  -   return ret;
  -   }
  -
  -   ret = write_ctree_super(NULL, root, 0);
  -   return ret;
  +   return btrfs_commit_transaction(trans, root);
   }
   
   int close_ctree(struct btrfs_root *root)
  
 
 fs/btrfs/disk-io.c: In function 'btrfs_commit_super':
 fs/btrfs/disk-io.c:3520: warning: unused variable 'ret'
 

Oops, sorry for that, will fix it soon.

thanks,
-liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send/receive do not keep inode ctimes

2013-11-04 Thread Karl Kiniger
Hi Jan + list

On Fri 131101, Jan Schmidt wrote:
 Hi Karl,
 
 On Fri, October 25, 2013 at 15:12 (+0200), Karl Kiniger wrote:
  is there low level support to change inode ctimes somehow?
  (on ext[234] it can be done using debugfs)
 
 No.

Yes :-), offline only of course

 
  It would be nice to make received snapshots as similar as
  possible to their send source. (I am not talking about
  uuids and such, just ls -lc output)
 
 This is not planned. Currently, we do not even preserve the inode number. Can
 you give a short explanation of your use case, why do you need to keep the 
 ctime?
 
 Thanks,
 -Jan

I just wanted to clone a btrfs with many snapshots and at the same time
be able to mount both original and clone on the same computer.
(I did not even try because I dont know what kind of pain the duplicate
 uuids will cause)

Is there a known way to re-UUID such a cloned btrfs?

I don't really need the ctime, its just  convenient for looking whats
new in some folder - wget adjusts the mtime and unix/linux lacks the
file creation time time stamp.

Greetings,
Karl

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS volume crashes after hard reset

2013-11-04 Thread Michael Eitelwein
The crash actually happened on kernel 3.8 (Ubuntu 13.04) but I had to fall 
back to a Ubuntu 12.04 LTS with kernel 3.2 to get it fixed.


Hope that Ubuntu back-ports btrfs bug-fixes into their LTS kernel - do they?

Michael



Am 4. November 2013 09:29:53 schrieb Hugo Mills h...@carfax.org.uk:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Mon, Nov 04, 2013 at 07:39:04AM +0100, Michael Eitelwein wrote:
 Hi
 Doing a btrfsck and then mounting with -o recovery finally worked. All
 seems well again.

   You should upgrade your kernel, too. 3.2 has a large number of
known bugs -- including the one you've just met here -- which have
been fixed in later versions. Some of those bugs are not as benign as
this one.

   Hugo.

 Best regards
 Michael
 Am 03.11.2013 23:50, schrieb Michael Eitelwein:
  Hi
 
  My computer froze and I had to power-cycle the PC. After reboot,
  attempting to mount the btrfs volume leads to a kernel oops:
 
  [  970.140850] device fsid 65db2d61-0301-42c0-9f1b-94dd215f694c devid 1
  transid 119849 /dev/sda3
  [  970.141475] btrfs: disk space caching is enabled
  [  991.149365] [ cut here ]
  [  991.149378] kernel BUG at
  /build/buildd/linux-3.2.0/fs/btrfs/free-space-cache.c:1515!
  [  991.149386] invalid opcode:  [#1] SMP
  [  991.149393] CPU 3
  [  991.149397] Modules linked in: des_generic md4 nls_utf8 cifs
  ip6table_filter ip6_tables ebtable_nat ebtables pci_stub vboxpci(O)
  vboxnetadp(O) vboxnetflt(O) vboxdrv(O) ipt_MASQUERADE iptable_nat nf_nat
  nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT
  xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables
  kvm_amd kvm bnep rfcomm parport_pc ppdev binfmt_misc dm_crypt btusb
  bluetooth snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel
  snd_hda_codec bridge stp snd_hwdep snd_pcm snd_seq_midi snd_rawmidi
  snd_seq_midi_event snd_seq snd_timer snd_seq_device snd edac_core
  k10temp sp5100_tco i2c_piix4 edac_mce_amd soundcore snd_page_alloc
  psmouse serio_raw mac_hid lp parport btrfs zlib_deflate libcrc32c
  firewire_ohci firewire_core usbhid hid crc_itu_t floppy pata_atiixp wmi
  r8169
  [  991.149516]
  [  991.149523] Pid: 4020, comm: mount Tainted: G   O
  3.2.0-32-generic #51-Ubuntu To Be Filled By O.E.M. To Be Filled By
  O.E.M./AOD790GX/128M
  [  991.149536] RIP: 0010:[a00f2d38]  [a00f2d38]
  remove_from_bitmap+0x248/0x250 [btrfs]
  [  991.149590] RSP: 0018:8801726e1698  EFLAGS: 00010286
  [  991.149596] RAX:  RBX: 880120ade700 RCX:
  0003
  [  991.149602] RDX: 8001 RSI: 8000 RDI:
  8801a0911000
  [  991.149608] RBP: 8801726e16f8 R08: 8801a0912000 R09:
  8000
  [  991.149614] R10: 8800 R11: 0010a7815400 R12:
  002c49c0
  [  991.149620] R13: 8801726e1718 R14: 8801726e1720 R15:
  8801819b8840
  [  991.149627] FS:  7f9e6f719800() GS:8801afd8()
  knlGS:f75b86c0
  [  991.149633] CS:  0010 DS:  ES:  CR0: 8005003b
  [  991.149639] CR2: 7ff9027b2110 CR3: 000159472000 CR4:
  06e0
  [  991.149645] DR0:  DR1:  DR2:
  
  [  991.149651] DR3:  DR6: 0ff0 DR7:
  0400
  [  991.149658] Process mount (pid: 4020, threadinfo 8801726e,
  task 8801727b4500)
  [  991.149663] Stack:
  [  991.149666]  880172682600 880172682600 880172682640
  8801819b8864
  [  991.149678]  002c43303000 a000 0286
  8801819b8840
  [  991.149688]   8801819b8864 880120ade700
  
  [  991.149698] Call Trace:
  [  991.149742]  [a00f3f99] btrfs_remove_free_space+0x69/0x330
  [btrfs]
  [  991.149779]  [a00ae270]
  btrfs_alloc_logged_file_extent+0x1c0/0x1e0 [btrfs]
  [  991.149811]  [a009c93a] ? btrfs_free_path+0x2a/0x40 [btrfs]
  [  991.149849]  [a00ef2f8] replay_one_extent+0x518/0x570 [btrfs]
  [  991.149861]  [8108abc0] ? autoremove_wake_function+0x40/0x40
  [  991.149901]  [a00efe2b] replay_one_buffer+0x26b/0x330 [btrfs]
  [  991.149941]  [a00dcb74] ? alloc_extent_buffer+0x74/0x410
  [btrfs]
  [  991.149979]  [a00eccea] walk_down_log_tree+0x1ea/0x3b0 [btrfs]
  [  991.150017]  [a00ed1ad] walk_log_tree+0xbd/0x1d0 [btrfs]
  [  991.150056]  [a00f0ea1] btrfs_recover_log_trees+0x211/0x300
  [btrfs]
  [  991.150095]  [a00efbc0] ?
  fixup_inode_link_counts+0x150/0x150 [btrfs]
  [  991.150133]  [a00bb635] open_ctree+0x14b5/0x1950 [btrfs]
  [  991.150145]  [81316eb4] ? snprintf+0x34/0x40
  [  991.150178]  [a010abe2] btrfs_fill_super.isra.37+0x72/0x12c
  [btrfs]
  [  991.150189]  [811e4711] ? disk_name+0x61/0xc0
  [  991.150197]  [813144d7] ? strlcpy+0x47/0x60
  [  991.150226]  [a009a897] 

Re: [PATCH v6] Btrfs: fix memory leak of orphan block rsv

2013-11-04 Thread Alex Lyakas
Hi Filipe,
any luck with this patch?:)

Alex.

On Wed, Oct 23, 2013 at 5:26 PM, Filipe David Manana fdman...@gmail.com wrote:
 On Wed, Oct 23, 2013 at 3:14 PM, Alex Lyakas
 alex.bt...@zadarastorage.com wrote:
 Hello,

 On Wed, Oct 23, 2013 at 4:35 PM, Filipe David Manana fdman...@gmail.com 
 wrote:
 On Wed, Oct 23, 2013 at 2:33 PM, Alex Lyakas
 alex.bt...@zadarastorage.com wrote:
 Hi Filipe,


 On Tue, Aug 20, 2013 at 2:52 AM, Filipe David Borba Manana
 fdman...@gmail.com wrote:

 This issue is simple to reproduce and observe if kmemleak is enabled.
 Two simple ways to reproduce it:

 ** 1

 $ mkfs.btrfs -f /dev/loop0
 $ mount /dev/loop0 /mnt/btrfs
 $ btrfs balance start /mnt/btrfs
 $ umount /mnt/btrfs

 So here it seems that the leak can only happen in case the block-group
 has a free-space inode. This is what the orphan item is added for.
 Yes, here kmemleak reports.
 But: if space_cache option is disabled (and nospace_cache) enabled, it
 seems that btrfs still creates the FREE_SPACE inodes, although they
 are empty because in cache_save_setup:

 inode = lookup_free_space_inode(root, block_group, path);
 if (IS_ERR(inode)  PTR_ERR(inode) != -ENOENT) {
 ret = PTR_ERR(inode);
 btrfs_release_path(path);
 goto out;
 }

 if (IS_ERR(inode)) {
 ...
 ret = create_free_space_inode(root, trans, block_group, path);

 and only later it actually sets BTRFS_DC_WRITTEN if space_cache option
 is disabled. Amazing!
 Although this is a different issue, do you know perhaps why these
 empty inodes are needed?

 Don't know if they are needed. But you have a point, it seems odd to
 create the free space cache inode if mount option nospace_cache was
 supplied. Thanks Alex. Testing the following patch:

 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index c43ee8a..eb1b7da 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -3162,6 +3162,9 @@ static int cache_save_setup(struct
 btrfs_block_group_cache *block_group,
 int retries = 0;
 int ret = 0;

 +   if (!btrfs_test_opt(root, SPACE_CACHE))
 +   return 0;
 +
 /*
  * If this block group is smaller than 100 megs don't bother caching 
 the
  * block group.



 Thanks!
 Alex.




 ** 2

 $ mkfs.btrfs -f /dev/loop0
 $ mount /dev/loop0 /mnt/btrfs
 $ touch /mnt/btrfs/foobar
 $ rm -f /mnt/btrfs/foobar
 $ umount /mnt/btrfs


 I tried the second repro script on kernel 3.8.13, and kmemleak does
 not report a leak (even if I force the kmemleak scan). I did not try
 the balance-repro script, though. Am I missing something?

 Maybe it's not an issue on 3.8.13 and older releases.
 This was on btrfs-next from August 19.

 thanks for testing


 Thanks,
 Alex.




 After a while, kmemleak reports the leak:

 $ cat /sys/kernel/debug/kmemleak
 unreferenced object 0x880402b13e00 (size 128):
   comm btrfs, pid 19621, jiffies 4341648183 (age 70057.844s)
   hex dump (first 32 bytes):
 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
 00 fc c6 b1 04 88 ff ff 04 00 04 00 ad 4e ad de  .N..
   backtrace:
 [817275a6] kmemleak_alloc+0x26/0x50
 [8117832b] kmem_cache_alloc_trace+0xeb/0x1d0
 [a04db499] btrfs_alloc_block_rsv+0x39/0x70 [btrfs]
 [a04f8bad] btrfs_orphan_add+0x13d/0x1b0 [btrfs]
 [a04e2b13] btrfs_remove_block_group+0x143/0x500 [btrfs]
 [a0518158] btrfs_relocate_chunk.isra.63+0x618/0x790 [btrfs]
 [a051bc27] btrfs_balance+0x8f7/0xe90 [btrfs]
 [a05240a0] btrfs_ioctl_balance+0x250/0x550 [btrfs]
 [a05269ca] btrfs_ioctl+0xdfa/0x25f0 [btrfs]
 [8119c936] do_vfs_ioctl+0x96/0x570
 [8119cea1] SyS_ioctl+0x91/0xb0
 [81750242] system_call_fastpath+0x16/0x1b
 [] 0x

 This affects btrfs-next, revision be8e3cd00d7293dd177e3f8a4a1645ce09ca3acb
 (Btrfs: separate out tests into their own directory).

 Signed-off-by: Filipe David Borba Manana fdman...@gmail.com
 ---

 V2: removed atomic_t member in struct btrfs_block_rsv, as suggested by
 Josef Bacik, and use instead the condition reserved == 0 to decide
 when to free the block.
 V3: simplified patch, just kfree() (and not btrfs_free_block_rsv) the
 root's orphan_block_rsv when free'ing the root. Thanks Josef for
 the suggestion.
 V4: use btrfs_free_block_rsv() instead of kfree(). The error I was getting
 in xfstests when using btrfs_free_block_rsv() was unrelated, Josef 
 just
 pointed it to me (separate issue).
 V5: move the free call below the iput() call, so that btrfs_evict_node()
 can process the orphan_block_rsv first to do some needed cleanup 
 before
 we free it.
 V6: free the root's orphan_block_rsv in close_ctree() too. After a balance
 the orphan_block_rsv of the tree of tree roots was being leaked, 
 because
 free_fs_root() is only called for filesystem 

Re: BTRFS volume crashes after hard reset

2013-11-04 Thread David Taylor
On Mon, 04 Nov 2013, Michael Eitelwein wrote:
 
 Am 4. November 2013 09:29:53 schrieb Hugo Mills h...@carfax.org.uk:

   You should upgrade your kernel, too. 3.2 has a large number of
known bugs -- including the one you've just met here -- which have
been fixed in later versions. Some of those bugs are not as benign as
this one.


 The crash actually happened on kernel 3.8 (Ubuntu 13.04) but I had to
 fall back to a Ubuntu 12.04 LTS with kernel 3.2 to get it fixed.

 Hope that Ubuntu back-ports btrfs bug-fixes into their LTS kernel - do they?

Running a LTS kernel and an experimental filesystem seem to be somewhat
incompatible desires.  It seems unlikely they would back-port such fixes.

If you want to run btrfs you are far better off compiling your own kernel
from up-to-date source.

-- 
David Taylor
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] FS: BTRFS: fixed coding style issues

2013-11-04 Thread Aldo Iljazi
Fixed three coding style issues. Replaced spaces with tabs.

Signed-off-by: Aldo Iljazi m...@aldo.io
---
 fs/btrfs/dev-replace.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 9efb94e..b2fe609 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -377,7 +377,7 @@ int btrfs_dev_replace_start(struct btrfs_root *root,
printk_in_rcu(KERN_INFO
  btrfs: dev_replace from %s (devid %llu) to %s) 
started\n,
  src_device-missing ? missing disk :
-   rcu_str_deref(src_device-name),
+   rcu_str_deref(src_device-name),
  src_device-devid,
  rcu_str_deref(tgt_device-name));
 
@@ -500,7 +500,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info 
*fs_info,
printk_in_rcu(KERN_ERR
  btrfs: btrfs_scrub_dev(%s, %llu, %s) failed 
%d\n,
  src_device-missing ? missing disk :
-   rcu_str_deref(src_device-name),
+   rcu_str_deref(src_device-name),
  src_device-devid,
  rcu_str_deref(tgt_device-name), scrub_ret);
btrfs_dev_replace_unlock(dev_replace);
@@ -515,7 +515,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info 
*fs_info,
printk_in_rcu(KERN_INFO
  btrfs: dev_replace from %s (devid %llu) to %s) 
finished\n,
  src_device-missing ? missing disk :
-   rcu_str_deref(src_device-name),
+   rcu_str_deref(src_device-name),
  src_device-devid,
  rcu_str_deref(tgt_device-name));
tgt_device-is_tgtdev_for_dev_replace = 0;
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] FS: BTRFS: fixed a styling issue

2013-11-04 Thread Aldo Iljazi
Line 363:
Added a space before the open parenthesis.

Signed-off-by: Aldo Iljazi m...@aldo.io
---
 fs/btrfs/compression.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 6aad98c..91338d2 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -360,7 +360,7 @@ int btrfs_submit_compressed_write(struct inode *inode, u64 
start,
bdev = BTRFS_I(inode)-root-fs_info-fs_devices-latest_bdev;
 
bio = compressed_bio_alloc(bdev, first_byte, GFP_NOFS);
-   if(!bio) {
+   if (!bio) {
kfree(cb);
return -ENOMEM;
}
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/7] Btrfs: pick up the code for the item number calculation in flush_space()

2013-11-04 Thread Miao Xie
This patch picked up the code that was used to calculate the number of
the items for which we need reserve space, and we will use it in the next
patch.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/extent-tree.c | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ed6eceb..32dcf80 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4004,6 +4004,18 @@ static void btrfs_writeback_inodes_sb_nr(struct 
btrfs_root *root,
}
 }
 
+static inline int calc_reclaim_items_nr(struct btrfs_root *root, u64 
to_reclaim)
+{
+   u64 bytes;
+   int nr;
+
+   bytes = btrfs_calc_trans_metadata_size(root, 1);
+   nr = (int)div64_u64(to_reclaim, bytes);
+   if (!nr)
+   nr = 1;
+   return nr;
+}
+
 /*
  * shrink metadata reservation for delalloc
  */
@@ -4149,16 +4161,11 @@ static int flush_space(struct btrfs_root *root,
switch (state) {
case FLUSH_DELAYED_ITEMS_NR:
case FLUSH_DELAYED_ITEMS:
-   if (state == FLUSH_DELAYED_ITEMS_NR) {
-   u64 bytes = btrfs_calc_trans_metadata_size(root, 1);
-
-   nr = (int)div64_u64(num_bytes, bytes);
-   if (!nr)
-   nr = 1;
-   nr *= 2;
-   } else {
+   if (state == FLUSH_DELAYED_ITEMS_NR)
+   nr = calc_reclaim_items_nr(root, num_bytes) * 2;
+   else
nr = -1;
-   }
+
trans = btrfs_join_transaction(root);
if (IS_ERR(trans)) {
ret = PTR_ERR(trans);
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/7] Btrfs: improve the latency of the space reservation

2013-11-04 Thread Miao Xie
This patchset improve the latency of the space reservation when doing the
delalloc inode flush. As we know, the current code would wait for
the completion all the ordered extents in the filesystem. It was unnecessary,
we can finish the wait and try to do reservation once again earlier if we
get enough free space. It can reduce the wait time and make the perfermance
up.

We did the buffered write test by the sysbench on my box, the following
is the result.
 Memory:2GB
 CPU:   2Cores * 1CPU
 Partition: 20GB(SSD)
w/  w/o
 rndwr-512KB-1Thread-2GB120.38MB/s  110.08MB/s
 rndwr-512KB-8Threads-2GB   119.43MB/s  110.96MB/s
 seqwr-512KB-1Thread-2GB99.676MB/s  98.53MB/s
 seqwr-512KB-8Threads-2GB   111.79MB/s  99.176MB/s

 rndwr-128KB-1Thread-2GB139.23MB/s  95.864MB/s
 rndwr-128KB-8Threads-2GB   126.16MB/s  96.686MB/s
 seqwr-128KB-1Thread-2GB100.24MB/s  100.95.MB/s
 seqwr-128KB-8Threads-2GB   126.51MB/s  100.26MB/s

rndwr: random write test
seqwr: sequential write test
512KB,128KB: the size of each request
1Thread, 8Threads: the number of the test threads
2GB: The total file size

Miao Xie (7):
  Btrfs: remove unnecessary initialization and memory barrior in 
shrink_delalloc()
  Btrfs: wait for the ordered extent only when we want
  Btrfs: pick up the code for the item number calculation in flush_space()
  Btrfs: fix the confusion between delalloc bytes and metadata bytes
  Btrfs: don't wait for all the async delalloc when shrinking delalloc
  Btrfs: don't wait for the completion of all the ordered extents
  Btrfs: rename btrfs_start_all_delalloc_inodes

 fs/btrfs/ctree.h|  3 +--
 fs/btrfs/dev-replace.c  |  6 ++---
 fs/btrfs/extent-tree.c  | 62 ++---
 fs/btrfs/inode.c|  3 +--
 fs/btrfs/ioctl.c|  2 +-
 fs/btrfs/ordered-data.c | 22 ++
 fs/btrfs/ordered-data.h |  4 ++--
 fs/btrfs/relocation.c   |  4 ++--
 fs/btrfs/super.c|  2 +-
 fs/btrfs/transaction.c  |  4 ++--
 10 files changed, 73 insertions(+), 39 deletions(-)

-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] Btrfs: don't wait for the completion of all the ordered extents

2013-11-04 Thread Miao Xie
It is very likely that there are lots of ordered extents in the filesytem,
if we wait for the completion of all of them when we want to reclaim some
space for the metadata space reservation, we would be blocked for a long
time. The performance would drop down suddenly for a long time.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/dev-replace.c  |  4 ++--
 fs/btrfs/extent-tree.c  | 11 ++-
 fs/btrfs/ioctl.c|  2 +-
 fs/btrfs/ordered-data.c | 22 +-
 fs/btrfs/ordered-data.h |  4 ++--
 fs/btrfs/relocation.c   |  2 +-
 fs/btrfs/super.c|  2 +-
 fs/btrfs/transaction.c  |  2 +-
 8 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 2a9bd5b..bc55b36 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -400,7 +400,7 @@ int btrfs_dev_replace_start(struct btrfs_root *root,
args-result = BTRFS_IOCTL_DEV_REPLACE_RESULT_NO_ERROR;
btrfs_dev_replace_unlock(dev_replace);
 
-   btrfs_wait_all_ordered_extents(root-fs_info);
+   btrfs_wait_ordered_roots(root-fs_info, -1);
 
/* force writing the updated state information to disk */
trans = btrfs_start_transaction(root, 0);
@@ -475,7 +475,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info 
*fs_info,
mutex_unlock(dev_replace-lock_finishing_cancel_unmount);
return ret;
}
-   btrfs_wait_all_ordered_extents(root-fs_info);
+   btrfs_wait_ordered_roots(root-fs_info, -1);
 
trans = btrfs_start_transaction(root, 0);
if (IS_ERR(trans)) {
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index d691e15..410d972 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4000,7 +4000,7 @@ static void btrfs_writeback_inodes_sb_nr(struct 
btrfs_root *root,
 */
btrfs_start_all_delalloc_inodes(root-fs_info, 0);
if (!current-journal_info)
-   btrfs_wait_all_ordered_extents(root-fs_info);
+   btrfs_wait_ordered_roots(root-fs_info, -1);
}
 }
 
@@ -4032,11 +4032,12 @@ static void shrink_delalloc(struct btrfs_root *root, 
u64 to_reclaim, u64 orig,
long time_left;
unsigned long nr_pages;
int loops;
+   int items;
enum btrfs_reserve_flush_enum flush;
 
/* Calc the number of the pages we need flush for space reservation */
-   to_reclaim = calc_reclaim_items_nr(root, to_reclaim);
-   to_reclaim *= EXTENT_SIZE_PER_ITEM;
+   items = calc_reclaim_items_nr(root, to_reclaim);
+   to_reclaim = items * EXTENT_SIZE_PER_ITEM;
 
trans = (struct btrfs_trans_handle *)current-journal_info;
block_rsv = root-fs_info-delalloc_block_rsv;
@@ -4048,7 +4049,7 @@ static void shrink_delalloc(struct btrfs_root *root, u64 
to_reclaim, u64 orig,
if (trans)
return;
if (wait_ordered)
-   btrfs_wait_all_ordered_extents(root-fs_info);
+   btrfs_wait_ordered_roots(root-fs_info, items);
return;
}
 
@@ -4087,7 +4088,7 @@ skip_async:
 
loops++;
if (wait_ordered  !trans) {
-   btrfs_wait_all_ordered_extents(root-fs_info);
+   btrfs_wait_ordered_roots(root-fs_info, items);
} else {
time_left = schedule_timeout_killable(1);
if (time_left)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 9d46f60..524692d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -574,7 +574,7 @@ static int create_snapshot(struct btrfs_root *root, struct 
inode *dir,
if (ret)
return ret;
 
-   btrfs_wait_ordered_extents(root);
+   btrfs_wait_ordered_extents(root, -1);
 
pending_snapshot = kzalloc(sizeof(*pending_snapshot), GFP_NOFS);
if (!pending_snapshot)
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index c702cb6..97efc12 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -563,10 +563,11 @@ static void btrfs_run_ordered_extent_work(struct 
btrfs_work *work)
  * wait for all the ordered extents in a root.  This is done when balancing
  * space between drives.
  */
-void btrfs_wait_ordered_extents(struct btrfs_root *root)
+int btrfs_wait_ordered_extents(struct btrfs_root *root, int nr)
 {
struct list_head splice, works;
struct btrfs_ordered_extent *ordered, *next;
+   int count = 0;
 
INIT_LIST_HEAD(splice);
INIT_LIST_HEAD(works);
@@ -574,7 +575,7 @@ void btrfs_wait_ordered_extents(struct btrfs_root *root)
mutex_lock(root-fs_info-ordered_operations_mutex);
spin_lock(root-ordered_extent_lock);
list_splice_init(root-ordered_extents, splice);
-   while (!list_empty(splice)) {
+   while (!list_empty(splice)  nr) {
 

[PATCH 2/7] Btrfs: wait for the ordered extent only when we want

2013-11-04 Thread Miao Xie
Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/extent-tree.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index abe65ed..ed6eceb 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4029,7 +4029,8 @@ static void shrink_delalloc(struct btrfs_root *root, u64 
to_reclaim, u64 orig,
if (delalloc_bytes == 0) {
if (trans)
return;
-   btrfs_wait_all_ordered_extents(root-fs_info);
+   if (wait_ordered)
+   btrfs_wait_all_ordered_extents(root-fs_info);
return;
}
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/7] Btrfs: rename btrfs_start_all_delalloc_inodes

2013-11-04 Thread Miao Xie
rename the function -- btrfs_start_all_delalloc_inodes(), and make its
name be compatible to btrfs_wait_ordered_roots(), since they are always
used at the same place.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/ctree.h   | 3 +--
 fs/btrfs/dev-replace.c | 2 +-
 fs/btrfs/extent-tree.c | 2 +-
 fs/btrfs/inode.c   | 3 +--
 fs/btrfs/relocation.c  | 2 +-
 fs/btrfs/transaction.c | 2 +-
 6 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 0506f40..1fbfc21 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3675,8 +3675,7 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle 
*trans,
   u32 min_type);
 
 int btrfs_start_delalloc_inodes(struct btrfs_root *root, int delay_iput);
-int btrfs_start_all_delalloc_inodes(struct btrfs_fs_info *fs_info,
-   int delay_iput);
+int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int delay_iput);
 int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
  struct extent_state **cached_state);
 int btrfs_create_subvol_root(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index bc55b36..7436e08 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -470,7 +470,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info 
*fs_info,
 * flush all outstanding I/O and inode extent mappings before the
 * copy operation is declared as being finished
 */
-   ret = btrfs_start_all_delalloc_inodes(root-fs_info, 0);
+   ret = btrfs_start_delalloc_roots(root-fs_info, 0);
if (ret) {
mutex_unlock(dev_replace-lock_finishing_cancel_unmount);
return ret;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 410d972..0aad9ed 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3998,7 +3998,7 @@ static void btrfs_writeback_inodes_sb_nr(struct 
btrfs_root *root,
 * the filesystem is readonly(all dirty pages are written to
 * the disk).
 */
-   btrfs_start_all_delalloc_inodes(root-fs_info, 0);
+   btrfs_start_delalloc_roots(root-fs_info, 0);
if (!current-journal_info)
btrfs_wait_ordered_roots(root-fs_info, -1);
}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index f4a6851..da618c4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8276,8 +8276,7 @@ int btrfs_start_delalloc_inodes(struct btrfs_root *root, 
int delay_iput)
return ret;
 }
 
-int btrfs_start_all_delalloc_inodes(struct btrfs_fs_info *fs_info,
-   int delay_iput)
+int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, int delay_iput)
 {
struct btrfs_root *root;
struct list_head splice;
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 0cec204..af41487 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4241,7 +4241,7 @@ int btrfs_relocate_block_group(struct btrfs_root 
*extent_root, u64 group_start)
printk(KERN_INFO btrfs: relocating block group %llu flags %llu\n,
   rc-block_group-key.objectid, rc-block_group-flags);
 
-   ret = btrfs_start_all_delalloc_inodes(fs_info, 0);
+   ret = btrfs_start_delalloc_roots(fs_info, 0);
if (ret  0) {
err = ret;
goto out;
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 0fada5b..840e672 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1596,7 +1596,7 @@ static int btrfs_flush_all_pending_stuffs(struct 
btrfs_trans_handle *trans,
 static inline int btrfs_start_delalloc_flush(struct btrfs_fs_info *fs_info)
 {
if (btrfs_test_opt(fs_info-tree_root, FLUSHONCOMMIT))
-   return btrfs_start_all_delalloc_inodes(fs_info, 1);
+   return btrfs_start_delalloc_roots(fs_info, 1);
return 0;
 }
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] Btrfs: improve the latency of the space reservation

2013-11-04 Thread Josef Bacik
On Mon, Nov 04, 2013 at 11:13:19PM +0800, Miao Xie wrote:
 This patchset improve the latency of the space reservation when doing the
 delalloc inode flush. As we know, the current code would wait for
 the completion all the ordered extents in the filesystem. It was unnecessary,
 we can finish the wait and try to do reservation once again earlier if we
 get enough free space. It can reduce the wait time and make the perfermance
 up.
 
 We did the buffered write test by the sysbench on my box, the following
 is the result.
  Memory:  2GB
  CPU: 2Cores * 1CPU
  Partition:   20GB(SSD)
   w/  w/o
  rndwr-512KB-1Thread-2GB  120.38MB/s  110.08MB/s
  rndwr-512KB-8Threads-2GB 119.43MB/s  110.96MB/s
  seqwr-512KB-1Thread-2GB  99.676MB/s  98.53MB/s
  seqwr-512KB-8Threads-2GB 111.79MB/s  99.176MB/s
 
  rndwr-128KB-1Thread-2GB  139.23MB/s  95.864MB/s
  rndwr-128KB-8Threads-2GB 126.16MB/s  96.686MB/s
  seqwr-128KB-1Thread-2GB  100.24MB/s  100.95.MB/s
  seqwr-128KB-8Threads-2GB 126.51MB/s  100.26MB/s
 
 rndwr: random write test
 seqwr: sequential write test
 512KB,128KB: the size of each request
 1Thread, 8Threads: the number of the test threads
 2GB: The total file size
 
 Miao Xie (7):
   Btrfs: remove unnecessary initialization and memory barrior in 
 shrink_delalloc()
   Btrfs: wait for the ordered extent only when we want
   Btrfs: pick up the code for the item number calculation in flush_space()
   Btrfs: fix the confusion between delalloc bytes and metadata bytes
   Btrfs: don't wait for all the async delalloc when shrinking delalloc
   Btrfs: don't wait for the completion of all the ordered extents
   Btrfs: rename btrfs_start_all_delalloc_inodes
 

Thank you for this, I've been wanting to do this for months but I keep getting
sucked into bug hell, remind me to kiss you at LSF.

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] Btrfs: rename btrfs_start_all_delalloc_inodes

2013-11-04 Thread Josef Bacik
On Mon, Nov 04, 2013 at 11:13:26PM +0800, Miao Xie wrote:
 rename the function -- btrfs_start_all_delalloc_inodes(), and make its
 name be compatible to btrfs_wait_ordered_roots(), since they are always
 used at the same place.
 
 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 ---
  fs/btrfs/ctree.h   | 3 +--
  fs/btrfs/dev-replace.c | 2 +-
  fs/btrfs/extent-tree.c | 2 +-
  fs/btrfs/inode.c   | 3 +--
  fs/btrfs/relocation.c  | 2 +-
  fs/btrfs/transaction.c | 2 +-
  6 files changed, 6 insertions(+), 8 deletions(-)
 

Theres another use in ioctl.c in my tree, I've just rebased that change in,
please check my tree when I push it to make sure I did it right.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix negative qgroup tracking from owner accounting (bug #61951)

2013-11-04 Thread Josef Bacik
On Thu, Oct 24, 2013 at 03:22:06PM +0200, Jan Schmidt wrote:
 btrfs_dec_ref() queued a delayed ref for owner of a tree block. The qgroup
 tracking is based on delayed refs. The owner of a tree block is set when a
 tree block is allocated, it is never updated.
 
 When you allocate a tree block and then remove the subvolume that did the
 allocation, the qgroup accounting for that removal is correct. However, the
 removal was accounted again for each subvolume deletion that also referenced
 the tree block, because accounting was erroneously based on the owner.
 
 Instead of queueing delayed refs for the non-existent owner, we now
 queue delayed refs for the root being removed. This fixes the qgroup
 accounting.
 
 Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
 Tested-by: dustym...@gmail.com

This breaks btrfs/003, I'm kicking it out.

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck errors is it save to fix?

2013-11-04 Thread Hendrik Friedel

Hello,

the list was quite full with patches, so this might have been hidden.
Here the complete Stack.
Does this help? Is this what you needed?

Greetings,
Hendrik



sorry about that:
[  126.444603] init: plymouth-stop pre-start process (3446) terminated
with status 1
[11189.299864] hda-intel: IRQ timing workaround is activated for card
#0. Suggest a bigger bdl_pos_adj.
[94999.489736] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 2
transid 140408 /dev/sdc1
[94999.489755] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 1
transid 140408 /dev/sdb1
[95394.400840] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 1
transid 140420 /dev/sdb1
[95394.400872] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 2
transid 140420 /dev/sdc1
[95585.149738] init: smbd main process (1168) killed by TERM signal
[95725.171156] nfsd: last server has exited, flushing export cache
[95764.899173] [ cut here ]
[95764.899216] WARNING: CPU: 1 PID: 21798 at
/home/apw/COD/linux/fs/btrfs/disk-io.c:3423 free_fs_root+0x99/0xa
0 [btrfs]()
[95764.899219] Modules linked in: nvram pci_stub vboxpci(OF)
vboxnetadp(OF) vboxnetflt(OF) vboxdrv(OF) ip6tabl
e_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat
nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_de
frag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle
xt_tcpudp iptable_filter ip_tables x_tab
les bridge stp llc kvm_intel kvm nfsd nfs_acl auth_rpcgss nfs fscache
binfmt_misc lockd sunrpc ftdi_sio usbser
ial stv6110x lnbp21 snd_hda_codec_realtek snd_hda_intel stv090x
snd_hda_codec snd_hwdep snd_pcm ddbridge dvb_c
ore snd_timer snd soundcore snd_page_alloc cxd2099(C) mei_me psmouse
i915 drm_kms_helper mei drm lpc_ich i2c_a
lgo_bit serio_raw video mac_hid coretemp lp parport hid_generic usbhid
hid btrfs raid6_pq e1000e ptp pps_core
ahci libahci xor zlib_deflate libcrc32c
[95764.899294] CPU: 1 PID: 21798 Comm: umount Tainted: GFCIO
3.11.0-031100rc2-generic #201307211535
[95764.899297] Hardware name:  /DH87RL, BIOS
RLH8710H.86A.0320.2013.0606.1802 06/06/2013
[95764.899300]  0d5f 880118b59cb8 8171e74d
0007
[95764.899306]   880118b59cf8 8106532c
880118b59d08
[95764.899311]  8801184cb800 8801184cb800 880118118000
880118b59d78
[95764.899315] Call Trace:
[95764.899324]  [8171e74d] dump_stack+0x46/0x58
[95764.899331]  [8106532c] warn_slowpath_common+0x8c/0xc0
[95764.899336]  [8106537a] warn_slowpath_null+0x1a/0x20
[95764.899359]  [a00d9a59] free_fs_root+0x99/0xa0 [btrfs]
[95764.899384]  [a00dd653]
btrfs_drop_and_free_fs_root+0x93/0xc0 [btrfs]
[95764.899408]  [a00dd74f] del_fs_roots+0xcf/0x130 [btrfs]
[95764.899433]  [a00ddac6] close_ctree+0x146/0x270 [btrfs]
[95764.899441]  [811cd24e] ? evict_inodes+0xce/0x130
[95764.899461]  [a00b4eb9] btrfs_put_super+0x19/0x20 [btrfs]
[95764.899467]  [811b47e2] generic_shutdown_super+0x62/0xf0
[95764.899475]  [811b4906] kill_anon_super+0x16/0x30
[95764.899493]  [a00b754a] btrfs_kill_super+0x1a/0x90 [btrfs]
[95764.899500]  [811b512d] deactivate_locked_super+0x4d/0x80
[95764.899505]  [811b57ae] deactivate_super+0x4e/0x70
[95764.899510]  [811d1266] mntput_no_expire+0x106/0x160
[95764.899515]  [811d2b79] SyS_umount+0xa9/0xf0
[95764.899520]  [817333ef] tracesys+0xe1/0xe6
[95764.899524] ---[ end trace 0024dfebf572e76c ]---
[95764.985245] VFS: Busy inodes after unmount of sdb1. Self-destruct in
5 seconds.  Have a nice day...
[95790.079663] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 1
transid 140425 /dev/sdb1
[95790.101778] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 2
transid 140425 /dev/sdc1
[95790.162960] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 1
transid 140425 /dev/sdb1
[95790.163825] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 2
transid 140425 /dev/sdc1
[95924.393344] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 1
transid 140425 /dev/sdb1
[95924.421118] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 2
transid 140425 /dev/sdc1
[95924.676571] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 1
transid 140425 /dev/sdb1
[95924.677046] device fsid 989306aa-d291-4752-8477-0baf94f8c42f devid 2
transid 140425 /dev/sdc1


Greetings,
Hendrik


Am 02.11.2013 09:12, schrieb cwillu:

Now that I am searching, I see this in dmesg:
[95764.899359]  [a00d9a59] free_fs_root+0x99/0xa0 [btrfs]
[95764.899384]  [a00dd653]
btrfs_drop_and_free_fs_root+0x93/0xc0
[btrfs]
[95764.899408]  [a00dd74f] del_fs_roots+0xcf/0x130 [btrfs]
[95764.899433]  [a00ddac6] close_ctree+0x146/0x270 [btrfs]
[95764.899461]  [a00b4eb9] btrfs_put_super+0x19/0x20 [btrfs]
[95764.899493]  [a00b754a] btrfs_kill_super+0x1a/0x90 [btrfs]


Need to see the rest of the trace this came from.








[PATCH 6/9] btrfs: remove unused variable from setup_cluster_no_bitmap

2013-11-04 Thread Valentina Giusti
The variable window_start in setup_cluster_no_bitmap is not used since commit
1bb91902dc90e25449893e693ad45605cb08fbe5
(Btrfs: revamp clustered allocation logic)

Signed-off-by: Valentina Giusti valentina.giu...@microon.de
---
 fs/btrfs/free-space-cache.c |2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 4f419ba..05395cd 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -2419,7 +2419,6 @@ setup_cluster_no_bitmap(struct btrfs_block_group_cache 
*block_group,
struct btrfs_free_space *entry = NULL;
struct btrfs_free_space *last;
struct rb_node *node;
-   u64 window_start;
u64 window_free;
u64 max_extent;
u64 total_size = 0;
@@ -2441,7 +2440,6 @@ setup_cluster_no_bitmap(struct btrfs_block_group_cache 
*block_group,
entry = rb_entry(node, struct btrfs_free_space, offset_index);
}
 
-   window_start = entry-offset;
window_free = entry-bytes;
max_extent = entry-bytes;
first = entry;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/9] btrfs: remove unused variable from scrub_fixup_nodatasum

2013-11-04 Thread Valentina Giusti
Signed-off-by: Valentina Giusti valentina.giu...@microon.de
---
 fs/btrfs/scrub.c |2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index a18e0e2..84139c6 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -705,13 +705,11 @@ static void scrub_fixup_nodatasum(struct btrfs_work *work)
struct scrub_fixup_nodatasum *fixup;
struct scrub_ctx *sctx;
struct btrfs_trans_handle *trans = NULL;
-   struct btrfs_fs_info *fs_info;
struct btrfs_path *path;
int uncorrectable = 0;
 
fixup = container_of(work, struct scrub_fixup_nodatasum, work);
sctx = fixup-sctx;
-   fs_info = fixup-root-fs_info;
 
path = btrfs_alloc_path();
if (!path) {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/9] btrfs: remove unused variable from btrfs_search_forward

2013-11-04 Thread Valentina Giusti
The variable blockptr in btrfs_search_forward is not used since commit
de78b51a2852bddccd6535e9e12de65f92787a1e
(btrfs: remove cache only arguments from defrag path)

Signed-off-by: Valentina Giusti valentina.giu...@microon.de
---
 fs/btrfs/ctree.c |2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 61b5bcd..ea20801 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -4911,10 +4911,8 @@ again:
 * If it is too old, old, skip to the next one.
 */
while (slot  nritems) {
-   u64 blockptr;
u64 gen;
 
-   blockptr = btrfs_node_blockptr(cur, slot);
gen = btrfs_node_ptr_generation(cur, slot);
if (gen  min_trans) {
slot++;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 9/9] btrfs: fix unused variables in qgroup.c

2013-11-04 Thread Valentina Giusti
Use otherwise unused local variables slot in update_qgroup_limit_item and
in update_qgroup_info_item, and remove unused variable ins from
btrfs_qgroup_account_ref.

Signed-off-by: Valentina Giusti valentina.giu...@microon.de
---
 fs/btrfs/qgroup.c |   11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 4e6ef49..bd0b058 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -644,8 +644,7 @@ static int update_qgroup_limit_item(struct 
btrfs_trans_handle *trans,
 
l = path-nodes[0];
slot = path-slots[0];
-   qgroup_limit = btrfs_item_ptr(l, path-slots[0],
- struct btrfs_qgroup_limit_item);
+   qgroup_limit = btrfs_item_ptr(l, slot, struct btrfs_qgroup_limit_item);
btrfs_set_qgroup_limit_flags(l, qgroup_limit, flags);
btrfs_set_qgroup_limit_max_rfer(l, qgroup_limit, max_rfer);
btrfs_set_qgroup_limit_max_excl(l, qgroup_limit, max_excl);
@@ -687,8 +686,7 @@ static int update_qgroup_info_item(struct 
btrfs_trans_handle *trans,
 
l = path-nodes[0];
slot = path-slots[0];
-   qgroup_info = btrfs_item_ptr(l, path-slots[0],
-struct btrfs_qgroup_info_item);
+   qgroup_info = btrfs_item_ptr(l, slot, struct btrfs_qgroup_info_item);
btrfs_set_qgroup_info_generation(l, qgroup_info, trans-transid);
btrfs_set_qgroup_info_rfer(l, qgroup_info, qgroup-rfer);
btrfs_set_qgroup_info_rfer_cmpr(l, qgroup_info, qgroup-rfer_cmpr);
@@ -1349,7 +1347,6 @@ int btrfs_qgroup_account_ref(struct btrfs_trans_handle 
*trans,
 struct btrfs_delayed_ref_node *node,
 struct btrfs_delayed_extent_op *extent_op)
 {
-   struct btrfs_key ins;
struct btrfs_root *quota_root;
u64 ref_root;
struct btrfs_qgroup *qgroup;
@@ -1363,10 +1360,6 @@ int btrfs_qgroup_account_ref(struct btrfs_trans_handle 
*trans,
 
BUG_ON(!fs_info-quota_root);
 
-   ins.objectid = node-bytenr;
-   ins.offset = node-num_bytes;
-   ins.type = BTRFS_EXTENT_ITEM_KEY;
-
if (node-type == BTRFS_TREE_BLOCK_REF_KEY ||
node-type == BTRFS_SHARED_BLOCK_REF_KEY) {
struct btrfs_delayed_tree_ref *ref;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/9] btrfs: replace path-slots[0] with otherwise unused variable 'slot'

2013-11-04 Thread Valentina Giusti
Signed-off-by: Valentina Giusti valentina.giu...@microon.de
---
 fs/btrfs/backref.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index 0552a59..5418435 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -1681,8 +1681,8 @@ static int iterate_inode_extrefs(u64 inum, struct 
btrfs_root *fs_root,
btrfs_release_path(path);
 
leaf = path-nodes[0];
-   item_size = btrfs_item_size_nr(leaf, path-slots[0]);
-   ptr = btrfs_item_ptr_offset(leaf, path-slots[0]);
+   item_size = btrfs_item_size_nr(leaf, slot);
+   ptr = btrfs_item_ptr_offset(leaf, slot);
cur_offset = 0;
 
while (cur_offset  item_size) {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/9] btrfs: remove unused variable from btrfs_new_inode

2013-11-04 Thread Valentina Giusti
Variable owner in btrfs_new_inode is unused since commit
d82a6f1d7e8b61ed5996334d0db66651bb43641d
(Btrfs: kill BTRFS_I(inode)-block_group)

Signed-off-by: Valentina Giusti valentina.giu...@microon.de
---
 fs/btrfs/inode.c |6 --
 1 file changed, 6 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3b4ffaf..7e09eff 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5347,7 +5347,6 @@ static struct inode *btrfs_new_inode(struct 
btrfs_trans_handle *trans,
u32 sizes[2];
unsigned long ptr;
int ret;
-   int owner;
 
path = btrfs_alloc_path();
if (!path)
@@ -5393,11 +5392,6 @@ static struct inode *btrfs_new_inode(struct 
btrfs_trans_handle *trans,
 */
set_bit(BTRFS_INODE_NEEDS_FULL_SYNC, BTRFS_I(inode)-runtime_flags);
 
-   if (S_ISDIR(mode))
-   owner = 0;
-   else
-   owner = 1;
-
key[0].objectid = objectid;
btrfs_set_key_type(key[0], BTRFS_INODE_ITEM_KEY);
key[0].offset = 0;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/9] btrfs: remove unused variables from disk-io.c

2013-11-04 Thread Valentina Giusti
Remove unused variables:
* tree from csum_dirty_buffer,
* tree from btree_readpage_end_io_hook,
* tree from btree_writepages,
* bytenr from btrfs_create_tree,
* fs_info from end_workqueue_fn.

Signed-off-by: Valentina Giusti valentina.giu...@microon.de
---
 fs/btrfs/disk-io.c |   11 ---
 1 file changed, 11 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 62176ad..9c2bb64 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -466,13 +466,10 @@ static int btree_read_extent_buffer_pages(struct 
btrfs_root *root,
 
 static int csum_dirty_buffer(struct btrfs_root *root, struct page *page)
 {
-   struct extent_io_tree *tree;
u64 start = page_offset(page);
u64 found_start;
struct extent_buffer *eb;
 
-   tree = BTRFS_I(page-mapping-host)-io_tree;
-
eb = (struct extent_buffer *)page-private;
if (page != eb-pages[0])
return 0;
@@ -577,7 +574,6 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio 
*io_bio,
  u64 phy_offset, struct page *page,
  u64 start, u64 end, int mirror)
 {
-   struct extent_io_tree *tree;
u64 found_start;
int found_level;
struct extent_buffer *eb;
@@ -588,7 +584,6 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio 
*io_bio,
if (!page-private)
goto out;
 
-   tree = BTRFS_I(page-mapping-host)-io_tree;
eb = (struct extent_buffer *)page-private;
 
/* the pending IO might have been the only thing that kept this buffer
@@ -975,11 +970,9 @@ static int btree_migratepage(struct address_space *mapping,
 static int btree_writepages(struct address_space *mapping,
struct writeback_control *wbc)
 {
-   struct extent_io_tree *tree;
struct btrfs_fs_info *fs_info;
int ret;
 
-   tree = BTRFS_I(mapping-host)-io_tree;
if (wbc-sync_mode == WB_SYNC_NONE) {
 
if (wbc-for_kupdate)
@@ -1262,7 +1255,6 @@ struct btrfs_root *btrfs_create_tree(struct 
btrfs_trans_handle *trans,
struct btrfs_root *root;
struct btrfs_key key;
int ret = 0;
-   u64 bytenr;
uuid_le uuid;
 
root = btrfs_alloc_root(fs_info);
@@ -1284,7 +1276,6 @@ struct btrfs_root *btrfs_create_tree(struct 
btrfs_trans_handle *trans,
goto fail;
}
 
-   bytenr = leaf-start;
memset_extent_buffer(leaf, 0, 0, sizeof(struct btrfs_header));
btrfs_set_header_bytenr(leaf, leaf-start);
btrfs_set_header_generation(leaf, trans-transid);
@@ -1673,12 +1664,10 @@ static void end_workqueue_fn(struct btrfs_work *work)
 {
struct bio *bio;
struct end_io_wq *end_io_wq;
-   struct btrfs_fs_info *fs_info;
int error;
 
end_io_wq = container_of(work, struct end_io_wq, work);
bio = end_io_wq-bio;
-   fs_info = end_io_wq-info;
 
error = end_io_wq-error;
bio-bi_private = end_io_wq-private;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


OOPS on 3.11.6

2013-11-04 Thread Andy Lutomirski
(This is Fedora's kernel 3.11.6-200.fc19.x86_64)

I have a file on my btrfs filesystem.  Reading it results in:

[  170.261789] general protection fault:  [#1] SMP
[  170.261950] Modules linked in: rfcomm fuse xt_CHECKSUM
nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE
ip6t_REJECT tun bnep bluetooth rfkill xt_conntrack ebtable_nat
ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat
nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle
ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
iptable_mangle iptable_security iptable_raw f71882fg vfat fat
snd_hda_codec_hdmi snd_hda_codec_realtek btrfs zlib_deflate raid6_pq
libcrc32c xor x86_pkg_temp_thermal coretemp kvm_intel iTCO_wdt joydev
kvm iTCO_vendor_support snd_hda_intel mxm_wmi snd_hda_codec snd_hwdep
snd_seq snd_seq_device snd_pcm microcode sb_edac i2c_i801 serio_raw
edac_core e1000e snd_page_alloc
[  170.264416]  ntb snd_timer mei_me snd ptp mei lpc_ich soundcore
shpchp pps_core wmi mfd_core mperf uinput binfmt_misc dm_crypt radeon
hid_logitech_dj crc32_pclmul i2c_algo_bit drm_kms_helper crc32c_intel
ttm ghash_clmulni_intel drm firewire_ohci firewire_core i2c_core
crc_itu_t
[  170.265260] CPU: 0 PID: 2947 Comm: thg Tainted: GW
3.11.6-200.fc19.x86_64 #1
[  170.265503] Hardware name: MSI MS-7760/X79A-GD65 (8D) (MS-7760),
BIOS V1.8 10/18/2012
[  170.265745] task: 88042ee48f40 ti: 8803fdb3 task.ti:
8803fdb3
[  170.265975] RIP: 0010:[81307d7d]  [81307d7d]
memcpy+0xd/0x110
[  170.266209] RSP: 0018:8803fdb31960  EFLAGS: 00010202
[  170.266373] RAX: 88040369ede9 RBX: 006c RCX: 000d
[  170.266592] RDX: 0004 RSI: 00050800 RDI: 88040369ede9
[  170.266812] RBP: 8803fdb31998 R08: 1000 R09: 88040369e000
[  170.267031] R10:  R11:  R12: 8803eac91390
[  170.267251] R13: 1600 R14: 88040369ee55 R15: 006c
[  170.267471] FS:  7f223b42c740() GS:88045fc0()
knlGS:
[  170.267721] CS:  0010 DS:  ES:  CR0: 80050033
[  170.267898] CR2: 031a1024 CR3: 000412d6a000 CR4: 000407f0
[  170.268117] Stack:
[  170.268179]  a04bf26c 1000 8804477de000

[  170.268424]  88041688a900 8803eaca2210 8803f0fcda18
8803fdb31a58
[  170.268666]  a04a4376  019d

[  170.268909] Call Trace:
[  170.268994]  [a04bf26c] ? read_extent_buffer+0xbc/0x110 [btrfs]
[  170.269202]  [a04a4376] btrfs_get_extent+0x926/0x9b0 [btrfs]
[  170.269403]  [a04bc53e] __extent_read_full_page+0x2ee/0x700 [btrfs]
[  170.269622]  [a04a3a50] ? btrfs_submit_direct+0x660/0x660 [btrfs]
[  170.269832]  [81159f53] ? __inc_zone_page_state+0x33/0x40
[  170.270028]  [a04a3a50] ? btrfs_submit_direct+0x660/0x660 [btrfs]
[  170.270243]  [a04bd945] extent_readpages+0x195/0x200 [btrfs]
[  170.270440]  [81183129] ? alloc_pages_current+0xa9/0x170
[  170.270635]  [a04a136f] btrfs_readpages+0x1f/0x30 [btrfs]
[  170.270824]  [811484fe] __do_page_cache_readahead+0x1ae/0x240
[  170.271027]  [811489c6] ondemand_readahead+0x126/0x250
[  170.271212]  [81148b23] page_cache_sync_readahead+0x33/0x50
[  170.271410]  [8113da45] generic_file_aio_read+0x4b5/0x700
[  170.271604]  [811a7ab0] do_sync_read+0x80/0xb0
[  170.271766]  [811a80de] vfs_read+0x9e/0x170
[  170.271921]  [811a8c09] SyS_read+0x49/0xa0
[  170.272074]  [810e6496] ? __audit_syscall_exit+0x1f6/0x2a0
[  170.272271]  [81656e99] system_call_fastpath+0x16/0x1b
[  170.272458] Code: 43 4e 5b 5d c3 66 0f 1f 84 00 00 00 00 00 e8 fb
fb ff ff eb e2 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 48 c1 e9
03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56
10 4c
[  170.273297] RIP  [81307d7d] memcpy+0xd/0x110
[  170.273457]  RSP 8803fdb31960
[  170.348204] ---[ end trace 7d04a6835a0093fd ]---


This issue has survived a reboot.

(The taint flag is due to a bogus BGRT table in my EFI BIOS.  It's not
corrupting any kernel data structures.)


--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Block layer stuff/DIO rewrite prep for 3.14

2013-11-04 Thread Kent Overstreet
Now that immutable biovecs is in, these are the remaining patches required for
my DIO rewrite, along with some related cleanup/refactoring.

The key enabler is patch 4 - making generic_make_request() handle arbitary sized
bios. This takes what was once bio_add_page()'s responsibility and pushes it
down; long term plan is to hopefully push this down to the driver level, which
should simplify a lot of code (a lot of it very fragile code!) and improve
performance at the same time.

The DIO rewrite needs some more work before it'll be ready, but I wanted to get
this patch series out because this stuff should all be ready and it's useful for
other reasons - I think this stuff will make it a lot easier for the btrfs
people to do what they need with their DIO code.

Other stuff enabled by this:

 * With this and the bio_split() rewrite already in, we can just delete
   merge_bvec_fn. I have patches for this, but they'll need more testing.

 * Multipage bvecs - bvecs pointing to an arbitrary amount of contiguous
   physical memory. I have this working but it'll need more testing and code
   auditing - this is what lets us kill bi_seg_front_size and bi_seg_back_size
   though.

Patch series is based on Jens' for-next tree, and it's available in my git
repository - git://evilpiepirate.org/~kent/linux-bcache.git for-jens

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/9] block: submit_bio_wait() conversions

2013-11-04 Thread Kent Overstreet
It was being open coded in a few places.

Signed-off-by: Kent Overstreet k...@daterainc.com
Cc: Jens Axboe ax...@kernel.dk
Cc: Joern Engel jo...@logfs.org
Cc: Prasad Joshi prasadjoshi.li...@gmail.com
---
 block/blk-flush.c   | 19 +--
 fs/logfs/dev_bdev.c |  8 +---
 2 files changed, 2 insertions(+), 25 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 5580b05..9288aaf 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -502,15 +502,6 @@ void blk_abort_flushes(struct request_queue *q)
}
 }
 
-static void bio_end_flush(struct bio *bio, int err)
-{
-   if (err)
-   clear_bit(BIO_UPTODATE, bio-bi_flags);
-   if (bio-bi_private)
-   complete(bio-bi_private);
-   bio_put(bio);
-}
-
 /**
  * blkdev_issue_flush - queue a flush
  * @bdev:  blockdev to issue flush for
@@ -526,7 +517,6 @@ static void bio_end_flush(struct bio *bio, int err)
 int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask,
sector_t *error_sector)
 {
-   DECLARE_COMPLETION_ONSTACK(wait);
struct request_queue *q;
struct bio *bio;
int ret = 0;
@@ -548,13 +538,9 @@ int blkdev_issue_flush(struct block_device *bdev, gfp_t 
gfp_mask,
return -ENXIO;
 
bio = bio_alloc(gfp_mask, 0);
-   bio-bi_end_io = bio_end_flush;
bio-bi_bdev = bdev;
-   bio-bi_private = wait;
 
-   bio_get(bio);
-   submit_bio(WRITE_FLUSH, bio);
-   wait_for_completion_io(wait);
+   ret = submit_bio_wait(WRITE_FLUSH, bio);
 
/*
 * The driver must store the error location in -bi_sector, if
@@ -564,9 +550,6 @@ int blkdev_issue_flush(struct block_device *bdev, gfp_t 
gfp_mask,
if (error_sector)
*error_sector = bio-bi_iter.bi_sector;
 
-   if (!bio_flagged(bio, BIO_UPTODATE))
-   ret = -EIO;
-
bio_put(bio);
return ret;
 }
diff --git a/fs/logfs/dev_bdev.c b/fs/logfs/dev_bdev.c
index ca42715..80adce7 100644
--- a/fs/logfs/dev_bdev.c
+++ b/fs/logfs/dev_bdev.c
@@ -23,7 +23,6 @@ static int sync_request(struct page *page, struct 
block_device *bdev, int rw)
 {
struct bio bio;
struct bio_vec bio_vec;
-   struct completion complete;
 
bio_init(bio);
bio.bi_max_vecs = 1;
@@ -35,13 +34,8 @@ static int sync_request(struct page *page, struct 
block_device *bdev, int rw)
bio.bi_iter.bi_size = PAGE_SIZE;
bio.bi_bdev = bdev;
bio.bi_iter.bi_sector = page-index * (PAGE_SIZE  9);
-   init_completion(complete);
-   bio.bi_private = complete;
-   bio.bi_end_io = request_complete;
 
-   submit_bio(rw, bio);
-   wait_for_completion(complete);
-   return test_bit(BIO_UPTODATE, bio.bi_flags) ? 0 : -EIO;
+   return submit_bio_wait(rw, bio);
 }
 
 static int bdev_readpage(void *_sb, struct page *page)
-- 
1.8.4.rc3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 9/9] block: Add bio_get_user_pages()

2013-11-04 Thread Kent Overstreet
This replaces some of the code that was in __bio_map_user_iov(), and
soon we're going to use this helper in the dio code.

Note that this relies on the recent change to make
generic_make_request() take arbitrary sized bios - we're not using
bio_add_page() here.

Signed-off-by: Kent Overstreet k...@daterainc.com
Cc: Jens Axboe ax...@kernel.dk
---
 fs/bio.c| 124 +++-
 include/linux/bio.h |   2 +
 2 files changed, 67 insertions(+), 59 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index c60bfcb..a10b350 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1124,17 +1124,70 @@ struct bio *bio_copy_user(struct request_queue *q, 
struct rq_map_data *map_data,
 }
 EXPORT_SYMBOL(bio_copy_user);
 
+/**
+ * bio_get_user_pages - pin user pages and add them to a biovec
+ * @bio: bio to add pages to
+ * @uaddr: start of user address
+ * @len: length in bytes
+ * @write_to_vm: bool indicating writing to pages or not
+ *
+ * Pins pages for up to @len bytes and appends them to @bio's bvec array. May
+ * pin only part of the requested pages - @bio need not have room for all the
+ * pages and can already have had pages added to it.
+ *
+ * Returns the number of bytes from @len added to @bio.
+ */
+ssize_t bio_get_user_pages(struct bio *bio, unsigned long uaddr,
+  unsigned long len, int write_to_vm)
+{
+   int ret;
+   unsigned nr_pages, bytes;
+   unsigned offset = offset_in_page(uaddr);
+   struct bio_vec *bv;
+   struct page **pages;
+
+   nr_pages = min_t(size_t,
+DIV_ROUND_UP(len + offset, PAGE_SIZE),
+bio-bi_max_vecs - bio-bi_vcnt);
+
+   bv = bio-bi_io_vec[bio-bi_vcnt];
+   pages = (void *) bv;
+
+   ret = get_user_pages_fast(uaddr, nr_pages, write_to_vm, pages);
+   if (ret  0)
+   return ret;
+
+   bio-bi_vcnt += ret;
+   bytes = ret * PAGE_SIZE - offset;
+
+   while (ret--) {
+   bv[ret].bv_page = pages[ret];
+   bv[ret].bv_len = PAGE_SIZE;
+   bv[ret].bv_offset = 0;
+   }
+
+   bv[0].bv_offset += offset;
+   bv[0].bv_len -= offset;
+
+   if (bytes  len) {
+   bio-bi_io_vec[bio-bi_vcnt - 1].bv_len -= bytes - len;
+   bytes = len;
+   }
+
+   bio-bi_iter.bi_size += bytes;
+
+   return bytes;
+}
+EXPORT_SYMBOL(bio_get_user_pages);
+
 static struct bio *__bio_map_user_iov(struct request_queue *q,
  struct block_device *bdev,
  struct sg_iovec *iov, int iov_count,
  int write_to_vm, gfp_t gfp_mask)
 {
-   int i, j;
-   int nr_pages = 0;
-   struct page **pages;
+   ssize_t ret;
+   int i, nr_pages = 0;
struct bio *bio;
-   int cur_page = 0;
-   int ret, offset;
 
for (i = 0; i  iov_count; i++) {
unsigned long uaddr = (unsigned long)iov[i].iov_base;
@@ -1163,57 +1216,17 @@ static struct bio *__bio_map_user_iov(struct 
request_queue *q,
if (!bio)
return ERR_PTR(-ENOMEM);
 
-   ret = -ENOMEM;
-   pages = kcalloc(nr_pages, sizeof(struct page *), gfp_mask);
-   if (!pages)
-   goto out;
-
for (i = 0; i  iov_count; i++) {
-   unsigned long uaddr = (unsigned long)iov[i].iov_base;
-   unsigned long len = iov[i].iov_len;
-   unsigned long end = (uaddr + len + PAGE_SIZE - 1)  PAGE_SHIFT;
-   unsigned long start = uaddr  PAGE_SHIFT;
-   const int local_nr_pages = end - start;
-   const int page_limit = cur_page + local_nr_pages;
-
-   ret = get_user_pages_fast(uaddr, local_nr_pages,
-   write_to_vm, pages[cur_page]);
-   if (ret  local_nr_pages) {
-   ret = -EFAULT;
-   goto out_unmap;
-   }
-
-   offset = uaddr  ~PAGE_MASK;
-   for (j = cur_page; j  page_limit; j++) {
-   unsigned int bytes = PAGE_SIZE - offset;
+   ret = bio_get_user_pages(bio, (size_t) iov[i].iov_base,
+iov[i].iov_len,
+write_to_vm);
+   if (ret  0)
+   goto out;
 
-   if (len = 0)
-   break;
-   
-   if (bytes  len)
-   bytes = len;
-
-   /*
-* sorry...
-*/
-   if (bio_add_pc_page(q, bio, pages[j], bytes, offset) 
-   bytes)
-   break;
-
-   len -= bytes;
-   offset = 0;
-   }
-
-   cur_page = j;
-

[PATCH 6/9] mtip32xx: handle arbitrary size bios

2013-11-04 Thread Kent Overstreet
We get a measurable performance increase by handling this in the driver when
we're already looping over the biovec, instead of handling it separately in
generic_make_request() (or bio_add_page() originally)

Signed-off-by: Kent Overstreet k...@daterainc.com
---
 drivers/block/mtip32xx/mtip32xx.c | 46 +--
 1 file changed, 15 insertions(+), 31 deletions(-)

diff --git a/drivers/block/mtip32xx/mtip32xx.c 
b/drivers/block/mtip32xx/mtip32xx.c
index d4c669b..c5a7a96 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -2648,24 +2648,6 @@ static void mtip_hw_submit_io(struct driver_data *dd, 
sector_t sector,
 }
 
 /*
- * Release a command slot.
- *
- * @dd  Pointer to the driver data structure.
- * @tag Slot tag
- *
- * return value
- *  None
- */
-static void mtip_hw_release_scatterlist(struct driver_data *dd, int tag,
-   int unaligned)
-{
-   struct semaphore *sem = unaligned ? dd-port-cmd_slot_unal :
-   dd-port-cmd_slot;
-   release_slot(dd-port, tag);
-   up(sem);
-}
-
-/*
  * Obtain a command slot and return its associated scatter list.
  *
  * @dd  Pointer to the driver data structure.
@@ -4016,21 +3998,22 @@ static void mtip_make_request(struct request_queue 
*queue, struct bio *bio)
 
sg = mtip_hw_get_scatterlist(dd, tag, unaligned);
if (likely(sg != NULL)) {
-   if (unlikely((bio)-bi_vcnt  MTIP_MAX_SG)) {
-   dev_warn(dd-pdev-dev,
-   Maximum number of SGL entries exceeded\n);
-   bio_io_error(bio);
-   mtip_hw_release_scatterlist(dd, tag, unaligned);
-   return;
-   }
-
/* Create the scatter list for this bio. */
bio_for_each_segment(bvec, bio, iter) {
-   sg_set_page(sg[nents],
-   bvec.bv_page,
-   bvec.bv_len,
-   bvec.bv_offset);
-   nents++;
+   if (unlikely(nents == MTIP_MAX_SG)) {
+   struct bio *split = bio_clone(bio, GFP_NOIO);
+
+   split-bi_iter = iter;
+   bio-bi_iter.bi_size -= iter.bi_size;
+   bio_chain(split, bio);
+   generic_make_request(split);
+   break;
+   }
+
+   sg_set_page(sg[nents++],
+   bvec.bv_page,
+   bvec.bv_len,
+   bvec.bv_offset);
}
 
/* Issue the read/write. */
@@ -4145,6 +4128,7 @@ skip_create_disk:
blk_queue_max_hw_sectors(dd-queue, 0x);
blk_queue_max_segment_size(dd-queue, 0x40);
blk_queue_io_min(dd-queue, 4096);
+   set_bit(QUEUE_FLAG_LARGEBIOS,   dd-queue-queue_flags);
 
/*
 * write back cache is not supported in the device. FUA depends on
-- 
1.8.4.rc3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/9] bcache: generic_make_request() handles large bios now

2013-11-04 Thread Kent Overstreet
So we get to delete our hacky workaround.

Signed-off-by: Kent Overstreet k...@daterainc.com
---
 drivers/md/bcache/bcache.h|  18 
 drivers/md/bcache/io.c| 100 +-
 drivers/md/bcache/journal.c   |   4 +-
 drivers/md/bcache/request.c   |  16 +++
 drivers/md/bcache/super.c |  33 ++
 drivers/md/bcache/util.h  |   5 ++-
 drivers/md/bcache/writeback.c |   4 +-
 include/linux/bio.h   |  12 -
 8 files changed, 19 insertions(+), 173 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 964353c..8f65331 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -241,19 +241,6 @@ struct keybuf {
DECLARE_ARRAY_ALLOCATOR(struct keybuf_key, freelist, KEYBUF_NR);
 };
 
-struct bio_split_pool {
-   struct bio_set  *bio_split;
-   mempool_t   *bio_split_hook;
-};
-
-struct bio_split_hook {
-   struct closure  cl;
-   struct bio_split_pool   *p;
-   struct bio  *bio;
-   bio_end_io_t*bi_end_io;
-   void*bi_private;
-};
-
 struct bcache_device {
struct closure  cl;
 
@@ -286,8 +273,6 @@ struct bcache_device {
int (*cache_miss)(struct btree *, struct search *,
  struct bio *, unsigned);
int (*ioctl) (struct bcache_device *, fmode_t, unsigned, unsigned long);
-
-   struct bio_split_pool   bio_split_hook;
 };
 
 struct io {
@@ -465,8 +450,6 @@ struct cache {
atomic_long_t   meta_sectors_written;
atomic_long_t   btree_sectors_written;
atomic_long_t   sectors_written;
-
-   struct bio_split_pool   bio_split_hook;
 };
 
 struct gc_stat {
@@ -901,7 +884,6 @@ void bch_bbio_endio(struct cache_set *, struct bio *, int, 
const char *);
 void bch_bbio_free(struct bio *, struct cache_set *);
 struct bio *bch_bbio_alloc(struct cache_set *);
 
-void bch_generic_make_request(struct bio *, struct bio_split_pool *);
 void __bch_submit_bbio(struct bio *, struct cache_set *);
 void bch_submit_bbio(struct bio *, struct cache_set *, struct bkey *, 
unsigned);
 
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index fa028fa..86a0bb8 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -11,104 +11,6 @@
 
 #include linux/blkdev.h
 
-static unsigned bch_bio_max_sectors(struct bio *bio)
-{
-   struct request_queue *q = bdev_get_queue(bio-bi_bdev);
-   struct bio_vec bv;
-   struct bvec_iter iter;
-   unsigned ret = 0, seg = 0;
-
-   if (bio-bi_rw  REQ_DISCARD)
-   return min(bio_sectors(bio), q-limits.max_discard_sectors);
-
-   bio_for_each_segment(bv, bio, iter) {
-   struct bvec_merge_data bvm = {
-   .bi_bdev= bio-bi_bdev,
-   .bi_sector  = bio-bi_iter.bi_sector,
-   .bi_size= ret  9,
-   .bi_rw  = bio-bi_rw,
-   };
-
-   if (seg == min_t(unsigned, BIO_MAX_PAGES,
-queue_max_segments(q)))
-   break;
-
-   if (q-merge_bvec_fn 
-   q-merge_bvec_fn(q, bvm, bv)  (int) bv.bv_len)
-   break;
-
-   seg++;
-   ret += bv.bv_len  9;
-   }
-
-   ret = min(ret, queue_max_sectors(q));
-
-   WARN_ON(!ret);
-   ret = max_t(int, ret, bio_iovec(bio).bv_len  9);
-
-   return ret;
-}
-
-static void bch_bio_submit_split_done(struct closure *cl)
-{
-   struct bio_split_hook *s = container_of(cl, struct bio_split_hook, cl);
-
-   s-bio-bi_end_io = s-bi_end_io;
-   s-bio-bi_private = s-bi_private;
-   bio_endio_nodec(s-bio, 0);
-
-   closure_debug_destroy(s-cl);
-   mempool_free(s, s-p-bio_split_hook);
-}
-
-static void bch_bio_submit_split_endio(struct bio *bio, int error)
-{
-   struct closure *cl = bio-bi_private;
-   struct bio_split_hook *s = container_of(cl, struct bio_split_hook, cl);
-
-   if (error)
-   clear_bit(BIO_UPTODATE, s-bio-bi_flags);
-
-   bio_put(bio);
-   closure_put(cl);
-}
-
-void bch_generic_make_request(struct bio *bio, struct bio_split_pool *p)
-{
-   struct bio_split_hook *s;
-   struct bio *n;
-
-   if (!bio_has_data(bio)  !(bio-bi_rw  REQ_DISCARD))
-   goto submit;
-
-   if (bio_sectors(bio) = bch_bio_max_sectors(bio))
-   goto submit;
-
-   s = mempool_alloc(p-bio_split_hook, GFP_NOIO);
-   closure_init(s-cl, NULL);
-
-   s-bio  = bio;
-   s-p= p;
-   s-bi_end_io= bio-bi_end_io;
-   s-bi_private   = bio-bi_private;
-   bio_get(bio);
-
-   do {
-   n = bio_next_split(bio, bch_bio_max_sectors(bio),
-  GFP_NOIO, s-p-bio_split);
-
-   

[PATCH 7/9] blk-lib.c: generic_make_request() handles large bios now

2013-11-04 Thread Kent Overstreet
generic_make_request() will now do for us what the code in blk-lib.c was
doing manually, with the bio_batch stuff - we still need some looping in
case we're trying to discard/zeroout more than around a gigabyte, but
when we can submit that much at a time doing the submissions in parallel
really shouldn't matter.

Signed-off-by: Kent Overstreet k...@daterainc.com
Cc: Jens Axboe ax...@kernel.dk
---
 block/blk-lib.c | 175 ++--
 1 file changed, 30 insertions(+), 145 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 2da76c9..368c36a 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -9,23 +9,6 @@
 
 #include blk.h
 
-struct bio_batch {
-   atomic_tdone;
-   unsigned long   flags;
-   struct completion   *wait;
-};
-
-static void bio_batch_end_io(struct bio *bio, int err)
-{
-   struct bio_batch *bb = bio-bi_private;
-
-   if (err  (err != -EOPNOTSUPP))
-   clear_bit(BIO_UPTODATE, bb-flags);
-   if (atomic_dec_and_test(bb-done))
-   complete(bb-wait);
-   bio_put(bio);
-}
-
 /**
  * blkdev_issue_discard - queue a discard
  * @bdev:  blockdev to issue discard for
@@ -40,15 +23,10 @@ static void bio_batch_end_io(struct bio *bio, int err)
 int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
sector_t nr_sects, gfp_t gfp_mask, unsigned long flags)
 {
-   DECLARE_COMPLETION_ONSTACK(wait);
struct request_queue *q = bdev_get_queue(bdev);
int type = REQ_WRITE | REQ_DISCARD;
-   unsigned int max_discard_sectors, granularity;
-   int alignment;
-   struct bio_batch bb;
struct bio *bio;
int ret = 0;
-   struct blk_plug plug;
 
if (!q)
return -ENXIO;
@@ -56,78 +34,28 @@ int blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
if (!blk_queue_discard(q))
return -EOPNOTSUPP;
 
-   /* Zero-sector (unknown) and one-sector granularities are the same.  */
-   granularity = max(q-limits.discard_granularity  9, 1U);
-   alignment = (bdev_discard_alignment(bdev)  9) % granularity;
-
-   /*
-* Ensure that max_discard_sectors is of the proper
-* granularity, so that requests stay aligned after a split.
-*/
-   max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX  9);
-   max_discard_sectors -= max_discard_sectors % granularity;
-   if (unlikely(!max_discard_sectors)) {
-   /* Avoid infinite loop below. Being cautious never hurts. */
-   return -EOPNOTSUPP;
-   }
-
if (flags  BLKDEV_DISCARD_SECURE) {
if (!blk_queue_secdiscard(q))
return -EOPNOTSUPP;
type |= REQ_SECURE;
}
 
-   atomic_set(bb.done, 1);
-   bb.flags = 1  BIO_UPTODATE;
-   bb.wait = wait;
-
-   blk_start_plug(plug);
while (nr_sects) {
-   unsigned int req_sects;
-   sector_t end_sect, tmp;
-
bio = bio_alloc(gfp_mask, 1);
-   if (!bio) {
-   ret = -ENOMEM;
-   break;
-   }
-
-   req_sects = min_t(sector_t, nr_sects, max_discard_sectors);
-
-   /*
-* If splitting a request, and the next starting sector would be
-* misaligned, stop the discard at the previous aligned sector.
-*/
-   end_sect = sector + req_sects;
-   tmp = end_sect;
-   if (req_sects  nr_sects 
-   sector_div(tmp, granularity) != alignment) {
-   end_sect = end_sect - alignment;
-   sector_div(end_sect, granularity);
-   end_sect = end_sect * granularity + alignment;
-   req_sects = end_sect - sector;
-   }
+   if (!bio)
+   return -ENOMEM;
 
-   bio-bi_iter.bi_sector = sector;
-   bio-bi_end_io = bio_batch_end_io;
bio-bi_bdev = bdev;
-   bio-bi_private = bb;
+   bio-bi_iter.bi_sector = sector;
+   bio-bi_iter.bi_size = min_t(sector_t, nr_sects, 1  20)  9;
 
-   bio-bi_iter.bi_size = req_sects  9;
-   nr_sects -= req_sects;
-   sector = end_sect;
+   sector += bio_sectors(bio);
+   nr_sects -= bio_sectors(bio);
 
-   atomic_inc(bb.done);
-   submit_bio(type, bio);
+   ret = submit_bio_wait(type, bio);
+   if (ret)
+   break;
}
-   blk_finish_plug(plug);
-
-   /* Wait for bios in-flight */
-   if (!atomic_dec_and_test(bb.done))
-   wait_for_completion_io(wait);
-
-   if (!test_bit(BIO_UPTODATE, bb.flags))
-   ret = -EIO;
 
return 

[PATCH 1/9] block: Convert various code to bio_for_each_segment()

2013-11-04 Thread Kent Overstreet
With immutable biovecs we don't want code accessing bi_io_vec directly -
the uses this patch changes weren't incorrect since they all own the
bio, but it makes the code harder to audit for no good reason - also,
this will help with multipage bvecs later.

Signed-off-by: Kent Overstreet k...@daterainc.com
Cc: Jens Axboe ax...@kernel.dk
Cc: Alexander Viro v...@zeniv.linux.org.uk
Cc: Chris Mason chris.ma...@fusionio.com
Cc: Jaegeuk Kim jaegeuk@samsung.com
Cc: Joern Engel jo...@logfs.org
Cc: Prasad Joshi prasadjoshi.li...@gmail.com
Cc: Trond Myklebust trond.mykleb...@netapp.com
---
 fs/btrfs/compression.c   | 10 --
 fs/btrfs/disk-io.c   | 11 ---
 fs/btrfs/extent_io.c | 37 ++---
 fs/btrfs/inode.c | 15 ++-
 fs/f2fs/data.c   | 13 +
 fs/f2fs/segment.c| 12 +---
 fs/logfs/dev_bdev.c  | 18 +++---
 fs/mpage.c   | 17 -
 fs/nfs/blocklayout/blocklayout.c | 34 +-
 9 files changed, 66 insertions(+), 101 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 06ab821..52e7848 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -203,18 +203,16 @@ csum_failed:
if (cb-errors) {
bio_io_error(cb-orig_bio);
} else {
-   int bio_index = 0;
-   struct bio_vec *bvec = cb-orig_bio-bi_io_vec;
+   int i;
+   struct bio_vec *bvec;
 
/*
 * we have verified the checksum already, set page
 * checked so the end_io handlers know about it
 */
-   while (bio_index  cb-orig_bio-bi_vcnt) {
+   bio_for_each_segment_all(bvec, cb-orig_bio, i)
SetPageChecked(bvec-bv_page);
-   bvec++;
-   bio_index++;
-   }
+
bio_endio(cb-orig_bio, 0);
}
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 62176ad..733182e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -850,20 +850,17 @@ int btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info, 
struct inode *inode,
 
 static int btree_csum_one_bio(struct bio *bio)
 {
-   struct bio_vec *bvec = bio-bi_io_vec;
-   int bio_index = 0;
+   struct bio_vec *bvec;
struct btrfs_root *root;
-   int ret = 0;
+   int i, ret = 0;
 
-   WARN_ON(bio-bi_vcnt = 0);
-   while (bio_index  bio-bi_vcnt) {
+   bio_for_each_segment_all(bvec, bio, i) {
root = BTRFS_I(bvec-bv_page-mapping-host)-root;
ret = csum_dirty_buffer(root, bvec-bv_page);
if (ret)
break;
-   bio_index++;
-   bvec++;
}
+
return ret;
 }
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0df176a..ea5a08b 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2014,7 +2014,7 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, u64 
start,
}
bio-bi_bdev = dev-bdev;
bio_add_page(bio, page, length, start - page_offset(page));
-   btrfsic_submit_bio(WRITE_SYNC, bio);
+   btrfsic_submit_bio(WRITE_SYNC, bio); /* XXX: submit_bio_wait() */
wait_for_completion(compl);
 
if (!test_bit(BIO_UPTODATE, bio-bi_flags)) {
@@ -2340,12 +2340,13 @@ int end_extent_writepage(struct page *page, int err, 
u64 start, u64 end)
  */
 static void end_bio_extent_writepage(struct bio *bio, int err)
 {
-   struct bio_vec *bvec = bio-bi_io_vec + bio-bi_vcnt - 1;
+   struct bio_vec *bvec;
struct extent_io_tree *tree;
u64 start;
u64 end;
+   int i;
 
-   do {
+   bio_for_each_segment_all(bvec, bio, i) {
struct page *page = bvec-bv_page;
tree = BTRFS_I(page-mapping-host)-io_tree;
 
@@ -2363,14 +2364,11 @@ static void end_bio_extent_writepage(struct bio *bio, 
int err)
start = page_offset(page);
end = start + bvec-bv_offset + bvec-bv_len - 1;
 
-   if (--bvec = bio-bi_io_vec)
-   prefetchw(bvec-bv_page-flags);
-
if (end_extent_writepage(page, err, start, end))
continue;
 
end_page_writeback(page);
-   } while (bvec = bio-bi_io_vec);
+   }
 
bio_put(bio);
 }
@@ -2400,9 +2398,8 @@ endio_readpage_release_extent(struct extent_io_tree 
*tree, u64 start, u64 len,
  */
 static void end_bio_extent_readpage(struct bio *bio, int err)
 {
+   struct bio_vec *bvec;
int uptodate = test_bit(BIO_UPTODATE, bio-bi_flags);
-   struct bio_vec *bvec_end = bio-bi_io_vec + bio-bi_vcnt - 1;
-   struct bio_vec *bvec = bio-bi_io_vec;
struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
struct 

[PATCH 3/9] block: Move bouncing to generic_make_request()

2013-11-04 Thread Kent Overstreet
Next patch is going to make generic_make_request() handle arbitrary
sized bios by splitting them if necessary. It makes more sense to call
blk_queue_bounce() first, partly so it's working on larger bios - but also the
code that splits bios, and __blk_recalc_rq_segments(), won't have to take into
account bouncing (as it'll already have been done).

Also, __blk_recalc_rq_segments() now doesn't have to take into account
potential bouncing - it's already been done.

Signed-off-by: Kent Overstreet k...@daterainc.com
Cc: Jens Axboe ax...@kernel.dk
Cc: Jiri Kosina jkos...@suse.cz
Cc: Asai Thambi S P asamymuth...@micron.com
---
 block/blk-core.c  | 14 +++---
 block/blk-merge.c | 13 -
 drivers/block/mtip32xx/mtip32xx.c |  2 --
 drivers/block/pktcdvd.c   |  2 --
 4 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index d9cab97..3c7467e 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1466,13 +1466,6 @@ void blk_queue_bio(struct request_queue *q, struct bio 
*bio)
struct request *req;
unsigned int request_count = 0;
 
-   /*
-* low level driver can indicate that it wants pages above a
-* certain limit bounced to low memory (ie for highmem, or even
-* ISA dma in theory)
-*/
-   blk_queue_bounce(q, bio);
-
if (bio_integrity_enabled(bio)  bio_integrity_prep(bio)) {
bio_endio(bio, -EIO);
return;
@@ -1822,6 +1815,13 @@ void generic_make_request(struct bio *bio)
do {
struct request_queue *q = bdev_get_queue(bio-bi_bdev);
 
+   /*
+* low level driver can indicate that it wants pages above a
+* certain limit bounced to low memory (ie for highmem, or even
+* ISA dma in theory)
+*/
+   blk_queue_bounce(q, bio);
+
q-make_request_fn(q, bio);
 
bio = bio_list_pop(current-bio_list);
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 953b8df..9680ec73 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -13,7 +13,7 @@ static unsigned int __blk_recalc_rq_segments(struct 
request_queue *q,
 struct bio *bio)
 {
struct bio_vec bv, bvprv = { NULL };
-   int cluster, high, highprv = 1;
+   int cluster, prev = 0;
unsigned int seg_size, nr_phys_segs;
struct bio *fbio, *bbio;
struct bvec_iter iter;
@@ -27,13 +27,7 @@ static unsigned int __blk_recalc_rq_segments(struct 
request_queue *q,
nr_phys_segs = 0;
for_each_bio(bio) {
bio_for_each_segment(bv, bio, iter) {
-   /*
-* the trick here is making sure that a high page is
-* never considered part of another segment, since that
-* might change with the bounce page.
-*/
-   high = page_to_pfn(bv.bv_page)  queue_bounce_pfn(q);
-   if (!high  !highprv  cluster) {
+   if (prev  cluster) {
if (seg_size + bv.bv_len
 queue_max_segment_size(q))
goto new_segment;
@@ -44,6 +38,7 @@ static unsigned int __blk_recalc_rq_segments(struct 
request_queue *q,
 
seg_size += bv.bv_len;
bvprv = bv;
+   prev = 1;
continue;
}
 new_segment:
@@ -53,8 +48,8 @@ new_segment:
 
nr_phys_segs++;
bvprv = bv;
+   prev = 1;
seg_size = bv.bv_len;
-   highprv = high;
}
bbio = bio;
}
diff --git a/drivers/block/mtip32xx/mtip32xx.c 
b/drivers/block/mtip32xx/mtip32xx.c
index 52b2f2a..d4c669b 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -4016,8 +4016,6 @@ static void mtip_make_request(struct request_queue 
*queue, struct bio *bio)
 
sg = mtip_hw_get_scatterlist(dd, tag, unaligned);
if (likely(sg != NULL)) {
-   blk_queue_bounce(queue, bio);
-
if (unlikely((bio)-bi_vcnt  MTIP_MAX_SG)) {
dev_warn(dd-pdev-dev,
Maximum number of SGL entries exceeded\n);
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 1bf1f22..7991cc8 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2486,8 +2486,6 @@ static void pkt_make_request(struct request_queue *q, 
struct bio *bio)
goto end_io;
}
 
-   blk_queue_bounce(q, bio);
-
do {
sector_t zone = get_zone(bio-bi_iter.bi_sector, 

[PATCH 4/9] block: Make generic_make_request handle arbitrary sized bios

2013-11-04 Thread Kent Overstreet
The way the block layer is currently written, it goes to great lengths
to avoid having to split bios; upper layer code (such as bio_add_page())
checks what the underlying device can handle and tries to always create
bios that don't need to be split.

But this approach becomes unwieldy and eventually breaks down with
stacked devices and devices with dynamic limits, and it adds a lot of
complexity. If the block layer could split bios as needed, we could
eliminate a lot of complexity elsewhere - particularly in stacked
drivers. Code that creates bios can then create whatever size bios are
convenient, and more importantly stacked drivers don't have to deal with
both their own bio size limitations and the limitations of the
(potentially multiple) devices underneath them.

In the future this will let us delete merge_bvec_fn and a bunch of other code.

Signed-off-by: Kent Overstreet k...@daterainc.com
Cc: Jens Axboe ax...@kernel.dk
Cc: Neil Brown ne...@suse.de
Cc: Alasdair Kergon a...@redhat.com
Cc: dm-de...@redhat.com
---
 block/blk-core.c   |  26 ++-
 block/blk-merge.c  | 120 +
 block/blk.h|   3 ++
 include/linux/blkdev.h |   4 ++
 4 files changed, 143 insertions(+), 10 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 3c7467e..abc5d23 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -566,6 +566,10 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
if (q-id  0)
goto fail_c;
 
+   q-bio_split = bioset_create(4, 0);
+   if (!q-bio_split)
+   goto fail_id;
+
q-backing_dev_info.ra_pages =
(VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE;
q-backing_dev_info.state = 0;
@@ -575,7 +579,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
 
err = bdi_init(q-backing_dev_info);
if (err)
-   goto fail_id;
+   goto fail_split;
 
setup_timer(q-backing_dev_info.laptop_mode_wb_timer,
laptop_mode_timer_fn, (unsigned long) q);
@@ -620,6 +624,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
 
 fail_bdi:
bdi_destroy(q-backing_dev_info);
+fail_split:
+   bioset_free(q-bio_split);
 fail_id:
ida_simple_remove(blk_queue_ida, q-id);
 fail_c:
@@ -1681,15 +1687,6 @@ generic_make_request_checks(struct bio *bio)
goto end_io;
}
 
-   if (likely(bio_is_rw(bio) 
-  nr_sectors  queue_max_hw_sectors(q))) {
-   printk(KERN_ERR bio too big device %s (%u  %u)\n,
-  bdevname(bio-bi_bdev, b),
-  bio_sectors(bio),
-  queue_max_hw_sectors(q));
-   goto end_io;
-   }
-
part = bio-bi_bdev-bd_part;
if (should_fail_request(part, bio-bi_iter.bi_size) ||
should_fail_request(part_to_disk(part)-part0,
@@ -1814,6 +1811,7 @@ void generic_make_request(struct bio *bio)
current-bio_list = bio_list_on_stack;
do {
struct request_queue *q = bdev_get_queue(bio-bi_bdev);
+   struct bio *split = NULL;
 
/*
 * low level driver can indicate that it wants pages above a
@@ -1822,6 +1820,14 @@ void generic_make_request(struct bio *bio)
 */
blk_queue_bounce(q, bio);
 
+   if (!blk_queue_largebios(q))
+   split = blk_bio_segment_split(q, bio, q-bio_split);
+   if (split) {
+   bio_chain(split, bio);
+   bio_list_add(current-bio_list, bio);
+   bio = split;
+   }
+
q-make_request_fn(q, bio);
 
bio = bio_list_pop(current-bio_list);
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 9680ec73..6e07213 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -9,6 +9,126 @@
 
 #include blk.h
 
+static struct bio *blk_bio_discard_split(struct request_queue *q,
+struct bio *bio,
+struct bio_set *bs)
+{
+   unsigned int max_discard_sectors, granularity;
+   int alignment;
+   sector_t tmp;
+   unsigned split_sectors;
+
+   /* Zero-sector (unknown) and one-sector granularities are the same.  */
+   granularity = max(q-limits.discard_granularity  9, 1U);
+
+   max_discard_sectors = min(q-limits.max_discard_sectors, UINT_MAX  9);
+   max_discard_sectors -= max_discard_sectors % granularity;
+
+   if (unlikely(!max_discard_sectors)) {
+   /* XXX: warn */
+   return NULL;
+   }
+
+   if (bio_sectors(bio) = max_discard_sectors)
+   return NULL;
+
+   split_sectors = max_discard_sectors;
+
+   /*
+* If the next starting sector would be misaligned, stop the 

Re: [dm-devel] [PATCH 4/9] block: Make generic_make_request handle arbitrary sized bios

2013-11-04 Thread Mike Christie
On 11/04/2013 03:36 PM, Kent Overstreet wrote:
 @@ -1822,6 +1820,14 @@ void generic_make_request(struct bio *bio)
*/
   blk_queue_bounce(q, bio);
  
 + if (!blk_queue_largebios(q))
 + split = blk_bio_segment_split(q, bio, q-bio_split);


Is it assumed bios coming down this path are created using bio_add_page?
If not, does blk_bio_segment_split need a queue_max_sectors or
queue_max_hw_sectors check? I only saw a segment count check below.


 +
 +struct bio *blk_bio_segment_split(struct request_queue *q, struct bio *bio,
 +   struct bio_set *bs)
 +{
 + struct bio *split;
 + struct bio_vec bv, bvprv;
 + struct bvec_iter iter;
 + unsigned seg_size = 0, nsegs = 0;
 + int prev = 0;
 +
 + struct bvec_merge_data bvm = {
 + .bi_bdev= bio-bi_bdev,
 + .bi_sector  = bio-bi_iter.bi_sector,
 + .bi_size= 0,
 + .bi_rw  = bio-bi_rw,
 + };
 +
 + if (bio-bi_rw  REQ_DISCARD)
 + return blk_bio_discard_split(q, bio, bs);
 +
 + if (bio-bi_rw  REQ_WRITE_SAME)
 + return blk_bio_write_same_split(q, bio, bs);
 +
 + bio_for_each_segment(bv, bio, iter) {
 + if (q-merge_bvec_fn 
 + q-merge_bvec_fn(q, bvm, bv)  (int) bv.bv_len)
 + goto split;
 +
 + bvm.bi_size += bv.bv_len;
 +
 + if (prev  blk_queue_cluster(q)) {
 + if (seg_size + bv.bv_len  queue_max_segment_size(q))
 + goto new_segment;
 + if (!BIOVEC_PHYS_MERGEABLE(bvprv, bv))
 + goto new_segment;
 + if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bv))
 + goto new_segment;
 +
 + seg_size += bv.bv_len;
 + bvprv = bv;
 + prev = 1;
 + continue;
 + }
 +new_segment:
 + if (nsegs == queue_max_segments(q))
 + goto split;
 +
 + nsegs++;
 + bvprv = bv;
 + prev = 1;
 + seg_size = bv.bv_len;
 + }
 +
 + return NULL;
 +split:
 + split = bio_clone_bioset(bio, GFP_NOIO, bs);
 +
 + split-bi_iter.bi_size -= iter.bi_size;
 + bio-bi_iter = iter;
 +
 + if (bio_integrity(bio)) {
 + bio_integrity_advance(bio, split-bi_iter.bi_size);
 + bio_integrity_trim(split, 0, bio_sectors(split));
 + }
 +
 + return split;
 +}
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [dm-devel] [PATCH 4/9] block: Make generic_make_request handle arbitrary sized bios

2013-11-04 Thread Kent Overstreet
On Mon, Nov 04, 2013 at 03:56:52PM -0800, Mike Christie wrote:
 On 11/04/2013 03:36 PM, Kent Overstreet wrote:
  @@ -1822,6 +1820,14 @@ void generic_make_request(struct bio *bio)
   */
  blk_queue_bounce(q, bio);
   
  +   if (!blk_queue_largebios(q))
  +   split = blk_bio_segment_split(q, bio, q-bio_split);
 
 
 Is it assumed bios coming down this path are created using bio_add_page?
 If not, does blk_bio_segment_split need a queue_max_sectors or
 queue_max_hw_sectors check? I only saw a segment count check below.

Shoot, you're absolutely right - thanks, I'll have this fixed in the next
version.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[BUG]:bad unlock balance detected!

2013-11-04 Thread majianpeng
On the mainline linux tree, the head of git is :be408cd3e1fef73e9408b196a7.

[ 2372.462131] 
[ 2372.462147] =
[ 2372.462171] [ BUG: bad unlock balance detected! ]
[ 2372.462191] 3.12.0+ #32 Tainted: GW   
[ 2372.462209] -
[ 2372.462228] ceph-osd/14048 is trying to release lock (sb_internal) at:
[ 2372.462275] [a022cb10] btrfs_commit_transaction_async+0x1b0/0x2a0 
[btrfs]
[ 2372.462305] but there are no more locks to release!
[ 2372.462324] 
[ 2372.462324] other info that might help us debug this:
[ 2372.462349] no locks held by ceph-osd/14048.
[ 2372.462367] 
[ 2372.462367] stack backtrace:
[ 2372.462386] CPU: 2 PID: 14048 Comm: ceph-osd Tainted: GW3.12.0+ 
#32
[ 2372.462414] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To 
be filled by O.E.M., BIOS 080015  11/09/2011
[ 2372.462455]  a022cb10 88007490fd28 816f094a 
8800378aa320
[ 2372.462491]  88007490fd50 810adf4c 8800378aa320 
88009af97650
[ 2372.462526]  a022cb10 88007490fd88 810b01ee 
8800898c
[ 2372.462562] Call Trace:
[ 2372.462584]  [a022cb10] ? 
btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs]
[ 2372.462619]  [816f094a] dump_stack+0x45/0x56
[ 2372.462642]  [810adf4c] print_unlock_imbalance_bug+0xec/0x100
[ 2372.462677]  [a022cb10] ? 
btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs]
[ 2372.462710]  [810b01ee] lock_release+0x18e/0x210
[ 2372.462742]  [a022cb36] btrfs_commit_transaction_async+0x1d6/0x2a0 
[btrfs]
[ 2372.462783]  [a025a7ce] btrfs_ioctl_start_sync+0x3e/0xc0 [btrfs]
[ 2372.462822]  [a025f1d3] btrfs_ioctl+0x4c3/0x1f70 [btrfs]
[ 2372.462849]  [812c0321] ? avc_has_perm+0x121/0x1b0
[ 2372.462873]  [812c0224] ? avc_has_perm+0x24/0x1b0
[ 2372.462897]  [8107ecc8] ? sched_clock_cpu+0xa8/0x100
[ 2372.462922]  [8117b145] do_vfs_ioctl+0x2e5/0x4e0
[ 2372.462946]  [812c19e6] ? file_has_perm+0x86/0xa0
[ 2372.462969]  [8117b3c1] SyS_ioctl+0x81/0xa0
[ 2372.462991]  [817045a4] tracesys+0xdd/0xe2
[ 2709.388375] btrfs: device fsid 983de968-d6a9-4ca9-bd27-2be472d15b9b devid 1 
transid 26 /dev/sdb
[ 2709.394532] btrfs: disk space caching is enabled


csum failure messages

2013-11-04 Thread Russell Coker
The below messages are from dmesg on a system where btrfs balance just 
aborted.  It's running kernel 3.11.6 (the latest Debian package).

This seems to be telling me that Inode 388 is involved, but there are over 300 
subvols on that system which could contain such an Inode.

I think that more information is needed for such log messages.  We need to at 
least be able to identify the subvol (is it possible to extract this from the 
numbers in the log messages?).  Ideally we would be able to identify the file 
name as well.


[10751.637517] BTRFS info (device sda3): csum failed ino 388 off 23191552 csum 
2566472073 private 3193692311
[10751.646390] BTRFS info (device sda3): csum failed ino 388 off 24104960 csum 
5219137 private 2264608335
[10751.654472] BTRFS info (device sda3): csum failed ino 388 off 24154112 csum 
4084831521 private 1792217768
[10751.731830] BTRFS info (device sda3): csum failed ino 388 off 23191552 csum 
2566472073 private 3193692311

-- 
My Main Blog http://etbe.coker.com.au/
My Documents Bloghttp://doc.coker.com.au/
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck errors is it save to fix?

2013-11-04 Thread cwillu
On Mon, Nov 4, 2013 at 3:14 PM, Hendrik Friedel hend...@friedels.name wrote:
 Hello,

 the list was quite full with patches, so this might have been hidden.
 Here the complete Stack.
 Does this help? Is this what you needed?
 [95764.899294] CPU: 1 PID: 21798 Comm: umount Tainted: GFCIO
 3.11.0-031100rc2-generic #201307211535

Can you reproduce the problem under the released 3.11 or 3.12?  An
-rc2 is still pretty early in the release cycle, and I wouldn't be at
all surprised if it was a bug added and fixed in a later rc.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG]:bad unlock balance detected!

2013-11-04 Thread Liu Bo
Hi,

Would you please try the following patch?

-liubo

From: Liu Bo bo.li@oracle.com
Subject: [PATCH] Btrfs: fix to use the right trans for async commit

@trans has been freed and is undefined, and we should use the trans
handle created for async commit instead.

Signed-off-by: Liu Bo bo.li@oracle.com
---
 fs/btrfs/transaction.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 8c81bdc..648d839 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1494,7 +1494,7 @@ int btrfs_commit_transaction_async(struct
btrfs_trans_handle *trans,
 * Tell lockdep we've released the freeze rwsem, since the
 * async commit thread will be the one to unlock it.
 */
-   if (trans-type  TRANS_JOIN_NOLOCK)
+   if (ac-newtrans-type  TRANS_JOIN_NOLOCK)
rwsem_release(
root-fs_info-sb-s_writers.lock_map[SB_FREEZE_FS-1],
1, _THIS_IP_);
-- 
1.8.1.4

On Tue, Nov 05, 2013 at 09:34:01AM +0800, majianpeng wrote:
 On the mainline linux tree, the head of git is :be408cd3e1fef73e9408b196a7.
 
 [ 2372.462131] 
 [ 2372.462147] =
 [ 2372.462171] [ BUG: bad unlock balance detected! ]
 [ 2372.462191] 3.12.0+ #32 Tainted: GW   
 [ 2372.462209] -
 [ 2372.462228] ceph-osd/14048 is trying to release lock (sb_internal) at:
 [ 2372.462275] [a022cb10] 
 btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs]
 [ 2372.462305] but there are no more locks to release!
 [ 2372.462324] 
 [ 2372.462324] other info that might help us debug this:
 [ 2372.462349] no locks held by ceph-osd/14048.
 [ 2372.462367] 
 [ 2372.462367] stack backtrace:
 [ 2372.462386] CPU: 2 PID: 14048 Comm: ceph-osd Tainted: GW
 3.12.0+ #32
 [ 2372.462414] Hardware name: To Be Filled By O.E.M. To Be Filled By 
 O.E.M./To be filled by O.E.M., BIOS 080015  11/09/2011
 [ 2372.462455]  a022cb10 88007490fd28 816f094a 
 8800378aa320
 [ 2372.462491]  88007490fd50 810adf4c 8800378aa320 
 88009af97650
 [ 2372.462526]  a022cb10 88007490fd88 810b01ee 
 8800898c
 [ 2372.462562] Call Trace:
 [ 2372.462584]  [a022cb10] ? 
 btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs]
 [ 2372.462619]  [816f094a] dump_stack+0x45/0x56
 [ 2372.462642]  [810adf4c] print_unlock_imbalance_bug+0xec/0x100
 [ 2372.462677]  [a022cb10] ? 
 btrfs_commit_transaction_async+0x1b0/0x2a0 [btrfs]
 [ 2372.462710]  [810b01ee] lock_release+0x18e/0x210
 [ 2372.462742]  [a022cb36] 
 btrfs_commit_transaction_async+0x1d6/0x2a0 [btrfs]
 [ 2372.462783]  [a025a7ce] btrfs_ioctl_start_sync+0x3e/0xc0 [btrfs]
 [ 2372.462822]  [a025f1d3] btrfs_ioctl+0x4c3/0x1f70 [btrfs]
 [ 2372.462849]  [812c0321] ? avc_has_perm+0x121/0x1b0
 [ 2372.462873]  [812c0224] ? avc_has_perm+0x24/0x1b0
 [ 2372.462897]  [8107ecc8] ? sched_clock_cpu+0xa8/0x100
 [ 2372.462922]  [8117b145] do_vfs_ioctl+0x2e5/0x4e0
 [ 2372.462946]  [812c19e6] ? file_has_perm+0x86/0xa0
 [ 2372.462969]  [8117b3c1] SyS_ioctl+0x81/0xa0
 [ 2372.462991]  [817045a4] tracesys+0xdd/0xe2
 [ 2709.388375] btrfs: device fsid 983de968-d6a9-4ca9-bd27-2be472d15b9b devid 
 1 transid 26 /dev/sdb
 [ 2709.394532] btrfs: disk space caching is enabled
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OK to take hourly snapshots, then cull older ones?

2013-11-04 Thread Marc MERLIN
On Sun, Nov 03, 2013 at 12:50:24PM +0100, Matthias G. Eckermann wrote:
 Hello David and all,
 
 On Mon, Oct 14, 2013 at 21:05 David Madden wrote:
 
  I'd like to use BTRFS to do something like the old NetApp
  snapshot system: every hour or so, there'd be a snapshot,
  then the 23 of the snapshots during a day would be
  deleted, leaving just a day snapshot, then after a month,
  6 of 7 snapshots would be deleted, leaving just a week
  snapshot, and so on.
 
 This is implemented in Snapper, see:
   http://snapper.io/
 It's by default delivered with openSUSE and SUSE Linux
 Enterprise, binaries are available for everything else
 as well.

Just curious, what does it do more than the 20 line shellscript I
posted?
http://marc.merlins.org/linux/scripts/btrfs_snaps

Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] Btrfs: rename btrfs_start_all_delalloc_inodes

2013-11-04 Thread Miao Xie
On  mon, 4 Nov 2013 12:09:18 -0500, Josef Bacik wrote:
 On Mon, Nov 04, 2013 at 11:13:26PM +0800, Miao Xie wrote:
 rename the function -- btrfs_start_all_delalloc_inodes(), and make its
 name be compatible to btrfs_wait_ordered_roots(), since they are always
 used at the same place.

 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 ---
  fs/btrfs/ctree.h   | 3 +--
  fs/btrfs/dev-replace.c | 2 +-
  fs/btrfs/extent-tree.c | 2 +-
  fs/btrfs/inode.c   | 3 +--
  fs/btrfs/relocation.c  | 2 +-
  fs/btrfs/transaction.c | 2 +-
  6 files changed, 6 insertions(+), 8 deletions(-)

 
 Theres another use in ioctl.c in my tree, I've just rebased that change in,
 please check my tree when I push it to make sure I did it right.  Thanks,

I have checked it, everything is OK.

Thanks
Miao

 
 Josef
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG]:bad unlock balance detected!

2013-11-04 Thread Liu Bo
On Tue, Nov 05, 2013 at 10:31:15AM +0800, Miao Xie wrote:
 Ontue, 5 Nov 2013 10:12:17 +0800, Liu Bo wrote:
  Hi,
  
  Would you please try the following patch?
  
  -liubo
  
  From: Liu Bo bo.li@oracle.com
  Subject: [PATCH] Btrfs: fix to use the right trans for async commit
  
  @trans has been freed and is undefined, and we should use the trans
  handle created for async commit instead.
  
  Signed-off-by: Liu Bo bo.li@oracle.com
  ---
   fs/btrfs/transaction.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
  
  diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
  index 8c81bdc..648d839 100644
  --- a/fs/btrfs/transaction.c
  +++ b/fs/btrfs/transaction.c
  @@ -1494,7 +1494,7 @@ int btrfs_commit_transaction_async(struct
  btrfs_trans_handle *trans,
   * Tell lockdep we've released the freeze rwsem, since the
   * async commit thread will be the one to unlock it.
   */
  -   if (trans-type  TRANS_JOIN_NOLOCK)
  +   if (ac-newtrans-type  TRANS_JOIN_NOLOCK)
 
 It should be:
 
 if (trans-type  __TRANS_FREEZABLE)
 
 The same for
 
 do_async_commit()

Make sense, I missed that we've grabbed a reference on trans.

Ma,

Sorry for the previous noise, but could you please try the following one 
instead?

-liubo

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 8c81bdc..c094f08 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1453,7 +1453,7 @@ static void do_async_commit(struct work_struct
*work)
 * We've got freeze protection passed with the transaction.
 * Tell lockdep about it.
 */
-   if (ac-newtrans-type  TRANS_JOIN_NOLOCK)
+   if (ac-newtrans-type  __TRANS_FREEZABLE)
rwsem_acquire_read(
 ac-root-fs_info-sb-s_writers.lock_map[SB_FREEZE_FS-1],
 0, 1, _THIS_IP_);
@@ -1494,7 +1494,7 @@ int btrfs_commit_transaction_async(struct
btrfs_trans_handle *trans,
 * Tell lockdep we've released the freeze rwsem, since the
 * async commit thread will be the one to unlock it.
 */
-   if (trans-type  TRANS_JOIN_NOLOCK)
+   if (ac-newtrans-type  __TRANS_FREEZABLE)
rwsem_release(
root-fs_info-sb-s_writers.lock_map[SB_FREEZE_FS-1],
1, _THIS_IP_);


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/3] Bootstrap the btrfs_find_item interface

2013-11-04 Thread Kelley Nielsen
There are many btrfs functions that manually search the tree for an
item. They all reimplement the same mechanism and differ in the
conditions that they use to find the item. Zach Brown proposed 
creating a new core interface, btrfs_find_item, take the place of
these functions, and standardize the search functionality.

This patchset takes the first steps toward the implementation of this
core interface. Create a starting point for the interface by moving 
one of the search functions, __inode_item, to ctree.c and renaming 
it with the core function name. With the next two patches, eliminate 
one similar helper function each by replacing their callers with 
calls to the new core function, and modifying the new core
function to ensure it fulfills the purpose of the function being
replaced.

Kelley Nielsen (3):
  Bootstrap generic btrfs_find_item interface
  expand btrfs_find_item() to include find_root_ref functionality
  expand btrfs_find_item() to include find_orphan_item functionality

 fs/btrfs/backref.c   | 40 
 fs/btrfs/ctree.c | 43 +++
 fs/btrfs/ctree.h |  2 ++
 fs/btrfs/disk-io.c   |  3 ++-
 fs/btrfs/inode.c |  6 +++---
 fs/btrfs/orphan.c| 20 
 fs/btrfs/root-tree.c | 15 ---
 fs/btrfs/tree-log.c  |  3 ++-
 8 files changed, 56 insertions(+), 76 deletions(-)

-- 
1.8.1.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/3] Bootstrap generic btrfs_find_item interface

2013-11-04 Thread Kelley Nielsen
There are many btrfs functions that manually search the tree for an
item. They all reimplement the same mechanism and differ in the
conditions that they use to find the item. __inode_info() is one such
example. Zach Brown proposed creating a new interface to take the place
of these functions.

This patch is the first step to creating the interface. A new function,
btrfs_find_item, has been added to ctree.c and prototyped in ctree.h.
It is identical to __inode_info, except that the order of the parameters
has been rearranged to more closely those of similar functions elsewhere
in the code (now, root and path come first, then the objectid, offset
and type, and the key to be filled in last). __inode_info's callers have
been set to call this new function instead, and __inode_info itself has
been removed.

Signed-off-by: Kelley Nielsen kelley...@gmail.com
Suggested-by: Zach Brown z...@redhat.com
Reviewed-by: Josh Triplett j...@joshtriplett.org
---
Changes since v1:
* reworded commit message to use the imperative rather than passive 
voice, and to mention Zach Brown in the body as well as in 
the suggested-by line
* Changed parameter inum to iobjectid in btrfs_find_item()
* Added a space before __inode_info() in the commit message

 fs/btrfs/backref.c | 40 
 fs/btrfs/ctree.c   | 37 +
 fs/btrfs/ctree.h   |  2 ++
 3 files changed, 43 insertions(+), 36 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index 3775947..595bd1f 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -1107,38 +1107,6 @@ int btrfs_find_all_roots(struct btrfs_trans_handle 
*trans,
return 0;
 }
 
-
-static int __inode_info(u64 inum, u64 ioff, u8 key_type,
-   struct btrfs_root *fs_root, struct btrfs_path *path,
-   struct btrfs_key *found_key)
-{
-   int ret;
-   struct btrfs_key key;
-   struct extent_buffer *eb;
-
-   key.type = key_type;
-   key.objectid = inum;
-   key.offset = ioff;
-
-   ret = btrfs_search_slot(NULL, fs_root, key, path, 0, 0);
-   if (ret  0)
-   return ret;
-
-   eb = path-nodes[0];
-   if (ret  path-slots[0] = btrfs_header_nritems(eb)) {
-   ret = btrfs_next_leaf(fs_root, path);
-   if (ret)
-   return ret;
-   eb = path-nodes[0];
-   }
-
-   btrfs_item_key_to_cpu(eb, found_key, path-slots[0]);
-   if (found_key-type != key.type || found_key-objectid != key.objectid)
-   return 1;
-
-   return 0;
-}
-
 /*
  * this makes the path point to (inum INODE_ITEM ioff)
  */
@@ -1146,16 +1114,16 @@ int inode_item_info(u64 inum, u64 ioff, struct 
btrfs_root *fs_root,
struct btrfs_path *path)
 {
struct btrfs_key key;
-   return __inode_info(inum, ioff, BTRFS_INODE_ITEM_KEY, fs_root, path,
-   key);
+   return btrfs_find_item(fs_root, path, inum, ioff,
+   BTRFS_INODE_ITEM_KEY, key);
 }
 
 static int inode_ref_info(u64 inum, u64 ioff, struct btrfs_root *fs_root,
struct btrfs_path *path,
struct btrfs_key *found_key)
 {
-   return __inode_info(inum, ioff, BTRFS_INODE_REF_KEY, fs_root, path,
-   found_key);
+   return btrfs_find_item(fs_root, path, inum, ioff,
+   BTRFS_INODE_REF_KEY, found_key);
 }
 
 int btrfs_find_one_extref(struct btrfs_root *root, u64 inode_objectid,
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 316136b..5969473 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2462,6 +2462,43 @@ static int key_search(struct extent_buffer *b, struct 
btrfs_key *key,
return 0;
 }
 
+/* Proposed generic search function, meant to take the place of the
+* various small search helper functions throughout the code and standardize
+* the search interface. Right now, it only replaces the former __inode_info
+* in backref.c.
+*/
+int btrfs_find_item(struct btrfs_root *fs_root, struct btrfs_path *path,
+   u64 iobjectid, u64 ioff, u8 key_type,
+   struct btrfs_key *found_key)
+{
+   int ret;
+   struct btrfs_key key;
+   struct extent_buffer *eb;
+
+   key.type = key_type;
+   key.objectid = iobjectid;
+   key.offset = ioff;
+
+   ret = btrfs_search_slot(NULL, fs_root, key, path, 0, 0);
+   if (ret  0)
+   return ret;
+
+   eb = path-nodes[0];
+   if (ret  path-slots[0] = btrfs_header_nritems(eb)) {
+   ret = btrfs_next_leaf(fs_root, path);
+   if (ret)
+   return ret;
+   eb = path-nodes[0];
+   }
+
+   btrfs_item_key_to_cpu(eb, found_key, path-slots[0]);
+   if (found_key-type != key.type ||
+   found_key-objectid != key.objectid)
+   return 

[PATCH v2 2/3] expand btrfs_find_item() to include find_root_ref functionality

2013-11-04 Thread Kelley Nielsen
This patch is the second step in bootstrapping the btrfs_find_item
interface. The btrfs_find_root_ref() is similar to the former
__inode_info(); it accepts four of its parameters, and duplicates the
first half of its functionality.

Replace the one former call to btrfs_find_root_ref() with a call to
btrfs_find_item(), along with the defined key type that was used
internally by btrfs_find_root ref, and a null found key. In
btrfs_find_item(), add a test for the null key at the place where
the functionality of btrfs_find_root_ref() ends; btrfs_find_item()
then returns if the test passes. Finally, remove btrfs_find_root_ref().

Signed-off-by: Kelley Nielsen kelley...@gmail.com
Suggested-by: Zach Brown z...@redhat.com
Reviewed-by: Josh Triplett j...@joshtriplett.org
---
Changes since v1:
* reworded the commit message to use the imperative form instead 
of the passive voice, and to mention Zach Brown in the body

 fs/btrfs/ctree.c | 10 --
 fs/btrfs/inode.c |  6 +++---
 fs/btrfs/root-tree.c | 15 ---
 3 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 5969473..7d2f71c 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2465,7 +2465,13 @@ static int key_search(struct extent_buffer *b, struct 
btrfs_key *key,
 /* Proposed generic search function, meant to take the place of the
 * various small search helper functions throughout the code and standardize
 * the search interface. Right now, it only replaces the former __inode_info
-* in backref.c.
+* in backref.c, and the former btrfs_find_root_ref in root-tree.c.
+*
+* If a null key is passed, it returns immediately after running
+* btrfs_search_slot, leaving the path filled as it is and passing its
+* return value upward. If a real key is passed, it will set the caller's
+* path to point to the first item in the tree after its specified
+* objectid, type, and offset for which objectid and type match the input.
 */
 int btrfs_find_item(struct btrfs_root *fs_root, struct btrfs_path *path,
u64 iobjectid, u64 ioff, u8 key_type,
@@ -2480,7 +2486,7 @@ int btrfs_find_item(struct btrfs_root *fs_root, struct 
btrfs_path *path,
key.offset = ioff;
 
ret = btrfs_search_slot(NULL, fs_root, key, path, 0, 0);
-   if (ret  0)
+   if ((ret  0) || (found_key == NULL))
return ret;
 
eb = path-nodes[0];
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index f167ced..27ee49b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4672,9 +4672,9 @@ static int fixup_tree_root_location(struct btrfs_root 
*root,
}
 
err = -ENOENT;
-   ret = btrfs_find_root_ref(root-fs_info-tree_root, path,
- BTRFS_I(dir)-root-root_key.objectid,
- location-objectid);
+   ret = btrfs_find_item(root-fs_info-tree_root, path,
+   BTRFS_I(dir)-root-root_key.objectid,
+   location-objectid, BTRFS_ROOT_REF_KEY, NULL);
if (ret) {
if (ret  0)
err = ret;
diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c
index ec71ea4..fcc10eb 100644
--- a/fs/btrfs/root-tree.c
+++ b/fs/btrfs/root-tree.c
@@ -400,21 +400,6 @@ out:
return err;
 }
 
-int btrfs_find_root_ref(struct btrfs_root *tree_root,
-  struct btrfs_path *path,
-  u64 root_id, u64 ref_id)
-{
-   struct btrfs_key key;
-   int ret;
-
-   key.objectid = root_id;
-   key.type = BTRFS_ROOT_REF_KEY;
-   key.offset = ref_id;
-
-   ret = btrfs_search_slot(NULL, tree_root, key, path, 0, 0);
-   return ret;
-}
-
 /*
  * add a btrfs_root_ref item.  type is either BTRFS_ROOT_REF_KEY
  * or BTRFS_ROOT_BACKREF_KEY.
-- 
1.8.1.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/3] expand btrfs_find_item() to include find_orphan_item functionality

2013-11-04 Thread Kelley Nielsen
This is the third step in bootstrapping the btrfs_find_item interface.
The function find_orphan_item(), in orphan.c, is similar to the two
functions already replaced by the new interface. It uses two parameters,
which are already present in the interface, and is nearly identical to
the function brought in in the previous patch.

Replace the two calls to find_orphan_item() with calls to
btrfs_find_item(), with the defined objectid and type that was used
internally by find_orphan_item(), a null path, and a null key. Add a
test for a null path to btrfs_find_item, and if it passes, allocate and
free the path. Finally, remove find_orphan_item().

Signed-off-by: Kelley Nielsen kelley...@gmail.com
---
Changes since v1:
* Removed comment at the head of btrfs_find_item()
* Reworded commit messasge to use imperative, not passive voice

 fs/btrfs/ctree.c| 26 +-
 fs/btrfs/disk-io.c  |  3 ++-
 fs/btrfs/orphan.c   | 20 
 fs/btrfs/tree-log.c |  3 ++-
 4 files changed, 17 insertions(+), 35 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 7d2f71c..a528a14 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2462,32 +2462,32 @@ static int key_search(struct extent_buffer *b, struct 
btrfs_key *key,
return 0;
 }
 
-/* Proposed generic search function, meant to take the place of the
-* various small search helper functions throughout the code and standardize
-* the search interface. Right now, it only replaces the former __inode_info
-* in backref.c, and the former btrfs_find_root_ref in root-tree.c.
-*
-* If a null key is passed, it returns immediately after running
-* btrfs_search_slot, leaving the path filled as it is and passing its
-* return value upward. If a real key is passed, it will set the caller's
-* path to point to the first item in the tree after its specified
-* objectid, type, and offset for which objectid and type match the input.
-*/
-int btrfs_find_item(struct btrfs_root *fs_root, struct btrfs_path *path,
+int btrfs_find_item(struct btrfs_root *fs_root, struct btrfs_path *found_path,
u64 iobjectid, u64 ioff, u8 key_type,
struct btrfs_key *found_key)
 {
int ret;
struct btrfs_key key;
struct extent_buffer *eb;
+   struct btrfs_path *path;
 
key.type = key_type;
key.objectid = iobjectid;
key.offset = ioff;
 
+   if (found_path == NULL) {
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+   } else
+   path = found_path;
+
ret = btrfs_search_slot(NULL, fs_root, key, path, 0, 0);
-   if ((ret  0) || (found_key == NULL))
+   if ((ret  0) || (found_key == NULL)) {
+   if (path != found_path)
+   btrfs_free_path(path);
return ret;
+   }
 
eb = path-nodes[0];
if (ret  path-slots[0] = btrfs_header_nritems(eb)) {
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 4c4ed0b..bce90c9 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1616,7 +1616,8 @@ again:
if (ret)
goto fail;
 
-   ret = btrfs_find_orphan_item(fs_info-tree_root, location-objectid);
+   ret = btrfs_find_item(fs_info-tree_root, NULL, BTRFS_ORPHAN_OBJECTID,
+   location-objectid, BTRFS_ORPHAN_ITEM_KEY, NULL);
if (ret  0)
goto fail;
if (ret == 0)
diff --git a/fs/btrfs/orphan.c b/fs/btrfs/orphan.c
index 24cad16..65793ed 100644
--- a/fs/btrfs/orphan.c
+++ b/fs/btrfs/orphan.c
@@ -69,23 +69,3 @@ out:
btrfs_free_path(path);
return ret;
 }
-
-int btrfs_find_orphan_item(struct btrfs_root *root, u64 offset)
-{
-   struct btrfs_path *path;
-   struct btrfs_key key;
-   int ret;
-
-   key.objectid = BTRFS_ORPHAN_OBJECTID;
-   key.type = BTRFS_ORPHAN_ITEM_KEY;
-   key.offset = offset;
-
-   path = btrfs_alloc_path();
-   if (!path)
-   return -ENOMEM;
-
-   ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
-
-   btrfs_free_path(path);
-   return ret;
-}
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index a2c7b04..9972e2a 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -1238,7 +1238,8 @@ static int insert_orphan_item(struct btrfs_trans_handle 
*trans,
  struct btrfs_root *root, u64 offset)
 {
int ret;
-   ret = btrfs_find_orphan_item(root, offset);
+   ret = btrfs_find_item(root, NULL, BTRFS_ORPHAN_OBJECTID,
+   offset, BTRFS_ORPHAN_ITEM_KEY, NULL);
if (ret  0)
ret = btrfs_insert_orphan_item(trans, root, offset);
return ret;
-- 
1.8.1.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Btrfs: avoid heavy operations in btrfs_commit_super

2013-11-04 Thread Liu Bo
The 'git blame' history shows that, the old transaction commit code has to do
twice to ensure roots are updated and we have to flush metadata and super block
manually, however, right now all of these can be handled well inside
the transaction commit code without extra efforts.

And the error handling part remains same with the current code, -- 'return to
caller once we get error'.

This saves us a transaction commit and a flush of super block, which are both
heavy operations according to ftrace output analysis.

Signed-off-by: Liu Bo bo.li@oracle.com
---
v2: remove @ret that is no longer used, thanks Stefan.

 fs/btrfs/disk-io.c | 21 +
 1 file changed, 1 insertion(+), 20 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 62176ad..e161740 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3528,7 +3528,6 @@ int btrfs_cleanup_fs_roots(struct btrfs_fs_info *fs_info)
 int btrfs_commit_super(struct btrfs_root *root)
 {
struct btrfs_trans_handle *trans;
-   int ret;
 
mutex_lock(root-fs_info-cleaner_mutex);
btrfs_run_delayed_iputs(root);
@@ -3542,25 +3541,7 @@ int btrfs_commit_super(struct btrfs_root *root)
trans = btrfs_join_transaction(root);
if (IS_ERR(trans))
return PTR_ERR(trans);
-   ret = btrfs_commit_transaction(trans, root);
-   if (ret)
-   return ret;
-   /* run commit again to drop the original snapshot */
-   trans = btrfs_join_transaction(root);
-   if (IS_ERR(trans))
-   return PTR_ERR(trans);
-   ret = btrfs_commit_transaction(trans, root);
-   if (ret)
-   return ret;
-   ret = btrfs_write_and_wait_transaction(NULL, root);
-   if (ret) {
-   btrfs_error(root-fs_info, ret,
-   Failed to sync btree inode to disk.);
-   return ret;
-   }
-
-   ret = write_ctree_super(NULL, root, 0);
-   return ret;
+   return btrfs_commit_transaction(trans, root);
 }
 
 int close_ctree(struct btrfs_root *root)
-- 
1.8.2.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Bootstrap the btrfs_find_item interface

2013-11-04 Thread Kelley Nielsen
Hi Josh, Zach, and team,

I went ahead and took the comment out, as can be seen in patch 3/3. I
had left it there because my understanding is that this is just a
proposal, and if the team likes the idea, btrfs_find_item() will be
further developed so that it replace more existing functions and not as
yet existing ones. I can imagine, though, that if it's not taken out it
could end up floating around in the tree for no good reason. Is that
correct?

Also, I fixed a rebase conflict before preparing this set, so a) I
hope I did it correctly, and b) assuming I did, I feel confident to 
handle them in the future. They look exactly like merge conflicts. 
I discarded the lines from HEAD (the lines on top) and kept what was 
in the patch (on the bottom).

Thanks,

Kelley (kelleynnn)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: csum failure messages

2013-11-04 Thread Hans-Kristian Bakke
As you were in the process of a rebalance these errors may actually be
caused by this serious bug Btrfs: relocate csums properly with
prealloc extents.

I hit that myself with several preallocated files made by rtorrent
during a rebalance and I lost several huge files as a consequence. The
only way I could rebalance without large scale corruptions was to
manually patch the 3.11.6 kernel with the small patch that fixes the
issue.
For some reason this patch is not pushed upstream yet. I think that is
strange as it leads to corruption and actual data loss and it is 100%
reproducible with preallocated files. Only systemd logs is mentioned
in the bug reports, but in my case it was actually hitting several
terabytes of files created by rtorrent.

Mvh

Hans-Kristian Bakke


On 5 November 2013 02:24, Russell Coker russ...@coker.com.au wrote:
 The below messages are from dmesg on a system where btrfs balance just
 aborted.  It's running kernel 3.11.6 (the latest Debian package).

 This seems to be telling me that Inode 388 is involved, but there are over 300
 subvols on that system which could contain such an Inode.

 I think that more information is needed for such log messages.  We need to at
 least be able to identify the subvol (is it possible to extract this from the
 numbers in the log messages?).  Ideally we would be able to identify the file
 name as well.


 [10751.637517] BTRFS info (device sda3): csum failed ino 388 off 23191552 csum
 2566472073 private 3193692311
 [10751.646390] BTRFS info (device sda3): csum failed ino 388 off 24104960 csum
 5219137 private 2264608335
 [10751.654472] BTRFS info (device sda3): csum failed ino 388 off 24154112 csum
 4084831521 private 1792217768
 [10751.731830] BTRFS info (device sda3): csum failed ino 388 off 23191552 csum
 2566472073 private 3193692311

 --
 My Main Blog http://etbe.coker.com.au/
 My Documents Bloghttp://doc.coker.com.au/
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] BTRFS: fixed coding style issues

2013-11-04 Thread Aldo Iljazi
Line 4989: Inserted a space after the comma.
Lines 7986 and 8274: Inserted a space before the open parenthesis.

Signed-off-by: Aldo Iljazi m...@aldo.io
---
 fs/btrfs/extent-tree.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index d58bef1..3bcd7c0 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4986,7 +4986,7 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, 
u64 num_bytes)
mutex_unlock(BTRFS_I(inode)-delalloc_mutex);
 
if (to_reserve)
-   trace_btrfs_space_reservation(root-fs_info,delalloc,
+   trace_btrfs_space_reservation(root-fs_info, delalloc,
  btrfs_ino(inode), to_reserve, 1);
block_rsv_add_bytes(block_rsv, to_reserve, 1);
 
@@ -7983,7 +7983,7 @@ u64 btrfs_account_ro_block_groups_free_space(struct 
btrfs_space_info *sinfo)
 
spin_lock(sinfo-lock);
 
-   for(i = 0; i  BTRFS_NR_RAID_TYPES; i++)
+   for (i = 0; i  BTRFS_NR_RAID_TYPES; i++)
if (!list_empty(sinfo-block_groups[i]))
free_bytes += __btrfs_get_ro_block_group_free_space(
sinfo-block_groups[i]);
@@ -8271,7 +8271,7 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
 
release_global_block_rsv(info);
 
-   while(!list_empty(info-space_info)) {
+   while (!list_empty(info-space_info)) {
space_info = list_entry(info-space_info.next,
struct btrfs_space_info,
list);
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] BTRFS: fixed two coding style issues

2013-11-04 Thread Aldo Iljazi
Line 26 and 31: Replaced spaces with tabs at the start of the lines.

Signed-off-by: Aldo Iljazi m...@aldo.io
---
 fs/btrfs/struct-funcs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/struct-funcs.c b/fs/btrfs/struct-funcs.c
index b976597..09528ec 100644
--- a/fs/btrfs/struct-funcs.c
+++ b/fs/btrfs/struct-funcs.c
@@ -23,12 +23,12 @@
 
 static inline u8 get_unaligned_le8(const void *p)
 {
-   return *(u8 *)p;
+   return *(u8 *)p;
 }
 
 static inline void put_unaligned_le8(u8 val, void *p)
 {
-   *(u8 *)p = val;
+   *(u8 *)p = val;
 }
 
 /*
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] BTRFS: fixed a coding style issue

2013-11-04 Thread Aldo Iljazi
Line 265: Inserted a space before the open parenthesis.

Signed-off-by: Aldo Iljazi m...@aldo.io
---
 fs/btrfs/async-thread.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index 08cc08f..8aec751 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -262,7 +262,7 @@ static struct btrfs_work *get_next_work(struct 
btrfs_worker_thread *worker,
struct btrfs_work *work = NULL;
struct list_head *cur = NULL;
 
-   if(!list_empty(prio_head))
+   if (!list_empty(prio_head))
cur = prio_head-next;
 
smp_mb();
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html