[PATCH] Btrfs: forced readonly when btrfs_drop_snapshot() fails

2011-08-09 Thread Tsutomu Itoh
The filesystem turns readonly instead of returning the error to the
caller when detected error in btrfs_drop_snapshot().
and, because the caller doesn't check the error, the function type is
changed to 'void'.

Signed-off-by: Tsutomu Itoh t-i...@jp.fujitsu.com
---
 fs/btrfs/ctree.h   |4 ++--
 fs/btrfs/extent-tree.c |   22 ++
 2 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index a6263bd..8842936 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2367,8 +2367,8 @@ static inline int btrfs_insert_empty_item(struct 
btrfs_trans_handle *trans,
 int btrfs_next_leaf(struct btrfs_root *root, struct btrfs_path *path);
 int btrfs_prev_leaf(struct btrfs_root *root, struct btrfs_path *path);
 int btrfs_leaf_free_space(struct btrfs_root *root, struct extent_buffer *leaf);
-int btrfs_drop_snapshot(struct btrfs_root *root,
-   struct btrfs_block_rsv *block_rsv, int update_ref);
+void btrfs_drop_snapshot(struct btrfs_root *root,
+struct btrfs_block_rsv *block_rsv, int update_ref);
 int btrfs_drop_subtree(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct extent_buffer *node,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 66bac22..7f2aec6 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -6269,8 +6269,8 @@ static noinline int walk_up_tree(struct 
btrfs_trans_handle *trans,
  * also make sure backrefs for the shared block and all lower level
  * blocks are properly updated.
  */
-int btrfs_drop_snapshot(struct btrfs_root *root,
-   struct btrfs_block_rsv *block_rsv, int update_ref)
+void btrfs_drop_snapshot(struct btrfs_root *root,
+struct btrfs_block_rsv *block_rsv, int update_ref)
 {
struct btrfs_path *path;
struct btrfs_trans_handle *trans;
@@ -6283,13 +6283,16 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
int level;
 
path = btrfs_alloc_path();
-   if (!path)
-   return -ENOMEM;
+   if (!path) {
+   err = -ENOMEM;
+   goto out;
+   }
 
wc = kzalloc(sizeof(*wc), GFP_NOFS);
if (!wc) {
btrfs_free_path(path);
-   return -ENOMEM;
+   err = -ENOMEM;
+   goto out;
}
 
trans = btrfs_start_transaction(tree_root, 0);
@@ -6318,7 +6321,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
path-lowest_level = 0;
if (ret  0) {
err = ret;
-   goto out;
+   goto out_free;
}
WARN_ON(ret  0);
 
@@ -6425,11 +6428,14 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
free_extent_buffer(root-commit_root);
kfree(root);
}
-out:
+out_free:
btrfs_end_transaction_throttle(trans, tree_root);
kfree(wc);
btrfs_free_path(path);
-   return err;
+out:
+   if (err)
+   btrfs_std_error(root-fs_info, err);
+   return;
 }
 
 /*


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: BTRFS partition won't mount

2011-08-09 Thread Adam Newby
I have to say this solution saved me. I've lost a BTRFS partition before and 
this tool was not available at that time. Is this going to be making it into a 
kernel pull anytime soon? The only reason I mention it is because pretty much 
everyone I know has soured on BTRFS due to losing a partition at some point or 
another due to frivolous error. This seems like a fine addition to the 
filesystem.

-Original Message-
From: C Anthony Risinger [mailto:anth...@xtfx.me] 
Sent: Monday, August 08, 2011 11:54 PM
To: Hugo Mills; Adam Newby; linux-btrfs@vger.kernel.org
Subject: Re: BTRFS partition won't mount

On Wed, Aug 3, 2011 at 3:50 PM, Hugo Mills h...@carfax.org.uk wrote:

   Try the instructions on the wiki at [1]. (And please feed back 
 and/or fix any issues you have with the instructions -- they're still 
 quite new and probably have awkward corners).

 [1] 
 https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#I_can.27t_mount_my
 _filesystem.2C_and_I_get_a_kernel_oops.21

this worked perfectly for me ... just saved my night from tedious restoration 
:-)

im on kernel 3.0.1 -- hard poweroff led to that problem.  i haven't had any 
issues for some time ... im not sure what the problem was exactly, but 
sometimes systemd gets a little twacky and takes a year to shutdown ... guess i 
got a little impatient :-)

anyways, thanks for the integration work!

-- 

C Anthony
N�r��yb�X��ǧv�^�)޺{.n�+{�n�߲)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥

Re: BTRFS partition won't mount

2011-08-09 Thread Hugo Mills
On Tue, Aug 09, 2011 at 08:29:06AM -0400, Adam Newby wrote:
 I have to say this solution saved me. I've lost a BTRFS partition
 before and this tool was not available at that time. Is this going
 to be making it into a kernel pull anytime soon? The only reason I
 mention it is because pretty much everyone I know has soured on
 BTRFS due to losing a partition at some point or another due to
 frivolous error. This seems like a fine addition to the filesystem.

   My understanding is that the btrfs-zero-log tool is something of a
blunt object, and shouldn't be applied automatically, so I think it's
unlikely to go into the kernel in its current form.

   From what Chris implied on IRC a few days ago, if we encounter
corrupt items in the log, we should drop only those items (and their
dependents?), not the entire log tree. I'd guess that as people work
on reducing the number of BUG_ONs in the code and passing the errors
back up the stack, this will become easier to do.

   Until then, we do have btrfs-zero-log. (And the new recovery tool
coming shortly, too).

   Hugo.

 -Original Message-
 From: C Anthony Risinger [mailto:anth...@xtfx.me] 
 Sent: Monday, August 08, 2011 11:54 PM
 To: Hugo Mills; Adam Newby; linux-btrfs@vger.kernel.org
 Subject: Re: BTRFS partition won't mount
 
 On Wed, Aug 3, 2011 at 3:50 PM, Hugo Mills h...@carfax.org.uk wrote:
 
    Try the instructions on the wiki at [1]. (And please feed back 
  and/or fix any issues you have with the instructions -- they're still 
  quite new and probably have awkward corners).
 
  [1] 
  https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#I_can.27t_mount_my
  _filesystem.2C_and_I_get_a_kernel_oops.21
 
 this worked perfectly for me ... just saved my night from tedious restoration 
 :-)
 
 im on kernel 3.0.1 -- hard poweroff led to that problem.  i haven't had any 
 issues for some time ... im not sure what the problem was exactly, but 
 sometimes systemd gets a little twacky and takes a year to shutdown ... guess 
 i got a little impatient :-)
 
 anyways, thanks for the integration work!
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- 2 + 2 = 5,  for sufficiently large values of 2. --- 


signature.asc
Description: Digital signature


Re: Btrfs slowdown

2011-08-09 Thread Christian Brunner
Hi Sage,

I did some testing with btrfs-unstable yesterday. With the recent
commit from Chris it looks quite good:

Btrfs: force unplugs when switching from high to regular priority bios


However I can't test it extensively, because our main environment is
on ext4 at the moment.

Regards
Christian

2011/8/8 Sage Weil s...@newdream.net:
 Hi Christian,

 Are you still seeing this slowness?

 sage


 On Wed, 27 Jul 2011, Christian Brunner wrote:
 2011/7/25 Chris Mason chris.ma...@oracle.com:
  Excerpts from Christian Brunner's message of 2011-07-25 03:54:47 -0400:
  Hi,
 
  we are running a ceph cluster with btrfs as it's base filesystem
  (kernel 3.0). At the beginning everything worked very well, but after
  a few days (2-3) things are getting very slow.
 
  When I look at the object store servers I see heavy disk-i/o on the
  btrfs filesystems (disk utilization is between 60% and 100%). I also
  did some tracing on the Cepp-Object-Store-Daemon, but I'm quite
  certain, that the majority of the disk I/O is not caused by ceph or
  any other userland process.
 
  When reboot the system(s) the problems go away for another 2-3 days,
  but after that, it starts again. I'm not sure if the problem is
  related to the kernel warning I've reported last week. At least there
  is no temporal relationship between the warning and the slowdown.
 
  Any hints on how to trace this would be welcome.
 
  The easiest way to trace this is with latencytop.
 
  Apply this patch:
 
  http://oss.oracle.com/~mason/latencytop.patch
 
  And then use latencytop -c for a few minutes while the system is slow.
  Send the output here and hopefully we'll be able to figure it out.

 I've now installed latencytop. Attached are two output files: The
 first is from yesterday and was created aproxematly half an hour after
 the boot. The second on is from today, uptime is 19h. The load on the
 system is already rising. Disk utilization is approximately at 50%.

 Thanks for your help.

 Christian

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Kernel bug at fs/bio.c:1499 during heavy I/O - mdadm linear used

2011-08-09 Thread Sebastian 'gonX' Jensen
Hi all,

I am having issues with my server, as I am getting kernel bugs about a
few minutes after starting rtorrent, while it's verifying hashes on
the fiels.

Unfortunately I can't figure out if this is a mdadm issue, a btrfs
issue or a hardware issue, but it seems to be fairly consistent. The
process is on btrfs-submit, so I figure it's a btrfs problem. Here's
the stacktrace in question:

[ 2109.468208] [ cut here ]
[ 2109.468241] kernel BUG at fs/bio.c:1499!
[ 2109.468263] invalid opcode:  [#1] PREEMPT SMP
[ 2109.468300] CPU 3
[ 2109.468312] Modules linked in: iptable_filter iptable_mangle
iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4
ip_tables x_tables appletalk ipx p8022 psnap llc p8023 ipv6 ext2
mbcache loop usbhid hid usb_storage uas lm63 radeon ttm i2c_i801
drm_kms_helper uhci_hcd r8169 evdev drm mii i7core_edac serio_raw
pcspkr edac_core ehci_hcd i2c_algo_bit mxm_wmi i2c_core thermal
processor wmi iTCO_wdt usbcore fan button iTCO_vendor_support btrfs
zlib_deflate crc32c libcrc32c linear md_mod sg sd_mod ahci libahci
libata scsi_mod
[ 2109.468711]
[ 2109.468722] Pid: 265, comm: btrfs-submit-0 Not tainted 3.0-ARCH #1
OEM OEM/121-BL-E756
[ 2109.468773] RIP: 0010:[811896ab]  [811896ab]
bio_split+0x2eb/0x2f0
[ 2109.468823] RSP: 0018:880156587ae0  EFLAGS: 00010206
[ 2109.468852] RAX: 8801565e RBX: 88005795e240 RCX: 025072c3
[ 2109.468891] RDX: 880015ae RSI: 000153a0 RDI: 810fa155
[ 2109.468930] RBP: 880156587b30 R08: e8dd8300 R09: 
[ 2109.468969] R10: 00f8 R11:  R12: 880015ae
[ 2109.469008] R13: e8dd8300 R14: 0001 R15: e8dd8300
[ 2109.469047] FS:  () GS:88015fcc()
knlGS:
[ 2109.469091] CS:  0010 DS:  ES:  CR0: 8005003b
[ 2109.469122] CR2: 7ff2784994aa CR3: 01693000 CR4: 06e0
[ 2109.469161] DR0:  DR1:  DR2: 
[ 2109.469200] DR3:  DR6: 0ff0 DR7: 0400
[ 2109.469239] Process btrfs-submit-0 (pid: 265, threadinfo
880156586000, task 8801573ea3f0)
[ 2109.469286] Stack:
[ 2109.469298]  0001  8801565e
00e0
[ 2109.469344]   88005795e240 880158360800
e8dd8300
[ 2109.469389]  0001 e8dd8220 880156587ba0
a00a2687
[ 2109.469435] Call Trace:
[ 2109.469453]  [a00a2687] linear_make_request+0x117/0x1c0 [linear]
[ 2109.469493]  [812118d6] ? throtl_find_tg+0x46/0x60
[ 2109.469525]  [812121bb] ? blk_throtl_bio+0x1fb/0x620
[ 2109.469560]  [a0085c52] md_make_request+0x102/0x250 [md_mod]
[ 2109.469597]  [81201c6e] generic_make_request+0x30e/0x5c0
[ 2109.469633]  [813efe9e] ? schedule+0x34e/0x9f0
[ 2109.469663]  [81201fa7] submit_bio+0x87/0x110
[ 2109.469693]  [8106cc18] ? lock_timer_base.isra.30+0x38/0x70
[ 2109.469739]  [a0108d53] run_scheduled_bios+0x253/0x510 [btrfs]
[ 2109.469782]  [a0109025] pending_bios_fn+0x15/0x20 [btrfs]
[ 2109.469822]  [a010fb45] worker_loop+0x165/0x520 [btrfs]
[ 2109.469862]  [a010f9e0] ? btrfs_queue_worker+0x2f0/0x2f0 [btrfs]
[ 2109.469901]  [8107ed2c] kthread+0x8c/0xa0
[ 2109.469930]  [813f4ea4] kernel_thread_helper+0x4/0x10
[ 2109.469964]  [8107eca0] ? kthread_worker_fn+0x190/0x190
[ 2109.469998]  [813f4ea0] ? gs_change+0x13/0x13
[ 2109.470026] Code: 65 48 8b 04 25 48 cd 00 00 83 a8 44 e0 ff ff 01
48 8b 80 38 e0 ff ff a8 08 0f 84 7c fd ff ff e8 2c 71 26 00 e9 72 fd
ff ff 0f 0b 0f 0b 0f 1f 00 55 48 89 e5 41 54 53 66 66 66 66 90 8b 56
3c 48
[ 2109.470323] RIP  [811896ab] bio_split+0x2eb/0x2f0
[ 2109.470356]  RSP 880156587ae0
[ 2109.486493] ---[ end trace 2077ba124373992b ]---

If this isn't a btrfs problem please forgive me and let me know :)

Also, PLEASE CC me if you're responding to this! I am not subscribed
to the mailing list.

Regards,
-- 
Sebastian J.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: “bio too big” regression and silent data corruption in 3.0

2011-08-09 Thread Josef Bacik
On 08/08/2011 10:53 PM, Alexandre Oliva wrote:
 On Aug  7, 2011, Alexandre Oliva ol...@lsd.ic.unicamp.br wrote:
 
 2. Removing a partition from the filesystem (say, the external disk)
 didn't relocate “single” block groups as such to other disks, as
 expected.
 
 /me reads some code and resets expectations about RAID0 in btrfs ;-)
 
 update_block_group_flags is what does this.  It doesn't care what was
 chosen when the filesystem was created, it just forces RAID0 if more
 than 1 disk remains:
 
   /* turn single device chunks into raid0 */
   return stripped | BTRFS_BLOCK_GROUP_RAID0;
 
 Is this really intended?  Given my current understanding that RAID0
 doesn't mean striping over all disks, but only over two disks, I guess I
 might even be interested in it, but...  I still think the user's choice
 should be honored, but I don't see where the choice is stored (if it is
 at all).

Well -m single -d single means that we only have one disk and we don't
want duplication (usually one just does -m single since metadata is the
only thing duplicated by default).  But if you add more disks we want to
do RAID0 as we should be stripping across all the devices in the fs.

 
 
 I wonder, why can't btrfs mark at least mounted partitions as busy, in
 much the same way that swap, md and various filesystems do, to avoid
 such accidental reuses?
 
 Heh.  And *unmark* them when they're removed, too...  As in, it won't
 let me create a new filesystem in a partition that was just removed from
 a filesystem, if that was the partition listed in /etc/mtab.
 

Yeah our what is busy thing should be a little smarter.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: “bio too big” regression and silent data corruption in 3.0

2011-08-09 Thread Josef Bacik
On 08/08/2011 06:39 PM, Alexandre Oliva wrote:
 On Aug  7, 2011, Alexandre Oliva ol...@lsd.ic.unicamp.br wrote:
 
 tl;dr version: 3.0 produces “bio too big” dmesg entries and silently
 corrupts data in “meta-raid1/data-single” configurations on disks with
 different max_hw_sectors, where 2.6.38 worked fine.
 
 FWIW, I just got the same problem with 2.6.38.  No idea how I hadn't hit
 it before, but it's not a 3.0 regression, just a regular (but IMHO very
 serious) bug.
 

This is worriesome, I will try and find a usb disk with a small
sectorsize and see if I can reproduce.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: kill the durable block rsv stuff

2011-08-09 Thread Mitch Harder
On Mon, Aug 8, 2011 at 1:21 PM, Josef Bacik jo...@redhat.com wrote:
 This is confusing code and isn't used by anything anymore, so delete it.

 Signed-off-by: Josef Bacik jo...@redhat.com
 ---
  fs/btrfs/ctree.h       |   11 -
  fs/btrfs/disk-io.c     |    2 -
  fs/btrfs/extent-tree.c |  100 ---
  fs/btrfs/inode.c       |    4 --
  fs/btrfs/relocation.c  |    1 -
  5 files changed, 17 insertions(+), 101 deletions(-)

 diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
 index 9a2db9e..6071dab 100644
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -772,13 +772,10 @@ struct btrfs_space_info {
  struct btrfs_block_rsv {
        u64 size;
        u64 reserved;
 -       u64 freed[2];
        struct btrfs_space_info *space_info;
 -       struct list_head list;
        spinlock_t lock;
        atomic_t usage;
        unsigned int priority:8;
 -       unsigned int durable:1;
        unsigned int refill_used:1;
        unsigned int full:1;
  };
 @@ -840,7 +837,6 @@ struct btrfs_block_group_cache {
        spinlock_t lock;
        u64 pinned;
        u64 reserved;
 -       u64 reserved_pinned;
        u64 bytes_super;
        u64 flags;
        u64 sectorsize;
 @@ -919,11 +915,6 @@ struct btrfs_fs_info {

        struct btrfs_block_rsv empty_block_rsv;

 -       /* list of block reservations that cross multiple transactions */
 -       struct list_head durable_block_rsv_list;
 -
 -       struct mutex durable_block_rsv_mutex;
 -
        u64 generation;
        u64 last_trans_committed;

 @@ -2240,8 +2231,6 @@ void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv);
  struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_root *root);
  void btrfs_free_block_rsv(struct btrfs_root *root,
                          struct btrfs_block_rsv *rsv);
 -void btrfs_add_durable_block_rsv(struct btrfs_fs_info *fs_info,
 -                                struct btrfs_block_rsv *rsv);
  int btrfs_block_rsv_add(struct btrfs_trans_handle *trans,
                        struct btrfs_root *root,
                        struct btrfs_block_rsv *block_rsv,
 diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
 index 07b3ac6..0b5643a 100644
 --- a/fs/btrfs/disk-io.c
 +++ b/fs/btrfs/disk-io.c
 @@ -1665,8 +1665,6 @@ struct btrfs_root *open_ctree(struct super_block *sb,
        btrfs_init_block_rsv(fs_info-trans_block_rsv);
        btrfs_init_block_rsv(fs_info-chunk_block_rsv);
        btrfs_init_block_rsv(fs_info-empty_block_rsv);
 -       INIT_LIST_HEAD(fs_info-durable_block_rsv_list);
 -       mutex_init(fs_info-durable_block_rsv_mutex);
        atomic_set(fs_info-nr_async_submits, 0);
        atomic_set(fs_info-async_delalloc_pages, 0);
        atomic_set(fs_info-async_submit_draining, 0);
 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index d30e0b4..fc2686c 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -121,7 +121,6 @@ void btrfs_put_block_group(struct btrfs_block_group_cache 
 *cache)
        if (atomic_dec_and_test(cache-count)) {
                WARN_ON(cache-pinned  0);
                WARN_ON(cache-reserved  0);
 -               WARN_ON(cache-reserved_pinned  0);
                kfree(cache-free_space_ctl);
                kfree(cache);
        }
 @@ -3662,7 +3661,6 @@ void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv)
        spin_lock_init(rsv-lock);
        atomic_set(rsv-usage, 1);
        rsv-priority = 6;
 -       INIT_LIST_HEAD(rsv-list);
  }

  struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_root *root)
 @@ -3685,25 +3683,10 @@ void btrfs_free_block_rsv(struct btrfs_root *root,
  {
        if (rsv  atomic_dec_and_test(rsv-usage)) {
                btrfs_block_rsv_release(root, rsv, (u64)-1);
 -               if (!rsv-durable)
 -                       kfree(rsv);
 +               kfree(rsv);
        }
  }

 -/*
 - * make the block_rsv struct be able to capture freed space.
 - * the captured space will re-add to the the block_rsv struct
 - * after transaction commit
 - */
 -void btrfs_add_durable_block_rsv(struct btrfs_fs_info *fs_info,
 -                                struct btrfs_block_rsv *block_rsv)
 -{
 -       block_rsv-durable = 1;
 -       mutex_lock(fs_info-durable_block_rsv_mutex);
 -       list_add_tail(block_rsv-list, fs_info-durable_block_rsv_list);
 -       mutex_unlock(fs_info-durable_block_rsv_mutex);
 -}
 -
  int btrfs_block_rsv_add(struct btrfs_trans_handle *trans,
                        struct btrfs_root *root,
                        struct btrfs_block_rsv *block_rsv,
 @@ -3745,9 +3728,7 @@ int btrfs_block_rsv_check(struct btrfs_trans_handle 
 *trans,
                ret = 0;
        } else {
                num_bytes -= block_rsv-reserved;
 -               if (block_rsv-durable 
 -                   block_rsv-freed[0] + block_rsv-freed[1] = num_bytes)
 -                       commit_trans = 1;
 +               commit_trans = 1;
        }
        spin_unlock(block_rsv-lock);
        if (!ret)
 @@ 

Re: [PATCH 02/12 v5] Btrfs: introduce sub transaction stuff

2011-08-09 Thread Mitch Harder
On Sat, Aug 6, 2011 at 4:37 AM, Liu Bo liubo2...@cn.fujitsu.com wrote:
 Introduce a new concept sub transaction,
 the relation between transaction and sub transaction is

 transaction A       --- transid = x
   sub trans a(1)   --- sub_transid = x+1
   sub trans a(2)   --- sub_transid = x+2
     ... ...
   sub trans a(n-1) --- sub_transid = x+n-1
   sub trans a(n)   --- sub_transid = x+n
 transaction B       --- transid = x+n+1
     ... ...

 And the most important is
 a) a trans handler's transid now gets value from sub transid instead of 
 transid.
 b) when a transaction commits, transid may not added by 1, but depend on the
   biggest sub_transaction of the last neighbour transaction,
   i.e.
        B-transid = a(n)-transid + 1,
        (B-transid - A-transid) = 1
 c) we start a new sub transaction after a fsync.

 We also ship some 'trans-transid' to 'trans-transaction-transid' to
 ensure btrfs works well and to get rid of WARNings.

 These are used for the new log code.

 Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
 ---
  fs/btrfs/ctree.c       |   35 ++-
  fs/btrfs/ctree.h       |    1 +
  fs/btrfs/disk-io.c     |    7 ---
  fs/btrfs/extent-tree.c |   10 ++
  fs/btrfs/inode.c       |    4 ++--
  fs/btrfs/ioctl.c       |    2 +-
  fs/btrfs/relocation.c  |    6 +++---
  fs/btrfs/transaction.c |   13 -
  fs/btrfs/transaction.h |    1 +
  fs/btrfs/tree-defrag.c |    2 +-
  fs/btrfs/tree-log.c    |   16 ++--
  11 files changed, 59 insertions(+), 38 deletions(-)

 diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
 index 011cab3..41d1d17 100644
 --- a/fs/btrfs/ctree.c
 +++ b/fs/btrfs/ctree.c
 @@ -228,9 +228,9 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans,
        int level;
        struct btrfs_disk_key disk_key;

 -       WARN_ON(root-ref_cows  trans-transid !=
 +       WARN_ON(root-ref_cows  trans-transaction-transid !=
                root-fs_info-running_transaction-transid);
 -       WARN_ON(root-ref_cows  trans-transid != root-last_trans);
 +       WARN_ON(root-ref_cows  trans-transid  root-last_trans);

        level = btrfs_header_level(buf);
        if (level == 0)
 @@ -425,9 +425,9 @@ static noinline int __btrfs_cow_block(struct 
 btrfs_trans_handle *trans,

        btrfs_assert_tree_locked(buf);

 -       WARN_ON(root-ref_cows  trans-transid !=
 +       WARN_ON(root-ref_cows  trans-transaction-transid !=
                root-fs_info-running_transaction-transid);
 -       WARN_ON(root-ref_cows  trans-transid != root-last_trans);
 +       WARN_ON(root-ref_cows  trans-transid  root-last_trans);

        level = btrfs_header_level(buf);

 @@ -493,7 +493,8 @@ static noinline int __btrfs_cow_block(struct 
 btrfs_trans_handle *trans,
                else
                        parent_start = 0;

 -               WARN_ON(trans-transid != btrfs_header_generation(parent));
 +               WARN_ON(btrfs_header_generation(parent) 
 +                                               trans-transaction-transid);
                btrfs_set_node_blockptr(parent, parent_slot,
                                        cow-start);
                btrfs_set_node_ptr_generation(parent, parent_slot,
 @@ -514,7 +515,7 @@ static inline int should_cow_block(struct 
 btrfs_trans_handle *trans,
                                   struct btrfs_root *root,
                                   struct extent_buffer *buf)
  {
 -       if (btrfs_header_generation(buf) == trans-transid 
 +       if (btrfs_header_generation(buf) = trans-transaction-transid 
            !btrfs_header_flag(buf, BTRFS_HEADER_FLAG_WRITTEN) 
            !(root-root_key.objectid != BTRFS_TREE_RELOC_OBJECTID 
              btrfs_header_flag(buf, BTRFS_HEADER_FLAG_RELOC)))
 @@ -542,7 +543,7 @@ noinline int btrfs_cow_block(struct btrfs_trans_handle 
 *trans,
                       root-fs_info-running_transaction-transid);
                WARN_ON(1);
        }
 -       if (trans-transid != root-fs_info-generation) {
 +       if (trans-transaction-transid != root-fs_info-generation) {
                printk(KERN_CRIT trans %llu running %llu\n,
                       (unsigned long long)trans-transid,
                       (unsigned long long)root-fs_info-generation);
 @@ -645,7 +646,7 @@ int btrfs_realloc_node(struct btrfs_trans_handle *trans,

        if (trans-transaction != root-fs_info-running_transaction)
                WARN_ON(1);
 -       if (trans-transid != root-fs_info-generation)
 +       if (trans-transaction-transid != root-fs_info-generation)
                WARN_ON(1);

        parent_nritems = btrfs_header_nritems(parent);
 @@ -898,7 +899,7 @@ static noinline int balance_level(struct 
 btrfs_trans_handle *trans,

        WARN_ON(path-locks[level] != BTRFS_WRITE_LOCK 
                path-locks[level] != BTRFS_WRITE_LOCK_BLOCKING);
 -       WARN_ON(btrfs_header_generation(mid) != trans-transid);
 +       WARN_ON(btrfs_header_generation(mid)  

[PATCH] Btrfs: reserve sufficient space for ioctl clone

2011-08-09 Thread Sage Weil
Fix a crash/BUG_ON in the clone ioctl due to insufficient reservation. We
need to reserve space for:

 - adjusting the old extent (possibly splitting it)
 - adding the new extent
 - updating the inode

Signed-off-by: Sage Weil s...@newdream.net
---
 fs/btrfs/ioctl.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a3c4751..f038d4a 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2320,7 +2320,12 @@ static noinline long btrfs_ioctl_clone(struct file 
*file, unsigned long srcfd,
else
new_key.offset = destoff;
 
-   trans = btrfs_start_transaction(root, 1);
+   /*
+* 1 - adjusting old extent (we may have to split it)
+* 1 - add new extent
+* 1 - inode update
+*/
+   trans = btrfs_start_transaction(root, 3);
if (IS_ERR(trans)) {
ret = PTR_ERR(trans);
goto out;
-- 
1.7.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC, crash][PATCH] btrfs: allow cross-subvolume file clone

2011-08-09 Thread David Sterba
On Thu, Aug 04, 2011 at 09:19:26AM +0800, Miao Xie wrote:
  the patch has been applied on top of current linus which contains patches 
  from
  both pull requests (ed8f37370d83).
 
 I think it is because the caller didn't reserve enough space.Could you try to
 apply the following patch? It might fix this bug.
 
 [PATCH v2] Btrfs: reserve enough space for file clone
 http://marc.info/?l=linux-btrfsm=131192686626576w=2

Thanks! Yes, it does not crash anymore. Trees reflinked succesfully,
md5sums verified.


david

 
 Thanks
 Miao
 
  
  The filesystem consists of 5 devices 23G each, about 100G of usable space,
  mkfs.btrfs with defaults. The kernel tree has about 6G:
  
  $ btrfs fi df .
  Data, RAID0: total=10.00GB, used=5.55GB
  Data: total=8.00MB, used=0.00
  System, RAID1: total=8.00MB, used=4.00KB
  System: total=4.00MB, used=0.00
  Metadata, RAID1: total=1.50GB, used=121.75MB
  Metadata: total=8.00MB, used=0.00
  
  $ df -h .
  FilesystemSize  Used Avail Use% Mounted on
  /dev/sda5 110G  5.8G   82G   7% /mnt/sda5
  
  ie. plenty of free space.
  
  It's possible that I've omitted some important bits in the patch itself, or
  this exposes a bug of ENOSPC or delayed-inode.
  
  david
  ---
  
  From: David Sterba dste...@suse.cz
  
  Lift the EXDEV condition and allow different root trees for files being
  cloned, then pass source inode's root when searching for extents.
  
  Signed-off-by: David Sterba dste...@suse.cz
  ---
   fs/btrfs/ioctl.c |7 ---
   1 files changed, 4 insertions(+), 3 deletions(-)
  
  diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
  index 0b980af..58eb0ef 100644
  --- a/fs/btrfs/ioctl.c
  +++ b/fs/btrfs/ioctl.c
  @@ -2183,7 +2183,7 @@ static noinline long btrfs_ioctl_clone(struct file 
  *file, unsigned long srcfd,
  goto out_fput;
   
  ret = -EXDEV;
  -   if (src-i_sb != inode-i_sb || BTRFS_I(src)-root != root)
  +   if (src-i_sb != inode-i_sb)
  goto out_fput;
   
  ret = -ENOMEM;
  @@ -2247,13 +2247,14 @@ static noinline long btrfs_ioctl_clone(struct file 
  *file, unsigned long srcfd,
   * note the key will change type as we walk through the
   * tree.
   */
  -   ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
  +   ret = btrfs_search_slot(NULL, BTRFS_I(src)-root, key, path,
  +   0, 0);
  if (ret  0)
  goto out;
   
  nritems = btrfs_header_nritems(path-nodes[0]);
  if (path-slots[0] = nritems) {
  -   ret = btrfs_next_leaf(root, path);
  +   ret = btrfs_next_leaf(BTRFS_I(src)-root, path);
  if (ret  0)
  goto out;
  if (ret  0)
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


clone ioctl bug with inline extents

2011-08-09 Thread Sage Weil
Hi all,

I'm hitting a problem cloning inline extents that I haven't had much 
success tracking down.  It's simple enough to reproduce:

 echo   src
 echo 2  dst
 clone_range src 0 29 dst 0
 cmp src dst   # fails! dst is size 29 but contains 2\n\0\0\0\0...

where clone_range comes from 

 
http://ceph.newdream.net/git/?p=ceph.git;a=blob;f=qa/btrfs/clone_range.c;h=0a88e16013104c27aa87e7cd0d75e4d292419a19;hb=HEAD

The file size is adjusted for the target, and debug-tree shows an inline 
data extent of length 29, but it has the old data in it.  I'm not sure why

ret = btrfs_insert_empty_item(trans, root, path,
  new_key, size);
BUG_ON(ret);

[...]

leaf = path-nodes[0];
slot = path-slots[0];
write_extent_buffer(leaf, buf,
btrfs_item_ptr_offset(leaf, slot),
size);
inode_add_bytes(inode, datal);

is working when cloning to a new file but not over an existing one.  
Hopefully this is something silly I'm missing...

sage
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: clone ioctl bug with inline extents

2011-08-09 Thread Sage Weil
On Tue, 9 Aug 2011, Sage Weil wrote:
 Hi all,
 
 I'm hitting a problem cloning inline extents that I haven't had much 
 success tracking down.  It's simple enough to reproduce:
 
  echo   src
  echo 2  dst
  clone_range src 0 29 dst 0
  cmp src dst   # fails! dst is size 29 but contains 2\n\0\0\0\0...

facepalm, ok, this was just a matter of the ioctl code not dropping the 
old page cache pages.  I'm not sure how we didn't notice that for so long!

sage
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: truncate pages from clone ioctl target range

2011-08-09 Thread David Sterba
just a readability issue:

On Tue, Aug 09, 2011 at 12:00:41PM -0700, Sage Weil wrote:
 --- a/fs/btrfs/ioctl.c
 +++ b/fs/btrfs/ioctl.c
 @@ -2243,6 +2243,10 @@ static noinline long btrfs_ioctl_clone(struct file 
 *file, unsigned long srcfd,
   btrfs_wait_ordered_range(src, off, len);
   }
  
 + /* truncate page cache pages from target inode range */
 + truncate_inode_pages_range(inode-i_data, off,
 +   ((off+len+PAGE_CACHE_SIZE-1)PAGE_CACHE_MASK)-1);

   ALIGN(off + len, PAGE_CACHE_SIZE) - 1)

I'll give it some testing too :) thanks for catching this!

david

 +
   /* clone data */
   key.objectid = btrfs_ino(src);
   key.type = BTRFS_EXTENT_DATA_KEY;
 -- 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS partition won't mount

2011-08-09 Thread C Anthony Risinger
On Mon, Aug 8, 2011 at 10:54 PM, C Anthony Risinger anth...@xtfx.me wrote:
 On Wed, Aug 3, 2011 at 3:50 PM, Hugo Mills h...@carfax.org.uk wrote:

   Try the instructions on the wiki at [1]. (And please feed back
 and/or fix any issues you have with the instructions -- they're still
 quite new and probably have awkward corners).

 [1] 
 https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#I_can.27t_mount_my_filesystem.2C_and_I_get_a_kernel_oops.21

 this worked perfectly for me ... just saved my night from tedious
 restoration :-)

 im on kernel 3.0.1 -- hard poweroff led to that problem.  i haven't
 had any issues for some time ... im not sure what the problem was
 exactly, but sometimes systemd gets a little twacky and takes a year
 to shutdown ... guess i got a little impatient :-)

 anyways, thanks for the integration work!

well i tried to shutdown again and had to force poweroff via `echo b 
/proc/sysrq-trigger` (but this time it was because dbus segfaulted and
i couldnt ask systemd to reboot ... `kill -INT 1` wasn't working
either, maybe all systemd related)

... the same thing happened again.  i'm wondering if btrfs is causing
the hang to begin with?  i will watch it after i fix it tonight by
making systemd more verbose and see what it has to say.  im wondering:

) what else i could try to determine if btrfs is contributing to the issue
) any other more graceful options than `echo b  /proc/sysrq-trigger` exist

thanks,

-- 

C Anthony
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Applications using fsync cause hangs for several seconds every few minutes

2011-08-09 Thread Andrew Guertin
On 06/21/2011 01:15 PM, Jan Stilow wrote:
 Hello,
 
 Nirbheek Chauhan nirbheek at gentoo.org writes:
 [...]

 Every few minutes, (I guess) when applications do fsync (firefox,
 xchat, vim, etc), all applications that use fsync() hang for several
 seconds, and applications that use general IO suffer extreme
 slowdowns. iotop shows various combinations of the processes listed
 below doing writes, and the total write as 2-3MB/s.

 [btrfs-dealloc-]
 [btrfs-submit-0]
 [btrfs-transacti]
 [btrfs-endio-wri]
 [flush-btrfs-1]
 
 I'm using btrfs under a 2.6.39-ARCH kernel and run into the same issue.
 
 In my case the [btrfs-submit-0] and [btrfs-transacti] shows up in iotop
 and produce 99% of IO at the time a application is frozen. For something
 like 10 to 30 seconds.
 
 [...]

I see the same issue. I have bisected it to

4e69b598f6cfb0940b75abf7e179d6020e94ad1e is the first bad commit
commit 4e69b598f6cfb0940b75abf7e179d6020e94ad1e
Author: Josef Bacik jo...@redhat.com
Date:   Mon Mar 21 10:11:24 2011 -0400

Btrfs: cleanup how we setup free space clusters

...which came in between 2.6.38 and 2.6.39.

The newest kernel I have tried was 3.0-rc7, which still had the bug. I
have not tried 3.1-rc1, but plan to soon.

--Andrew
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html