Re: Deleted subvolume reappears and other cleaner issues

2013-02-06 Thread Alex Lyakas
Can anyone please comment on below?

On Thu, Jan 31, 2013 at 3:03 PM, Alex Lyakas
alex.bt...@zadarastorage.com wrote:
 Hi,
 I want to check if any of the below issues are worth/should be  fixed:

 # btrfs_ioctl_snap_destroy() does not commit a transaction. As a
 result, user can ask to delete a subvol, he receives ok back. Even
 if user does btrfs sub list,
 he will not see the deleted subvol (even though the transaction was
 not committed yet). But if a crash happens, ORPHAN_ITEM will not
 re-appear after crash.
 So after crash, the subvolume still exists perfectly fine (happened
 couple of times here).

 # btrfs_drop_snapshot() does not commit a transaction after
 btrfs_del_orphan_item(). So if the subvol deletion completed in one go
 (did not have to detach and re-attach to transaction, thus committing
 the ORPHAN_ITEM and drop_progress/level), then after crash ORPHAN_ITEM
 will not be in the tree, and subvolume still exists.

 # btrfs_drop_snapshot() checks btrfs_should_end_transaction(), and
 then does btrfs_end_transaction_throttle() and
 btrfs_start_transaction(). However, it looks like it can rejoin the
 same transaction if transaction was not not blocked yet. Minor issue,
 perhaps?

 # umount may get delayed because of pending-for-deletion subvolumes:
 btrfs_commit_super() locks the cleaner_mutex, so it will wait for the
 cleaner to complete.
 On the other hand, cleaner will not give up until it completes
 processing all its splice. If currently cleaner is not running, then
 btrfs_commit_super()
 calls btrfs_clean_old_snapshots() directly. So does it make sense:
 - btrfs_commit_super() will not call btrfs_clean_old_snapshots()
 - close_ctree() calls kthread_stop(cleaner_kthread) early, and cleaner
 thread periodically checks if it needs to exit

 Thanks,
 Alex.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Leaking btrfs_qgroup_inherit on snapshot creation?

2013-02-06 Thread Alex Lyakas
Hi Jan, Arne,
I see this code in create_snapshot:

if (inherit) {
pending_snapshot-inherit = *inherit;
*inherit = NULL;/* take responsibility to free it */
}

So, first thing I think it should be:
if (*inherit)
because in btrfs_ioctl_snap_create_v2() we have:
struct btrfs_qgroup_inherit *inherit = NULL;
...
btrfs_ioctl_snap_create_transid(..., inherit)

so the current check is very unlikely to be NULL.

Second, I don't see anybody freeing pending_snapshot-inherit. I guess
it should be freed after callin btrfs_qgroup_inherit() and also in
btrfs_destroy_pending_snapshots().

Thanks,
Alex.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Leaking btrfs_qgroup_inherit on snapshot creation?

2013-02-06 Thread Arne Jansen
Hi Alex,

On 02/06/13 12:18, Alex Lyakas wrote:
 Hi Jan, Arne,
 I see this code in create_snapshot:
 
   if (inherit) {
   pending_snapshot-inherit = *inherit;
   *inherit = NULL;/* take responsibility to free it */
   }
 
 So, first thing I think it should be:
 if (*inherit)
 because in btrfs_ioctl_snap_create_v2() we have:
 struct btrfs_qgroup_inherit *inherit = NULL;
 ...
 btrfs_ioctl_snap_create_transid(..., inherit)
 
 so the current check is very unlikely to be NULL.

But in btrfs_ioctl_snap_create it is called with NULL, so *inherit would
dereference a NULL pointer.

 
 Second, I don't see anybody freeing pending_snapshot-inherit. I guess
 it should be freed after callin btrfs_qgroup_inherit() and also in
 btrfs_destroy_pending_snapshots().

You're right. In our original version (6f72c7e20dbaea5) it was still
there, in transaction.c. It has been removed in 6fa9700e734:

commit 6fa9700e734275de2acbcb0e99414bd7ddfc60f1
Author: Miao Xie mi...@cn.fujitsu.com
Date:   Thu Sep 6 04:00:32 2012 -0600

Btrfs: fix error path in create_pending_snapshot()

This patch fixes the following problem:
- If we failed to deal with the delayed dir items, we should abort
transaction,
  just as its comment said. Fix it.
- If root reference or root back reference insertion failed, we should
  abort transaction. Fix it.
- Fix the double free problem of pending-inherit.
- Do not restore the trans-rsv if we doesn't change it.
- make the error path more clearly.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com

Miao, can you please explain where you see a double free?

-Arne


 Thanks,
 Alex.
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs: Error removing orphan entry, stopping orphan cleanup. could not do orphan cleanup -22

2013-02-06 Thread Marguerite Su
On Wed, Feb 6, 2013 at 4:45 AM, Josef Bacik jba...@fusionio.com wrote:
 Ok so I think we've fixed this already, can you build btrfs-next and try
 mounting with that and see if it fixes the problem?  Thanks,

Hi, Josef,

Is there any btrfs git compatible with kernel 3.7 and absorbs this
fix? So I can dkms it, I'm using openSUSE 12.3 Beta1 with kernel 3.7,
I can update to 3.8, but next time I do system update to RC1 it will
fallback to 3.7 again.

#btrfs-next seems to move forward to 3.8. /uapi/linux/btrfs.h doesn't
exist in kernel 3.7, it still use the old ioctl.h, I tried to revert
the ioctl.h  btrfs.h patch, but there's still errors about:

/usr/src/linux-3.7.1-1/include/trace/events/btrfs.h: In function
‘ftrace_raw_event_btrfs_transaction_commit’:
/usr/src/linux-3.7.1-1/include/trace/events/btrfs.h:61:1: error:
incompatible types when assigning to type ‘u64’ from type ‘atomic64_t’

Thanks

Marguerite
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: add delayed_iput list head to btrfs inode

2013-02-06 Thread Eric Sandeen
On Feb 5, 2013, at 8:11 PM, Liu Bo bo.li@oracle.com wrote:

 On Tue, Feb 05, 2013 at 03:14:05PM -0800, Zach Brown wrote:
 +struct btrfs_inode *b_inode = BTRFS_I(inode);
 +struct btrfs_fs_info *fs_info = b_inode-root-fs_info;
 
if (atomic_add_unless(inode-i_count, -1, 1))
return;
 
 -delayed = kmalloc(sizeof(*delayed), GFP_NOFS | __GFP_NOFAIL);
 -delayed-inode = inode;
 -
spin_lock(fs_info-delayed_iput_lock);
 -list_add_tail(delayed-list, fs_info-delayed_iputs);
 +list_add_tail(b_inode-delayed_iput, fs_info-delayed_iputs);
spin_unlock(fs_info-delayed_iput_lock);
 }
 
 Hmm.  I'm not great with inode life cycles, but isn't this only safe if
 someone else can't get an i_count reference while this is in flight?  It
 looks like the final iput does the unhashing, and so on, so couldn't an
 iget/iput race with this and try to add the inode's list_head twice?
 
 Yeah, same concern here.  Basically this will result in inodes still being
 in use on unmount.
 
 Actually I did a similar one, here is some disscussion:
 
 https://patchwork.kernel.org/patch/1824711/
 
Ok, thanks all.  We should remove Jeff's comment then, it sure sounded like a 
good idea...

Eric

 thanks,
 liubo
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-progs pull request for coverity fixes

2013-02-06 Thread Gene Czarcinski

On 02/05/2013 07:49 PM, Zach Brown wrote:

Hi gang,

Eric and I went through the warnings that a Coverity scan raised against
the reasonably recent btrfs-progs that's in Fedora.  We tried to tackle
the lowest hanging fruit: these are the most obvious and least risky
fixes.

If you ran your tests against the btrfs-progs in Fedora 18 then your 
tests may not have found problems fixed by the valgrind patch on F18.  
This patch is not applied to the current David Sterba's integration 
branches.


If you have the time, it might be useful (my opinion) to run your tests 
against Sterba's integration-02130201 branch ... some of the problems 
may be fixed and other not but you also may find some additional 
problems.  IMO, this branch (or something similar) will be the basis for 
a future v0.20 btrfs-progs ... at least that is what I am using/testing 
with.  There are over 80 commits in this branch over what is the baseis 
for the package in Fedora 18.


Gene
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


GIT PULL (was Re: Integration branch of btrfs-progs 2013-02-01)

2013-02-06 Thread David Sterba
On Fri, Feb 01, 2013 at 06:41:11PM +0100, David Sterba wrote:
 I'm going to do some more testing of the previous integration branch and
 send a pull request as of 20130126.

I didn't have that much time for testing as I wanted, at least did
another review round of the pending patchset. Please pull

  git://repo.or.cz/btrfs-progs-unstable/devel.git for-chris

with top commit 78b35a43988163dbf71d9a

Thanks,
david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-progs pull request for coverity fixes

2013-02-06 Thread Eric Sandeen
On 2/6/13 8:27 AM, Gene Czarcinski wrote:
 On 02/05/2013 07:49 PM, Zach Brown wrote:
 Hi gang,
 
 Eric and I went through the warnings that a Coverity scan raised
 against the reasonably recent btrfs-progs that's in Fedora.  We
 tried to tackle the lowest hanging fruit: these are the most
 obvious and least risky fixes.
 
 If you ran your tests against the btrfs-progs in Fedora 18 then your
 tests may not have found problems fixed by the valgrind patch on
 F18.  This patch is not applied to the current David Sterba's
 integration branches.

The original one was done against F18 with the valgrind patch
in place, yes.  So it may have missed several things.  I can easily
do the same analysis against any codebase if/when it's appropriate.

 If you have the time, it might be useful (my opinion) to run your
 tests against Sterba's integration-02130201 branch ... some of the
 problems may be fixed and other not but you also may find some
 additional problems.  IMO, this branch (or something similar) will be
 the basis for a future v0.20 btrfs-progs ... at least that is what I
 am using/testing with.  There are over 80 commits in this branch over
 what is the baseis for the package in Fedora 18.

I agree that we need to re-run against all those patches, but I think
I will wait until they make it all the way upstream.

Until it gets to the point where development and patch
integration happens upstream (or at least in an orderly, predictable
manner), and we have timely releases of userspace for distro consumption,
it's going to be a little hit and miss with tools like this, since
there are so many different codebases out there right now, with distros
all maintaining their own large patchsets.

There are certainly more static analysis issues to fix, but Zach
and I thought we'd just see if this group of patches managed to
make it upstream before we went a lot further with it.

Thanks,
-Eric

 Gene

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Deleted subvolume reappears and other cleaner issues

2013-02-06 Thread Josef Bacik
On Thu, Jan 31, 2013 at 06:03:06AM -0700, Alex Lyakas wrote:
 Hi,
 I want to check if any of the below issues are worth/should be  fixed:
 
 # btrfs_ioctl_snap_destroy() does not commit a transaction. As a
 result, user can ask to delete a subvol, he receives ok back. Even
 if user does btrfs sub list,
 he will not see the deleted subvol (even though the transaction was
 not committed yet). But if a crash happens, ORPHAN_ITEM will not
 re-appear after crash.
 So after crash, the subvolume still exists perfectly fine (happened
 couple of times here).

Same thing happens to normal unlinks, I don't see a reason to have different
rules for subvols.

 
 # btrfs_drop_snapshot() does not commit a transaction after
 btrfs_del_orphan_item(). So if the subvol deletion completed in one go
 (did not have to detach and re-attach to transaction, thus committing
 the ORPHAN_ITEM and drop_progress/level), then after crash ORPHAN_ITEM
 will not be in the tree, and subvolume still exists.
 

Again same thing happens with normal files.

 # btrfs_drop_snapshot() checks btrfs_should_end_transaction(), and
 then does btrfs_end_transaction_throttle() and
 btrfs_start_transaction(). However, it looks like it can rejoin the
 same transaction if transaction was not not blocked yet. Minor issue,
 perhaps?

No if we didn't block then its ok and we wait longer, we only throttle to give
the transaction stuff a chance to commit, so if the join logic decides its ok to
go on then we're good.

 
 # umount may get delayed because of pending-for-deletion subvolumes:
 btrfs_commit_super() locks the cleaner_mutex, so it will wait for the
 cleaner to complete.
 On the other hand, cleaner will not give up until it completes
 processing all its splice. If currently cleaner is not running, then
 btrfs_commit_super()
 calls btrfs_clean_old_snapshots() directly. So does it make sense:
 - btrfs_commit_super() will not call btrfs_clean_old_snapshots()
 - close_ctree() calls kthread_stop(cleaner_kthread) early, and cleaner
 thread periodically checks if it needs to exit

I don't quite follow this, but sure?  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs: Error removing orphan entry, stopping orphan cleanup. could not do orphan cleanup -22

2013-02-06 Thread Josef Bacik
On Wed, Feb 06, 2013 at 06:03:14AM -0700, Marguerite Su wrote:
 On Wed, Feb 6, 2013 at 4:45 AM, Josef Bacik jba...@fusionio.com wrote:
  Ok so I think we've fixed this already, can you build btrfs-next and try
  mounting with that and see if it fixes the problem?  Thanks,
 
 Hi, Josef,
 
 Is there any btrfs git compatible with kernel 3.7 and absorbs this
 fix? So I can dkms it, I'm using openSUSE 12.3 Beta1 with kernel 3.7,
 I can update to 3.8, but next time I do system update to RC1 it will
 fallback to 3.7 again.
 
 #btrfs-next seems to move forward to 3.8. /uapi/linux/btrfs.h doesn't
 exist in kernel 3.7, it still use the old ioctl.h, I tried to revert
 the ioctl.h  btrfs.h patch, but there's still errors about:
 
 /usr/src/linux-3.7.1-1/include/trace/events/btrfs.h: In function
 ‘ftrace_raw_event_btrfs_transaction_commit’:
 /usr/src/linux-3.7.1-1/include/trace/events/btrfs.h:61:1: error:
 incompatible types when assigning to type ‘u64’ from type ‘atomic64_t’
 

Btrfs-next is based on 3.7, you should be able to just git diff v3.7.. 
whatever.patch and get all the changes you need.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: add delayed_iput list head to btrfs inode

2013-02-06 Thread Eric Sandeen
On 2/5/13 8:08 PM, Liu Bo wrote:
 On Tue, Feb 05, 2013 at 03:14:05PM -0800, Zach Brown wrote:
 +   struct btrfs_inode *b_inode = BTRFS_I(inode);
 +   struct btrfs_fs_info *fs_info = b_inode-root-fs_info;
  
 if (atomic_add_unless(inode-i_count, -1, 1))
 return;
  
 -   delayed = kmalloc(sizeof(*delayed), GFP_NOFS | __GFP_NOFAIL);
 -   delayed-inode = inode;
 -
 spin_lock(fs_info-delayed_iput_lock);
 -   list_add_tail(delayed-list, fs_info-delayed_iputs);
 +   list_add_tail(b_inode-delayed_iput, fs_info-delayed_iputs);
 spin_unlock(fs_info-delayed_iput_lock);
  }

 Hmm.  I'm not great with inode life cycles, but isn't this only safe if
 someone else can't get an i_count reference while this is in flight?  It
 looks like the final iput does the unhashing, and so on, so couldn't an
 iget/iput race with this and try to add the inode's list_head twice?
 
 Yeah, same concern here.  Basically this will result in inodes still being
 in use on unmount.
 
 Actually I did a similar one, here is some disscussion:
 
 https://patchwork.kernel.org/patch/1824711/

I read it, thanks.  Did you try the counter approach?

-Eric

 thanks,
 liubo
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: add delayed_iput list head to btrfs inode

2013-02-06 Thread Liu Bo
On Wed, Feb 06, 2013 at 09:53:05AM -0600, Eric Sandeen wrote:
 On 2/5/13 8:08 PM, Liu Bo wrote:
  On Tue, Feb 05, 2013 at 03:14:05PM -0800, Zach Brown wrote:
  + struct btrfs_inode *b_inode = BTRFS_I(inode);
  + struct btrfs_fs_info *fs_info = b_inode-root-fs_info;
   
if (atomic_add_unless(inode-i_count, -1, 1))
return;
   
  - delayed = kmalloc(sizeof(*delayed), GFP_NOFS | __GFP_NOFAIL);
  - delayed-inode = inode;
  -
spin_lock(fs_info-delayed_iput_lock);
  - list_add_tail(delayed-list, fs_info-delayed_iputs);
  + list_add_tail(b_inode-delayed_iput, fs_info-delayed_iputs);
spin_unlock(fs_info-delayed_iput_lock);
   }
 
  Hmm.  I'm not great with inode life cycles, but isn't this only safe if
  someone else can't get an i_count reference while this is in flight?  It
  looks like the final iput does the unhashing, and so on, so couldn't an
  iget/iput race with this and try to add the inode's list_head twice?
  
  Yeah, same concern here.  Basically this will result in inodes still being
  in use on unmount.
  
  Actually I did a similar one, here is some disscussion:
  
  https://patchwork.kernel.org/patch/1824711/
 
 I read it, thanks.  Did you try the counter approach?

Yes, it'll bring a tradeoff situation.

With counter, we need to lock the list all the time instead of
doing a splice on the list and unlocking it.  I think splice would be
faster so I didn't go further(I MIGHT be wrong on this)..

thanks,
liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] Btrfs: Allow the compressed extent size limit to be modified.

2013-02-06 Thread Mitch Harder
Provide for modification of the limit of compressed extent size
utilizing mount-time configuration settings.

The size of compressed extents was limited to 128K, which
leads to fragmentation of the extents (although the extents
themselves may still be located contiguously).  This limit is
put in place to ease the RAM required when spreading compression
across several CPUs, and to make sure the amount of IO required
to do a random read is reasonably small.

This patch is still preliminary.

In this version of the patch, the allowed compressed extent size is
restricted to 128 (the default) and 512. I wanted to extensively test
a single value for a change in compressed extent size before expanding
and testing a wider range of parameters.

I submitted a similar patch about a year and a half ago where the
change was hard-coded and not tuneable.

http://comments.gmane.org/gmane.comp.file-systems.btrfs/10516

---
 fs/btrfs/ctree.h  |  3 +++
 fs/btrfs/disk-io.c|  1 +
 fs/btrfs/inode.c  |  8 
 fs/btrfs/relocation.c |  7 ---
 fs/btrfs/super.c  | 19 ++-
 5 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 547b7b0..f37ec32 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1477,6 +1477,8 @@ struct btrfs_fs_info {
unsigned data_chunk_allocations;
unsigned metadata_ratio;
 
+   unsigned compressed_extent_size;
+
void *bdev_holder;
 
/* private scrub information */
@@ -1829,6 +1831,7 @@ struct btrfs_ioctl_defrag_range_args {
 #define BTRFS_MOUNT_CHECK_INTEGRITY(1  20)
 #define BTRFS_MOUNT_CHECK_INTEGRITY_INCLUDING_EXTENT_DATA (1  21)
 #define BTRFS_MOUNT_PANIC_ON_FATAL_ERROR   (1  22)
+#define BTRFS_MOUNT_COMPR_EXTENT_SIZE (1  23)
 
 #define btrfs_clear_opt(o, opt)((o) = ~BTRFS_MOUNT_##opt)
 #define btrfs_set_opt(o, opt)  ((o) |= BTRFS_MOUNT_##opt)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 830bc17..2d2be03 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2056,6 +2056,7 @@ int open_ctree(struct super_block *sb,
fs_info-trans_no_join = 0;
fs_info-free_chunk_space = 0;
fs_info-tree_mod_log = RB_ROOT;
+   fs_info-compressed_extent_size = 128;
 
/* readahead state */
INIT_RADIX_TREE(fs_info-reada_tree, GFP_NOFS  ~__GFP_WAIT);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 148abeb..5b81b56 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -346,8 +346,8 @@ static noinline int compress_file_range(struct inode *inode,
unsigned long nr_pages_ret = 0;
unsigned long total_compressed = 0;
unsigned long total_in = 0;
-   unsigned long max_compressed = 128 * 1024;
-   unsigned long max_uncompressed = 128 * 1024;
+   unsigned long max_compressed = root-fs_info-compressed_extent_size * 
1024;
+   unsigned long max_uncompressed = root-fs_info-compressed_extent_size 
* 1024;
int i;
int will_compress;
int compress_type = root-fs_info-compress_type;
@@ -361,7 +361,7 @@ static noinline int compress_file_range(struct inode *inode,
 again:
will_compress = 0;
nr_pages = (end  PAGE_CACHE_SHIFT) - (start  PAGE_CACHE_SHIFT) + 1;
-   nr_pages = min(nr_pages, (128 * 1024UL) / PAGE_CACHE_SIZE);
+   nr_pages = min(nr_pages, (max_compressed * 1024UL) / PAGE_CACHE_SIZE);
 
/*
 * we don't want to send crud past the end of i_size through
@@ -386,7 +386,7 @@ again:
 *
 * We also want to make sure the amount of IO required to do
 * a random read is reasonably small, so we limit the size of
-* a compressed extent to 128k.
+* a compressed extent (default of 128k).
 */
total_compressed = min(total_compressed, max_uncompressed);
num_bytes = (end - start + blocksize)  ~(blocksize - 1);
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 300e09a..8d6f6bf 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -144,7 +144,7 @@ struct tree_block {
unsigned int key_ready:1;
 };
 
-#define MAX_EXTENTS 128
+#define MAX_EXTENTS 512
 
 struct file_extent_cluster {
u64 start;
@@ -3055,6 +3055,7 @@ int relocate_data_extent(struct inode *inode, struct 
btrfs_key *extent_key,
 struct file_extent_cluster *cluster)
 {
int ret;
+   struct btrfs_fs_info *fs_info = BTRFS_I(inode)-root-fs_info;
 
if (cluster-nr  0  extent_key-objectid != cluster-end + 1) {
ret = relocate_file_extent_cluster(inode, cluster);
@@ -3066,12 +3067,12 @@ int relocate_data_extent(struct inode *inode, struct 
btrfs_key *extent_key,
if (!cluster-nr)
cluster-start = extent_key-objectid;
else
-   BUG_ON(cluster-nr = MAX_EXTENTS);
+   BUG_ON(cluster-nr = fs_info-compressed_extent_size);
cluster-end = extent_key-objectid + 

Re: btrfs-progs pull request for coverity fixes

2013-02-06 Thread Chris Mason
On Tue, Feb 05, 2013 at 05:49:07PM -0700, Zach Brown wrote:
 Hi gang,
 
 Eric and I went through the warnings that a Coverity scan raised against
 the reasonably recent btrfs-progs that's in Fedora.  We tried to tackle
 the lowest hanging fruit: these are the most obvious and least risky
 fixes.

Awesome, thanks Zach and Eric.  I've got this rebased on top of Dave's
integration pull from today, and I'm doing a bunch of tests.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-progs pull request for coverity fixes

2013-02-06 Thread Zach Brown
 Awesome, thanks Zach and Eric.  I've got this rebased on top of Dave's
 integration pull from today, and I'm doing a bunch of tests.

Great, thanks for the pull.  We're happy to help!

- z
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Btrfs: Allow the compressed extent size limit to be modified.

2013-02-06 Thread Zach Brown
 + unsigned compressed_extent_size;

It kind of jumps out that this mentions neither that it's the max nor
that it's in KB.  How about max_compressed_extent_kb?

 + fs_info-compressed_extent_size = 128;

I'd put a DEFAULT_MAX_EXTENTS up by the MAX_ definition instead of using
a raw 128 here.

 + unsigned long max_compressed = root-fs_info-compressed_extent_size * 
 1024;
 + unsigned long max_uncompressed = root-fs_info-compressed_extent_size 
 * 1024;

(so max_compressed is in bytes)

   nr_pages = (end  PAGE_CACHE_SHIFT) - (start  PAGE_CACHE_SHIFT) + 1;
 - nr_pages = min(nr_pages, (128 * 1024UL) / PAGE_CACHE_SIZE);
 + nr_pages = min(nr_pages, (max_compressed * 1024UL) / PAGE_CACHE_SIZE);

(and now that expression adds another * 1024, allowing {128,512}MB
extents :))

* We also want to make sure the amount of IO required to do
* a random read is reasonably small, so we limit the size of
 -  * a compressed extent to 128k.
 +  * a compressed extent (default of 128k).

Just drop the value so that this comment doesn't need to be updated
again.

-* a compressed extent to 128k.
+* a compressed extent.

 + {Opt_compr_extent_size, compressed_extent_size=%d},

It's even more important to make the exposed option self-documenting
than it was to get the fs_info member right.

 + if ((intarg == 128) || (intarg == 512)) {
 + info-compressed_extent_size = intarg;
 + printk(KERN_INFO btrfs: compressed extent size 
 %d\n,
 +info-compressed_extent_size);
 + } else {
 + printk(KERN_INFO btrfs: 
 + Invalid compressed extent size,
 +  using default.\n);

I'd print the default value when it's used and would include a unit in
both.

 + if (btrfs_test_opt(root, COMPR_EXTENT_SIZE))
 + seq_printf(seq, ,compressed_extent_size=%d,
 +(unsigned long long)info-compressed_extent_size);

The (ull) cast doesn't match the %d format and wouldn't be needed if it
was printed with %u.

- z
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: rework the overcommit logic to be based on the total size

2013-02-06 Thread Josef Bacik
People have been complaining about random ENOSPC errors that will clear up
after a umount or just a given amount of time.  Chris was able to reproduce
this with stress.sh and lots of processes and so was I.  Basically the
overcommit stuff would really let us get out of hand, in my tests I saw up
to 30 gigs of outstanding reservations with only 2 gigs total of metadata
space.  This usually worked out fine but with so much outstanding
reservation the flushing stuff short circuits to make sure we don't hang
forever flushing when we really need ENOSPC.  Plus we allocate chunks in
order to alleviate the pressure, but this doesn't actually help us since we
only use the non-allocated area in our over commit logic.

So instead of basing overcommit on the amount of non-allocated space,
instead just do it based on how much total space we have, and then limit it
to the non-allocated space in case we are short on space to spill over into.
This allows us to have the same performance as well as no longer giving
random ENOSPC.  Thanks,

Signed-off-by: Josef Bacik jba...@fusionio.com
---
 fs/btrfs/extent-tree.c |   15 ---
 1 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 11ffa16..981130b 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3677,6 +3677,7 @@ static int can_overcommit(struct btrfs_root *root,
u64 rsv_size = 0;
u64 avail;
u64 used;
+   u64 to_add;
 
used = space_info-bytes_used + space_info-bytes_reserved +
space_info-bytes_pinned + space_info-bytes_readonly;
@@ -3710,17 +3711,25 @@ static int can_overcommit(struct btrfs_root *root,
   BTRFS_BLOCK_GROUP_RAID10))
avail = 1;
 
+   to_add = space_info-total_bytes;
+
/*
 * If we aren't flushing all things, let us overcommit up to
 * 1/2th of the space. If we can flush, don't let us overcommit
 * too much, let it overcommit up to 1/8 of the space.
 */
if (flush == BTRFS_RESERVE_FLUSH_ALL)
-   avail = 3;
+   to_add = 3;
else
-   avail = 1;
+   to_add = 1;
+
+   /*
+* Limit the overcommit to the amount of free space we could possibly
+* allocate for chunks.
+*/
+   to_add = min(avail, to_add);
 
-   if (used + bytes  space_info-total_bytes + avail)
+   if (used + bytes  space_info-total_bytes + to_add)
return 1;
return 0;
 }
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: handle errors in compression submission path

2013-02-06 Thread Josef Bacik
I noticed we would deadlock if we aborted a transaction while doing
compressed io.  This is because we don't unlock our pages if something goes
horribly wrong.  To fix this we need to make sure that we call
extent_clear_unlock_delalloc in order to unlock all the pages.  If we have
to cow in the async submission thread we need to make sure to unlock our
locked_page as the cow error path will not unlock the locked page as it
depends on the caller to unlock that page.  With this patch we no longer
deadlock on the page lock when we have an aborted transaction.  Thanks,

Signed-off-by: Josef Bacik jba...@fusionio.com
---
 fs/btrfs/inode.c |   38 --
 1 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5398883..44d95d1 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -608,7 +608,7 @@ static noinline int submit_compressed_extents(struct inode 
*inode,
if (list_empty(async_cow-extents))
return 0;
 
-
+again:
while (!list_empty(async_cow-extents)) {
async_extent = list_entry(async_cow-extents.next,
  struct async_extent, list);
@@ -648,6 +648,8 @@ retry:
  async_extent-ram_size - 1,
  btrfs_get_extent,
  WB_SYNC_ALL);
+   else if (ret)
+   unlock_page(async_cow-locked_page);
kfree(async_extent);
cond_resched();
continue;
@@ -672,6 +674,7 @@ retry:
 
if (ret) {
int i;
+
for (i = 0; i  async_extent-nr_pages; i++) {
WARN_ON(async_extent-pages[i]-mapping);
page_cache_release(async_extent-pages[i]);
@@ -679,12 +682,10 @@ retry:
kfree(async_extent-pages);
async_extent-nr_pages = 0;
async_extent-pages = NULL;
-   unlock_extent(io_tree, async_extent-start,
- async_extent-start +
- async_extent-ram_size - 1);
+
if (ret == -ENOSPC)
goto retry;
-   goto out_free; /* JDM: Requeue? */
+   goto out_free;
}
 
/*
@@ -696,7 +697,8 @@ retry:
async_extent-ram_size - 1, 0);
 
em = alloc_extent_map();
-   BUG_ON(!em); /* -ENOMEM */
+   if (!em)
+   goto out_free_reserve;
em-start = async_extent-start;
em-len = async_extent-ram_size;
em-orig_start = em-start;
@@ -728,6 +730,9 @@ retry:
async_extent-ram_size - 1, 0);
}
 
+   if (ret)
+   goto out_free_reserve;
+
ret = btrfs_add_ordered_extent_compress(inode,
async_extent-start,
ins.objectid,
@@ -735,7 +740,8 @@ retry:
ins.offset,
BTRFS_ORDERED_COMPRESSED,
async_extent-compress_type);
-   BUG_ON(ret); /* -ENOMEM */
+   if (ret)
+   goto out_free_reserve;
 
/*
 * clear dirty, set writeback and unlock the pages.
@@ -756,18 +762,30 @@ retry:
ins.objectid,
ins.offset, async_extent-pages,
async_extent-nr_pages);
-
-   BUG_ON(ret); /* -ENOMEM */
alloc_hint = ins.objectid + ins.offset;
kfree(async_extent);
+   if (ret)
+   goto out;
cond_resched();
}
ret = 0;
 out:
return ret;
+out_free_reserve:
+   btrfs_free_reserved_extent(root, ins.objectid, ins.offset);
 out_free:
+   extent_clear_unlock_delalloc(inode, BTRFS_I(inode)-io_tree,
+async_extent-start,
+async_extent-start +
+async_extent-ram_size - 1,
+NULL, EXTENT_CLEAR_UNLOCK_PAGE |
+EXTENT_CLEAR_UNLOCK |
+EXTENT_CLEAR_DELALLOC |
+EXTENT_CLEAR_DIRTY |
+EXTENT_SET_WRITEBACK |
+

Re: [PATCH] Btrfs-progs print more informative error when we fail to open a device

2013-02-06 Thread David Sterba
On Mon, Feb 04, 2013 at 10:57:57AM -0500, Gene Czarcinski wrote:
 From: Eric Sandeen sand...@redhat.com
 
 print more informative error when we fail to open a device
 
 If open() fails, we should let the user know why it failed.

Thanks, added to next integration queue.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs balance - hang/crash

2013-02-06 Thread Michael Schneider
Hi,

my btrfs hangs when doing a balance operation.

I'm using a 3.7.1 kernel from opensuse:  linux-opzz 3.7.1-2.10-m4 #11 SMP
PREEMPT Fri Jan 11 18:04:04 CET 2013 x86_64 x86_64 x86_64 GNU/Linux
and Btrfs v0.19+

I did a scrub which completed without errors. Then I tried btrfs filesystem
balance / which work fine for the first 23 of 46 chunks, then ist stopped
processing further chunks, but contiued to consume large amounts of the CPU.
The system logged filled quickly with messages like:

[  347.237658] btrfs: block rsv returned -28
[  347.237661] [ cut here ]
[  347.237667] WARNING: at fs/btrfs/extent-tree.c:6297
btrfs_alloc_free_block+0x399/0x3a0()
[  347.237668] Hardware name: GA-MA74GM-S2H
[  347.237669] Modules linked in: fuse xt_pkttype af_packet ipt_REJECT
xt_conntrack iptable_raw xt_CT iptable_filter nf
_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_conntrack
nf_defrag_ipv4 ip_tables x_tables cpufreq_c
onservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf
sp5100_tco usb_storage kvm_amd kvm snd_hda_codec_rea
ltek k10temp edac_core edac_mce_amd sr_mod cdrom i2c_piix4 snd_hda_intel
snd_hda_codec sg snd_hwdep snd_pcm snd_timer s
nd_page_alloc floppy edd microcode autofs4 radeon ttm drm_kms_helper
processor thermal_sys pata_atiixp
[  347.237694] Pid: 320, comm: btrfs-balance Not tainted 3.7.1-2.10-m4 #11
[  347.237695] Call Trace:
[  347.237707]  [810046c7] dump_trace+0x87/0x2f0
[  347.237711]  [81591682] dump_stack+0x69/0x6f
[  347.237716]  [8103cb79] warn_slowpath_common+0x79/0xc0
[  347.237720]  [8122c889] btrfs_alloc_free_block+0x399/0x3a0
[  347.237725]  [81218617] __btrfs_cow_block+0x137/0x550
[  347.237728]  [81218baf] btrfs_cow_block+0xff/0x250
[  347.237731]  [8121d2f1] btrfs_search_slot+0x421/0x980
[  347.237735]  [8127fa0e] do_relocation+0x3be/0x510
[  347.237740]  [812839f3] relocate_tree_blocks+0x5e3/0x610
[  347.237743]  [812849a4] relocate_block_group+0x444/0x6c0
[  347.237747]  [81284dc9] btrfs_relocate_block_group+0x1a9/0x2d0
[  347.237751]  [8125de36] btrfs_relocate_chunk.isra.53+0x56/0x730
[  347.237754]  [812621fe] btrfs_balance+0x82e/0xd50
[  347.237758]  [812627a0] balance_kthread+0x80/0x90
[  347.237762]  [8105ec33] kthread+0xb3/0xc0
[  347.237766]  [815a43fc] ret_from_fork+0x7c/0xb0
[  347.237770] ---[ end trace cf4d2bf19a87ec83 ]---
[  347.239650] btrfs: block rsv returned -28

[  347.239653] [ cut here ]
[  347.239658] WARNING: at fs/btrfs/extent-tree.c:6297
btrfs_alloc_free_block+0x399/0x3a0()
[  347.239660] Hardware name: GA-MA74GM-S2H
[  347.239661] Modules linked in: fuse xt_pkttype af_packet ipt_REJECT
xt_conntrack iptable_raw xt_CT iptable_filter nf_conntrack_netbios_ns
nf_conntrack_broadcast nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4
ip_tables x_tables cpufreq_conservative cpufreq_userspace cpufreq_powersave
acpi_cpufreq mperf sp5100_tco usb_storage kvm_amd kvm snd_hda_codec_realtek
k10temp edac_core edac_mce_amd sr_mod cdrom i2c_piix4 snd_hda_intel
snd_hda_codec sg snd_hwdep snd_pcm snd_timer snd_page_alloc floppy edd
microcode autofs4 radeon ttm drm_kms_helper processor thermal_sys
pata_atiixp
[  347.239683] Pid: 320, comm: btrfs-balance Tainted: GW
3.7.1-2.10-m4 #11
[  347.239685] Call Trace:
[  347.239695]  [810046c7] dump_trace+0x87/0x2f0
[  347.239699]  [81591682] dump_stack+0x69/0x6f
[  347.239704]  [8103cb79] warn_slowpath_common+0x79/0xc0
[  347.239708]  [8122c889] btrfs_alloc_free_block+0x399/0x3a0
[  347.239713]  [81218617] __btrfs_cow_block+0x137/0x550
[  347.239716]  [81218baf] btrfs_cow_block+0xff/0x250
[  347.239719]  [8121d2f1] btrfs_search_slot+0x421/0x980
[  347.239723]  [8127fa0e] do_relocation+0x3be/0x510
[  347.239728]  [812839f3] relocate_tree_blocks+0x5e3/0x610
[  347.239731]  [812849a4] relocate_block_group+0x444/0x6c0
[  347.239735]  [81284dc9] btrfs_relocate_block_group+0x1a9/0x2d0
[  347.239739]  [8125de36] btrfs_relocate_chunk.isra.53+0x56/0x730
[  347.239743]  [812621fe] btrfs_balance+0x82e/0xd50
[  347.239746]  [812627a0] balance_kthread+0x80/0x90
[  347.239750]  [8105ec33] kthread+0xb3/0xc0
[  347.239755]  [815a43fc] ret_from_fork+0x7c/0xb0
[  347.239758] ---[ end trace cf4d2bf19a87ec84 ]---
[  352.352742] use_block_rsv: 146 callbacks suppressed
[  352.352750] btrfs: block rsv returned -28
[  352.352753] [ cut here ]
[  352.352765] WARNING: at fs/btrfs/extent-tree.c:6297
btrfs_alloc_free_block+0x399/0x3a0()
[  352.352769] Hardware name: GA-MA74GM-S2H
[  352.352772] Modules linked in: fuse xt_pkttype af_packet ipt_REJECT
xt_conntrack iptable_raw xt_CT iptable_filter nf_conntrack_netbios_ns
nf_conntrack_broadcast nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4
ip_tables x_tables 

Re: [PATCH] btrfs-progs: remove unused bit-radix.[ch] files

2013-02-06 Thread David Sterba
On Mon, Feb 04, 2013 at 11:26:23AM -0600, Eric Sandeen wrote:
 fd53de4d Drop bit-radix.[ch] files
 removed the files from the Makefile, but not the files themselves.

Added to next integration.

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balance - hang/crash

2013-02-06 Thread Hugo Mills
On Wed, Feb 06, 2013 at 11:12:14PM +0100, Michael Schneider wrote:
 Hi,
 
 my btrfs hangs when doing a balance operation.
 
 I'm using a 3.7.1 kernel from opensuse:  linux-opzz 3.7.1-2.10-m4 #11 SMP
 PREEMPT Fri Jan 11 18:04:04 CET 2013 x86_64 x86_64 x86_64 GNU/Linux
 and Btrfs v0.19+
 
 I did a scrub which completed without errors. Then I tried btrfs filesystem
 balance / which work fine for the first 23 of 46 chunks, then ist stopped
 processing further chunks, but contiued to consume large amounts of the CPU.
 The system logged filled quickly with messages like:
 
 [  347.237658] btrfs: block rsv returned -28
[snip]
 Any idea what's going on?

   You're running out of space. Can you post the output of:

# btrfs fi df /mountpoint
# btrfs fi show

this will give us a bit more info about likely scenarios. Also, josef
published a patch(*) in the last few hours, on this list, which may
help deal with the issue.

   Hugo.

(*) Btrfs: rework the overcommit logic to be based on the total size

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- What do you give the man who has everything? -- Penicillin is ---  
 a good start... 


signature.asc
Description: Digital signature


Re: btrfs balance - hang/crash

2013-02-06 Thread Michael Schneider
# btrfs fi show
failed to read /dev/sr0
Label: 'bootssd'  uuid: ac89584c-c757-488e-8a2a-5ee5a5484dde
Total devices 1 FS bytes used 34.98GB
devid1 size 56.00GB used 41.07GB path /dev/dm-1

Btrfs v0.19+

# btrfs fi df /
Data: total=35.00GB, used=33.34GB
System, DUP: total=32.00MB, used=16.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=3.00GB, used=1.63GB


On Wed, Feb 6, 2013 at 11:21 PM, Hugo Mills h...@carfax.org.uk wrote:
 On Wed, Feb 06, 2013 at 11:12:14PM +0100, Michael Schneider wrote:
 Hi,

 my btrfs hangs when doing a balance operation.

 I'm using a 3.7.1 kernel from opensuse:  linux-opzz 3.7.1-2.10-m4 #11 SMP
 PREEMPT Fri Jan 11 18:04:04 CET 2013 x86_64 x86_64 x86_64 GNU/Linux
 and Btrfs v0.19+

 I did a scrub which completed without errors. Then I tried btrfs filesystem
 balance / which work fine for the first 23 of 46 chunks, then ist stopped
 processing further chunks, but contiued to consume large amounts of the CPU.
 The system logged filled quickly with messages like:

 [  347.237658] btrfs: block rsv returned -28
 [snip]
 Any idea what's going on?

You're running out of space. Can you post the output of:

 # btrfs fi df /mountpoint
 # btrfs fi show

 this will give us a bit more info about likely scenarios. Also, josef
 published a patch(*) in the last few hours, on this list, which may
 help deal with the issue.

Hugo.

 (*) Btrfs: rework the overcommit logic to be based on the total size

 --
 === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
   PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- What do you give the man who has everything? -- Penicillin is ---
  a good start...
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Btrfs: Allow the compressed extent size limit to be modified.

2013-02-06 Thread Mitch Harder
On Wed, Feb 6, 2013 at 12:46 PM, Zach Brown z...@redhat.com wrote:
 + unsigned compressed_extent_size;

 It kind of jumps out that this mentions neither that it's the max nor
 that it's in KB.  How about max_compressed_extent_kb?

 + fs_info-compressed_extent_size = 128;

 I'd put a DEFAULT_MAX_EXTENTS up by the MAX_ definition instead of using
 a raw 128 here.

 + unsigned long max_compressed = root-fs_info-compressed_extent_size * 
 1024;
 + unsigned long max_uncompressed = root-fs_info-compressed_extent_size 
 * 1024;

 (so max_compressed is in bytes)

   nr_pages = (end  PAGE_CACHE_SHIFT) - (start  PAGE_CACHE_SHIFT) + 1;
 - nr_pages = min(nr_pages, (128 * 1024UL) / PAGE_CACHE_SIZE);
 + nr_pages = min(nr_pages, (max_compressed * 1024UL) / PAGE_CACHE_SIZE);

 (and now that expression adds another * 1024, allowing {128,512}MB
 extents :))


Yuk!  I'm surprised this never manifested as a problem during testing.

* We also want to make sure the amount of IO required to do
* a random read is reasonably small, so we limit the size of
 -  * a compressed extent to 128k.
 +  * a compressed extent (default of 128k).

 Just drop the value so that this comment doesn't need to be updated
 again.

 -* a compressed extent to 128k.
 +* a compressed extent.

 + {Opt_compr_extent_size, compressed_extent_size=%d},

 It's even more important to make the exposed option self-documenting
 than it was to get the fs_info member right.

 + if ((intarg == 128) || (intarg == 512)) {
 + info-compressed_extent_size = intarg;
 + printk(KERN_INFO btrfs: compressed extent 
 size %d\n,
 +info-compressed_extent_size);
 + } else {
 + printk(KERN_INFO btrfs: 
 + Invalid compressed extent size,
 +  using default.\n);

 I'd print the default value when it's used and would include a unit in
 both.

 + if (btrfs_test_opt(root, COMPR_EXTENT_SIZE))
 + seq_printf(seq, ,compressed_extent_size=%d,
 +(unsigned long long)info-compressed_extent_size);

 The (ull) cast doesn't match the %d format and wouldn't be needed if it
 was printed with %u.

 - z

Thanks for the review.

All these comments make sense, and I should be able to work them in.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2, RFC] btrfs-progs: overhaul mkfs.btrfs -r option

2013-02-06 Thread David Sterba
On Tue, Jan 29, 2013 at 02:41:08PM -0600, Eric Sandeen wrote:
 The manpage for the -r option simply says that
 it will copy the path specified to -r into the newly
 made filesystem.
 
 There's not a lot of reason to treat that option as differently
 as it is now - today it ignores discard, fs size, and mixed
 options, for example.  It also failed to check whether the
 target device was mounted before proceeding.  Etc...
 
 Rework things so that we really follow the same paths whether
 or not -r is specified, but with one special case for -r:
 
 * If the device does not exist, it will be created as
   a regular file of the minimum size to hold the -r path,
   or of size specified by the -b option.
 
 This also changes a little behavior; it does not pre-fill
 the new file with zeros, but allows it to be sparse, and does
 not truncate an existing device file.  If you want to start
 with an empty file, just don't point it at an existing
 file...

Fixes numerous bugs and I find the changes in behaviour sane and
reasonable. A quick test failed for me when the image file does not
exist:

$ rm image
$ ./mkfs.btrfs -r . image
...
not enough free space
add_file_items failed
unable to traverse_directory
Making image is aborted.
ret = -1
mkfs.btrfs: mkfs.c:1533: main: Assertion `!(ret)' failed.

seems that the estimated size is not sufficient.

'du' on the directory (without the image itself) gives me 59M, the image
after failed mkfs has 102M, and it's empty when I try to mount it. Using
the progs .git directory (18M) as sourcedir also failed, but it worked
with man/ subdirectory (94K).

david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] Btrfs fixes

2013-02-06 Thread Chris Mason
Hi Linus,

Please pull my for-linus branch:

git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus

We've got corner cases for updating i_size that ceph was hitting, error
handling for quotas when we run out of space, a very subtle snapshot
deletion race, a crash while removing devices, and one deadlock between
subvolume creation and the sb_internal code (thanks lockdep).

Josef Bacik (3) commits (+12/-4):
Btrfs: do not merge logged extents if we've removed them from the tree 
(+2/-1)
Btrfs: fix possible stale data exposure (+1/-1)
Btrfs: fix missing i_size update (+9/-2)

Miao Xie (2) commits (+21/-9):
Btrfs: fix missing release of the space/qgroup reservation in 
start_transaction() (+19/-8)
Btrfs: fix wrong sync_writers decrement in btrfs_file_aio_write() (+2/-1)

Jan Schmidt (1) commits (+10/-12):
Btrfs: fix EDQUOT handling in btrfs_delalloc_reserve_metadata

Liu Bo (1) commits (+38/-9):
Btrfs: fix race between snapshot deletion and getting inode

Chris Mason (1) commits (+4/-1):
Btrfs: move d_instantiate outside the transaction during mksubvol

Eric Sandeen (1) commits (+2/-1):
btrfs: don't try to notify udev about missing devices

Total: (9) commits

 fs/btrfs/extent-tree.c  | 22 ++
 fs/btrfs/extent_map.c   |  3 ++-
 fs/btrfs/file.c | 25 -
 fs/btrfs/ioctl.c|  5 -
 fs/btrfs/ordered-data.c | 13 ++---
 fs/btrfs/scrub.c| 25 -
 fs/btrfs/transaction.c  | 27 +++
 fs/btrfs/volumes.c  |  3 ++-
 8 files changed, 87 insertions(+), 36 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] Btrfs fixes

2013-02-06 Thread Chris Mason
[ sorry, my lbdb seems to really like linux-ker...@vger.kerrnel.org,
fixed for real this time ]

Hi Linus,

Please pull my for-linus branch:

git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus

We've got corner cases for updating i_size that ceph was hitting, error
handling for quotas when we run out of space, a very subtle snapshot
deletion race, a crash while removing devices, and one deadlock between
subvolume creation and the sb_internal code (thanks lockdep).

Josef Bacik (3) commits (+12/-4):
Btrfs: do not merge logged extents if we've removed them from the tree 
(+2/-1)
Btrfs: fix possible stale data exposure (+1/-1)
Btrfs: fix missing i_size update (+9/-2)

Miao Xie (2) commits (+21/-9):
Btrfs: fix missing release of the space/qgroup reservation in 
start_transaction() (+19/-8)
Btrfs: fix wrong sync_writers decrement in btrfs_file_aio_write() (+2/-1)

Jan Schmidt (1) commits (+10/-12):
Btrfs: fix EDQUOT handling in btrfs_delalloc_reserve_metadata

Liu Bo (1) commits (+38/-9):
Btrfs: fix race between snapshot deletion and getting inode

Chris Mason (1) commits (+4/-1):
Btrfs: move d_instantiate outside the transaction during mksubvol

Eric Sandeen (1) commits (+2/-1):
btrfs: don't try to notify udev about missing devices

Total: (9) commits

 fs/btrfs/extent-tree.c  | 22 ++
 fs/btrfs/extent_map.c   |  3 ++-
 fs/btrfs/file.c | 25 -
 fs/btrfs/ioctl.c|  5 -
 fs/btrfs/ordered-data.c | 13 ++---
 fs/btrfs/scrub.c| 25 -
 fs/btrfs/transaction.c  | 27 +++
 fs/btrfs/volumes.c  |  3 ++-
 8 files changed, 87 insertions(+), 36 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Leaking btrfs_qgroup_inherit on snapshot creation?

2013-02-06 Thread Miao Xie
On wed, 06 Feb 2013 13:14:23 +0100, Arne Jansen wrote:
 Hi Alex,
 
 On 02/06/13 12:18, Alex Lyakas wrote:
 Hi Jan, Arne,
 I see this code in create_snapshot:

  if (inherit) {
  pending_snapshot-inherit = *inherit;
  *inherit = NULL;/* take responsibility to free it */
  }

 So, first thing I think it should be:
 if (*inherit)
 because in btrfs_ioctl_snap_create_v2() we have:
 struct btrfs_qgroup_inherit *inherit = NULL;
 ...
 btrfs_ioctl_snap_create_transid(..., inherit)

 so the current check is very unlikely to be NULL.
 
 But in btrfs_ioctl_snap_create it is called with NULL, so *inherit would
 dereference a NULL pointer.
 

 Second, I don't see anybody freeing pending_snapshot-inherit. I guess
 it should be freed after callin btrfs_qgroup_inherit() and also in
 btrfs_destroy_pending_snapshots().
 
 You're right. In our original version (6f72c7e20dbaea5) it was still
 there, in transaction.c. It has been removed in 6fa9700e734:
 
 commit 6fa9700e734275de2acbcb0e99414bd7ddfc60f1
 Author: Miao Xie mi...@cn.fujitsu.com
 Date:   Thu Sep 6 04:00:32 2012 -0600
 
 Btrfs: fix error path in create_pending_snapshot()
 
 This patch fixes the following problem:
 - If we failed to deal with the delayed dir items, we should abort
 transaction,
   just as its comment said. Fix it.
 - If root reference or root back reference insertion failed, we should
   abort transaction. Fix it.
 - Fix the double free problem of pending-inherit.
 - Do not restore the trans-rsv if we doesn't change it.
 - make the error path more clearly.
 
 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 
 Miao, can you please explain where you see a double free?

Sorry, I misread the code,I didn't notice that the pointer had been assigned
to NULL.

But I think we can make the code more readable and be easy to maintain, we can
free the memory in the caller(btrfs_ioctl_snap_create_v2()) since we are sure
the snapshot creation is done after btrfs_ioctl_snap_create_transid() completes.

Thanks
Miao

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Btrfs: fix the race between bio and btrfs_stop_workers

2013-02-06 Thread Miao Xie
open_ctree() need read the metadata to initialize the global information
of btrfs. But it may fail after it submit some bio, and then it will jump
to the error path. Unfortunately, it doesn't check if there are some bios
in flight, and just stop all the worker threads. As a result, when the
submitted bios end, they can not find any worker thread which can deal with
subsequent work, then oops happen.

kernel BUG at fs/btrfs/async-thread.c:605!

Fix this problem by invoking invalidate_inode_pages2() before we stop the
worker threads. This function will wait until the bio end because it need
lock the pages which are going to be invalidated, and if a page is under
disk read IO, it must be locked. invalidate_inode_pages2() need wait until
end bio handler to unlocked it.

Reported-and-Tested-by: Tsutomu Itoh t-i...@jp.fujitsu.com
Signed-off-by: Eric Sandeen sand...@redhat.com
Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/disk-io.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 0c31d07..d8fd711 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2728,13 +2728,13 @@ fail_cleaner:
 * kthreads
 */
filemap_write_and_wait(fs_info-btree_inode-i_mapping);
-   invalidate_inode_pages2(fs_info-btree_inode-i_mapping);
 
 fail_block_groups:
btrfs_free_block_groups(fs_info);
 
 fail_tree_roots:
free_root_pointers(fs_info, 1);
+   invalidate_inode_pages2(fs_info-btree_inode-i_mapping);
 
 fail_sb_buffer:
btrfs_stop_workers(fs_info-generic_worker);
@@ -2755,7 +2755,6 @@ fail_alloc:
 fail_iput:
btrfs_mapping_tree_free(fs_info-mapping_tree);
 
-   invalidate_inode_pages2(fs_info-btree_inode-i_mapping);
iput(fs_info-btree_inode);
 fail_bdi:
bdi_destroy(fs_info-bdi);
-- 
1.7.11.7
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Btrfs: fix memory leak of pending_snapshot-inherit

2013-02-06 Thread Miao Xie
The argument inherit of btrfs_ioctl_snap_create_transid() was assigned
to NULL during we created the snapshots, so we didn't free it though we
called kfree() in the caller.

But since we are sure the snapshot creation is done after the function -
btrfs_ioctl_snap_create_transid() - completes, it is safe that we don't
assign the pointer inherit to NULL, and just free it in the caller of
btrfs_ioctl_snap_create_transid(). In this way, the code can become more
readable.

Reported-by: Alex Lyakas alex.bt...@zadarastorage.com
Cc: Arne Jansen sensi...@gmx.net
Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/ioctl.c | 18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 02d3035..40f2fbf 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -367,7 +367,7 @@ static noinline int create_subvol(struct btrfs_root *root,
  struct dentry *dentry,
  char *name, int namelen,
  u64 *async_transid,
- struct btrfs_qgroup_inherit **inherit)
+ struct btrfs_qgroup_inherit *inherit)
 {
struct btrfs_trans_handle *trans;
struct btrfs_key key;
@@ -401,8 +401,7 @@ static noinline int create_subvol(struct btrfs_root *root,
if (IS_ERR(trans))
return PTR_ERR(trans);
 
-   ret = btrfs_qgroup_inherit(trans, root-fs_info, 0, objectid,
-  inherit ? *inherit : NULL);
+   ret = btrfs_qgroup_inherit(trans, root-fs_info, 0, objectid, inherit);
if (ret)
goto fail;
 
@@ -530,7 +529,7 @@ fail:
 
 static int create_snapshot(struct btrfs_root *root, struct dentry *dentry,
   char *name, int namelen, u64 *async_transid,
-  bool readonly, struct btrfs_qgroup_inherit **inherit)
+  bool readonly, struct btrfs_qgroup_inherit *inherit)
 {
struct inode *inode;
struct btrfs_pending_snapshot *pending_snapshot;
@@ -549,10 +548,7 @@ static int create_snapshot(struct btrfs_root *root, struct 
dentry *dentry,
pending_snapshot-dentry = dentry;
pending_snapshot-root = root;
pending_snapshot-readonly = readonly;
-   if (inherit) {
-   pending_snapshot-inherit = *inherit;
-   *inherit = NULL;/* take responsibility to free it */
-   }
+   pending_snapshot-inherit = inherit;
 
trans = btrfs_start_transaction(root-fs_info-extent_root, 6);
if (IS_ERR(trans)) {
@@ -692,7 +688,7 @@ static noinline int btrfs_mksubvol(struct path *parent,
   char *name, int namelen,
   struct btrfs_root *snap_src,
   u64 *async_transid, bool readonly,
-  struct btrfs_qgroup_inherit **inherit)
+  struct btrfs_qgroup_inherit *inherit)
 {
struct inode *dir  = parent-dentry-d_inode;
struct dentry *dentry;
@@ -1454,7 +1450,7 @@ out:
 static noinline int btrfs_ioctl_snap_create_transid(struct file *file,
char *name, unsigned long fd, int subvol,
u64 *transid, bool readonly,
-   struct btrfs_qgroup_inherit **inherit)
+   struct btrfs_qgroup_inherit *inherit)
 {
int namelen;
int ret = 0;
@@ -1563,7 +1559,7 @@ static noinline int btrfs_ioctl_snap_create_v2(struct 
file *file,
 
ret = btrfs_ioctl_snap_create_transid(file, vol_args-name,
  vol_args-fd, subvol, ptr,
- readonly, inherit);
+ readonly, inherit);
 
if (ret == 0  ptr 
copy_to_user(arg +
-- 
1.7.11.7
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html