Re: [PATCH] Btrfs-progs: make fsck deal with bogus items

2014-10-03 Thread Mitch Harder
On Fri, Oct 3, 2014 at 9:54 AM, Josef Bacik jba...@fb.com wrote:
 We can deal with corrupt items by deleting them in a few cases.  Fsck can 
 easily
 recover from a missing extent item or a dir index item.  So if we notice a 
 item
 is completely bogus and it is of a key that we know we can repair then just
 delete it and carry on.  Thanks,

 Signed-off-by: Josef Bacik jba...@fb.com
 ---
  cmds-check.c |  45 
 +++
  tests/fsck-tests/005-bad-item-offset.img | Bin 0 - 398336 bytes
  2 files changed, 45 insertions(+)
  create mode 100644 tests/fsck-tests/005-bad-item-offset.img


It looks like tests/fsck-tests/005-bad-item-offset.img was added
unintentionally to this patch.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Questions on using BtrFS for fileserver

2014-08-19 Thread Mitch Harder
On Tue, Aug 19, 2014 at 11:21 AM, M G Berberich
bt...@oss.m-berberich.de wrote:
 Hello,

 we are thinking about using BtrFS on standard hardware for a
 fileserver with about 50T (100T raw) of storage (25×4TByte).


I would recommend carefully reading this thread titled: 1 week to
rebuid 4x 3TB raid10 is a long time!

http://comments.gmane.org/gmane.comp.file-systems.btrfs/36969

There are multiple methods for replacing a device in a Btrfs RAID
array.  If I understand the conclusions of this thread, you might
still expect 12-14 hours to rebuild after replacing a 4 TByte device,
assuming you use the optimal replace commands.

With 25 devices, that leaves an uncomfortable period of time where
another device might fail.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 40TB volume taking over 16 hours to mount, any ideas?

2014-08-09 Thread Mitch Harder
On Sat, Aug 9, 2014 at 11:21 PM, Duncan 1i5t5.dun...@cox.net wrote:

   But from rc5 on thru rc7 or 8 and
 release, unless you're one of the ones still waiting on a bug found
 earlier to be fixed, it's generally quite stable and boring.

 So by the time of actual .0 release, it really is quite stable, and no
 longer development kernel.  Sure, Greg KH's stable series kernel releases
 stabilize it further, but that's exactly what they are, stable series,
 not development series, and there's really no development going into it
 generally from rc1 on, tho occasionally something that needs to come
 after everything else is slipped in in the first couple days after rc1,
 but still well before rc2, and the .0 release signifies the end of the
 post development stabilization period such that .0 really is no longer a
 development kernel at all, even if there are a few more weekly stable-
 series updates (about 10, 3.15.10 was announced to be the last one for
 3.15, with the Friday-released 3.15.9) before support ceases if it's not
 a long-term-stable candidate.


I can't say I've observed that to be the case with Btrfs.  I know
there is a core group of developers working very hard on testing the
Btrfs updates in the _rc kernels, but once that .0 kernel hits the
streets, the extra exposure to all the various combinations of
hardware and options has been know to discover new issues.  I think
this is nearly unavoidable given the pace of Btrfs development.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ENOSPC with mkdir and rename

2014-08-04 Thread Mitch Harder
On Mon, Aug 4, 2014 at 9:47 AM, Russell Coker russ...@coker.com.au wrote:
 If you regularly run a scrub with options such as -dusage=50 -musage=10 then
 the amount of free space in metadata chunks will tend to be a lot greater than
 that in data chunks.


Just to clarify for posterity, I'm pretty sure you meant 'balance'
with -dusage=50 -musage=10 instead of 'scrub'.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ENOSPC with mkdir and rename

2014-08-02 Thread Mitch Harder
On Sat, Aug 2, 2014 at 6:35 PM, Peter Waller pe...@scraperwiki.com wrote:
 Hi All,

 My TL;DR questions are at the bottom, before the stack trace.

 I'm running Ubuntu 14.04. I wonder if this problem is related to the
 thread titled Machine lockup due to btrfs-transaction on AWS EC2
 Ubuntu 14.04 which I started on the 29th of July:

 http://thread.gmane.org/gmane.comp.file-systems.btrfs/37224

 Kernel: 3.15.7-031507-generic

 I'm on a single block device system, i.e, no RAID.

 I was observing ENOSPC from `mkdir` and `rename` on this system, with
 a good amount of free disk space (df -h reports 62 GB remain). I added
 enospc_debug (full umount/mount, not just mount -o remount), but this
 had no apparent effect when receiving ENOSPC from userland.

 $ sudo btrfs fi df /path/to/volume
 Data, single: total=489.97GiB, used=427.75GiB
 System, DUP: total=8.00MiB, used=60.00KiB
 System, single: total=4.00MiB, used=0.00
 Metadata, DUP: total=5.00GiB, used=4.50GiB
 Metadata, single: total=8.00MiB, used=0.00
 unknown, single: total=512.00MiB, used=820.00KiB

 After a thorough search of the internet for ENOSPC BTRFS I found
 various resources and came to understand a little bit more. One thing
 which broke my intuition severely is that I expected if there is a
 large number of free GiB, I should expect things to continue to work.

 In this case, for example, metadata has 0.5GiB free (sounds like
 plenty for metadata for one mkdir to me). Data has 62GiB free. Why
 would I get ENOSPC for a file rename?

 I expected that if metadata needed more space, it would just eat it
 from the 'data'. Now I believe this not to be the case and that it
 wanted to allocate  0.5GiB, and this is why I was getting ENOSPC.

 I tried a rebalance with btrfs balance start -dusage=10 and tried
 increasing the value until I saw reallocations in dmesg.

 This spat out a large number of messages in dmesg, of this form:

 [376096.546353] BTRFS info (device dm-0): relocating block group 
 530457821184 flags 1
 [376010.736879] BTRFS info (device dm-0): 40 enospc errors during balance

 (and a full stack trace at the end of this message).

 The rebalance printed:

 ERROR: error during balancing '/path/to/volume' - No space left on device
 There may be more info in syslog - try dmesg | tail

 Eventually, not knowing what else to do I had to take my escape hatch
 and enlarge the volume. When I did this, metadata grew by 1GiB:

 Data, single: total=490.97GiB, used=427.75GiB
 System, DUP: total=8.00MiB, used=60.00KiB
 System, single: total=4.00MiB, used=0.00
 Metadata, DUP: total=5.50GiB, used=4.50GiB
 Metadata, single: total=8.00MiB, used=0.00
 unknown, single: total=512.00MiB, used=0.00

 A few questions:

 * Why didn't the metadata grow before enlarging the disk?
 * Why didn't the rebalance enable the metadata to grow?
 * Why is it necessary to rebalance? Can't it automatically take some
 free space from 'data'?
 * Are my machine lockups related to the fact I was low on space?
 * Can we improve the documentation/FAQ for this? I was scratching my
 head in particular because my notion of free space definitely does not
 match up with BTRFS', and I didn't find the FAQ very helpful for
 getting out of this mess.
 * It isn't documented on the wiki what enospc_debug is supposed to do,
 so I couldn't tell whether I should have expected it to tell me
 anything in my circumstances.
 * What is the best course of action to take (other than enlarging the
 disk or deleting files) if I encounter this situation again?


Looking at this line:

 Data, single: total=489.97GiB, used=427.75GiB

I see that btrfs has allocated almost the entire disk to Data, and it
appears you are starved for Metadata room.

Once btrfs allocates space for either Data or Metadata, there are
currently no build-in kernel mechanisms re-allocate that space.  We
have to use the userland balance tools.

I agree that this behavior can become a gotcha.  Btrfs has the
capability to run in a mode where Data and Metadata are combined, but
there is a speed penalty running in Mixed Data/Metadata mode.

The btrfs balance tools have to ability to use filters to run a
quicker pass on just the mostly-empty blocks, skipping a full balance.

https://btrfs.wiki.kernel.org/index.php/Balance_Filters

I would suggest this as the next step.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What to do about snapshot-aware defrag

2014-06-01 Thread Mitch Harder
On Sat, May 31, 2014 at 6:51 PM, Brendan Hide bren...@swiftspirit.co.za wrote:
 On 2014/05/31 12:00 AM, Martin wrote:

 OK... I'll jump in...

 On 30/05/14 21:43, Josef Bacik wrote:

 [snip]

 Option 1: Only relink inodes that haven't changed since the snapshot was
 taken.

 Pros:
 -Faster
 -Simpler
 -Less duplicated code, uses existing functions for tricky operations so
 less likely to introduce weird bugs.

 Cons:
 -Could possibly lost some of the snapshot-awareness of the defrag.  If
 you just touch a file we would not do the relinking and you'd end up
 with twice the space usage.

 [...]


 Obvious way to go for fast KISS.


 I second this - KISS is better.

 Would in-band dedupe resolve the issue with losing the snapshot-awareness
 of the defrag? I figure that if someone absolutely wants everything deduped
 efficiently they'd put in the necessary resources (memory/dedicated SSD/etc)
 to have in-band dedupe work well.

 One question:

 Will option one mean that we always need to mount with noatime or
 read-only to allow snapshot defragging to do anything?



When snapshot-aware defrag first came out, I was convinced it was a
must-have capability for nearly everybody using btrfs.  But, the
more I look at my work load and common practices with btrfs, the more
I am wondering just how often snapshot-aware defrag was actually doing
something for me.

I use a lot of snapshots.  But for the most part, once I touch a file
in my current subvolume, the whole file needs to be COW-ed from it's
previous version.

Now that we have a working sysfs, I wonder if we could implement some
counters to track how often snapshot-aware defrag would have run.  I
might be surprised at how much it was doing.

---
Regards,
Mitch Harder
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid0 vs single, and should we allow -mdup by default on SSDs?

2014-05-07 Thread Mitch Harder
On Wed, May 7, 2014 at 3:52 AM, Marc MERLIN m...@merlins.org wrote:
 On Wed, May 07, 2014 at 09:29:41AM +0100, Hugo Mills wrote:
 On Wed, May 07, 2014 at 01:18:40AM -0700, Marc MERLIN wrote:
  On Tue, May 06, 2014 at 07:39:12PM +, Duncan wrote:
   That appears to be a very good use of either -d raid0 or -d single, yes.
   And since you're apparently not streaming such high resolution video that
   you NEED the raid0, single does indeed give you a somewhat better chance
   at recovery.
 
  zoneminder saves 'video' as a stream of independent small jpegs, so I'm
  good. Actually come to think of it they're so small that they probably
  all ended up in the raid1 metadata. That also means that I'm not getting
  twice the storage space like I planned to. Oh well...

There's a mount option to change the threshold at which files are
 inlined in metadata: maxinline=bytes. You could play with that for
 this particular use-case.

 Oh cool, thank you.


Since each non-inlined file will occupy a minimum of 4k, you may find
that inlining will still save space even if it is duplicated.

Even if they are duplicated in the metadata under RAID1, inlining a
bunch of 256 byte files will still be more space efficient than
storing them as regular files.

But if most of the files are in the 2k-3k range, you may be more
efficient to store them as files.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to view transaction log chronologically, human-readable?

2014-04-20 Thread Mitch Harder
On Sat, Apr 19, 2014 at 2:45 PM, Marcel Partap mpar...@gmx.net wrote:
 This is the BTRFS development list, right? Someone here should know how
 to achieve this I hope?
 #Regards

 On 01/03/14 02:21, Marcel Partap wrote:
 Dear BTFRS devs,
 I have a 1TB btrfs volume mounted read-only since two years because I
 deleted a bunch of files and didn't want to give up on them.
 Now with latest btrfs-find-root and btrfs restore --dry-run -t in a
 loop, I generated the full list of files contained in the last several
 hundred root trees. However, diffing these, I find the current one being
 the same until 94 root trees back, and the ones before contain earlier
 changes. Maybe by my own fault that is..whatever.

 Is there a way to just view the transaction history in a human-readable way?

 #Regards


I am not a dev, but since BTRFS utilizes a COW (Copy On Write)
architecture, it doesn't keep a journal or history of transactions
that can be unwound.

With respect to un-deleting files on BTRFS, the btrfs-find-root/'btrfs
restore' combination are the most effective user-space tools I know
of.

It sounds like you've effectively tried this manually, but here's a
link to an btrfs undelete script that also makes use of
btrfs-find-root and 'btrfs restore':

http://comments.gmane.org/gmane.comp.file-systems.btrfs/22560
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3.15-rc2] btrfs: replace error code from btrfs_drop_extents

2014-04-15 Thread Mitch Harder
On Tue, Apr 15, 2014 at 11:50 AM, David Sterba dste...@suse.cz wrote:
 There's a case which clone does not handle and used to BUG_ON instead,
 (testcase xfstests/btrfs/035), now returns EINVAL. This error code is
 confusing to the ioctl caller, as it normally signifies errorneous
 arguments.

 Change it to ENOPNOTSUPP which allows a fall back to copy instead of
 clone. This does not affect the common reflink operation.


Minor spelling error in the commit message, you clearly mean
EOPNOTSUPP, not ENOPNOTSUPP.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [RFC] btrfs-progs: Expand BUG_ON/WARN_ON Macros

2014-02-25 Thread Mitch Harder
I'm providing this patch as an example of how to expand the
BUG_ON/WARN_ON macros to provide more information or extra
capabilities.

Josef Bacik has been working on working with a user on IRC
to recover data from a btrfs volume, and the 'work-in-progress'
solution involved expanding the BUG_ON/WARN_ON macros in a
different method that would lose the information on where
the BUG_ON/WARN_ON occured.

When the macro is structured like this patch, it will still
provide the location of the BUG_ON/WARN_ON in the code.

This patch also highlights that BUG_ON and WARN_ON are the
same thing in btrfs-progs.  All WARN_ONs are treated the same
as BUG_ONs, and the program is halted.

Should we convert all our btrfs-progs WARN_ONs to BUG_ONs to
allow us to implement a true WARN_ON functionality?

Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org
---
 kerncompat.h | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/kerncompat.h b/kerncompat.h
index f370cd8..79661f5 100644
--- a/kerncompat.h
+++ b/kerncompat.h
@@ -233,9 +233,19 @@ static inline long IS_ERR(const void *ptr)
 #define kstrdup(x, y) strdup(x)
 #define kfree(x) free(x)
 
-#define BUG_ON(c) assert(!(c))
-#define WARN_ON(c) assert(!(c))
+#define BUG_ON(c) do { \
+   if (c) { \
+   fprintf(stderr, BUG_ON!\n); \
+   assert(!(c)); \
+   } \
+} while (0)
 
+#define WARN_ON(c) do { \
+   if (c) { \
+   fprintf(stderr, WARN_ON!\n); \
+   assert(!(c)); \
+   } \
+} while (0)
 
 #define container_of(ptr, type, member) ({  \
 const typeof( ((type *)0)-member ) *__mptr = (ptr);\
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: fsck: fix wrong return value in check_block()

2014-02-25 Thread Mitch Harder
On Mon, Feb 24, 2014 at 7:38 PM, Wang Shilong
wangsl.f...@cn.fujitsu.com wrote:
 Hi Mitch,


 On 02/25/2014 07:03 AM, Mitch Harder wrote:

 On Mon, Feb 24, 2014 at 5:55 AM, Wang Shilong
 wangsl.f...@cn.fujitsu.com wrote:

 We found btrfsck will output backrefs mismatch while the filesystem
 is defenitely ok.

 The problem is that check_block() don't return right value,which
 makes btrfsck won't walk all tree blocks thus we don't get a consistent
 filesystem, we will fail to check extent refs etc.

 Reported-by: Gui Hecheng guihc.f...@cn.fujitsu.com
 Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
 ---
   cmds-check.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/cmds-check.c b/cmds-check.c
 index a2afae6..253569f 100644
 --- a/cmds-check.c
 +++ b/cmds-check.c
 @@ -2477,7 +2477,7 @@ static int check_block(struct btrfs_trans_handle
 *trans,
  struct cache_extent *cache;
  struct btrfs_key key;
  enum btrfs_tree_block_status status;
 -   int ret = 1;
 +   int ret = 0;
  int level;

  cache = lookup_cache_extent(extent_cache, buf-start, buf-len);
 --

 I tried this fix on a broken btrfs volume I've been trying to repair,
 and it seemed to put me in an infinite loop.

 I agree that something seems wrong with the way the caller of
 check_block uses the return value, and I also noticed that it seemed
 to exit before walking all the tree blocks.

 But I think the problem is more subtle than flipping the default ret
 value from 1 to 0.

 No, not really even though i know there are other problems with fsck repair
 mode.
 But this problem should be fixed and pushed into btrfs-progsv3.13.(Notice,
 the below problem did not exist in btrfs-progsv3.12)

 An easy way to trigger this problem:

 # mkfs.btrfs -f /dev/sda9
 # mount /dev/sda9 /mnt
 # dd if=/dev/zero of=/mnt/data bs=4k count=10240 oflag=direct
 # btrfs sub snapshot /mnt /mnt/snap1
 # btrfs sub snapshot /mnt /mnt/snap2
 # umount /mnt
 # btrfs check /dev/sda9

 After applying this patch, the above problems did not exist.
 Feel free to correct me if i miss something here.^_^


I took a closer look at the check_block function today, and it looks
to me like the problem is that the return value is not modified when
BTRFS_BLOCK_FLAG_FULL_BACKREF is set.

@@ -2521,14 +2521,17 @@ static int check_block(struct btrfs_trans_handle *trans,
 }
 } else {
 rec-content_checked = 1;
-if (flags  BTRFS_BLOCK_FLAG_FULL_BACKREF)
+if (flags  BTRFS_BLOCK_FLAG_FULL_BACKREF) {
 rec-owner_ref_checked = 1;
+ret = 0;
+}
 else {
 ret = check_owner_ref(root, rec, buf);
 if (!ret)
 rec-owner_ref_checked = 1;
 }

For me, in this function I would lean towards an initial return value
that must be updated by having check_block() make an affirmative
PASS/FAIL decision on the block.

What do you think about something like this?

diff --git a/cmds-check.c b/cmds-check.c
index ffc5d3e..55070da 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -2477,7 +2477,7 @@ static int check_block(struct btrfs_trans_handle *trans,
 struct cache_extent *cache;
 struct btrfs_key key;
 enum btrfs_tree_block_status status;
-int ret = 1;
+int ret = -EINVAL;
 int level;

 cache = lookup_cache_extent(extent_cache, buf-start, buf-len);
@@ -2521,14 +2521,17 @@ static int check_block(struct btrfs_trans_handle *trans,
 }
 } else {
 rec-content_checked = 1;
-if (flags  BTRFS_BLOCK_FLAG_FULL_BACKREF)
+if (flags  BTRFS_BLOCK_FLAG_FULL_BACKREF) {
 rec-owner_ref_checked = 1;
+ret = 0;
+}
 else {
 ret = check_owner_ref(root, rec, buf);
 if (!ret)
 rec-owner_ref_checked = 1;
 }
 }
+BUG_ON(ret == -EINVAL);
 if (!ret)
 maybe_free_extent_rec(extent_cache, rec);
 return ret;
--
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: fsck: fix wrong return value in check_block()

2014-02-24 Thread Mitch Harder
On Mon, Feb 24, 2014 at 5:55 AM, Wang Shilong
wangsl.f...@cn.fujitsu.com wrote:
 We found btrfsck will output backrefs mismatch while the filesystem
 is defenitely ok.

 The problem is that check_block() don't return right value,which
 makes btrfsck won't walk all tree blocks thus we don't get a consistent
 filesystem, we will fail to check extent refs etc.

 Reported-by: Gui Hecheng guihc.f...@cn.fujitsu.com
 Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
 ---
  cmds-check.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/cmds-check.c b/cmds-check.c
 index a2afae6..253569f 100644
 --- a/cmds-check.c
 +++ b/cmds-check.c
 @@ -2477,7 +2477,7 @@ static int check_block(struct btrfs_trans_handle *trans,
 struct cache_extent *cache;
 struct btrfs_key key;
 enum btrfs_tree_block_status status;
 -   int ret = 1;
 +   int ret = 0;
 int level;

 cache = lookup_cache_extent(extent_cache, buf-start, buf-len);
 --

I tried this fix on a broken btrfs volume I've been trying to repair,
and it seemed to put me in an infinite loop.

I agree that something seems wrong with the way the caller of
check_block uses the return value, and I also noticed that it seemed
to exit before walking all the tree blocks.

But I think the problem is more subtle than flipping the default ret
value from 1 to 0.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix max_inline mount option

2014-02-13 Thread Mitch Harder
Currently, the only mount option for max_inline that has any effect is
max_inline=0.  Any other value that is supplied to max_inline will be
adjusted to a minimum of 4k.  Since max_inline has an effective maximum
of ~3900 bytes due to page size limitations, the current behaviour
only has meaning for max_inline=0.

This patch will allow the the max_inline mount option to accept non-zero
values as indicated in the documentation.

Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org
---
 fs/btrfs/super.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 97cc241..e73c80e 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -566,7 +566,7 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options)
kfree(num);
 
if (info-max_inline) {
-   info-max_inline = max_t(u64,
+   info-max_inline = min_t(u64,
info-max_inline,
root-sectorsize);
}
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: Remove superfluous BUG_ON check.

2014-02-11 Thread Mitch Harder
The function call that set the ret parameter evaluated in this
BUG_ON was removed in a previous commit:
11be10f71e1af5256f221feb9e91300b3e28bbef
Btrfs-progs: make fsck fix certain file extent inconsistencies

Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org
---
 cmds-check.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/cmds-check.c b/cmds-check.c
index eef7c6c..ffc5d3e 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -4037,7 +4037,6 @@ static int run_next_block(struct btrfs_trans_handle 
*trans,
parent, owner, key.objectid, key.offset -
btrfs_file_extent_offset(buf, fi), 1, 1,
btrfs_file_extent_disk_num_bytes(buf, fi));
-   BUG_ON(ret);
}
} else {
int level;
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: Change BUG() to use assert.

2014-02-07 Thread Mitch Harder
Change the definition of BUG() to use assert instead of abort to
provide information about the location of the issue.

Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org
---
 kerncompat.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kerncompat.h b/kerncompat.h
index 1fc2b34..f370cd8 100644
--- a/kerncompat.h
+++ b/kerncompat.h
@@ -50,7 +50,7 @@
 #define ULONG_MAX   (~0UL)
 #endif
 
-#define BUG() abort()
+#define BUG() assert(0)
 #ifdef __CHECKER__
 #define __force__attribute__((force))
 #define __bitwise__ __attribute__((bitwise))
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: Preserve process_one_leaf return value.

2014-02-07 Thread Mitch Harder
The return value in process_one_leaf could be over-written while
looping over the items in the leaf.

This patch will preserve a non-zero return value to the calling
function if a non-zero return value is encountered in the loop.

The return value of one (1) is consistent with non-zero values
that could be returned while processing the leaf.

The only caller of this function (walk_down_tree) would ignore
the return value anyway.  But this patch will correct the
behaviour in case future changes intend to utilize the return
value.

Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org
---
 cmds-check.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/cmds-check.c b/cmds-check.c
index 2911af0..eef7c6c 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -1219,6 +1219,7 @@ static int process_one_leaf(struct btrfs_root *root, 
struct extent_buffer *eb,
u32 nritems;
int i;
int ret = 0;
+   int error = 0;
struct cache_tree *inode_cache;
struct shared_node *active_node;
 
@@ -1268,8 +1269,10 @@ static int process_one_leaf(struct btrfs_root *root, 
struct extent_buffer *eb,
default:
break;
};
+   if (ret != 0)
+   error = 1;
}
-   return ret;
+   return error;
 }
 
 static void reada_walk_down(struct btrfs_root *root,
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: Convert BUG() to BUG_ON(1)

2014-02-06 Thread Mitch Harder
Convert the instances of BUG() to BUG_ON(1) to provide information
about the location of the abort.

Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org
---
 btrfs-debug-tree.c |  4 ++--
 ctree.c| 20 ++--
 ctree.h|  2 +-
 disk-io.c  |  4 ++--
 extent-tree.c  |  6 +++---
 extent_io.c|  2 +-
 file-item.c|  4 ++--
 print-tree.c   |  8 
 volumes.c  |  4 ++--
 9 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/btrfs-debug-tree.c b/btrfs-debug-tree.c
index f37de9d..0180265 100644
--- a/btrfs-debug-tree.c
+++ b/btrfs-debug-tree.c
@@ -68,10 +68,10 @@ static void print_extents(struct btrfs_root *root, struct 
extent_buffer *eb)
 btrfs_node_ptr_generation(eb, i));
if (btrfs_is_leaf(next) 
btrfs_header_level(eb) != 1)
-   BUG();
+   BUG_ON(1);
if (btrfs_header_level(next) !=
btrfs_header_level(eb) - 1)
-   BUG();
+   BUG_ON(1);
print_extents(root, next);
free_extent_buffer(next);
}
diff --git a/ctree.c b/ctree.c
index 9e5b30f..7aab3b1 100644
--- a/ctree.c
+++ b/ctree.c
@@ -822,7 +822,7 @@ static int balance_level(struct btrfs_trans_handle *trans,
check_block(root, path, level);
if (orig_ptr !=
btrfs_node_blockptr(path-nodes[level], path-slots[level]))
-   BUG();
+   BUG_ON(1);
 enospc:
if (right)
free_extent_buffer(right);
@@ -1425,9 +1425,9 @@ static int insert_ptr(struct btrfs_trans_handle *trans, 
struct btrfs_root
lower = path-nodes[level];
nritems = btrfs_header_nritems(lower);
if (slot  nritems)
-   BUG();
+   BUG_ON(1);
if (nritems == BTRFS_NODEPTRS_PER_BLOCK(root))
-   BUG();
+   BUG_ON(1);
if (slot != nritems) {
memmove_extent_buffer(lower,
  btrfs_node_key_ptr_offset(slot + 1),
@@ -2213,7 +2213,7 @@ split:
ret = 0;
if (btrfs_leaf_free_space(root, leaf)  0) {
btrfs_print_leaf(root, leaf);
-   BUG();
+   BUG_ON(1);
}
kfree(buf);
return ret;
@@ -2311,7 +2311,7 @@ int btrfs_truncate_item(struct btrfs_trans_handle *trans,
ret = 0;
if (btrfs_leaf_free_space(root, leaf)  0) {
btrfs_print_leaf(root, leaf);
-   BUG();
+   BUG_ON(1);
}
return ret;
 }
@@ -2337,7 +2337,7 @@ int btrfs_extend_item(struct btrfs_trans_handle *trans,
 
if (btrfs_leaf_free_space(root, leaf)  data_size) {
btrfs_print_leaf(root, leaf);
-   BUG();
+   BUG_ON(1);
}
slot = path-slots[0];
old_data = btrfs_item_end_nr(leaf, slot);
@@ -2374,7 +2374,7 @@ int btrfs_extend_item(struct btrfs_trans_handle *trans,
ret = 0;
if (btrfs_leaf_free_space(root, leaf)  0) {
btrfs_print_leaf(root, leaf);
-   BUG();
+   BUG_ON(1);
}
return ret;
 }
@@ -2406,7 +2406,7 @@ int btrfs_insert_empty_items(struct btrfs_trans_handle 
*trans,
 
/* create a root if there isn't one */
if (!root-node)
-   BUG();
+   BUG_ON(1);
 
total_size = total_data + nr * sizeof(struct btrfs_item);
ret = btrfs_search_slot(trans, root, cpu_key, path, total_size, 1);
@@ -2425,7 +2425,7 @@ int btrfs_insert_empty_items(struct btrfs_trans_handle 
*trans,
btrfs_print_leaf(root, leaf);
printk(not enough freespace need %u have %d\n,
   total_size, btrfs_leaf_free_space(root, leaf));
-   BUG();
+   BUG_ON(1);
}
 
slot = path-slots[0];
@@ -2484,7 +2484,7 @@ int btrfs_insert_empty_items(struct btrfs_trans_handle 
*trans,
 
if (btrfs_leaf_free_space(root, leaf)  0) {
btrfs_print_leaf(root, leaf);
-   BUG();
+   BUG_ON(1);
}
 
 out:
diff --git a/ctree.h b/ctree.h
index a9c67b2..101389b 100644
--- a/ctree.h
+++ b/ctree.h
@@ -1519,7 +1519,7 @@ static inline u32 btrfs_extent_inline_ref_size(int type)
if (type == BTRFS_EXTENT_DATA_REF_KEY)
return sizeof(struct btrfs_extent_data_ref) +
   offsetof(struct btrfs_extent_inline_ref, offset);
-   BUG();
+   BUG_ON(1);
return 0;
 }
 
diff --git a/disk-io.c b/disk-io.c
index e840177..2a6c68f 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -349,10 +349,10 @@ static int write_tree_block(struct btrfs_trans_handle 
*trans,
 struct extent_buffer *eb)
 {
if (check_tree_block(root, eb))
-   BUG();
+   BUG_ON(1

Re: [PATCH] btrfs-progs: Convert BUG() to BUG_ON(1)

2014-02-06 Thread Mitch Harder
On Thu, Feb 6, 2014 at 3:22 PM, David Sterba dste...@suse.cz wrote:
 On Thu, Feb 06, 2014 at 12:34:08PM -0600, Mitch Harder wrote:
 Convert the instances of BUG() to BUG_ON(1) to provide information
 about the location of the abort.

 kerncompat.h:

 #define BUG() abort()

 #define BUG_ON(c) assert(!(c))

 I'd rather fix the definition to do the same thing, that way no
 developer would need to know the difference (that actually exists only
 in the userspace tools, in kernel the two produce the same outout).


 david

Thanks for the feedback.

Changing the definition of BUG() in kerncompat.h will be much more concise.

I'll restructure the patch and resubmit it after I test it.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs-endio-wri: page allocation failure

2014-01-16 Thread Mitch Harder
I received a btrfs page allocation failure on my 3.12.7 kernel which
is merged with Chris' for-linus branch for the 3.13_rc kernel.

I have several btrfs partitions mounted, but I believe this error is
on my btrfs root partition.

Several things were going on at the same time on this partition.  I
have a snapshot script creating and deleting snapshots of the root
partition.  I was also compiling an application, and running Firefox.

I know the snapshots may be a problem area.  The snapshot script is
currently running with about 550 snapshots of the root partition.  It
adds snapshots every 180 seconds, and removes the oldest snapshots
based on available disk space.

So far, I haven't encountered a crash.

Since this is my root partition, I'll have to reboot to check for corruption.

[111575.089533] btrfs-endio-wri: page allocation failure: order:4, mode:0x104050
[111575.089543] CPU: 1 PID: 14414 Comm: btrfs-endio-wri Tainted: G
C   3.12.7-git-local #1
[111575.089546] Hardware name: Dell Inc. OptiPlex 745
   /0WF810, BIOS 2.6.4  03/01/2010
[111575.089550]  00104050 88007484f6f8 81642878
88007f30eaf8
[111575.089556]  0001 88007484f788 810d27cd
81ca5d28
[111575.089561]  0010 8800fff0 810d4c86
8840
[111575.089566] Call Trace:
[111575.089578]  [81642878] dump_stack+0x46/0x58
[111575.089584]  [810d27cd] warn_alloc_failed+0x115/0x129
[111575.089589]  [810d4c86] ? drain_local_pages+0x16/0x18
[111575.089594]  [810d5145] __alloc_pages_nodemask+0x47a/0x84d
[111575.089620]  [a018dd01] ? balance_level+0x666/0x6e8 [btrfs]
[111575.089626]  [810d552f] __get_free_pages+0x17/0x44
[111575.089631]  [810e7e81] kmalloc_order_trace+0x2e/0x90
[111575.089637]  [8110b1fc] __kmalloc_track_caller+0x3f/0x12c
[111575.089653]  [a01f8e5c] ? ulist_add_merge+0xe6/0x153 [btrfs]
[111575.089659]  [810e401e] krealloc+0x57/0x91
[111575.089674]  [a01f8e5c] ulist_add_merge+0xe6/0x153 [btrfs]
[111575.089689]  [a01f7b8b] find_parent_nodes+0x494/0x57e [btrfs]
[111575.089705]  [a01f7d12] btrfs_find_all_roots+0x81/0xdc [btrfs]
[111575.089721]  [a01f8589] iterate_extent_inodes+0x12f/0x2c4 [btrfs]
[111575.089737]  [a01aec83] ? record_extent_backrefs+0xa7/0xa7 [btrfs]
[111575.089754]  [a01aec83] ? record_extent_backrefs+0xa7/0xa7 [btrfs]
[111575.089770]  [a01f87a2]
iterate_inodes_from_logical+0x84/0x9a [btrfs]
[111575.089787]  [a01aec3c] record_extent_backrefs+0x60/0xa7 [btrfs]
[111575.089804]  [a01b7515]
btrfs_finish_ordered_io+0x780/0x87d [btrfs]
[111575.089809]  [810d09cf] ? mempool_free_slab+0x17/0x19
[111575.089826]  [a01b7627] finish_ordered_fn+0x15/0x17 [btrfs]
[111575.089843]  [a01d3153] worker_loop+0x13d/0x4a2 [btrfs]
[111575.089860]  [a01d3016] ? btrfs_queue_worker+0x267/0x267 [btrfs]
[111575.089865]  [81053779] kthread+0xba/0xc2
[111575.089870]  [810536bf] ? kthread_freezable_should_stop+0x4d/0x4d
[111575.089875]  [81649dac] ret_from_fork+0x7c/0xb0
[111575.089879]  [810536bf] ? kthread_freezable_should_stop+0x4d/0x4d
[111575.089882] Mem-Info:
[111575.089884] DMA per-cpu:
[111575.089887] CPU0: hi:0, btch:   1 usd:   0
[111575.089890] CPU1: hi:0, btch:   1 usd:   0
[111575.089892] DMA32 per-cpu:
[111575.089895] CPU0: hi:  186, btch:  31 usd:  27
[111575.089897] CPU1: hi:  186, btch:  31 usd:   0
[111575.089904] active_anon:169762 inactive_anon:52853 isolated_anon:0
 active_file:115654 inactive_file:114252 isolated_file:0
 unevictable:0 dirty:1795 writeback:0 unstable:0
 free:22811 slab_reclaimable:19321 slab_unreclaimable:4379
 mapped:15644 shmem:10186 pagetables:1982 bounce:0
 free_cma:0
[111575.089918] DMA free:8264kB min:352kB low:440kB high:528kB
active_anon:1496kB inactive_anon:1556kB active_file:1680kB
inactive_file:1692kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:15992kB managed:15968kB mlocked:0kB
dirty:0kB writeback:0kB mapped:640kB shmem:300kB
slab_reclaimable:420kB slab_unreclaimable:128kB kernel_stack:24kB
pagetables:88kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? no
[111575.089920] lowmem_reserve[]: 0 1971 1971 1971
[111575.089932] DMA32 free:82980kB min:44700kB low:55872kB
high:67048kB active_anon:677552kB inactive_anon:209856kB
active_file:460936kB inactive_file:455316kB unevictable:0kB
isolated(anon):0kB isolated(file):0kB present:2070524kB
managed:2022852kB mlocked:0kB dirty:7180kB writeback:0kB
mapped:61936kB shmem:40444kB slab_reclaimable:76864kB
slab_unreclaimable:17388kB kernel_stack:2032kB pagetables:7840kB
unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
[111575.089935] lowmem_reserve[]: 0 0 0 0
[111575.089940] DMA: 32*4kB (UEM) 32*8kB (UE) 11*16kB (U) 

Re: btrfs-endio-wri: page allocation failure

2014-01-16 Thread Mitch Harder
On Thu, Jan 16, 2014 at 8:03 PM, Mitch Harder
mitch.har...@sabayonlinux.org wrote:
 I received a btrfs page allocation failure on my 3.12.7 kernel which
 is merged with Chris' for-linus branch for the 3.13_rc kernel.

 I have several btrfs partitions mounted, but I believe this error is
 on my btrfs root partition.

 Several things were going on at the same time on this partition.  I
 have a snapshot script creating and deleting snapshots of the root
 partition.  I was also compiling an application, and running Firefox.

 I know the snapshots may be a problem area.  The snapshot script is
 currently running with about 550 snapshots of the root partition.  It
 adds snapshots every 180 seconds, and removes the oldest snapshots
 based on available disk space.

 So far, I haven't encountered a crash.

 Since this is my root partition, I'll have to reboot to check for corruption.


The partition still mounts, and so far I can access everything I
spot-check, but btrfsck is reporting the following errors:

Checking filesystem on /dev/sda3
UUID: 1050ccb5-58ae-4479-9e12-2230a7b0097a
checking extents
checking free space cache
checking fs roots
checking csums
There are no extents for csum range 2267451392-2267521024
Csum exists for 2267451392-2267521024 but there is no extent record
There are no extents for csum range 10636697600-10636836864
Csum exists for 10636697600-10636836864 but there is no extent record
found 5015120900 bytes used err is 2
total csum bytes: 10233048
total tree bytes: 2166587392
total fs tree bytes: 2043346944
total extent tree bytes: 108380160
btree space waste bytes: 483349153
file data blocks allocated: 93115641856
 referenced 99033673728
Btrfs v3.12
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck failes

2014-01-15 Thread Mitch Harder
On Mon, Jan 13, 2014 at 6:37 PM, Chris Murphy li...@colorremedies.com wrote:

 On Jan 13, 2014, at 3:58 PM, Holger Brandsmeier brandsme...@gmail.com wrote:

 Currently btrfsck failes to repair my partition, I get the output:

 [root@ho-think bholger]# btrfsck --repair /dev/sda5

 This is almost the last resort and you probably should be posting to the list 
 before using repair.



This is like saying:

Yes, btrfs does now have a working btrfsck, but only for the select
few who manage to get through on the mailing list for support.

I'd like to think that's not the case.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel BUG on Snapshot Deletion (3.11.0-rc5)

2013-08-23 Thread Mitch Harder
On Fri, Aug 23, 2013 at 3:48 AM, Stefan Behrens
sbehr...@giantdisaster.de wrote:
 On Wed, 21 Aug 2013 08:44:55 -0500, Mitch Harder wrote:
 I've had a hard time assembling a portable reproducer for this issue.

 I discovered that my reproducer was highly dependent on a local
 archive of out-of-date git kernel sources.  My efforts to reproduce
 the error with a portable set of scripts with publicly available
 kernel git sources weren't successful.

 It seems like this issue is related to a corner-case workload that is
 difficult to reproduce.

 So I've bisected the error I was seeing with my local script, and
 identified the following commit as triggering my issue:

 commit:3c64a1aba7cfcb04f79e76f859b3d0275d59
 Btrfs: cleanup: don't check the same thing twice
 https://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/fs/btrfs?h=for-linusid=3c64a1aba7cfcb04

 I tested a kernel which reverted this change, and also added WARN_ON
 lines to provide a back trace.
 [...]
 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index cd46e2c..a1091f7 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -2302,6 +2302,12 @@ static noinline int
 relink_extent_backref(struct btrfs_path *path,
  return 0;
  return PTR_ERR(root);
  }
 +if (btrfs_root_refs(root-root_item) == 0) {
 +srcu_read_unlock(fs_info-subvol_srcu, index);
 +/* parse ENOENT to 0 */
 +WARN_ON(1);
 +return 0;
 +}
 [...]
 [ 1616.886868] [ cut here ]
 [ 1616.886912] WARNING: at fs/btrfs/inode.c:2308 
 relink_extent_backref+0x103/0x721 [btrfs]()
 [ 1616.887050] Call Trace:
 [ 1616.887064] [8161a34a] dump_stack+0x19/0x1b
 [ 1616.887071] [8103035a] warn_slowpath_common+0x67/0x80
 [ 1616.887077] [8103038d] warn_slowpath_null+0x1a/0x1c
 [ 1616.887100] [a019ea82] relink_extent_backref+0x103/0x721
 [ 1616.887205] [a019f7e2] btrfs_finish_ordered_io+0x742/0x829

 Mitch,

 Thank you for this excellent work to find the cause of the issue. I've sent a 
 patch Btrfs: fix for patch cleanup: don't check the same thing twice and 
 would appreciate if you could repeat your test, just to make sure, because I 
 was never able to reproduce this issue myself.


Thanks.

I've tested my special workload with your patch on the latest
3.11_rc6 kernel, and the patch corrects the errors I was encountering.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Btrfs: fix for patch cleanup: don't check the same thing twice

2013-08-23 Thread Mitch Harder
On Fri, Aug 23, 2013 at 4:03 AM, Miao Xie mi...@cn.fujitsu.com wrote:
 On fri, 23 Aug 2013 10:34:42 +0200, Stefan Behrens wrote:
 Mitch Harder noticed that the patch 3c64a1a mentioned in the subject
 line was causing a kernel BUG() on snapshot deletion.

 The patch was wrong. It did not handle cached roots correctly. The
 check for root_refs == 0 was removed everywhere where
 btrfs_read_fs_root_no_name() had been used to retrieve the root,
 because this check was already dealt with in
 btrfs_read_fs_root_no_name(). But in the case when the root was
 found in the cache, there was no such check.

 This patch adds the missing check in the case where the root is
 found in the cache.

 Reported-by: Mitch Harder mitch.har...@sabayonlinux.org
 Signed-off-by: Stefan Behrens sbehr...@giantdisaster.de
 ---
  fs/btrfs/disk-io.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

 diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
 index 43ec3c6..7078554 100644
 --- a/fs/btrfs/disk-io.c
 +++ b/fs/btrfs/disk-io.c
 @@ -1583,8 +1583,11 @@ struct btrfs_root *btrfs_read_fs_root_no_name(struct 
 btrfs_fs_info *fs_info,
   ERR_PTR(-ENOENT);
  again:
   root = btrfs_lookup_fs_root(fs_info, location-objectid);
 - if (root)
 + if (root) {
 + if (btrfs_root_refs(root-root_item) == 0)
 + return ERR_PTR(-ENOENT);
   return root;
 + }

 It seems good to me.

 Reviewed-by: Miao Xie mi...@cn.fujitsu.com


   root = btrfs_read_fs_root(fs_info-tree_root, location);
   if (IS_ERR(root))


Tested-by: Mitch Harder mitch.har...@sabayonlinux.org
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question: How can I recover this partition? (unable to find logical $hugenum len 4096)

2013-08-22 Thread Mitch Harder
On Thu, Aug 22, 2013 at 1:47 AM, Nicholas Lee em...@nickle.es wrote:

 [   45.914275] [ cut here ]
 [   45.914406] kernel BUG at fs/btrfs/volumes.c:4417!
 [   45.914489] invalid opcode:  [#1] PREEMPT SMP

I can't say if this will fix your problem or not, but the 3.10.x
kernel has a patch to pass this error back instead of halting with a
BUG() at this point.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel BUG on Snapshot Deletion (3.11.0-rc5)

2013-08-15 Thread Mitch Harder
I'm running into a curious problem.

In the process of making my script portable, I am breaking the ability
to replicate the error.

I'm trying to isolate the aspect of my local script that is triggering
the error.  No firm insights yet.


On Tue, Aug 13, 2013 at 11:03 AM, Mitch Harder
mitch.har...@sabayonlinux.org wrote:
 Let me work on making that script more portable, and hopefully quicker
 to reproduce.

 On Tue, Aug 13, 2013 at 9:15 AM, Josef Bacik jba...@fusionio.com wrote:
 On Mon, Aug 12, 2013 at 11:06:27PM -0500, Mitch Harder wrote:
 I'm hitting a btrfs Kernel BUG running a snapshot stress script with
 linux-3.11.0-rc5.


 I can haz script?  Thanks,

 Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel BUG on Snapshot Deletion (3.11.0-rc5)

2013-08-13 Thread Mitch Harder
Let me work on making that script more portable, and hopefully quicker
to reproduce.

On Tue, Aug 13, 2013 at 9:15 AM, Josef Bacik jba...@fusionio.com wrote:
 On Mon, Aug 12, 2013 at 11:06:27PM -0500, Mitch Harder wrote:
 I'm hitting a btrfs Kernel BUG running a snapshot stress script with
 linux-3.11.0-rc5.


 I can haz script?  Thanks,

 Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Kernel BUG on Snapshot Deletion (3.11.0-rc5)

2013-08-12 Thread Mitch Harder
I'm hitting a btrfs Kernel BUG running a snapshot stress script with
linux-3.11.0-rc5.

I'm running with lzo compression, autodefrag, and the partition is
formated with 16k leafsize/inodesize.

[   72.170431] device fsid 8a6be667-d041-4367-80f7-e4cb42356e85 devid
1 transid 4 /dev/sda7
[   72.297512] device fsid 8a6be667-d041-4367-80f7-e4cb42356e85 devid
1 transid 4 /dev/sda7
[   72.298928] device fsid 8a6be667-d041-4367-80f7-e4cb42356e85 devid
1 transid 4 /dev/sda7
[   72.299390] btrfs: setting 8 feature flag
[   72.299395] btrfs: force lzo compression
[   72.299401] btrfs: enabling auto defrag
[   72.299404] btrfs: disk space caching is enabled
[   72.299407] btrfs flagging fs with big metadata feature
[ 2234.790218] [ cut here ]
[ 2234.790257] WARNING: CPU: 0 PID: 4246 at fs/btrfs/extent-tree.c:840
btrfs_lookup_extent_info+0x328/0x36e [btrfs]()
[ 2234.790262] Modules linked in: ipv6 tg3 serio_raw ppdev
snd_hda_codec_analog iTCO_wdt iTCO_vendor_support snd_hda_intel floppy
snd_hda_codec sr_mod snd_hwdep pcspkr snd_pcm lpc_ich i2c_i801
parport_pc parport ptp snd_page_alloc pps_core snd_timer snd xts
ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 sha256_generic
fuse xfs nfs lockd sunrpc reiserfs btrfs zlib_deflate ext4 jbd2 ext3
jbd ext2 mbcache sl811_hcd hid_generic xhci_hcd ohci_hcd uhci_hcd
ehci_pci ehci_hcd
[ 2234.790333] CPU: 0 PID: 4246 Comm: btrfs-cleaner Not tainted 3.11.0-rc5 #1
[ 2234.790337] Hardware name: Dell Inc. OptiPlex 745
  /0WF810, BIOS 2.6.4  03/01/2010
[ 2234.790341]  0348 880077739b68 81625def
0006
[ 2234.790349]   880077739ba8 810374f0
88000556e800
[ 2234.790356]  a0185d5c 88007721de10 88000556e800

[ 2234.790363] Call Trace:
[ 2234.790375]  [81625def] dump_stack+0x46/0x58
[ 2234.790384]  [810374f0] warn_slowpath_common+0x81/0x9b
[ 2234.790403]  [a0185d5c] ?
btrfs_lookup_extent_info+0x328/0x36e [btrfs]
[ 2234.790411]  [81037524] warn_slowpath_null+0x1a/0x1c
[ 2234.790429]  [a0185d5c]
btrfs_lookup_extent_info+0x328/0x36e [btrfs]
[ 2234.790449]  [a018837e] do_walk_down+0x142/0x438 [btrfs]
[ 2234.790467]  [a01860d4] ?
btrfs_delayed_refs_qgroup_accounting+0xbd/0xcc [btrfs]
[ 2234.790487]  [a018871a] walk_down_tree+0xa6/0xd4 [btrfs]
[ 2234.790507]  [a018aec3] btrfs_drop_snapshot+0x32d/0x65d [btrfs]
[ 2234.790531]  [a019b1df]
btrfs_clean_one_deleted_snapshot+0xda/0x103 [btrfs]
[ 2234.790552]  [a0193c0c] cleaner_kthread+0x130/0x157 [btrfs]
[ 2234.790573]  [a0193adc] ? transaction_kthread+0x1a0/0x1a0 [btrfs]
[ 2234.790580]  [810522bc] kthread+0xba/0xc2
[ 2234.790586]  [81052202] ? kthread_freezable_should_stop+0x52/0x52
[ 2234.790593]  [8162d89c] ret_from_fork+0x7c/0xb0
[ 2234.790599]  [81052202] ? kthread_freezable_should_stop+0x52/0x52
[ 2234.790604] ---[ end trace 21a428587abe0e9d ]---
[ 2234.790610] BTRFS error (device sda7): Missing references.
[ 2234.790637] [ cut here ]
[ 2234.790688] kernel BUG at fs/btrfs/extent-tree.c:7191!
[ 2234.790736] invalid opcode:  [#1] SMP
[ 2234.790779] Modules linked in: ipv6 tg3 serio_raw ppdev
snd_hda_codec_analog iTCO_wdt iTCO_vendor_support snd_hda_intel floppy
snd_hda_codec sr_mod snd_hwdep pcspkr snd_pcm lpc_ich i2c_i801
parport_pc parport ptp snd_page_alloc pps_core snd_timer snd xts
ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 sha256_generic
fuse xfs nfs lockd sunrpc reiserfs btrfs zlib_deflate ext4 jbd2 ext3
jbd ext2 mbcache sl811_hcd hid_generic xhci_hcd ohci_hcd uhci_hcd
ehci_pci ehci_hcd
[ 2234.791005] CPU: 0 PID: 4246 Comm: btrfs-cleaner Tainted: G
W3.11.0-rc5 #1
[ 2234.791005] Hardware name: Dell Inc. OptiPlex 745
  /0WF810, BIOS 2.6.4  03/01/2010
[ 2234.791005] task: 88007c97c380 ti: 880077738000 task.ti:
880077738000
[ 2234.791005] RIP: 0010:[a01883be]  [a01883be]
do_walk_down+0x182/0x438 [btrfs]
[ 2234.791005] RSP: :880077739c58  EFLAGS: 00010296
[ 2234.791005] RAX: 002e RBX: 88000c6706c0 RCX: 0046
[ 2234.791005] RDX: 0006 RSI: 0046 RDI: 88007f20d210
[ 2234.791005] RBP: 880077739d18 R08: 0002 R09: fffe
[ 2234.791005] R10: 0001 R11: 81e2ee38 R12: 88002a930500
[ 2234.791005] R13: 88007721 R14: 88000556e800 R15: 0002
[ 2234.791005] FS:  () GS:88007f20()
knlGS:
[ 2234.791005] CS:  0010 DS:  ES:  CR0: 8005003b
[ 2234.791005] CR2: 7f312ced67bd CR3: 255e CR4: 07f0
[ 2234.791005] Stack:
[ 2234.791005]  88000c670708 a01860d4 
8800771df3c0
[ 2234.791005]  880077739c98 0001 0001

Re: lz4 status?

2013-06-30 Thread Mitch Harder
There's been a parallel effort to incorporate a general set of lz4
patches in the kernel.

I see these patches are currently queued up in the linux-next tree, so
we may see them in the 3.11 kernel.

It looks like lz4 and lz4hc will be provided.

So, instead of btrfs having it's own implementation of lz4, the
patches will be re-worked around kernel's new lz4 library.

On Wed, Jun 26, 2013 at 10:57 AM, Roger Pack rogerpack2...@gmail.com wrote:
 Any update on the unmerged lz4 patches? Have they been merged?
 Just wondering (and +1'ing my support, obviously).
 Thank you.
 -roger-
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs-cleaner Blocked on xfstests 068

2013-06-09 Thread Mitch Harder
I'm running into a problem with the btrfs-cleaner thread becoming
blocked on xfstests 068.

The test locks up indefinitely without completing (normally it
finished in about 45 seconds on my test box).

I've replicated the issue on 3.10.0_rc5 and the for-linus branch of 3.9.0.

I ran a git bisect on the 3.9.0 for-linus branch, and tracked my issue
to the following commit:

commit 9d1a2a3ad59f7ae810bf04a5a05995bf2d79300c
btrfs: clean snapshots one by one

The 068 test uses the scratch drive, so I believe xfs-test is using
the defaults for formatting the device, which is a physical partition
on my SATA drive.

My mount settings for xfstests is:
export MOUNT_OPTIONS=-o compress-force=lzo,autodefrag

There are no errors shown in dmesg.  Here is the result of Alt-SysRq-W
to show the blocked states:

[  413.408168] SysRq : Show Blocked State
[  413.409157]   taskPC stack   pid father
[  413.409157] btrfs-cleaner   D 88007827c308 0  4516  2 0x
[  413.409157]  8800785d5d18 0046 8800785d5c58
8800785d5fd8
[  413.409157]  4000 00012c80 88007ca7a210
88007cbd4af0
[  413.409157]  88007cbd4b60 88007cbd4b38 88007f312cf0
0001
[  413.409157] Call Trace:
[  413.409157]  [8105e30f] ? dequeue_entity+0x34e/0x370
[  413.409157]  [81118593] ? find_inode+0x93/0xbe
[  413.409157]  [8161e3e4] schedule+0x64/0x66
[  413.409157]  [8110525c] __sb_start_write+0x9a/0xf0
[  413.409157]  [8104e2d7] ? remove_wait_queue+0x3a/0x3a
[  413.409157]  [a019018e] btrfs_run_defrag_inodes+0x20a/0x327 [btrfs]
[  413.409157]  [a017a6c1] cleaner_kthread+0x95/0x122 [btrfs]
[  413.409157]  [a017a62c] ? transaction_kthread+0x1a0/0x1a0 [btrfs]
[  413.409157]  [8104da7c] kthread+0xba/0xc2
[  413.409157]  [8104d9c2] ? kthread_freezable_should_stop+0x52/0x52
[  413.409157]  [8161fd1c] ret_from_fork+0x7c/0xb0
[  413.409157]  [8104d9c2] ? kthread_freezable_should_stop+0x52/0x52
[  413.409157] fsstressD 88007827c308 0  4730   4729 0x
[  413.409157]  88007717fd58 0082 88007717fe18
88007717ffd8
[  413.409157]  4000 00012c80 81c11410
88007caf2fb0
[  413.409157]  88007717fca8 8111bb58 
88007717fe58
[  413.409157] Call Trace:
[  413.409157]  [8111bb58] ? mntput_no_expire+0x40/0x11b
[  413.409157]  [8110bca6] ? complete_walk+0x92/0xda
[  413.409157]  [8161e3e4] schedule+0x64/0x66
[  413.409157]  [8110525c] __sb_start_write+0x9a/0xf0
[  413.409157]  [8104e2d7] ? remove_wait_queue+0x3a/0x3a
[  413.409157]  [8111bf5d] mnt_want_write+0x24/0x4b
[  413.409157]  [8110e5dd] kern_path_create+0x6d/0x13f
[  413.409157]  [810f7fdc] ? kmem_cache_alloc+0x31/0xf8
[  413.409157]  [8110cb01] ? getname_flags+0x74/0x158
[  413.409157]  [8110e6ee] user_path_create+0x3f/0x57
[  413.409157]  [81110a38] SyS_symlinkat+0x4a/0xc0
[  413.409157]  [81001f1b] ? do_notify_resume+0x5a/0x61
[  413.409157]  [81110ac4] SyS_symlink+0x16/0x18
[  413.409157]  [8161fdc6] system_call_fastpath+0x1a/0x1f
[  413.409157] fsstressD 88007827c308 0  4731   4729 0x
[  413.409157]  880077181e68 0082 8800785c2000
880077181fd8
[  413.409157]  4000 00012c80 88007caf1470
88007caf0da0
[  413.409157]  0001 0001 880077181da8
8110cc21
[  413.409157] Call Trace:
[  413.409157]  [8110cc21] ? putname+0x28/0x31
[  413.409157]  [81110553] ? user_path_at_empty+0x61/0x92
[  413.409157]  [811075b4] ? inode_get_bytes+0x1a/0x3a
[  413.409157]  [811075b4] ? inode_get_bytes+0x1a/0x3a
[  413.409157]  [8161e3e4] schedule+0x64/0x66
[  413.409157]  [8110525c] __sb_start_write+0x9a/0xf0
[  413.409157]  [8104e2d7] ? remove_wait_queue+0x3a/0x3a
[  413.409157]  [8110386f] vfs_write+0xc2/0x18f
[  413.409157]  [81103c5b] SyS_write+0x50/0x78
[  413.409157]  [8161fdc6] system_call_fastpath+0x1a/0x1f
[  413.409157] xfs_io  D 0001 0  4750   4746 0x
[  413.409157]  88007726fd68 0086 88007c5a00b0
88007726ffd8
[  413.409157]  4000 00012c80 81c11410
88007caf6d00
[  413.409157]  88007726fca8 810c072e 8800767396c0
88007722ca50
[  413.409157] Call Trace:
[  413.409157]  [810c072e] ? unlock_page+0x24/0x28
[  413.409157]  [810dbd38] ? __do_fault+0x398/0x3cd
[  413.409157]  [8161e3e4] schedule+0x64/0x66
[  413.409157]  [8161ee9c] rwsem_down_write_failed+0xf7/0x14a
[  413.409157]  [8120d7f3] call_rwsem_down_write_failed+0x13/0x20
[  413.409157]  [8161d555] ? down_write+0x2e/0x32
[  

Re: btrfs prof compile error on debian squeeze.

2013-04-10 Thread Mitch Harder
We had a discussion on this topic in another thread.

I'd be happy to be corrected, but I think the conclusion was that you
probably need to be on a really  modern version of Linux to work with
the latest version of btrfs-progs that is in the kernel git
repository.

The mkfs.btrfs version in the kernel git tree won't even work
correctly on a kernel = 3.7, and only partially works on the 3.8
kernel.

On 4/10/13, Wang Shilong wangshilong1...@gmail.com wrote:
 Hello,
 Maybe this url will help you.

 https://btrfs.wiki.kernel.org/index.php/Btrfs_source_repositories

 Thanks,
 Wang
 Hello,

 I'm trying to build btrfs-prog on debian squeeze but when I'm trying to
 use make, I have an error :


 pc@debian:~/b/btrfs-progs$ make
[LD] mkfs.btrfs
 mkfs.o: In function `is_ssd':
 /home/pc/b/btrfs-progs/mkfs.c:1234: undefined reference to
 `blkid_probe_get_wholedisk_devno'
 collect2: ld returned 1 exit status
 make: *** [mkfs.btrfs] Erreur 1


 After a few searches over the internet, it seems that my blkid library is
 out of date. How can I compile btrfs prog on debian squeeze ?

 Thanks !


 Olivier.--
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: minimum kernel version for btrfsprogs.0.20?

2013-04-04 Thread Mitch Harder
On 4/3/13, Chris Murphy li...@colorremedies.com wrote:

 On Mar 29, 2013, at 9:42 AM, Mitch Harder mitch.har...@sabayonlinux.org
 wrote:

 On Fri, Mar 29, 2013 at 1:21 AM, Chris Murphy li...@colorremedies.com
 wrote:

 mkfs.btrfs -l 8192 with kernel 3.9.0 creates a file system mountable by
 3.9.0 and only 3.9.0 (so far). And while there's no error making such a
 file system with other kernels, they won't mount the resulting file
 system.


 I'm seeing something similar.

 Using the current master branch of btrfs-progs
 (https://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git,
 top commit 7854c8b667654502f69e05584729146a06827bc6 Btrfs-progs: give
 restore a list roots option), if I run 'mkfs.btrfs -f /dev/device'
 on a 3.7.x vintage kernel, the mkfs operation is successful, but I
 can't mount the partition.

 I am successful on a 3.8.x vintage kernel or testing the _rc code for
 3.9.

 If you try a leaf size other than default, it creates the file system but
 won't mount it, for any 3.8.x kernel I've tried including 3.8.5. Only 3.9.0
 kernels are apparently mounting leaf sizes above 4KB, if the fs was created
 with btrfs-progs-0.20.rc1.20130308git704a08c.


I disagree with cwillu regarding the default setting for extended inode refs.

While the extended inode refs are a great addition and solve a long
standing problem, it appears only the 3.9.0_rc kernel consistently
works with extended inode refs.

There should be at least a few working kernel versions out there
before this becomes the default.  Options like this that will make
btrfs unmountable on older kernel versions need buy-in by the users.

There is still the capability to enable extended inode refs with btrfstune.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: minimum kernel version for btrfsprogs.0.20?

2013-03-29 Thread Mitch Harder
On Fri, Mar 29, 2013 at 1:21 AM, Chris Murphy li...@colorremedies.com wrote:
 Chris Murphy wrote:
 On Mar 29, 2013, at 12:04 AM, cwillu cwi...@cwillu.com wrote:

 commit 1a72afaa btrfs-progs: mkfs support for extended inode refs
 unconditionally enables extended irefs (which permits more than 4k
 links to the same inode).  It's the right default imo, but there
 probably should have been a mkfs option to disable it.

 mkfs.btrfs -l 8192

 That is not mountable by 3.8.5. I get:

 [  252.870733] btrfs: disk space caching is enabled
 [  252.870740] btrfs flagging fs with big metadata feature
 [  252.874944] btrfs: failed to recover relocation
 [  252.885186] btrfs: open_ctree failed

 That's definitely not expected.

 mkfs.btrfs -l 8192 with kernel 3.9.0 creates a file system mountable by 3.9.0 
 and only 3.9.0 (so far). And while there's no error making such a file system 
 with other kernels, they won't mount the resulting file system.


I'm seeing something similar.

Using the current master branch of btrfs-progs
(https://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git,
top commit 7854c8b667654502f69e05584729146a06827bc6 Btrfs-progs: give
restore a list roots option), if I run 'mkfs.btrfs -f /dev/device'
on a 3.7.x vintage kernel, the mkfs operation is successful, but I
can't mount the partition.

I am successful on a 3.8.x vintage kernel or testing the _rc code for 3.9.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: zlib vs lzo uncompress speed, ssd vs nossd

2013-03-27 Thread Mitch Harder
On Wed, Mar 27, 2013 at 11:53 AM, Marc MERLIN m...@merlins.org wrote:

 Is my feeling of slower boot wrong, or is zlib also noticeably slower than
 lzo to read and decompress?


Lzo compression should be faster in every aspect than zlib, especially
for reading.

But having said that, btrfs won't recompress any existing files just
because you switch your mount option from lzo to zlib.  Only newly
written files will be zlib, and btrfs will leave the lzo-compressed
files alone unless they are re-written, or you expressly recompress
them using the defrag tool.

If you were to take a snapshot of your root partition, and reboot to
the snapshot as the new root with zlib compression, you could make
some side-by-side comparisons of boot time to clarify your
impressions.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: zlib vs lzo uncompress speed, ssd vs nossd

2013-03-27 Thread Mitch Harder
On Wed, Mar 27, 2013 at 4:22 PM, Marc MERLIN m...@merlins.org wrote:
 On Wed, Mar 27, 2013 at 04:12:27PM -0500, Mitch Harder wrote:
 On Wed, Mar 27, 2013 at 11:53 AM, Marc MERLIN m...@merlins.org wrote:
 
  Is my feeling of slower boot wrong, or is zlib also noticeably slower than
  lzo to read and decompress?
 

 Lzo compression should be faster in every aspect than zlib, especially
 for reading.

 But having said that, btrfs won't recompress any existing files just
 because you switch your mount option from lzo to zlib.  Only newly
 written files will be zlib, and btrfs will leave the lzo-compressed
 files alone unless they are re-written, or you expressly recompress
 them using the defrag tool.

 That was my intent at the time, I thought that zlib decompression was about
 as fast as lzo, so it would have been good that most my files stayed
 compressed as zlib.
 Turns out I was wrong :)

 If you were to take a snapshot of your root partition, and reboot to
 the snapshot as the new root with zlib compression, you could make
 some side-by-side comparisons of boot time to clarify your
 impressions.

 Fair point. By that, you mean degrag all my files somehow (recompressing as
 lzo, and doubling the size of my rootfs)?

 Also, I was re-reading ssd vs nossd:
 https://btrfs.wiki.kernel.org/index.php/Mount_options
 isn't clear whether these are read/write ordering optimizations, or
 filesystem layout optimization (i.e. you'd have to recreate the entire FS,
 and rewrite everything).

 http://www.phoronix.com/scan.php?page=articleitem=btrfs_ssd_modenum=1
 says 'However, unless disabling the write cache for the drive, the SSD mode
 does not necessarily mean better performance. In fact, as our results are
 about to show, the quantitative disk performance can drop greatly in the SSD
 mode when the write cache remains enabled'
 But that's from 2009, so not very relevant to today.

 Do you happen to know more than me on this?


I'm sorry, I have no experience with the ssd mount option.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs stuck on

2013-03-22 Thread Mitch Harder
On Thu, Mar 21, 2013 at 1:56 PM, Ask Bjørn Hansen a...@develooper.com wrote:
 Hello,

 A few weeks ago I replaced a ZFS backup system with one backed by btrfs. A 
 script loops over a bunch of hosts rsyncing them to each their own subvolume. 
  After each rsync I snapshot the host-specific subvolume.

 The disk is an iscsi disk that in my benchmarks performs roughly like a 
 local raid with 2-3 SATA disks.

 It worked fine for about a week (~150 snapshots from ~20 sub volumes) before 
 it suddenly exploded in disk io wait. Doing anything (in particular 
 changes) on the file system is just insanely slow, rsync basically can't 
 complete (an rsync that should take 10-20 minutes takes 24 hours; I have a 
 directory of 60k files I tried deleting and it's deleting one file every few 
 minutes, that sort of thing).

 I am using 3.8.2-206.fc18.x86_64 (Fedora 18). I tried rebooting, it doesn't 
 make a difference. As soon as I boot [btrfs-cleaner] and 
 [btrfs-transacti] gets really busy.

 I wonder if it's because I deleted a few snapshots at some point?

 The file system is mounted with -o compress=zlib,noatime

 # mount | grep tank
 /dev/sdc on /tank type btrfs 
 (rw,noatime,seclabel,compress=zlib,space_cache,_netdev)

 I don't recall mounting it with space_cache; though I don't think that's the 
 default so I wonder if I did do that at some point. Could that be what's 
 messing me up?

 btrfs-cleaner stack:

 # cat /proc/1117/stack
 [a022598a] btrfs_commit_transaction+0x36a/0xa70 [btrfs]
 [a022677f] start_transaction+0x23f/0x460 [btrfs]
 [a0226cb8] btrfs_start_transaction+0x18/0x20 [btrfs]
 [a021487f] btrfs_drop_snapshot+0x3ef/0x5d0 [btrfs]
 [a0226e1f] btrfs_clean_old_snapshots+0x9f/0x120 [btrfs]
 [a021eda9] cleaner_kthread+0xa9/0x120 [btrfs]
 [81081f90] kthread+0xc0/0xd0
 [816584ac] ret_from_fork+0x7c/0xb0
 [] 0x


 btrfs-transaction stack:

 #  cat /proc/1118/stack
 [a0256b35] btrfs_tree_read_lock+0x95/0x110 [btrfs]
 [a020033b] btrfs_read_lock_root_node+0x3b/0x50 [btrfs]
 [a0205649] btrfs_search_slot+0x3f9/0x7a0 [btrfs]
 [a020be5e] lookup_inline_extent_backref+0x8e/0x4d0 [btrfs]
 [a020dd38] __btrfs_free_extent+0xc8/0x870 [btrfs]
 [a0211f29] run_clustered_refs+0x459/0xb50 [btrfs]
 [a0215e48] btrfs_run_delayed_refs+0xc8/0x2f0 [btrfs]
 [a02256a6] btrfs_commit_transaction+0x86/0xa70 [btrfs]
 [a021e7c5] transaction_kthread+0x1a5/0x220 [btrfs]
 [81081f90] kthread+0xc0/0xd0
 [816584ac] ret_from_fork+0x7c/0xb0
 [] 0x


 Thank you for reading this far. Any suggestions would be most appreciated!


The space_cache option is probably not the issue.  As you've guessed,
this gets activated by default.

The cleaner runs to remove deleted snapshots.  Responsiveness while
the cleaner is running has been an issue that has come up, but it is
usually just an inconvenience.  I can't recall hearing about a
slowdown of this degree while the cleaner is running.

I haven't noticed many discussions on the Btrfs mailing list where
Btrfs is used in the context of iSCSI, so you may be seeing new issues
in your use case.

If you can, it would be interesting to know how well the cleaner runs
across iSCSI if nothing else is running.  If you could delete a single
snapshot, and make note of the space used before and after the cleaner
finishes and the time required, this might help isolate the issue.

As a work-around, I would suggest using a script to delete the files
in the subvolume before removing the snapshot.  This way, you will
have more control over the priority given to the deletion process.
Once the subvolume is empty, the cleaner usually runs much better.  :)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problems with compiling btrfs

2013-03-21 Thread Mitch Harder
On Thu, Mar 21, 2013 at 4:46 PM, Avi Miller avi.mil...@oracle.com wrote:
 Hi,

 On 22/03/2013, at 8:11 AM, Joseph Moore jap...@gmail.com wrote:

 [root@ol6 btrfs-progs]# uname -a
 Linux ol6.localdomain 2.6.39-400.17.2.el6uek.x86_64 #1 SMP Wed Mar 13
 12:31:05 PDT 2013 x86_64 x86_64 x86_64 GNU/Linux


 This is the currently shipping Oracle Linux 6 UEK and as such, doesn't 
 support a newer btrfs-progs. If you want to run a newer btrfs, you should 
 install the 3.8 kernel from our playground channel on public-yum.oracle.com 
 and then you can compile a newer btrfs-progs to match.

 I've also asked the playground build team to build a newer btrfs-progs RPM 
 for the playground channel, but I'm not sure what the timeframes on that 
 would be.

 --
 Oracle http://www.oracle.com
 Avi Miller | Principal Program Manager | +61 (412) 229 687
 Oracle Linux and Virtualization
 417 St Kilda Road, Melbourne, Victoria 3004 Australia


I have also run into the same problem on Enterprise Linux 6.3
(Scientific Linux in my case).

It is relatively trivial to get a current kernel from sources like
ELREPO, so I was hoping to use my Scientific Linux partition at least
for rescue and evaluation.

Is the position of the Btrfs Developer community that Enterprise Linux
6.x is not to be supported?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mkfs.btrfs broken

2013-03-07 Thread Mitch Harder
On Thu, Mar 7, 2013 at 12:10 PM, Swâmi Petaramesh sw...@petaramesh.org wrote:
 Le 07/03/2013 19:06, Jérôme Poulin a écrit :
 mkfs.btrfs tries to lookup loop devices by their filenames and fails
 if any loop device file is missing.

 Hmm Why would mkfs.btrfs want to lookup anything else but the device
 we're trying to format, to check if it's mounted or not ?


At Sabayon, we pretty-much hacked our way around this with a
make-it-go kind of patch.

Otherwise, our installation would break with btrfs on our
Live-[CD/DVD/USB] media.

I know we should have taken the time to put together a proper
solution, but I could never figure out the  reasoning for needing to
scan every device either.

--- btrfs-progs-0.19.orig/utils.c
+++ btrfs-progs-0.19/utils.c
@@ -708,6 +708,21 @@ int is_same_blk_file(const char* a, cons
return 0;
 }

+/* Checks if a file exists and is a block or regular file*/
+int is_existing_blk_or_reg_file(const char* filename)
+{
+   struct stat st_buf;
+
+   if(stat(filename, st_buf)  0) {
+   if(errno == ENOENT)
+   return 0;
+   else
+   return -errno;
+   }
+
+   return (S_ISBLK(st_buf.st_mode) || S_ISREG(st_buf.st_mode));
+}
+
 /* checks if a and b are identical or device
  * files associated with the same block device or
  * if one file is a loop device that uses the other
@@ -727,7 +742,10 @@ int is_same_loop_file(const char* a, con
} else if(ret) {
if((ret = resolve_loop_device(a, res_a, sizeof(res_a)))  0)
return ret;
-
+   /* if the resolved path is not available, there is nothing
+  we can do */
+   if((ret = is_existing_blk_or_reg_file(res_a)) == 0)
+   return ret;
final_a = res_a;
} else {
final_a = a;
@@ -739,6 +757,10 @@ int is_same_loop_file(const char* a, con
} else if(ret) {
if((ret = resolve_loop_device(b, res_b, sizeof(res_b)))  0)
return ret;
+   /* if the resolved path is not available, there is nothing
+  we can do */
+   if((ret = is_existing_blk_or_reg_file(res_b)) == 0)
+   return ret;

final_b = res_b;
} else {
@@ -748,21 +770,6 @@ int is_same_loop_file(const char* a, con
return is_same_blk_file(final_a, final_b);
 }

-/* Checks if a file exists and is a block or regular file*/
-int is_existing_blk_or_reg_file(const char* filename)
-{
-   struct stat st_buf;
-
-   if(stat(filename, st_buf)  0) {
-   if(errno == ENOENT)
-   return 0;
-   else
-   return -errno;
-   }
-
-   return (S_ISBLK(st_buf.st_mode) || S_ISREG(st_buf.st_mode));
-}
-
 /* Checks if a file is used (directly or indirectly via a loop device)
  * by a device in fs_devices
  */
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix wrong outstanding_extents when doing DIO write

2013-02-21 Thread Mitch Harder
On Thu, Feb 21, 2013 at 7:26 AM, Chris Mason chris.ma...@fusionio.com wrote:
 On Thu, Feb 21, 2013 at 02:48:22AM -0700, Miao Xie wrote:
 When running the 083th case of xfstests on the filesystem with
 compress-force=lzo, the following WARNINGs were triggered.
   WARNING: at fs/btrfs/inode.c:7908
   WARNING: at fs/btrfs/inode.c:7909
   WARNING: at fs/btrfs/inode.c:7911
   WARNING: at fs/btrfs/extent-tree.c:4510
   WARNING: at fs/btrfs/extent-tree.c:4511

 This problem was introduced by the patch Btrfs: fix deadlock due
 to unsubmitted. In this patch, there are two bugs which caused
 the above problem.

 I saw this as well on test 132 last night.  My plan was to track it down
 this morning, so discovering it already fixed while I slept was
 wonderful.

 Thanks Miao.  Josef I've got this one and Miao's defrag unmount patch
 queued up.


Thanks, I've also tested this patch, and it cleared the error I was receiving.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix cleaner thread not working with inode cache option

2013-02-20 Thread Mitch Harder
On Wed, Feb 20, 2013 at 8:10 AM, Liu Bo bo.li@oracle.com wrote:
 Right now inode cache inode is treated as the same as space cache
 inode, ie. keep inode in memory till putting super.

 But this leads to an awkward situation.

 If we're going to delete a snapshot/subvolume, btrfs will not
 actually delete it and return free space, but will add it to dead
 roots list until the last inode on this snap/subvol being destroyed.
 Then we'll fetch deleted roots and cleanup them via cleaner thread.

 So here is the problem, if we enable inode cache option, each
 snap/subvol has a cached inode which is used to store inode allcation
 information.  And this cache inode will be kept in memory, as the above
 said.  So with inode cache, snap/subvol can only be added into
 dead roots list during freeing roots stage in umount, so that we can
 ONLY get space back after another remount(we cleanup dead roots on mount).

 But the real thing is we'll no more use the snap/subvol if we mark it
 deleted, so we can safely iput its cache inode when we delete snap/subvol.

 Another thing is that we need to change the rules of droping inode, we
 don't keep snap/subvol's cache inode in memory till end so that we can
 add snap/subvol into dead roots list in time.

 Reported-by: Mitch Harder mitch.har...@sabayonlinux.org
 Signed-off-by: Liu Bo bo.li@oracle.com
 ---
  fs/btrfs/inode.c |3 ++-
  fs/btrfs/ioctl.c |6 ++
  2 files changed, 8 insertions(+), 1 deletions(-)

 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index ca7ace7..d9984fa 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -7230,8 +7230,9 @@ int btrfs_drop_inode(struct inode *inode)
  {
 struct btrfs_root *root = BTRFS_I(inode)-root;

 +   /* the snap/subvol tree is on deleting */
 if (btrfs_root_refs(root-root_item) == 0 
 -   !btrfs_is_free_space_inode(inode))
 +   root != root-fs_info-tree_root)
 return 1;
 else
 return generic_drop_inode(inode);
 diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
 index a31cd93..375f31f 100644
 --- a/fs/btrfs/ioctl.c
 +++ b/fs/btrfs/ioctl.c
 @@ -2171,6 +2171,12 @@ out_unlock:
 shrink_dcache_sb(root-fs_info-sb);
 btrfs_invalidate_inodes(dest);
 d_delete(dentry);
 +
 +   /* the last ref */
 +   if (dest-cache_inode) {
 +   iput(dest-cache_inode);
 +   dest-cache_inode = NULL;
 +   }
 }
  out_dput:
 dput(dentry);
 --
 1.7.7.6


Thanks, I tested this patch, and it fixes the issues I was seeing.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Kernel WARNINGs on btrfs-next

2013-02-20 Thread Mitch Harder
I'm getting a series of kernel WARNING messages when testing Josef's
btrfs-next and Chris' next branch running xfstests 083 when mounted
with compress-force=lzo.

I'm not seeing any other indications of problems other than the
WARNINGs on xfstests 083, so this may be some sort of false positive.

Here are the messages against Chris' -next branch (the same warnings
are being generated against josef's branch, except against a 3.7.x
kernel):

[  553.194991] [ cut here ]
[  553.195002] WARNING: at fs/btrfs/inode.c:7908
btrfs_destroy_inode+0x67/0x25b [btrfs]()
[  553.195043] Hardware name: OptiPlex 745
[  553.195046] Modules linked in: ipv6 snd_hda_codec_analog
snd_hda_intel snd_hda_codec snd_hwdep ppdev parport_pc snd_pcm
snd_page_alloc snd_timer snd floppy sr_mod i2c_i801 tg3 ptp iTCO_wdt
pps_core iTCO_vendor_support ehci_pci parport lpc_ich microcode
serio_raw pcspkr ablk_helper cryptd lrw xts gf128mul aes_x86_64
sha256_generic fuse xfs nfs lockd sunrpc reiserfs btrfs zlib_deflate
ext4 jbd2 ext3 jbd ext2 mbcache sl811_hcd hid_generic xhci_hcd
ohci_hcd uhci_hcd ehci_hcd
[  553.195099] Pid: 4674, comm: rm Not tainted 3.8.0-mason-next+ #1
[  553.195102] Call Trace:
[  553.195112]  [81030522] warn_slowpath_common+0x83/0x9b
[  553.195118]  [81030554] warn_slowpath_null+0x1a/0x1c
[  553.195135]  [a018d69e] btrfs_destroy_inode+0x67/0x25b [btrfs]
[  553.195141]  [8111759a] destroy_inode+0x3b/0x54
[  553.195145]  [811176fc] evict+0x149/0x151
[  553.195149]  [81117f82] iput+0x12c/0x135
[  553.195166]  [a0187f42] ? btrfs_unlink_inode+0x38/0x40 [btrfs]
[  553.195171]  [8110de10] do_unlinkat+0x145/0x1df
[  553.195177]  [81106e9f] ? sys_newfstatat+0x2a/0x33
[  553.195191]  [8110fce5] sys_unlinkat+0x29/0x2b
[  553.195212]  [81607746] system_call_fastpath+0x1a/0x1f
[  553.195224] ---[ end trace 0adc4db1ad1a6634 ]---
[  553.195231] [ cut here ]
[  553.195247] WARNING: at fs/btrfs/inode.c:7909
btrfs_destroy_inode+0x7e/0x25b [btrfs]()
[  553.195249] Hardware name: OptiPlex 745
[  553.195251] Modules linked in: ipv6 snd_hda_codec_analog
snd_hda_intel snd_hda_codec snd_hwdep ppdev parport_pc snd_pcm
snd_page_alloc snd_timer snd floppy sr_mod i2c_i801 tg3 ptp iTCO_wdt
pps_core iTCO_vendor_support ehci_pci parport lpc_ich microcode
serio_raw pcspkr ablk_helper cryptd lrw xts gf128mul aes_x86_64
sha256_generic fuse xfs nfs lockd sunrpc reiserfs btrfs zlib_deflate
ext4 jbd2 ext3 jbd ext2 mbcache sl811_hcd hid_generic xhci_hcd
ohci_hcd uhci_hcd ehci_hcd
[  553.195296] Pid: 4674, comm: rm Tainted: GW3.8.0-mason-next+ #1
[  553.195298] Call Trace:
[  553.195304]  [81030522] warn_slowpath_common+0x83/0x9b
[  553.195308]  [81030554] warn_slowpath_null+0x1a/0x1c
[  553.195324]  [a018d6b5] btrfs_destroy_inode+0x7e/0x25b [btrfs]
[  553.195329]  [8111759a] destroy_inode+0x3b/0x54
[  553.195333]  [811176fc] evict+0x149/0x151
[  553.195336]  [81117f82] iput+0x12c/0x135
[  553.195352]  [a0187f42] ? btrfs_unlink_inode+0x38/0x40 [btrfs]
[  553.195356]  [8110de10] do_unlinkat+0x145/0x1df
[  553.195360]  [81106e9f] ? sys_newfstatat+0x2a/0x33
[  553.195364]  [8110fce5] sys_unlinkat+0x29/0x2b
[  553.195368]  [81607746] system_call_fastpath+0x1a/0x1f
[  553.195371] ---[ end trace 0adc4db1ad1a6635 ]---
[  553.195373] [ cut here ]
[  553.195389] WARNING: at fs/btrfs/inode.c:7911
btrfs_destroy_inode+0xae/0x25b [btrfs]()
[  553.195391] Hardware name: OptiPlex 745
[  553.195393] Modules linked in: ipv6 snd_hda_codec_analog
snd_hda_intel snd_hda_codec snd_hwdep ppdev parport_pc snd_pcm
snd_page_alloc snd_timer snd floppy sr_mod i2c_i801 tg3 ptp iTCO_wdt
pps_core iTCO_vendor_support ehci_pci parport lpc_ich microcode
serio_raw pcspkr ablk_helper cryptd lrw xts gf128mul aes_x86_64
sha256_generic fuse xfs nfs lockd sunrpc reiserfs btrfs zlib_deflate
ext4 jbd2 ext3 jbd ext2 mbcache sl811_hcd hid_generic xhci_hcd
ohci_hcd uhci_hcd ehci_hcd
[  553.195437] Pid: 4674, comm: rm Tainted: GW3.8.0-mason-next+ #1
[  553.195439] Call Trace:
[  553.195444]  [81030522] warn_slowpath_common+0x83/0x9b
[  553.195449]  [81030554] warn_slowpath_null+0x1a/0x1c
[  553.195463]  [a018d6e5] btrfs_destroy_inode+0xae/0x25b [btrfs]
[  553.195470]  [8111759a] destroy_inode+0x3b/0x54
[  553.195474]  [811176fc] evict+0x149/0x151
[  553.195480]  [81117f82] iput+0x12c/0x135
[  553.195495]  [a0187f42] ? btrfs_unlink_inode+0x38/0x40 [btrfs]
[  553.195499]  [8110de10] do_unlinkat+0x145/0x1df
[  553.195504]  [81106e9f] ? sys_newfstatat+0x2a/0x33
[  553.195508]  [8110fce5] sys_unlinkat+0x29/0x2b
[  553.195512]  [81607746] system_call_fastpath+0x1a/0x1f
[  553.195515] ---[ end trace 0adc4db1ad1a6636 ]---
[  553.404031] [ cut here 

Snapshot Cleaner not Working with inode_cache

2013-02-19 Thread Mitch Harder
I've encountered an issue where the space from previously deleted
snapshots is not being freed up by the cleaner thread.

I'm only encountering this issue when I mount with the inode_cache option.

I've reproduced this on a 3.7.9 kernel merged with the latest
for-linus branch.  No additional patches are involved.  My testing
partition is 16 GB.

There is nothing in dmesg indicating any issues.

A simple manual test can reproduce the issue on my box

(1)  Format a fresh, scratch btrfs partition (it would probably work
with an existing test partition, but I always like to test things that
seem broken on a scratch partition).
(2)  Mount partition (my options are -o
compress-force=lzo,inode_cache).  My mount command was:
mount -o compress-force=lzo,inode_cache /dev/sda7 /mnt/benchmark/
(3)  Make a subvolume:  cd /mnt/device; btrfs su create test1
(4)  Untar kernel sources to the subvolume:  cd test1; tar -xpf
path/to/kernel/source/tarball
I believe anything you use to populate the subvolume is sufficient.
(5)  Make a note of the disk usage:  df -T /mnt/device
(6)  Remove the subvolume:  cd ..; btrfs su delete test1
(7)  Wait 2 minutes, and notice that the space has not been freed up.
I've waited much longer, but I forget the exact timeout on the cleaner
thread.
df -T /mnt/device

If I unmount and remount the partition with the same mount options,
the cleaner will begin to correctly free the space.

I've never used the inode_cache option before, so I'll try a few other
kernels to see if this is a regression.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Btrfs: Allow the compressed extent size limit to be modified v2

2013-02-08 Thread Mitch Harder
On Fri, Feb 8, 2013 at 8:53 AM, David Sterba dste...@suse.cz wrote:
 On Thu, Feb 07, 2013 at 11:17:46PM -0600, Mitch Harder wrote:
 On Thu, Feb 7, 2013 at 6:28 PM, David Sterba d...@jikos.cz wrote:
  On Thu, Feb 07, 2013 at 03:38:34PM -0600, Mitch Harder wrote:
  --- a/fs/btrfs/relocation.c
  +++ b/fs/btrfs/relocation.c
  @@ -144,7 +144,7 @@ struct tree_block {
unsigned int key_ready:1;
   };
 
  -#define MAX_EXTENTS 128
  +#define MAX_EXTENTS 512
 
  Is this really related to compression? IIRC I've seen it only in context
  of batch work in reloc, but not anywhere near compression. (I may be
  wrong of course, just checking).
 

 When you defragment compressed extents, it will run through relocation.

 If autodefrag is enabled, I found most everything I touched was
 running through relocation.

 AFAIK defragmentation runs through the writeback loop, blocks are marked
 dirty, delalloc tries to make them contiguous and then synced back to
 disk. Autodefrag uses the same loop, just affects newly written data.

 It has been a while since I looked at the issue, but I think balancing
 your data will also run through relocation.

 Balance does go through reloc for sure.

 From the commit that introduces MAX_EXTENTS it's imo quite clear that
 it's only a balance speedup:

 (0257bb82d21bedff26541bcf12f1461c23f9ed61)
 Btrfs: relocate file extents in clusters

 The extent relocation code copy file extents one by one when
 relocating data block group. This is inefficient if file
 extents are small. This patch makes the relocation code copy
 file extents in clusters. So we can can make better use of
 read-ahead.

In an earlier version of the patch, I had the changes to relocation.c
in a separate patch.  But, I couldn't consistently attain the changed
maximum extent size unless I also addressed the issue with relocation.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] Btrfs: Allow the compressed extent size limit to be modified v2

2013-02-07 Thread Mitch Harder
Provide for modification of the limit of compressed extent size
utilizing mount-time configuration settings.

The size of compressed extents was limited to 128K, which
leads to fragmentation of the extents (although the extents
themselves may still be located contiguously).  This limit is
put in place to ease the RAM required when spreading compression
across several CPUs, and to make sure the amount of IO required
to do a random read is reasonably small.

Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org
---
Changelog v1 - v2:
- Use more self-documenting variable name:
  compressed_extent_size - max_compressed_extent_kb
- Use #define BTRFS_DEFAULT_MAX_COMPR_EXTENTS instead of raw 128.
- Fix min calculation for nr_pages.
- Comment cleanup.
- Use more self-documenting mount option parameter:
  compressed_extent_size - max_compressed_extent_kb
- Fix formatting in btrfs_show_options.
---
 fs/btrfs/ctree.h  |  6 ++
 fs/btrfs/disk-io.c|  1 +
 fs/btrfs/inode.c  |  8 
 fs/btrfs/relocation.c |  7 ---
 fs/btrfs/super.c  | 20 +++-
 5 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 547b7b0..a62f20c 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -191,6 +191,9 @@ static int btrfs_csum_sizes[] = { 4, 0 };
 /* ioprio of readahead is set to idle */
 #define BTRFS_IOPRIO_READA (IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0))
 
+/* Default value for maximum compressed extent size (kb) */
+#define BTRFS_DEFAULT_MAX_COMPR_EXTENTS128
+
 /*
  * The key defines the order in the tree, and so it also defines (optimal)
  * block layout.
@@ -1477,6 +1480,8 @@ struct btrfs_fs_info {
unsigned data_chunk_allocations;
unsigned metadata_ratio;
 
+   unsigned max_compressed_extent_kb;
+
void *bdev_holder;
 
/* private scrub information */
@@ -1829,6 +1834,7 @@ struct btrfs_ioctl_defrag_range_args {
 #define BTRFS_MOUNT_CHECK_INTEGRITY(1  20)
 #define BTRFS_MOUNT_CHECK_INTEGRITY_INCLUDING_EXTENT_DATA (1  21)
 #define BTRFS_MOUNT_PANIC_ON_FATAL_ERROR   (1  22)
+#define BTRFS_MOUNT_COMPR_EXTENT_SIZE (1  23)
 
 #define btrfs_clear_opt(o, opt)((o) = ~BTRFS_MOUNT_##opt)
 #define btrfs_set_opt(o, opt)  ((o) |= BTRFS_MOUNT_##opt)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 830bc17..775e7ba 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2056,6 +2056,7 @@ int open_ctree(struct super_block *sb,
fs_info-trans_no_join = 0;
fs_info-free_chunk_space = 0;
fs_info-tree_mod_log = RB_ROOT;
+   fs_info-max_compressed_extent_kb = BTRFS_DEFAULT_MAX_COMPR_EXTENTS;
 
/* readahead state */
INIT_RADIX_TREE(fs_info-reada_tree, GFP_NOFS  ~__GFP_WAIT);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 148abeb..78fc6eb 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -346,8 +346,8 @@ static noinline int compress_file_range(struct inode *inode,
unsigned long nr_pages_ret = 0;
unsigned long total_compressed = 0;
unsigned long total_in = 0;
-   unsigned long max_compressed = 128 * 1024;
-   unsigned long max_uncompressed = 128 * 1024;
+   unsigned long max_compressed = root-fs_info-max_compressed_extent_kb 
* 1024;
+   unsigned long max_uncompressed = 
root-fs_info-max_compressed_extent_kb * 1024;
int i;
int will_compress;
int compress_type = root-fs_info-compress_type;
@@ -361,7 +361,7 @@ static noinline int compress_file_range(struct inode *inode,
 again:
will_compress = 0;
nr_pages = (end  PAGE_CACHE_SHIFT) - (start  PAGE_CACHE_SHIFT) + 1;
-   nr_pages = min(nr_pages, (128 * 1024UL) / PAGE_CACHE_SIZE);
+   nr_pages = min(nr_pages, max_compressed / PAGE_CACHE_SIZE);
 
/*
 * we don't want to send crud past the end of i_size through
@@ -386,7 +386,7 @@ again:
 *
 * We also want to make sure the amount of IO required to do
 * a random read is reasonably small, so we limit the size of
-* a compressed extent to 128k.
+* a compressed extent.
 */
total_compressed = min(total_compressed, max_uncompressed);
num_bytes = (end - start + blocksize)  ~(blocksize - 1);
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 300e09a..64bbc9e 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -144,7 +144,7 @@ struct tree_block {
unsigned int key_ready:1;
 };
 
-#define MAX_EXTENTS 128
+#define MAX_EXTENTS 512
 
 struct file_extent_cluster {
u64 start;
@@ -3055,6 +3055,7 @@ int relocate_data_extent(struct inode *inode, struct 
btrfs_key *extent_key,
 struct file_extent_cluster *cluster)
 {
int ret;
+   struct btrfs_fs_info *fs_info = BTRFS_I(inode)-root-fs_info;
 
if (cluster-nr  0  extent_key-objectid != cluster-end + 1) {
ret

Re: [RFC] Btrfs: Allow the compressed extent size limit to be modified v2

2013-02-07 Thread Mitch Harder
On Thu, Feb 7, 2013 at 6:28 PM, David Sterba d...@jikos.cz wrote:
 On Thu, Feb 07, 2013 at 03:38:34PM -0600, Mitch Harder wrote:
 --- a/fs/btrfs/relocation.c
 +++ b/fs/btrfs/relocation.c
 @@ -144,7 +144,7 @@ struct tree_block {
   unsigned int key_ready:1;
  };

 -#define MAX_EXTENTS 128
 +#define MAX_EXTENTS 512

 Is this really related to compression? IIRC I've seen it only in context
 of batch work in reloc, but not anywhere near compression. (I may be
 wrong of course, just checking).


When you defragment compressed extents, it will run through relocation.

If autodefrag is enabled, I found most everything I touched was
running through relocation.

It has been a while since I looked at the issue, but I think balancing
your data will also run through relocation.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] Btrfs: Allow the compressed extent size limit to be modified.

2013-02-06 Thread Mitch Harder
Provide for modification of the limit of compressed extent size
utilizing mount-time configuration settings.

The size of compressed extents was limited to 128K, which
leads to fragmentation of the extents (although the extents
themselves may still be located contiguously).  This limit is
put in place to ease the RAM required when spreading compression
across several CPUs, and to make sure the amount of IO required
to do a random read is reasonably small.

This patch is still preliminary.

In this version of the patch, the allowed compressed extent size is
restricted to 128 (the default) and 512. I wanted to extensively test
a single value for a change in compressed extent size before expanding
and testing a wider range of parameters.

I submitted a similar patch about a year and a half ago where the
change was hard-coded and not tuneable.

http://comments.gmane.org/gmane.comp.file-systems.btrfs/10516

---
 fs/btrfs/ctree.h  |  3 +++
 fs/btrfs/disk-io.c|  1 +
 fs/btrfs/inode.c  |  8 
 fs/btrfs/relocation.c |  7 ---
 fs/btrfs/super.c  | 19 ++-
 5 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 547b7b0..f37ec32 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1477,6 +1477,8 @@ struct btrfs_fs_info {
unsigned data_chunk_allocations;
unsigned metadata_ratio;
 
+   unsigned compressed_extent_size;
+
void *bdev_holder;
 
/* private scrub information */
@@ -1829,6 +1831,7 @@ struct btrfs_ioctl_defrag_range_args {
 #define BTRFS_MOUNT_CHECK_INTEGRITY(1  20)
 #define BTRFS_MOUNT_CHECK_INTEGRITY_INCLUDING_EXTENT_DATA (1  21)
 #define BTRFS_MOUNT_PANIC_ON_FATAL_ERROR   (1  22)
+#define BTRFS_MOUNT_COMPR_EXTENT_SIZE (1  23)
 
 #define btrfs_clear_opt(o, opt)((o) = ~BTRFS_MOUNT_##opt)
 #define btrfs_set_opt(o, opt)  ((o) |= BTRFS_MOUNT_##opt)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 830bc17..2d2be03 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2056,6 +2056,7 @@ int open_ctree(struct super_block *sb,
fs_info-trans_no_join = 0;
fs_info-free_chunk_space = 0;
fs_info-tree_mod_log = RB_ROOT;
+   fs_info-compressed_extent_size = 128;
 
/* readahead state */
INIT_RADIX_TREE(fs_info-reada_tree, GFP_NOFS  ~__GFP_WAIT);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 148abeb..5b81b56 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -346,8 +346,8 @@ static noinline int compress_file_range(struct inode *inode,
unsigned long nr_pages_ret = 0;
unsigned long total_compressed = 0;
unsigned long total_in = 0;
-   unsigned long max_compressed = 128 * 1024;
-   unsigned long max_uncompressed = 128 * 1024;
+   unsigned long max_compressed = root-fs_info-compressed_extent_size * 
1024;
+   unsigned long max_uncompressed = root-fs_info-compressed_extent_size 
* 1024;
int i;
int will_compress;
int compress_type = root-fs_info-compress_type;
@@ -361,7 +361,7 @@ static noinline int compress_file_range(struct inode *inode,
 again:
will_compress = 0;
nr_pages = (end  PAGE_CACHE_SHIFT) - (start  PAGE_CACHE_SHIFT) + 1;
-   nr_pages = min(nr_pages, (128 * 1024UL) / PAGE_CACHE_SIZE);
+   nr_pages = min(nr_pages, (max_compressed * 1024UL) / PAGE_CACHE_SIZE);
 
/*
 * we don't want to send crud past the end of i_size through
@@ -386,7 +386,7 @@ again:
 *
 * We also want to make sure the amount of IO required to do
 * a random read is reasonably small, so we limit the size of
-* a compressed extent to 128k.
+* a compressed extent (default of 128k).
 */
total_compressed = min(total_compressed, max_uncompressed);
num_bytes = (end - start + blocksize)  ~(blocksize - 1);
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 300e09a..8d6f6bf 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -144,7 +144,7 @@ struct tree_block {
unsigned int key_ready:1;
 };
 
-#define MAX_EXTENTS 128
+#define MAX_EXTENTS 512
 
 struct file_extent_cluster {
u64 start;
@@ -3055,6 +3055,7 @@ int relocate_data_extent(struct inode *inode, struct 
btrfs_key *extent_key,
 struct file_extent_cluster *cluster)
 {
int ret;
+   struct btrfs_fs_info *fs_info = BTRFS_I(inode)-root-fs_info;
 
if (cluster-nr  0  extent_key-objectid != cluster-end + 1) {
ret = relocate_file_extent_cluster(inode, cluster);
@@ -3066,12 +3067,12 @@ int relocate_data_extent(struct inode *inode, struct 
btrfs_key *extent_key,
if (!cluster-nr)
cluster-start = extent_key-objectid;
else
-   BUG_ON(cluster-nr = MAX_EXTENTS);
+   BUG_ON(cluster-nr = fs_info-compressed_extent_size);
cluster-end = extent_key-objectid + 

Re: [RFC] Btrfs: Allow the compressed extent size limit to be modified.

2013-02-06 Thread Mitch Harder
On Wed, Feb 6, 2013 at 12:46 PM, Zach Brown z...@redhat.com wrote:
 + unsigned compressed_extent_size;

 It kind of jumps out that this mentions neither that it's the max nor
 that it's in KB.  How about max_compressed_extent_kb?

 + fs_info-compressed_extent_size = 128;

 I'd put a DEFAULT_MAX_EXTENTS up by the MAX_ definition instead of using
 a raw 128 here.

 + unsigned long max_compressed = root-fs_info-compressed_extent_size * 
 1024;
 + unsigned long max_uncompressed = root-fs_info-compressed_extent_size 
 * 1024;

 (so max_compressed is in bytes)

   nr_pages = (end  PAGE_CACHE_SHIFT) - (start  PAGE_CACHE_SHIFT) + 1;
 - nr_pages = min(nr_pages, (128 * 1024UL) / PAGE_CACHE_SIZE);
 + nr_pages = min(nr_pages, (max_compressed * 1024UL) / PAGE_CACHE_SIZE);

 (and now that expression adds another * 1024, allowing {128,512}MB
 extents :))


Yuk!  I'm surprised this never manifested as a problem during testing.

* We also want to make sure the amount of IO required to do
* a random read is reasonably small, so we limit the size of
 -  * a compressed extent to 128k.
 +  * a compressed extent (default of 128k).

 Just drop the value so that this comment doesn't need to be updated
 again.

 -* a compressed extent to 128k.
 +* a compressed extent.

 + {Opt_compr_extent_size, compressed_extent_size=%d},

 It's even more important to make the exposed option self-documenting
 than it was to get the fs_info member right.

 + if ((intarg == 128) || (intarg == 512)) {
 + info-compressed_extent_size = intarg;
 + printk(KERN_INFO btrfs: compressed extent 
 size %d\n,
 +info-compressed_extent_size);
 + } else {
 + printk(KERN_INFO btrfs: 
 + Invalid compressed extent size,
 +  using default.\n);

 I'd print the default value when it's used and would include a unit in
 both.

 + if (btrfs_test_opt(root, COMPR_EXTENT_SIZE))
 + seq_printf(seq, ,compressed_extent_size=%d,
 +(unsigned long long)info-compressed_extent_size);

 The (ull) cast doesn't match the %d format and wouldn't be needed if it
 was printed with %u.

 - z

Thanks for the review.

All these comments make sense, and I should be able to work them in.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: fix race between snapshot deletion and getting inode

2013-01-30 Thread Mitch Harder
On Mon, Jan 28, 2013 at 9:52 PM, Chris Mason chris.ma...@fusionio.com wrote:
 On Mon, Jan 28, 2013 at 08:22:10PM -0700, Liu Bo wrote:
 While running snapshot testscript created by Mitch and David,
 the race between autodefrag and snapshot deletion can lead to
 corruption of dead_root list so that we can get crash on
 btrfs_clean_old_snapshots().

 Really nice.  Thanks to everyone that hashed this out.

 -chris

I've been testing [PATCH v2] Btrfs: fix race between snapshot
deletion and getting inode along with [PATCH v6] Btrfs:
snapshot-aware defrag using the same work flow that was reproducing
the dead_root list corruptions.

I've been unable to reproduce the error in ~24 hours of testing.

Normally, I'd hit the error within an hour of testing on a single run.
 I've made three separate runs, and let the last run proceed
overnight.

I'll keep using these patches, and let you know if anything turns up.

Thanks for all your work on this patch set.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix race between snapshot deletion and getting inode

2013-01-28 Thread Mitch Harder
On Mon, Jan 28, 2013 at 5:04 AM, Liu Bo bo.li@oracle.com wrote:
 While running snapshot testscript created by Mitch and David,
 the race between autodefrag and snapshot deletion can lead to
 corruption of dead_root list so that we can get crash on
 btrfs_clean_old_snapshots().

 And besides autodefrag, scrub also do the same thing, ie. read
 root first and get inode.

 Here is the story(take autodefrag as an example):
 (1) when we delete a snapshot or subvolume, it will set its root's
 refs to zero and do a iput() on its own inode, and if this inode happens
 to be the only active in-meory one in root's inode rbtree, it will add
 itself to the global dead_roots list for later cleanup.

 (2) after (1), the autodefrag thread may read another inode for defrag
 and the inode is just in the deleted snapshot/subvolume, but all of these
 are without checking if the root is still valid(refs  0).  So the end up
 result is adding the deleted snapshot/subvolume's root to the global
 dead_roots list AGAIN.

 Fortunately, we already have a srcu lock to avoid the race, ie. subvol_srcu.

 So all we need to do is to take the lock to protect 'read root and get inode',
 since we synchronize to wait for the rcu grace period before adding something
 to the global dead_roots list.

 Reported-by: Mitch Harder mitch.har...@sabayonlinux.org
 Signed-off-by: Liu Bo bo.li@oracle.com

I'm still seeing seeing issues with duplications in the dead_roots list.

I'm using a 3.7.4 kernel merged with the for-linus branch with the
following four patches:
[PATCH V5] Btrfs: snapshot-aware defrag
[PATCH] Btrfs: List Debugging for cleaning deleted
 Non-functional patch to issue some trace_printk debugging.
[PATCH] [RFC] Btrfs: Check for duplicate dead root list
 This is the patch discussed in the snapshot-aware defrag thread.
 It checks for duplicate list entries, and dumps a backtrace
 if it finds one.
Btrfs: fix race between snapshot deletion and getting inode

I've run into several backtraces similar to the following:

[ 3129.368196] btrfs: Duplicate dead root entry.
[ 3129.368199] [ cut here ]
[ 3129.368220] WARNING: at fs/btrfs/transaction.c:893
btrfs_add_dead_root+0x73/0xbc [btrfs]()
[ 3129.368223] Hardware name: OptiPlex 745
[ 3129.368224] Modules linked in: ipv6 snd_hda_codec_analog
snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc iTCO_wdt
ppdev iTCO_vendor_support i2c_i801 parport_pc floppy tg3 sr_mod
microcode snd_timer snd lpc_ich serio_raw pcspkr parport ablk_helper
cryptd lrw xts gf128mul aes_x86_64 sha256_generic fuse xfs nfs lockd
sunrpc reiserfs btrfs zlib_deflate ext4 jbd2 ext3 jbd ext2 mbcache
sl811_hcd hid_generic xhci_hcd ohci_hcd uhci_hcd ehci_hcd
[ 3129.368268] Pid: 4309, comm: btrfs-endio-wri Tainted: GW
3.7.4-sad-v2+ #1
[ 3129.368271] Call Trace:
[ 3129.368278]  [81030586] warn_slowpath_common+0x83/0x9b
[ 3129.368282]  [810305b8] warn_slowpath_null+0x1a/0x1c
[ 3129.368297]  [a0179e0b] btrfs_add_dead_root+0x73/0xbc [btrfs]
[ 3129.368313]  [a0187bef] btrfs_destroy_inode+0x227/0x25b [btrfs]
[ 3129.368319]  [8111393a] destroy_inode+0x3b/0x54
[ 3129.368322]  [81113a9c] evict+0x149/0x151
[ 3129.368327]  [81114322] iput+0x12c/0x135
[ 3129.368342]  [a01845e7] relink_extent_backref+0x669/0x6af [btrfs]
[ 3129.368346]  [815e9849] ? __slab_free+0x17c/0x21b
[ 3129.368362]  [a017c33d] ? record_extent_backrefs+0xa3/0xa3 [btrfs]
[ 3129.368377]  [a0184d9d] ?
btrfs_finish_ordered_io+0x770/0x827 [btrfs]
[ 3129.368393]  [a0184d6d] btrfs_finish_ordered_io+0x740/0x827 [btrfs]
[ 3129.368409]  [a0184e69] finish_ordered_fn+0x15/0x17 [btrfs]
[ 3129.368424]  [a019e7fd] worker_loop+0x14c/0x493 [btrfs]
[ 3129.368439]  [a019e6b1] ? btrfs_queue_worker+0x258/0x258 [btrfs]
[ 3129.368443]  [8104c750] kthread+0xba/0xc2
[ 3129.368447]  [8104c696] ? kthread_freezable_should_stop+0x52/0x52
[ 3129.368451]  [815f301c] ret_from_fork+0x7c/0xb0
[ 3129.368455]  [8104c696] ? kthread_freezable_should_stop+0x52/0x52
[ 3129.368458] ---[ end trace 46705ba72c45db88 ]---
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5] Btrfs: snapshot-aware defrag

2013-01-27 Thread Mitch Harder
On Sun, Jan 27, 2013 at 6:41 AM, Liu Bo bo.li@oracle.com wrote:

 Hi Mitch,

 Many thanks for testing it!

 Well, after some debugging, I finally figure out the whys:

 (1) btrfs_ioctl_snap_destroy() will free the inode of snapshot and set
 root's refs to zero(btrfs_set_root_refs()), if this inode happens to
 be the only one in the rbtree of the snapshot's root at this moment,
 we add this root to the dead_root list.

 (2) Unfortunately, after (1), our snapshot-aware defrag work may read
 another inode in this snapshot into memory during 'relink' stage, and
 later after we finish relink work and iput() will force us to add the
 snapshot's root to the dead_root list again.

 So that's why we get double list_add and list_del corruption.

 And IMO, it can also take place without snapshot-aware defrag, but it's a
 rare case.

I'm seeing a smattering of reports that resemble list corruption on
the M/L, so that is possible.


 So could you please try this?

 thanks,
 liubo

 diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
 index f154946..d4ee66b 100644
 --- a/fs/btrfs/transaction.c
 +++ b/fs/btrfs/transaction.c
 @@ -885,7 +885,15 @@ static noinline int commit_cowonly_roots(struct 
 btrfs_trans_handle *trans,
  int btrfs_add_dead_root(struct btrfs_root *root)
  {
 spin_lock(root-fs_info-trans_lock);
 +   if (!list_empty(root-root_list)) {
 +   struct btrfs_root *tmp;
 +   list_for_each_entry(tmp, root-fs_info-dead_roots, 
 root_list)
 +   if (tmp == root)
 +   goto unlock;
 +   }
 +
 list_add(root-root_list, root-fs_info-dead_roots);
 +unlock:
 spin_unlock(root-fs_info-trans_lock);
 return 0;
  }


It feels like we're correcting the problem after-the-fact with this
method, instead of addressing the root problem.  But I was able to
successfully run with this patch.

I slightly modified your patch as follows by introducing a WARN_ON in
order to get a back trace, and also to give me a positive confirmation
that I was triggering the problem.

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index d6b17fa..0c1066e 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -885,7 +885,18 @@ static noinline int commit_cowonly_roots(struct
btrfs_trans_handle *trans,
 int btrfs_add_dead_root(struct btrfs_root *root)
 {
spin_lock(root-fs_info-trans_lock);
+   if (!list_empty(root-root_list)) {
+   struct btrfs_root *tmp;
+   list_for_each_entry(tmp, root-fs_info-dead_roots, root_list)
+   if (tmp == root) {
+   printk(KERN_ERR btrfs: Duplicate dead root 
entry.\n);
+   WARN_ON(1);
+   goto unlock;
+   }
+   }
+
list_add(root-root_list, root-fs_info-dead_roots);
+unlock:
spin_unlock(root-fs_info-trans_lock);
return 0;
 }
-- 

I was able to trigger the problem several times (16 separate times
according to dmesg) without killing the cleaner process, and
everything appears to have continued successfully after encountering a
duplicate list entry.  My test partition passes btrfsck afterwards.

13 out of the 16 backtraces seem support your hypothesis as passing
through the iput in your patch:

[ 4367.314806] btrfs: Duplicate dead root entry.
[ 4367.314809] [ cut here ]
[ 4367.314834] WARNING: at fs/btrfs/transaction.c:893
btrfs_add_dead_root+0x73/0xbc [btrfs]()
[ 4367.314836] Hardware name: OptiPlex 745
[ 4367.314841] Modules linked in: ipv6 snd_hda_codec_analog
snd_hda_intel snd_hda_codec snd_hwdep snd_pcm tg3 snd_page_alloc
snd_timer snd iTCO_wdt iTCO_vendor_support ppdev parport_pc microcode
i2c_i801 floppy parport sr_mod lpc_ich serio_raw pcspkr ablk_helper
cryptd lrw xts gf128mul aes_x86_64 sha256_generic fuse xfs nfs lockd
sunrpc reiserfs btrfs zlib_deflate ext4 jbd2 ext3 jbd ext2 mbcache
sl811_hcd hid_generic xhci_hcd ohci_hcd uhci_hcd ehci_hcd
[ 4367.314887] Pid: 4463, comm: btrfs-endio-wri Tainted: GW
3.7.4-sad-v2+ #1
[ 4367.314889] Call Trace:
[ 4367.314895]  [81030586] warn_slowpath_common+0x83/0x9b
[ 4367.314899]  [810305b8] warn_slowpath_null+0x1a/0x1c
[ 4367.314915]  [a0179e0b] btrfs_add_dead_root+0x73/0xbc [btrfs]
[ 4367.314931]  [a0187bef] btrfs_destroy_inode+0x227/0x25b [btrfs]
[ 4367.314936]  [8111393a] destroy_inode+0x3b/0x54
[ 4367.314940]  [81113a9c] evict+0x149/0x151
[ 4367.314944]  [81114322] iput+0x12c/0x135
[ 4367.314959]  [a01845e7] relink_extent_backref+0x669/0x6af [btrfs]
[ 4367.314964]  [815e9849] ? __slab_free+0x17c/0x21b
[ 4367.314980]  [a0184d9d] ?
btrfs_finish_ordered_io+0x770/0x827 [btrfs]
[ 4367.314995]  [a0184d6d] btrfs_finish_ordered_io+0x740/0x827 [btrfs]
[ 4367.315011]  [a0184e69] finish_ordered_fn+0x15/0x17 [btrfs]
[ 4367.315034]  

Re: [PATCH V5] Btrfs: snapshot-aware defrag

2013-01-25 Thread Mitch Harder
On Wed, Jan 23, 2013 at 6:52 PM, Liu Bo bo.li@oracle.com wrote:
 On Wed, Jan 23, 2013 at 10:05:04AM -0600, Mitch Harder wrote:
 On Wed, Jan 23, 2013 at 1:51 AM, Liu Bo bo.li@oracle.com wrote:
  On Tue, Jan 22, 2013 at 11:41:19AM -0600, Mitch Harder wrote:
  On Thu, Jan 17, 2013 at 8:42 AM, Mitch Harder
  mitch.har...@sabayonlinux.org wrote:
   On Wed, Jan 16, 2013 at 6:36 AM, Liu Bo bo.li@oracle.com wrote:
   This comes from one of btrfs's project ideas,
   As we defragment files, we break any sharing from other snapshots.
   The balancing code will preserve the sharing, and defrag needs to grow 
   this
   as well.
  [...]
  
   I've been testing this patch on a 3.7.2 kernel merged with the
   for-linus branch for the 3.8_rc kernels, and I'm seeing the following
   error:
  
 
  I've reproduced the error with CONFIG_DEBUG_LIST enabled, which shows
  some problem with an entry in the list.
 
  [59312.260441] [ cut here ]
  [59312.260454] WARNING: at lib/list_debug.c:62 
  __list_del_entry+0x8d/0x98()
  [59312.260458] Hardware name: OptiPlex 745
  [59312.260461] list_del corruption. next-prev should be
  88006511c438, but was dead00200200
 
  LIST_POISON2 - (00200200)
  So we can know that the next one is deleted from the list even _earlier_
  than the current one is.
 
  Any other messages before this warning complains?
 

 Just some normal feedback from a metadata balance I had run.

 Well, these do fit my expectation, since balance also involves with playing 
 with
 root_list, which may lead to the bad situation.


 [14057.193343] device fsid 28c688c5-7dbd-4071-b271-1bf6726d8835 devid
 1 transid 4 /dev/sda7
 [14057.194438] btrfs: force lzo compression
 [14057.194446] btrfs: enabling auto defrag
 [14057.194449] btrfs: disk space caching is enabled
 [14057.194452] btrfs flagging fs with big metadata feature
 [14057.194455] btrfs: lzo incompat flag set.
 [57508.799193] btrfs: relocating block group 14516486144 flags 4
 [57632.178797] btrfs: found 6775 extents
 [57633.214701] btrfs: relocating block group 11832131584 flags 4
 [57776.400102] btrfs: found 6480 extents
 [5.021175] btrfs: relocating block group 10489954304 flags 4
 [57949.182725] btrfs: found 6681 extents
 [59312.260441] [ cut here ]
 [59312.260454] WARNING: at lib/list_debug.c:62 __list_del_entry+0x8d/0x98()
 [59312.260458] Hardware name: OptiPlex 745
 ...

 I'm going to try to wrap some debugging around the section of code in
 btrfs_clean_old_snapshots() where the dead_roots list is spliced onto
 the root list being processed.  The double entry may be slipping in
 here.

 1764 spin_lock(fs_info-trans_lock);
 1765 list_splice_init(fs_info-dead_roots, list);
 1766 spin_unlock(fs_info-trans_lock);

 hmm, I don't think there is anything wrong in this code.  But you can
 give it a shot anyway :)


I've changed up my reproducer to try some things that may hit the
issue quicker and more reliably.

It gave me a slightly different set of warnings in dmesg, which seem
to suggest issues in the dead_root list.

[43925.656065] device fsid a8f6fadb-3022-4c01-b369-f1f3f638c052 devid
1 transid 310 /dev/sda7
[43925.658062] btrfs: force lzo compression
[43925.658072] btrfs: enabling auto defrag
[43925.658075] btrfs: disk space caching is enabled
[43925.658078] btrfs: lzo incompat flag set.
[44503.421293] btrfs: unlinked 1 orphans
[44898.287365] btrfs: unlinked 1 orphans
[45080.641383] btrfs: unlinked 1 orphans
[45250.063773] btrfs: unlinked 1 orphans
[46223.387355] btrfs: unlinked 1 orphans
[46476.473944] btrfs: unlinked 1 orphans
[46499.665615] btrfs: unlinked 1 orphans
[46769.785454] [ cut here ]
[46769.785471] WARNING: at lib/list_debug.c:36 __list_add+0x9d/0xba()
[46769.785474] Hardware name: OptiPlex 745
[46769.785478] list_add double add: new=880050c27c38,
prev=880078f3e720, next=880050c27c38.
[46769.785480] Modules linked in: ipv6 snd_hda_codec_analog
snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer
tg3 sr_mod snd i2c_i801 ppdev parport_pc iTCO_wdt iTCO_vendor_support
lpc_ich pcspkr parport floppy serio_raw microcode ablk_helper cryptd
lrw xts gf128mul aes_x86_64 sha256_generic fuse xfs nfs lockd sunrpc
reiserfs btrfs zlib_deflate ext4 jbd2 ext3 jbd ext2 mbcache sl811_hcd
hid_generic xhci_hcd ohci_hcd uhci_hcd ehci_hcd
[46769.785537] Pid: 18291, comm: btrfs-endio-wri Not tainted 3.7.4-sad-v1+ #3
[46769.785539] Call Trace:
[46769.785549]  [81030586] warn_slowpath_common+0x83/0x9b
[46769.785553]  [81030641] warn_slowpath_fmt+0x46/0x48
[46769.785558]  [8120987b] __list_add+0x9d/0xba
[46769.785586]  [a0179dd6] btrfs_add_dead_root+0x42/0x56 [btrfs]
[46769.785603]  [a0187b67] btrfs_destroy_inode+0x227/0x25b [btrfs]
[46769.785611]  [8111393a] destroy_inode+0x3b/0x54
[46769.785615]  [81113a9c] evict+0x149/0x151
[46769.785619]  [81114322] iput

Re: [PATCH V5] Btrfs: snapshot-aware defrag

2013-01-25 Thread Mitch Harder
On Fri, Jan 25, 2013 at 9:42 AM, Liu Bo bo.li@oracle.com wrote:
 On Fri, Jan 25, 2013 at 08:55:58AM -0600, Mitch Harder wrote:
 On Wed, Jan 23, 2013 at 6:52 PM, Liu Bo bo.li@oracle.com wrote:
  On Wed, Jan 23, 2013 at 10:05:04AM -0600, Mitch Harder wrote:
  On Wed, Jan 23, 2013 at 1:51 AM, Liu Bo bo.li@oracle.com wrote:
   On Tue, Jan 22, 2013 at 11:41:19AM -0600, Mitch Harder wrote:
   On Thu, Jan 17, 2013 at 8:42 AM, Mitch Harder
   mitch.har...@sabayonlinux.org wrote:
On Wed, Jan 16, 2013 at 6:36 AM, Liu Bo bo.li@oracle.com wrote:
This comes from one of btrfs's project ideas,
As we defragment files, we break any sharing from other snapshots.
The balancing code will preserve the sharing, and defrag needs to 
grow this
as well.
   [...]
   
I've been testing this patch on a 3.7.2 kernel merged with the
for-linus branch for the 3.8_rc kernels, and I'm seeing the following
error:
   
  
   I've reproduced the error with CONFIG_DEBUG_LIST enabled, which shows
   some problem with an entry in the list.
  
   [59312.260441] [ cut here ]
   [59312.260454] WARNING: at lib/list_debug.c:62 
   __list_del_entry+0x8d/0x98()
   [59312.260458] Hardware name: OptiPlex 745
   [59312.260461] list_del corruption. next-prev should be
   88006511c438, but was dead00200200
  
   LIST_POISON2 - (00200200)
   So we can know that the next one is deleted from the list even _earlier_
   than the current one is.
  
   Any other messages before this warning complains?
  
 
  Just some normal feedback from a metadata balance I had run.
 
  Well, these do fit my expectation, since balance also involves with 
  playing with
  root_list, which may lead to the bad situation.
 
 
  [14057.193343] device fsid 28c688c5-7dbd-4071-b271-1bf6726d8835 devid
  1 transid 4 /dev/sda7
  [14057.194438] btrfs: force lzo compression
  [14057.194446] btrfs: enabling auto defrag
  [14057.194449] btrfs: disk space caching is enabled
  [14057.194452] btrfs flagging fs with big metadata feature
  [14057.194455] btrfs: lzo incompat flag set.
  [57508.799193] btrfs: relocating block group 14516486144 flags 4
  [57632.178797] btrfs: found 6775 extents
  [57633.214701] btrfs: relocating block group 11832131584 flags 4
  [57776.400102] btrfs: found 6480 extents
  [5.021175] btrfs: relocating block group 10489954304 flags 4
  [57949.182725] btrfs: found 6681 extents
  [59312.260441] [ cut here ]
  [59312.260454] WARNING: at lib/list_debug.c:62 
  __list_del_entry+0x8d/0x98()
  [59312.260458] Hardware name: OptiPlex 745
  ...
 
  I'm going to try to wrap some debugging around the section of code in
  btrfs_clean_old_snapshots() where the dead_roots list is spliced onto
  the root list being processed.  The double entry may be slipping in
  here.
 
  1764 spin_lock(fs_info-trans_lock);
  1765 list_splice_init(fs_info-dead_roots, list);
  1766 spin_unlock(fs_info-trans_lock);
 
  hmm, I don't think there is anything wrong in this code.  But you can
  give it a shot anyway :)
 

 I've changed up my reproducer to try some things that may hit the
 issue quicker and more reliably.

 It gave me a slightly different set of warnings in dmesg, which seem
 to suggest issues in the dead_root list.

 Great!  Many thanks for nail it down, we really shouldn't iput()
 after btrfs_iget().

 Could you please try this(remove iput()) and see if it gets us rid of
 the trouble?

 thanks,
 liubo

 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index 1683f48..c7a0fb7 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -2337,7 +2337,6 @@ out_free_path:
  out_unlock:
 unlock_extent_cached(BTRFS_I(inode)-io_tree, lock_start,
 lock_end,
  cached, GFP_NOFS);
 -   iput(inode);
 return ret;
  }


With this patch, the cleaner never runs to delete the old roots.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5] Btrfs: snapshot-aware defrag

2013-01-23 Thread Mitch Harder
On Wed, Jan 23, 2013 at 1:51 AM, Liu Bo bo.li@oracle.com wrote:
 On Tue, Jan 22, 2013 at 11:41:19AM -0600, Mitch Harder wrote:
 On Thu, Jan 17, 2013 at 8:42 AM, Mitch Harder
 mitch.har...@sabayonlinux.org wrote:
  On Wed, Jan 16, 2013 at 6:36 AM, Liu Bo bo.li@oracle.com wrote:
  This comes from one of btrfs's project ideas,
  As we defragment files, we break any sharing from other snapshots.
  The balancing code will preserve the sharing, and defrag needs to grow 
  this
  as well.
 [...]
 
  I've been testing this patch on a 3.7.2 kernel merged with the
  for-linus branch for the 3.8_rc kernels, and I'm seeing the following
  error:
 

 I've reproduced the error with CONFIG_DEBUG_LIST enabled, which shows
 some problem with an entry in the list.

 [59312.260441] [ cut here ]
 [59312.260454] WARNING: at lib/list_debug.c:62 __list_del_entry+0x8d/0x98()
 [59312.260458] Hardware name: OptiPlex 745
 [59312.260461] list_del corruption. next-prev should be
 88006511c438, but was dead00200200

 LIST_POISON2 - (00200200)
 So we can know that the next one is deleted from the list even _earlier_
 than the current one is.

 Any other messages before this warning complains?


Just some normal feedback from a metadata balance I had run.

[14057.193343] device fsid 28c688c5-7dbd-4071-b271-1bf6726d8835 devid
1 transid 4 /dev/sda7
[14057.194438] btrfs: force lzo compression
[14057.194446] btrfs: enabling auto defrag
[14057.194449] btrfs: disk space caching is enabled
[14057.194452] btrfs flagging fs with big metadata feature
[14057.194455] btrfs: lzo incompat flag set.
[57508.799193] btrfs: relocating block group 14516486144 flags 4
[57632.178797] btrfs: found 6775 extents
[57633.214701] btrfs: relocating block group 11832131584 flags 4
[57776.400102] btrfs: found 6480 extents
[5.021175] btrfs: relocating block group 10489954304 flags 4
[57949.182725] btrfs: found 6681 extents
[59312.260441] [ cut here ]
[59312.260454] WARNING: at lib/list_debug.c:62 __list_del_entry+0x8d/0x98()
[59312.260458] Hardware name: OptiPlex 745
...

I'm going to try to wrap some debugging around the section of code in
btrfs_clean_old_snapshots() where the dead_roots list is spliced onto
the root list being processed.  The double entry may be slipping in
here.

1764 spin_lock(fs_info-trans_lock);
1765 list_splice_init(fs_info-dead_roots, list);
1766 spin_unlock(fs_info-trans_lock);
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5] Btrfs: snapshot-aware defrag

2013-01-22 Thread Mitch Harder
On Thu, Jan 17, 2013 at 8:42 AM, Mitch Harder
mitch.har...@sabayonlinux.org wrote:
 On Wed, Jan 16, 2013 at 6:36 AM, Liu Bo bo.li@oracle.com wrote:
 This comes from one of btrfs's project ideas,
 As we defragment files, we break any sharing from other snapshots.
 The balancing code will preserve the sharing, and defrag needs to grow this
 as well.

 Now we're able to fill the blank with this patch, in which we make full use 
 of
 backref walking stuff.

 Here is the basic idea,
 o  set the writeback ranges started by defragment with flag EXTENT_DEFRAG
 o  at endio, after we finish updating fs tree, we use backref walking to find
all parents of the ranges and re-link them with the new COWed file layout 
 by
adding corresponding backrefs.

 Signed-off-by: Li Zefan l...@cn.fujitsu.com
 Signed-off-by: Liu Bo bo.li@oracle.com
 ---
 v4-v5:
   - Clarify the comments for duplicated refs.
   - Clear defrag flag after we're ready to defrag.
   - Fix a bug on HOLE extent.
 v3-v4:
   - Fix duplicated refs bugs detected by mounting with autodefrag, thanks
 for the bug report from Mitch and Chris.
 v2-v3:
   - Rebase
 v1-v2:
   - Address comments from David.


 I've been testing this patch on a 3.7.2 kernel merged with the
 for-linus branch for the 3.8_rc kernels, and I'm seeing the following
 error:


I've reproduced the error with CONFIG_DEBUG_LIST enabled, which shows
some problem with an entry in the list.

[59312.260441] [ cut here ]
[59312.260454] WARNING: at lib/list_debug.c:62 __list_del_entry+0x8d/0x98()
[59312.260458] Hardware name: OptiPlex 745
[59312.260461] list_del corruption. next-prev should be
88006511c438, but was dead00200200
[59312.260464] Modules linked in: ipv6 snd_hda_codec_analog
snd_hda_intel i2c_i801 tg3 snd_hda_codec iTCO_wdt snd_hwdep snd_pcm
ppdev parport_pc sr_mod microcode floppy parport snd_page_alloc
snd_timer snd iTCO_vendor_support lpc_ich serio_raw pcspkr ablk_helper
cryptd lrw xts gf128mul aes_x86_64 sha256_generic fuse xfs nfs lockd
sunrpc reiserfs btrfs zlib_deflate ext4 jbd2 ext3 jbd ext2 mbcache
sl811_hcd hid_generic xhci_hcd ohci_hcd uhci_hcd ehci_hcd
[59312.260519] Pid: 20523, comm: btrfs-cleaner Not tainted 3.7.2-sad+ #1
[59312.260521] Call Trace:
[59312.260529]  [81030586] warn_slowpath_common+0x83/0x9b
[59312.260549]  [a015aa01] ? reada_for_balance+0x187/0x218 [btrfs]
[59312.260554]  [81030641] warn_slowpath_fmt+0x46/0x48
[59312.260566]  [a015aa01] ? reada_for_balance+0x187/0x218 [btrfs]
[59312.260570]  [812099e5] __list_del_entry+0x8d/0x98
[59312.260574]  [812099fe] list_del+0xe/0x2e
[59312.260590]  [a017b325]
btrfs_clean_old_snapshots+0x101/0x168 [btrfs]
[59312.260605]  [a0173d99] cleaner_kthread+0x5a/0xe6 [btrfs]
[59312.260619]  [a0173d3f] ? transaction_kthread+0x1a0/0x1a0 [btrfs]
[59312.260624]  [8104c750] kthread+0xba/0xc2
[59312.260629]  [8104c696] ? kthread_freezable_should_stop+0x52/0x52
[59312.260634]  [815f2f1c] ret_from_fork+0x7c/0xb0
[59312.260639]  [8104c696] ? kthread_freezable_should_stop+0x52/0x52
[59312.260642] ---[ end trace 61b4cbd93690300f ]---
[59318.623735] [ cut here ]
[59318.623751] WARNING: at lib/list_debug.c:53 __list_del_entry+0x8d/0x98()
[59318.623755] Hardware name: OptiPlex 745
[59318.623760] list_del corruption, 88006511c438-next is
LIST_POISON1 (dead00100100)
[59318.623766] Modules linked in: ipv6 snd_hda_codec_analog
snd_hda_intel i2c_i801 tg3 snd_hda_codec iTCO_wdt snd_hwdep snd_pcm
ppdev parport_pc sr_mod microcode floppy parport snd_page_alloc
snd_timer snd iTCO_vendor_support lpc_ich serio_raw pcspkr ablk_helper
cryptd lrw xts gf128mul aes_x86_64 sha256_generic fuse xfs nfs lockd
sunrpc reiserfs btrfs zlib_deflate ext4 jbd2 ext3 jbd ext2 mbcache
sl811_hcd hid_generic xhci_hcd ohci_hcd uhci_hcd ehci_hcd
[59318.623840] Pid: 20523, comm: btrfs-cleaner Tainted: GW
3.7.2-sad+ #1
[59318.623844] Call Trace:
[59318.623855]  [81030586] warn_slowpath_common+0x83/0x9b
[59318.623878]  [a015aab9] ? btrfs_free_path+0x27/0x2c [btrfs]
[59318.623885]  [81030641] warn_slowpath_fmt+0x46/0x48
[59318.623901]  [a015aab9] ? btrfs_free_path+0x27/0x2c [btrfs]
[59318.623907]  [812099e5] __list_del_entry+0x8d/0x98
[59318.623912]  [812099fe] list_del+0xe/0x2e
[59318.623935]  [a017b325]
btrfs_clean_old_snapshots+0x101/0x168 [btrfs]
[59318.623955]  [a0173d99] cleaner_kthread+0x5a/0xe6 [btrfs]
[59318.623975]  [a0173d3f] ? transaction_kthread+0x1a0/0x1a0 [btrfs]
[59318.623981]  [8104c750] kthread+0xba/0xc2
[59318.623988]  [8104c696] ? kthread_freezable_should_stop+0x52/0x52
[59318.623994]  [815f2f1c] ret_from_fork+0x7c/0xb0
[59318.624000]  [8104c696] ? kthread_freezable_should_stop+0x52/0x52
[59318.624022] ---[ end trace 61b4cbd936903010

Re: [PATCH V5] Btrfs: snapshot-aware defrag

2013-01-17 Thread Mitch Harder
On Thu, Jan 17, 2013 at 6:53 PM, Liu Bo bo.li@oracle.com wrote:
 On Thu, Jan 17, 2013 at 08:42:46AM -0600, Mitch Harder wrote:
 On Wed, Jan 16, 2013 at 6:36 AM, Liu Bo bo.li@oracle.com wrote:
  This comes from one of btrfs's project ideas,
  As we defragment files, we break any sharing from other snapshots.
  The balancing code will preserve the sharing, and defrag needs to grow this
  as well.
 
  Now we're able to fill the blank with this patch, in which we make full 
  use of
  backref walking stuff.
 
  Here is the basic idea,
  o  set the writeback ranges started by defragment with flag EXTENT_DEFRAG
  o  at endio, after we finish updating fs tree, we use backref walking to 
  find
 all parents of the ranges and re-link them with the new COWed file 
  layout by
 adding corresponding backrefs.
 
  Signed-off-by: Li Zefan l...@cn.fujitsu.com
  Signed-off-by: Liu Bo bo.li@oracle.com
  ---
  v4-v5:
- Clarify the comments for duplicated refs.
- Clear defrag flag after we're ready to defrag.
- Fix a bug on HOLE extent.
  v3-v4:
- Fix duplicated refs bugs detected by mounting with autodefrag, 
  thanks
  for the bug report from Mitch and Chris.
  v2-v3:
- Rebase
  v1-v2:
- Address comments from David.
 

 I've been testing this patch on a 3.7.2 kernel merged with the
 for-linus branch for the 3.8_rc kernels, and I'm seeing the following
 error:

 Hi Mitch,

 Insteresting!  I don't even change the snapshot code ever.

Yes, this patch series has been excellent at tickling unrelated issues.

 Is it reproducable stably from your side?  Still with the
 snapshot-test-pub scripts?

I'm still using the same snapshot-test scripts, but they don't
reproduce reliably.  I have to run for a while after my script reaches
the point where it starts deleting snapshots to make space.

But, I've been able to hit this error four times with this script.

I'll try to keep playing with this to make a better reproducer, and to
isolate the problem with the parameter supplied to list_del.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can moving data to a subvolume not take as long as a fully copy?

2013-01-15 Thread Mitch Harder
On Tue, Jan 15, 2013 at 8:49 AM, Marc MERLIN m...@merlins.org wrote:
 On Mon, Jan 14, 2013 at 10:48:50PM -0800, David Brown wrote:
 Why not make a snapshot of the root volume, and then delete the files
 you want to move from the original root, and delete the rest of root
 from the snapshot?

 Are a snapshot of the root volume and a subvolume effectively the same thing
 as far as btrfs sees them?
 Once I have that snapshot which I'll treat as a subvolume, can I then
 snapshot that snapshot/subvolume further?


Yes, the product of the btrfs snapshot command is a subvolume.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Errors not found by btrfsck or scrub

2013-01-11 Thread Mitch Harder
On Fri, Jan 11, 2013 at 12:13 PM, Chris Carlin chrisrcar...@gmail.com wrote:
 I have a week-old filesystem that is reported clean by btrfsck and
 scrub, but that fails under operations ranging from du to sync and
 umount (but no failures if mounted readonly).

 My problem sounds similar to a few other reports (e.g. TM's in
 http://thread.gmane.org/gmane.comp.file-systems.btrfs/22014 ) that
 seem to hint at problems with full metadata. My df shows:


I know this advice will run counter to what everyone else is saying,
but I've had some luck booting with an older kernel (such as 3.4 or
3.5) just long enough to get some more Metadata allocated.

I would also caution you to back up your data.  I've had a similar
issue, and that file system soon showed additional corruptions.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: segmentation-fault in btrfsck (git-version)

2012-12-30 Thread Mitch Harder
On Sat, Dec 29, 2012 at 5:28 AM, Hendrik Friedel hend...@friedels.name wrote:
 Hello,

 I re-send this message, hoping that someone can give me a hint?

 Regards,
 Hendrik


Two possibilities come to mind (although there may be others).

(1)  The file still exists, but it is somewhere you did not expect.

(2)  Your filesystem tree has some sort of corruption.

For item (1), have you thoroughly searched the entire volume for this
file with something like:

find path/to/volume/top/level/mount  -iname 'Sting_Live_in_Berlin'

It is possible that the file exists in a snapshot or different
directory then you were expecting.

If the filesystem tree is corrupted, the task becomes tricky.

Perhaps you can look at the Wiki entry for how the filesystem tree is
constructed:

https://btrfs.wiki.kernel.org/index.php/Trees

Then examine the btrfs-debug-tree output around these entries, and try
to determine why the tree still has entries for these files, but does
not show these files nor report the problem with btrfsck.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: reset path lock state to zero

2012-12-28 Thread Mitch Harder
On Fri, Dec 28, 2012 at 3:33 AM, Liu Bo bo.li@oracle.com wrote:
 We forgot to reset the path lock state to zero after we unlock the path block,
 and this can lead to the ASSERT checker in tree unlock API.

 Reported-by: Slava Barinov raysl...@gmail.com
 Signed-off-by: Liu Bo bo.li@oracle.com
 ---
  fs/btrfs/extent-tree.c |2 ++
  1 files changed, 2 insertions(+), 0 deletions(-)

 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index 521e9d4..a71d457 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -6788,11 +6788,13 @@ static noinline int walk_up_proc(struct 
 btrfs_trans_handle *trans,
wc-flags[level]);
 if (ret  0) {
 btrfs_tree_unlock_rw(eb, path-locks[level]);
 +   path-locks[level] = 0;
 return ret;
 }
 BUG_ON(wc-refs[level] == 0);
 if (wc-refs[level] == 1) {
 btrfs_tree_unlock_rw(eb, path-locks[level]);
 +   path-locks[level] = 0;
 return 1;
 }
 }
 --
 1.7.7.6

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

This patch seems to clear a lock WARNING I've been seeing recently.

http://permalink.gmane.org/gmane.comp.file-systems.btrfs/21692

I'm unable to generate the WARNING after applying this patch.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Kernel lockdep WARNING on btrfs-next

2012-12-15 Thread Mitch Harder
I've been testing Josef's btrfs-next master branch using a test that
loops through creation, manipulation and destruction of snapshots of
kernel git sources.

The version of btrfs-next I'm using was built as of Friday, December
14th, and the top commit is:
Btrfs: don't take inode delalloc mutex if we're a free space inode
committer   Josef Bacik jba...@fusionio.com   
Fri, 14 Dec 2012 21:57:39 + (16:57 -0500)
commit  bd2dd0060cf0ae2a81a7b22e9cc23063796fe09c

I've hit a WARN_ON at kernel/lockdep.c:702, which is in the
look_up_lock_class(...) function

/*
 * We can walk the hash lockfree, because the hash only
 * grows, and we are careful when adding entries to the end:
 */
list_for_each_entry(class, hash_head, hash_entry) {
if (class-key == key) {
/*
 * Huh! same key, different name? Did someone trample
 * on some memory? We're most confused.
 */
Line 702   WARN_ON_ONCE(class-name != lock-name);
return class;
}
}

It looks like this occurred during the delayed deletion of one of the
subvolumes.

As far as I can tell, no corruption occurred, the file system passes
btrfsck checks, and seems to be otherwise behaving normally.

I was not on the system at the time this occurred, so I can't say if
it noticeably delayed the system.

[ 5260.068074] [ cut here ]
[ 5260.068092] WARNING: at kernel/lockdep.c:702
__lock_acquire.isra.29+0xa44/0xab9()
[ 5260.068096] Hardware name: OptiPlex 745
[ 5260.068099] Modules linked in: iTCO_wdt iTCO_vendor_support lpc_ich
mfd_core lrw xts gf128mul ablk_helper cryptd aes_x86_64 sha256_generic
btrfs libcrc32c
[ 5260.068124] Pid: 3801, comm: btrfs-cleaner Not tainted 3.7.0-btrfs-next+ #2
[ 5260.068128] Call Trace:
[ 5260.068139]  [8103663a] warn_slowpath_common+0x74/0xa2
[ 5260.068172]  [a0062805] ? btrfs_tree_read_unlock+0x7d/0xa9 [btrfs]
[ 5260.068179]  [81036682] warn_slowpath_null+0x1a/0x1c
[ 5260.068185]  [81084813] __lock_acquire.isra.29+0xa44/0xab9
[ 5260.068210]  [a0062483] ? btrfs_tree_lock+0xf7/0x24c [btrfs]
[ 5260.068217]  [81084d8e] lock_acquire+0x81/0xff
[ 5260.068241]  [a0062483] ? btrfs_tree_lock+0xf7/0x24c [btrfs]
[ 5260.068248]  [8183d7b5] _raw_write_lock+0x31/0x40
[ 5260.068271]  [a0062483] ? btrfs_tree_lock+0xf7/0x24c [btrfs]
[ 5260.068295]  [a0062483] btrfs_tree_lock+0xf7/0x24c [btrfs]
[ 5260.068319]  [a004eb1d] ? find_extent_buffer+0x8f/0xd6 [btrfs]
[ 5260.068343]  [a004eaa4] ? find_extent_buffer+0x16/0xd6 [btrfs]
[ 5260.068360]  [a001ef79] do_walk_down+0xd2/0x4b6 [btrfs]
[ 5260.068378]  [a001e1ae] ? btrfs_block_rsv_check+0x29/0x7d [btrfs]
[ 5260.068394]  [a001e1ae] ? btrfs_block_rsv_check+0x29/0x7d [btrfs]
[ 5260.068411]  [a001f420] walk_down_tree+0xc3/0xef [btrfs]
[ 5260.068430]  [a0021f4f] btrfs_drop_snapshot+0x372/0x5c7 [btrfs]
[ 5260.068451]  [a0033ccc]
btrfs_clean_old_snapshots+0xa6/0x13a [btrfs]
[ 5260.068471]  [a002b1c0] ? cleaner_kthread+0x8d/0x102 [btrfs]
[ 5260.068490]  [a002b1d4] cleaner_kthread+0xa1/0x102 [btrfs]
[ 5260.068509]  [a002b133] ? btree_invalidatepage+0x73/0x73 [btrfs]
[ 5260.068515]  [81058333] kthread+0xea/0xef
[ 5260.068522]  [81058249] ? flush_kthread_work+0x19c/0x19c
[ 5260.068528]  [8184549c] ret_from_fork+0x7c/0xb0
[ 5260.068534]  [81058249] ? flush_kthread_work+0x19c/0x19c
[ 5260.068538] ---[ end trace 0caa5c9123c1e741 ]---
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: segmentation-fault in btrfsck (git-version)

2012-12-15 Thread Mitch Harder
On Sat, Dec 15, 2012 at 1:40 PM, Hendrik Friedel hend...@friedels.name wrote:
 Hello Mitch, hello all,


 Since btrfs has significant improvements and fixes in each kernel

 release, and since very few of these changes are backported, it is
 recommended to use the latest kernels available.


 Ok, it's 3.7 now.


 The root ### inode # errors 400 are an indication that there is
 an inconsistency in the inode size.  There was a patch included in the
 3.1 or 3.2 kernel to address this issue

 (http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commit;h=f70a9a6b94af86fca069a7552ab672c31b457786).
   But I don't believe this patch fixed existing occurrences of this
 error.


 Apparently not. It's still there.


 At this point, the quickest solution for you may be to rebuild and
 reformat this RAID assembly, and restore this data from backups.


 Yepp, I did that. But in fact, some data is missing. It is not essential,
 but nice to have.


 If you don't have a backup of this data, and since your array seems to
 be working pretty well in a degraded state, this would be a really
 good time to look at a strategy of getting a backup of this data
 before doing many more attempts at rescue.


 Done. It's all save on another ext4 drive.

 So, let's play ;-)
 Could you please help me trying to restore the missing Data?

 What I tried sofar was:
 ./btrfs-restore /dev/sdc1 /mnt/restore/

 It worked, in a way that it restored what I already had.
 What's odd aswell is, that btrfs scrub did run through without errors.
 So, the missing data could have been (accidentally) deleted by me. But I
 don't think... nevertheless I cannot exclude.

 What I know is the (original) Path of the Data.


You could try btrfs-debug-tree, and search for any traces of your
file.  However, be ready to sift through a massive amount of output.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Encryption

2012-12-12 Thread Mitch Harder
On Wed, Dec 12, 2012 at 11:12 AM,  merc1...@f-m.fm wrote:

 So there is no way to have filesystem encryption, while keeping
 snapshots?



I run btrfs on top of LUKS encryption on my laptop.  You should be
able to do the same.

You could then run rsync through ssh.  However, rsync will have no
knowledge of any blocks shared under subvolume snapshots.

Btrfs does not yet have internal encryption.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2 v4] Btrfs: snapshot-aware defrag

2012-12-12 Thread Mitch Harder
On Thu, Nov 1, 2012 at 6:21 AM, Liu Bo bo.li@oracle.com wrote:
 On Thu, Nov 01, 2012 at 08:08:52PM +0900, Itaru Kitayama wrote:
 Hi Liubo,

 I couldn't apply your V4 patch against the btrfs-next HEAD. Do you have
 a github branch which I can checkout?


 The current btrfs-next HEAD actually have included this v4 patch, so
 just pull btrfs-next and give it a shot :)


I'm still seeing similar issues using Josef's current btrfs-next
branch (which still includes the v4 version of the snapshot-aware
defrag patches).

[44507.850693] [ cut here ]
[44507.850728] WARNING: at fs/btrfs/inode.c:7755
btrfs_destroy_inode+0x231/0x2c4 [btrfs]()
[44507.850732] Hardware name: OptiPlex 745
[44507.850735] Modules linked in: iTCO_wdt iTCO_vendor_support lpc_ich
mfd_core lrw xts gf128mul ablk_helper cryptd aes_x86_64 sha256_generic
btrfs libcrc32c
[44507.850753] Pid: 15719, comm: umount Tainted: GW
3.7.0-btrfs-next+ #1
[44507.850756] Call Trace:
[44507.850766]  [810364da] warn_slowpath_common+0x74/0xa2
[44507.850770]  [81036522] warn_slowpath_null+0x1a/0x1c
[44507.850787]  [a0041e0e] btrfs_destroy_inode+0x231/0x2c4 [btrfs]
[44507.850793]  [81141670] destroy_inode+0x3c/0x5f
[44507.850797]  [811417b5] evict+0x122/0x1ac
[44507.850800]  [81142016] iput+0xed/0x169
[44507.850816]  [a0038c18] btrfs_run_delayed_iputs+0xd6/0xf6 [btrfs]
[44507.850831]  [a002db75] btrfs_commit_super+0x2c/0xfd [btrfs]
[44507.850845]  [a002f289] close_ctree+0x2c1/0x300 [btrfs]
[44507.850850]  [811424c9] ? evict_inodes+0x106/0x115
[44507.850861]  [a00070b1] btrfs_put_super+0x19/0x1b [btrfs]
[44507.850866]  [8112b321] generic_shutdown_super+0x5b/0xdc
[44507.850869]  [8112b424] kill_anon_super+0x16/0x24
[44507.850880]  [a000ad98] btrfs_kill_super+0x1a/0x8f [btrfs]
[44507.850884]  [8112b647] deactivate_locked_super+0x33/0x6c
[44507.850887]  [8112c25f] deactivate_super+0x4e/0x66
[44507.850892]  [81145e64] mntput_no_expire+0xf7/0x14d
[44507.850896]  [81146ced] sys_umount+0x63/0x37a
[44507.850901]  [8183e642] system_call_fastpath+0x16/0x1b
[44507.850905] ---[ end trace ba14fbf3de68a237 ]---
[44507.850907] [ cut here ]
[44507.850924] WARNING: at fs/btrfs/inode.c:7756
btrfs_destroy_inode+0x2b9/0x2c4 [btrfs]()
[44507.850927] Hardware name: OptiPlex 745
[44507.850930] Modules linked in: iTCO_wdt iTCO_vendor_support lpc_ich
mfd_core lrw xts gf128mul ablk_helper cryptd aes_x86_64 sha256_generic
btrfs libcrc32c
[44507.850947] Pid: 15719, comm: umount Tainted: GW
3.7.0-btrfs-next+ #1
[44507.850949] Call Trace:
[44507.850956]  [810364da] warn_slowpath_common+0x74/0xa2
[44507.850961]  [81036522] warn_slowpath_null+0x1a/0x1c
[44507.850978]  [a0041e96] btrfs_destroy_inode+0x2b9/0x2c4 [btrfs]
[44507.850982]  [81141670] destroy_inode+0x3c/0x5f
[44507.850986]  [811417b5] evict+0x122/0x1ac
[44507.850990]  [81142016] iput+0xed/0x169
[44507.851003]  [a0038c18] btrfs_run_delayed_iputs+0xd6/0xf6 [btrfs]
[44507.851033]  [a002db75] btrfs_commit_super+0x2c/0xfd [btrfs]
[44507.851048]  [a002f289] close_ctree+0x2c1/0x300 [btrfs]
[44507.851052]  [811424c9] ? evict_inodes+0x106/0x115
[44507.851063]  [a00070b1] btrfs_put_super+0x19/0x1b [btrfs]
[44507.851066]  [8112b321] generic_shutdown_super+0x5b/0xdc
[44507.851070]  [8112b424] kill_anon_super+0x16/0x24
[44507.851080]  [a000ad98] btrfs_kill_super+0x1a/0x8f [btrfs]
[44507.851084]  [8112b647] deactivate_locked_super+0x33/0x6c
[44507.851087]  [8112c25f] deactivate_super+0x4e/0x66
[44507.851091]  [81145e64] mntput_no_expire+0xf7/0x14d
[44507.851095]  [81146ced] sys_umount+0x63/0x37a
[44507.851099]  [8183e642] system_call_fastpath+0x16/0x1b
[44507.851101] ---[ end trace ba14fbf3de68a238 ]---
[44507.851104] [ cut here ]
[44507.851121] WARNING: at fs/btrfs/inode.c:7758
btrfs_destroy_inode+0x28d/0x2c4 [btrfs]()
[44507.851123] Hardware name: OptiPlex 745
[44507.851124] Modules linked in: iTCO_wdt iTCO_vendor_support lpc_ich
mfd_core lrw xts gf128mul ablk_helper cryptd aes_x86_64 sha256_generic
btrfs libcrc32c
[44507.851140] Pid: 15719, comm: umount Tainted: GW
3.7.0-btrfs-next+ #1
[44507.851142] Call Trace:
[44507.851148]  [810364da] warn_slowpath_common+0x74/0xa2
[44507.851152]  [81036522] warn_slowpath_null+0x1a/0x1c
[44507.851168]  [a0041e6a] btrfs_destroy_inode+0x28d/0x2c4 [btrfs]
[44507.851172]  [81141670] destroy_inode+0x3c/0x5f
[44507.851176]  [811417b5] evict+0x122/0x1ac
[44507.851180]  [81142016] iput+0xed/0x169
[44507.851195]  [a0038c18] btrfs_run_delayed_iputs+0xd6/0xf6 [btrfs]
[44507.851209]  [a002db75] btrfs_commit_super+0x2c/0xfd [btrfs]
[44507.851223]  [a002f289] 

Re: segmentation-fault in btrfsck (git-version)

2012-12-09 Thread Mitch Harder
On Sun, Dec 9, 2012 at 1:06 PM, Hendrik Friedel hend...@friedels.name wrote:
 Dear Mich,

 thanks for your help and suggestion:

 It might be interesting for you to try a newer kernel, and use scrub
 on this volume if you have the two disks RAIDed.

 I have now scrubbed the Disk:
 ./btrfs scrub status /mnt/other/
 scrub status for a15eede9-1a92-47d8-940a-adc7cf97352d
 scrub started at Sun Dec  9 13:48:57 2012 and finished after 3372
 seconds
 total bytes scrubbed: 1.10TB with 0 errors


 That's odd, as in one folder, data is missing (I could have deleted it, but
 I'd be very surprised...)

 Also, when I run btrfsck, I get errors:
 On sdc1:
 root 261 inode 64370 errors 400
 root 261 inode 64373 errors 400

 root 261 inode 64375 errors 400
 root 261 inode 64376 errors 400
 found 1203899371520 bytes used err is 1
 total csum bytes: 1173983136
 total tree bytes: 1740640256
 total fs tree bytes: 280260608
 btree space waste bytes: 212383383
 file data blocks allocated: 28032005304320
  referenced 1190305632256
 Btrfs v0.20-rc1-37-g91d9eec

 On sdb1:
 root 261 inode 64373 errors 400

 root 261 inode 64375 errors 400
 root 261 inode 64376 errors 400
 found 1203899371520 bytes used err is 1
 total csum bytes: 1173983136
 total tree bytes: 1740640256
 total fs tree bytes: 280260608
 btree space waste bytes: 212383383
 file data blocks allocated: 28032005304320
  referenced 1190305632256
 Btrfs v0.20-rc1-37-g91d9eec



 And when I try to mount one of the two raided disks, I get:
 [ 1173.773861] device fsid a15eede9-1a92-47d8-940a-adc7cf97352d devid 1
 transid 140194 /dev/sdb1
 [ 1173.774695] btrfs: failed to read the system array on sdb1
 [ 1173.774854] btrfs: open_ctree failed

 while the other works:
 [ 1177.927096] device fsid a15eede9-1a92-47d8-940a-adc7cf97352d devid 2
 transid 140194 /dev/sdc1

 Do you have hints for me?
 The Kernel now is 3.3.7-030307-generic (anything more recent, I would have
 to compile myself, which I will do, if you suggest to)


Since btrfs has significant improvements and fixes in each kernel
release, and since very few of these changes are backported, it is
recommended to use the latest kernels available.

The root ### inode # errors 400 are an indication that there is
an inconsistency in the inode size.  There was a patch included in the
3.1 or 3.2 kernel to address this issue
(http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commit;h=f70a9a6b94af86fca069a7552ab672c31b457786).
 But I don't believe this patch fixed existing occurrences of this
error.

At this point, the quickest solution for you may be to rebuild and
reformat this RAID assembly, and restore this data from backups.

If you don't have a backup of this data, and since your array seems to
be working pretty well in a degraded state, this would be a really
good time to look at a strategy of getting a backup of this data
before doing many more attempts at rescue.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: segmentation-fault in btrfsck (git-version)

2012-12-06 Thread Mitch Harder
On Wed, Dec 5, 2012 at 2:50 PM, Hendrik Friedel hend...@friedels.name wrote:
 Dear all,

 thanks for developing btrfsck!
 Now, I'd like to contribute -as far as I can. I'm not a developer, but I do
 have some linux-experience.
 I've been using btrfsck on two 3TB HDDs (mirrored) for a while now under
 Kernel 3.0. Now it's corrupt. I had some hard resets of the machine -which
 might have contributed. I do have a backup of the data -at least of the
 important stuff. Some TV-Recordings are missing. The reason I am writing is,
 to support the development.

 Unfortunately, btrfsck (latest git-version) crashes with a segmentation
 fault, when trying to repair this.

 Here's the backtrace:
 root 261 inode 64375 errors 400
 root 261 inode 64376 errors 400
 btrfsck: disk-io.c:382: __commit_transaction: Assertion `!(!eb || eb-start
 != start)' failed.

 Program received signal SIGABRT, Aborted.
 0x7784c425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
 (gdb)
 (gdb) backtrace
 #0  0x7784c425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
 #1  0x7784fb8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
 #2  0x778450ee in ?? () from /lib/x86_64-linux-gnu/libc.so.6
 #3  0x77845192 in __assert_fail () from
 /lib/x86_64-linux-gnu/libc.so.6
 #4  0x0040d3ae in __commit_transaction (trans=0x62e010,
 root=0xb66ae0) at disk-io.c:382
 #5  0x0040d4d8 in btrfs_commit_transaction (trans=0x62e010,
 root=0xb66ae0) at disk-io.c:415
 #6  0x0040743d in main (ac=optimized out, av=optimized out) at
 btrfsck.c:3587


 Now, here's where my debugging knowledge ends. Are you interested in
 debugging this further, or is it a known bug?


Line 382 in disk-io.c is:

BUG_ON(!eb || eb-start != start);

So, basically, btrfsck is intentionally crashing because it doesn't
know how to handle this condition.

Future refinements of btrfsck will probably include proper error
messages for issues that can't be handled, or perhaps even fix the
error.

It might be interesting for you to try a newer kernel, and use scrub
on this volume if you have the two disks RAIDed.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Btrfs Slow Down (Metadata Starvation?)

2012-11-08 Thread Mitch Harder
One of my Btrfs partitions ran into a severe slowdown recently.
Operations that would normally complete in 20-30 seconds were now
requiring hours.

There were no errors or warnings in dmesg (Alt-SysRq-W is below, but
shows nothing out of the ordinary).  And if I took the partition
offline, it would pass btrfsck without error.  So far, I've found not
indications of corruption.

The kernel is a version 3.6.6 merged with the for-linus branch for
3.7.  I usually mount with compress-force-lzo, but no autodefrag or
other options.

The symptoms were consistent with some kind of corner case metadata starvation.

While under pressure, my 'btrfs df' would show something like the following:

# btrfs fi df /mnt/sabayon9/
Data: total=7.00GB, used=6.00GB
System: total=4.00MB, used=4.00KB
Metadata: total=768.00MB, used=737.65MB

For some reason, btrfs was not allocating any additional metadata space.

The partition is 25 GB, and not very full:
/dev/sda2  btrfs 25165824   7047816  18082844  29% /mnt/sabayon9

When I rebooted into a 3.4 kernel (which is merged with the Btrfs code
for 3.5), the slow down cleared after I mounted the partition, and
triggered an allocation of metadata up to 1 GB.

I would note that I tried my 3.5 vintage kernel (which is merged with
the Btrfs code for 3.6), and was unable to clear the issue.  This
tends to strengthen my suspicion that this is some kind of corner case
since this code has been out there for a while now.

Currently, I'm showing something like this with 'btrfs df'

# btrfs fi df /mnt/sabayon9/
Data: total=8.00GB, used=5.97GB
System: total=4.00MB, used=4.00KB
Metadata: total=1.00GB, used=722.65MB

Now, everything is operating normally in my 3.6.6 kernel.

I've saved an image of the partition in it's 'slow-down' condition in
case it becomes desirable to test something in that condition.

I'm including my dmesg output of an Alt-SysRq-W operation, but I don't
see anything useful there.

[18697.498504] SysRq : Show Blocked State
[18697.498510]   taskPC stack   pid father
[18697.498551] btrfs-submit-1  D 0210 0  4236  2 0x
[18697.498556]  880123d53b70 0046 8801231908d0
880123d53fd8
[18697.498560]  4000 00012700 88012aaf4380
880124269680
[18697.498563]  0006 880125e18000 880125e18000

[18697.498567] Call Trace:
[18697.498576]  [812d8750] ? __blk_run_queue+0x1e/0x20
[18697.498580]  [812db573] ? queue_unplugged+0x83/0x99
[18697.498585]  [8161dcf4] schedule+0x64/0x66
[18697.498588]  [8161dd85] io_schedule+0x8f/0xce
[18697.498591]  [812dd520] get_request+0x559/0x5b0
[18697.498596]  [8104c79b] ? abort_exclusive_wait+0x8e/0x8e
[18697.498599]  [812de7de] blk_queue_bio+0x1b7/0x315
[18697.498602]  [812dcbc3] generic_make_request+0x9f/0xe1
[18697.498605]  [812dcce9] submit_bio+0xe4/0x103
[18697.498640]  [a006a145] run_scheduled_bios+0x28c/0x428 [btrfs]
[18697.498660]  [a006a2f6] pending_bios_fn+0x15/0x17 [btrfs]
[18697.498679]  [a0071840] worker_loop+0x15f/0x497 [btrfs]
[18697.498698]  [a00716e1] ? btrfs_queue_worker+0x272/0x272 [btrfs]
[18697.498702]  [8104c072] kthread+0x8b/0x93
[18697.498707]  [816205b4] kernel_thread_helper+0x4/0x10
[18697.498710]  [8104bfe7] ? kthread_freezable_should_stop+0x57/0x57
[18697.498714]  [816205b0] ? gs_change+0xb/0xb
[18697.498718] btrfs-transacti D 0002 0  4248  2 0x
[18697.498721]  880123d83b70 0046 
880123d83fd8
[18697.498725]  4000 00012700 88012aaf4380
880123d79680
[18697.498728]  a0049342 880123d83bf0 880123d83b10
810c9ee4
[18697.498732] Call Trace:
[18697.498748]  [a0049342] ? check_leaf+0x2d4/0x2d4 [btrfs]
[18697.498753]  [810c9ee4] ? release_pages+0x1b2/0x1c1
[18697.498772]  [a0063068] ? submit_one_bio+0x8a/0x94 [btrfs]
[18697.498776]  [8106c8db] ? ktime_get_ts+0x56/0xbc
[18697.498780]  [8109c171] ? delayacct_end+0x79/0x84
[18697.498784]  [810bf744] ? __lock_page+0x68/0x68
[18697.498787]  [8161dcf4] schedule+0x64/0x66
[18697.498790]  [8161dd85] io_schedule+0x8f/0xce
[18697.498793]  [810bf752] sleep_on_page+0xe/0x12
[18697.498796]  [8161c436] __wait_on_bit+0x48/0x7b
[18697.498799]  [810bf4b9] ? find_get_pages_tag+0xf4/0x130
[18697.498803]  [810bf97d] wait_on_page_bit+0x72/0x74
[18697.498806]  [8104c7d3] ? autoremove_wake_function+0x38/0x38
[18697.498810]  [810bfa4c] filemap_fdatawait_range+0x87/0x13e
[18697.498829]  [a0063c09] ? free_extent_state+0x7d/0x85 [btrfs]
[18697.498849]  [a0064631] ? clear_extent_bit+0x272/0x2aa [btrfs]
[18697.498866]  [a004f696] btrfs_wait_marked_extents+0x7d/0xce [btrfs]
[18697.498884]  

Re: Why btrfs inline small file by default?

2012-10-30 Thread Mitch Harder
On Tue, Oct 30, 2012 at 6:04 AM, ching lschin...@gmail.com wrote:
 Hi all,

 I am testing my btrfs root partition with max_inline=0, and 64k leaf size 
 for weeks and it seems that it is fine.


 AFAIK btrfs inline small files into metadata by default, I am curious why?

 If there is only a few small files, then there will be neither effect nor 
 benefit at all
 If there is a lot of small files, then the size of metadata will be 
 undesirable due to deduplication

 there are also some email threads related to problem of metadata inline (i 
 don't know whether they are fixed in recent kernel):
 http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg16295.html
 http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg05265.html

 How about turning off inline so that btrfs works better out of the box?

 ching


I did some rough benchmarking around this a few weeks ago.  I'll try
to clean up my method and post the results.

I was working with multiple copies and rsyncs of kernel sources, which
have many candidate files for inlining.

To my surprise, my btrfs benchmarks were always the same or faster
when I let btrfs inline the files, even though metadata was much
larger.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2 v4] Btrfs: snapshot-aware defrag

2012-10-30 Thread Mitch Harder
On Mon, Oct 29, 2012 at 8:20 PM, Liu Bo bo.li@oracle.com wrote:
 On 10/30/2012 04:06 AM, Mitch Harder wrote:
 On Sat, Oct 27, 2012 at 5:28 AM, Liu Bo bo.li@oracle.com wrote:
 This comes from one of btrfs's project ideas,
 As we defragment files, we break any sharing from other snapshots.
 The balancing code will preserve the sharing, and defrag needs to grow this
 as well.

 Now we're able to fill the blank with this patch, in which we make full use 
 of
 backref walking stuff.

 Here is the basic idea,
 o  set the writeback ranges started by defragment with flag EXTENT_DEFRAG
 o  at endio, after we finish updating fs tree, we use backref walking to 
 find
all parents of the ranges and re-link them with the new COWed file 
 layout by
adding corresponding backrefs.

 Originally patch by Li Zefan l...@cn.fujitsu.com
 Signed-off-by: Liu Bo bo.li@oracle.com
 ---
 v3-v4:
   - fix duplicated refs bugs detected by mounting with autodefrag, 
 thanks
 for the bug report from Mitch and Chris.


 I'm picking up many WARN_ON messages while testing this patch.

 I'm testing a snapshot script that uses kernel git sources along with
 some git manipulations.

 The kernel is a 3.6.4 kernel merged with the latest for-linus branch.

 I mounted with -o compress-force=lzo,autodefrag.

 I also have the second patch in this set (Btrfs: make snapshot-aware
 defrag as a mount option).  However, I did not mount with
 'snap_aware_defrag'.

 I did not find any corrupted data, and the partition passes a btrfsck
 without error after these warnings were observed.


 Hi Mitch,

 Well, good report, but I don't think it has anything to do with this 
 patch(since you
 didn't mount with 'snap_aware_defrag' :)


I've re-run my my testing script with a combination of no compression
and lzo compression, combined with no further options, only -o
autodefrag, and -o autodefrag,snap_aware_defrag.

I only get the WARN_ONs when I run with autodefrag only (no snap_aware_defrag).

My logs are clean when I avoid all defrag options, or use both
autodefrag and snap_aware_defrag.

 After going through the below messages, the bug comes from the space side 
 where we
 must have mis-used our reservation somehow.

 So can you show me your script so that I can give it a shot to reproduce 
 locally?

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2 v4] Btrfs: snapshot-aware defrag

2012-10-29 Thread Mitch Harder
On Sat, Oct 27, 2012 at 5:28 AM, Liu Bo bo.li@oracle.com wrote:
 This comes from one of btrfs's project ideas,
 As we defragment files, we break any sharing from other snapshots.
 The balancing code will preserve the sharing, and defrag needs to grow this
 as well.

 Now we're able to fill the blank with this patch, in which we make full use of
 backref walking stuff.

 Here is the basic idea,
 o  set the writeback ranges started by defragment with flag EXTENT_DEFRAG
 o  at endio, after we finish updating fs tree, we use backref walking to find
all parents of the ranges and re-link them with the new COWed file layout 
 by
adding corresponding backrefs.

 Originally patch by Li Zefan l...@cn.fujitsu.com
 Signed-off-by: Liu Bo bo.li@oracle.com
 ---
 v3-v4:
   - fix duplicated refs bugs detected by mounting with autodefrag, thanks
 for the bug report from Mitch and Chris.


I'm picking up many WARN_ON messages while testing this patch.

I'm testing a snapshot script that uses kernel git sources along with
some git manipulations.

The kernel is a 3.6.4 kernel merged with the latest for-linus branch.

I mounted with -o compress-force=lzo,autodefrag.

I also have the second patch in this set (Btrfs: make snapshot-aware
defrag as a mount option).  However, I did not mount with
'snap_aware_defrag'.

I did not find any corrupted data, and the partition passes a btrfsck
without error after these warnings were observed.

Here's a summary of the WARN_ON messages:

$ cat local/dmesg-3.6.4-x+ | grep WARNING:
[  610.407561] WARNING: at fs/btrfs/inode.c:7779
btrfs_destroy_inode+0x2ac/0x2e6 [btrfs]()
[  610.407757] WARNING: at fs/btrfs/inode.c:7780
btrfs_destroy_inode+0x296/0x2e6 [btrfs]()
[  610.407929] WARNING: at fs/btrfs/inode.c:7782
btrfs_destroy_inode+0x26a/0x2e6 [btrfs]()
[  661.211849] WARNING: at fs/btrfs/inode.c:7779
btrfs_destroy_inode+0x2ac/0x2e6 [btrfs]()
[  661.212004] WARNING: at fs/btrfs/inode.c:7780
btrfs_destroy_inode+0x296/0x2e6 [btrfs]()
[  661.212236] WARNING: at fs/btrfs/inode.c:7782
btrfs_destroy_inode+0x26a/0x2e6 [btrfs]()
[  719.882942] WARNING: at fs/btrfs/inode.c:7779
btrfs_destroy_inode+0x2ac/0x2e6 [btrfs]()
[  719.883112] WARNING: at fs/btrfs/inode.c:7780
btrfs_destroy_inode+0x296/0x2e6 [btrfs]()
[  719.883232] WARNING: at fs/btrfs/inode.c:7782
btrfs_destroy_inode+0x26a/0x2e6 [btrfs]()
[  786.978869] WARNING: at fs/btrfs/inode.c:7779
btrfs_destroy_inode+0x2ac/0x2e6 [btrfs]()
[  786.979003] WARNING: at fs/btrfs/inode.c:7780
btrfs_destroy_inode+0x296/0x2e6 [btrfs]()
[  786.979140] WARNING: at fs/btrfs/inode.c:7782
btrfs_destroy_inode+0x26a/0x2e6 [btrfs]()
[  845.605176] WARNING: at fs/btrfs/inode.c:7779
btrfs_destroy_inode+0x2ac/0x2e6 [btrfs]()
[  845.605323] WARNING: at fs/btrfs/inode.c:7780
btrfs_destroy_inode+0x296/0x2e6 [btrfs]()
[  845.605445] WARNING: at fs/btrfs/inode.c:7782
btrfs_destroy_inode+0x26a/0x2e6 [btrfs]()
[  912.300307] WARNING: at fs/btrfs/inode.c:7779
btrfs_destroy_inode+0x2ac/0x2e6 [btrfs]()
[  912.300454] WARNING: at fs/btrfs/inode.c:7780
btrfs_destroy_inode+0x296/0x2e6 [btrfs]()
[  912.300577] WARNING: at fs/btrfs/inode.c:7782
btrfs_destroy_inode+0x26a/0x2e6 [btrfs]()
[  968.835873] WARNING: at fs/btrfs/inode.c:7779
btrfs_destroy_inode+0x2ac/0x2e6 [btrfs]()
[  968.836032] WARNING: at fs/btrfs/inode.c:7780
btrfs_destroy_inode+0x296/0x2e6 [btrfs]()
[  968.836156] WARNING: at fs/btrfs/inode.c:7782
btrfs_destroy_inode+0x26a/0x2e6 [btrfs]()
[ 1023.778160] WARNING: at fs/btrfs/inode.c:7779
btrfs_destroy_inode+0x2ac/0x2e6 [btrfs]()
[ 1023.778316] WARNING: at fs/btrfs/inode.c:7780
btrfs_destroy_inode+0x296/0x2e6 [btrfs]()
[ 1023.778435] WARNING: at fs/btrfs/inode.c:7782
btrfs_destroy_inode+0x26a/0x2e6 [btrfs]()
[ 1064.342768] WARNING: at fs/btrfs/inode.c:7779
btrfs_destroy_inode+0x2ac/0x2e6 [btrfs]()
[ 1064.342914] WARNING: at fs/btrfs/inode.c:7780
btrfs_destroy_inode+0x296/0x2e6 [btrfs]()
[ 1064.343112] WARNING: at fs/btrfs/inode.c:7782
btrfs_destroy_inode+0x26a/0x2e6 [btrfs]()
[ 1177.892047] WARNING: at fs/btrfs/inode.c:7779
btrfs_destroy_inode+0x2ac/0x2e6 [btrfs]()
[ 1177.892189] WARNING: at fs/btrfs/inode.c:7780
btrfs_destroy_inode+0x296/0x2e6 [btrfs]()
[ 1177.892312] WARNING: at fs/btrfs/inode.c:7782
btrfs_destroy_inode+0x26a/0x2e6 [btrfs]()
[ 1281.951715] WARNING: at fs/btrfs/inode.c:7779
btrfs_destroy_inode+0x2ac/0x2e6 [btrfs]()
[ 1281.951857] WARNING: at fs/btrfs/inode.c:7780
btrfs_destroy_inode+0x296/0x2e6 [btrfs]()
[ 1281.951978] WARNING: at fs/btrfs/inode.c:7782
btrfs_destroy_inode+0x26a/0x2e6 [btrfs]()
[ 1282.804376] WARNING: at fs/btrfs/inode.c:7779
btrfs_destroy_inode+0x2ac/0x2e6 [btrfs]()
[ 1282.804524] WARNING: at fs/btrfs/inode.c:7780
btrfs_destroy_inode+0x296/0x2e6 [btrfs]()
[ 1282.804645] WARNING: at fs/btrfs/inode.c:7782
btrfs_destroy_inode+0x26a/0x2e6 [btrfs]()
[ 1351.187114] WARNING: at fs/btrfs/inode.c:7779
btrfs_destroy_inode+0x2ac/0x2e6 [btrfs]()
[ 1351.187263] WARNING: at fs/btrfs/inode.c:7780
btrfs_destroy_inode+0x296/0x2e6 

Re: block rsv returned -28 during balance

2012-10-10 Thread Mitch Harder
On Mon, Oct 1, 2012 at 1:28 AM, Roman Mamedov r...@romanrm.ru wrote:
 Hello,

 On a 3.6.0-rc7 kernel, I launched:

   # btrfs fi balance start -f -mconvert=single /mnt/tmp/

 Current situation:

 # df -h /mnt/tmp/
 Filesystem Size  Used Avail Use% Mounted on
 /dev/mapper/alpha-lv1  3.6T  2.7T  801G  78% /mnt/tmp

 # btrfs fi df /mnt/tmp/
 Data: total=3.00TB, used=2.66TB
 System: total=4.00MB, used=364.00KB
 Metadata, DUP: total=11.00GB, used=5.72GB
 Metadata: total=63.00GB, used=0.00

 There seems to be plenty of free space, but the balance seems to have stalled
 and the dmesg is being filled with messages like this:

 [ 2926.465406] btrfs: block rsv returned -28
 [ 2926.465411] [ cut here ]
 [ 2926.465446] WARNING: at /home/apw/COD/linux/fs/btrfs/extent-tree.c:6323 
 use_block_rsv+0x19f/0x1b0 [btrfs]()
 [ 2926.465450] Hardware name: VirtualBox
 [ 2926.465452] Modules linked in: joydev microcode parport_pc hid_generic 
 parport psmouse serio_raw pcspkr i2c_piix4 mac_hid xfs btrfs libcrc32c 
 zlib_deflate raid456 async_raid6_recov async_memcpy async_pq async_xor xor 
 async_tx raid6_pq usbhid hid e1000
 [ 2926.465517] Pid: 4682, comm: btrfs Tainted: GW
 3.6.0-030600rc7-generic #201209232235

I've just run into the same issue running a balance.  My kernel is a
3.6.1 kernel merged with the latest for-linus branch.  The dmesg log
is full of warnings, and the balance appears stuck.

Looking at the results of 'btrfs fi df ..., it almost seems like
btrfs is unable to allocate any more metadata space, even though there
is some space available.

# btrfs fi df /mnt/sabayon8/
Data: total=7.14GB, used=6.08GB
System: total=4.00MB, used=4.00KB
Metadata: total=1.00GB, used=973.32MB

# df -T /mnt/sabayon8/
Filesystem Type  1K-blocksUsed Available Use% Mounted on
/dev/sdb5  btrfs  10008460 7373064   2537536  75% /mnt/sabayon8


For reference here's an example of the dmesg warning:

[ 4070.726429] btrfs: block rsv returned -28
[ 4070.726431] [ cut here ]
[ 4070.726455] WARNING: at fs/btrfs/extent-tree.c:6359
btrfs_alloc_free_block+0x4ee/0x500 [btrfs]()
[ 4070.726531] Hardware name:
[ 4070.726533] Modules linked in: nvidia(PO) nvidia_agp xts gf128mul
ablk_helper cryptd sha256_generic btrfs libcrc32c xhci_hcd
[ 4070.726543] Pid: 8717, comm: btrfs Tainted: PW  O 3.6.1-git-local+ #1
[ 4070.726545] Call Trace:
[ 4070.726552]  [c1029952] warn_slowpath_common+0x72/0xa0
[ 4070.726569]  [f85d38de] ? btrfs_alloc_free_block+0x4ee/0x500 [btrfs]
[ 4070.726585]  [f85d38de] ? btrfs_alloc_free_block+0x4ee/0x500 [btrfs]
[ 4070.726590]  [c10299a2] warn_slowpath_null+0x22/0x30
[ 4070.726606]  [f85d38de] btrfs_alloc_free_block+0x4ee/0x500 [btrfs]
[ 4070.726628]  [f8603c0c] ? read_extent_buffer+0x9c/0x100 [btrfs]
[ 4070.726643]  [f85c0394] __btrfs_cow_block+0x144/0x590 [btrfs]
[ 4070.726731]  [f85ddb65] ? verify_parent_transid+0x55/0x1c0 [btrfs]
[ 4070.726746]  [f85c08b9] ? btrfs_cow_block+0xd9/0x230 [btrfs]
[ 4070.726765]  [f85fc799] ? mark_extent_buffer_accessed+0x59/0x70 [btrfs]
[ 4070.726780]  [f85c08b9] btrfs_cow_block+0xd9/0x230 [btrfs]
[ 4070.726801]  [f862815a] do_relocation+0x42a/0x4d0 [btrfs]
[ 4070.726818]  [f85d00fb] ? btrfs_block_rsv_add+0x6b/0x80 [btrfs]
[ 4070.726838]  [f862bb9a] relocate_tree_blocks+0x3fa/0x5a0 [btrfs]
[ 4070.726929]  [f862ca42] relocate_block_group+0x212/0x670 [btrfs]
[ 4070.726950]  [f862d030] btrfs_relocate_block_group+0x190/0x2e0 [btrfs]
[ 4070.726969]  [f8605a57] btrfs_relocate_chunk.isra.54+0x57/0x690 [btrfs]
[ 4070.726989]  [f85fa351] ? btrfs_get_token_64+0x61/0x100 [btrfs]
[ 4070.727008]  [f8602bf6] ? free_extent_buffer+0x26/0x70 [btrfs]
[ 4070.727045]  [f860ad41] btrfs_balance+0x9b1/0xf40 [btrfs]
[ 4070.727051]  [c12f14ae] ? cred_has_capability+0x7e/0xf0
[ 4070.727071]  [f861254b] btrfs_ioctl_balance+0xcb/0x330 [btrfs]
[ 4070.727163]  [f8614777] btrfs_ioctl+0x907/0x1840 [btrfs]
[ 4070.727168]  [c10bcd22] ? lru_cache_add_lru+0x22/0x40
[ 4070.727172]  [c10cf00f] ? handle_pte_fault+0x42f/0x5c0
[ 4070.727192]  [f8613e70] ? update_ioctl_balance_args+0x240/0x240 [btrfs]
[ 4070.727197]  [c10f5cf2] do_vfs_ioctl+0x82/0x570
[ 4070.727202]  [c12f1dba] ? inode_has_perm.isra.42.constprop.68+0x3a/0x50
[ 4070.727206]  [c12f43c6] ? selinux_file_ioctl+0x46/0xe0
[ 4070.727209]  [c10f624f] sys_ioctl+0x6f/0x80
[ 4070.727214]  [c176e693] sysenter_do_call+0x12/0x22
[ 4070.727217] ---[ end trace b39bb21a5ae11cb1 ]---
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Two Issues with Btrfs Delayed Cleaner Process (linux-next)

2012-10-08 Thread Mitch Harder
I've run across two issues with the delayed cleaner process running a
kernel based on the 3.6.0 btrfs-next branch in Josef's git repository.

(1)  I'm getting an error when trying to list my subvolumes whenever
the cleaner thread is running:

# btrfs su li /mnt/benchmark/
ERROR: Failed to lookup path for root 0 - No such file or directory

As long as the cleaner thread is idle, I can run this command without error.

(2)  I ran into an issue on a slower x86 machine (AMD Athlon XP 2600+)
where the cleaner thread literally required an hour to finish deleting
a subvolume that contained the sources for a kernel I had previously
built.

The machine was responsive the whole time, and the cleaner thread
never required much more than 5-10% of the CPU, leaving ample idle
time.

Interestingly, every attempt to replicate this behaviour resulted in
the cleaner thread finishing in a few seconds.

My first issue replicates every time the cleaner thread is running.

I'll need to work on the second issue for a while to see if I can get
it to replicate.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2 v3] Btrfs: snapshot-aware defrag

2012-10-04 Thread Mitch Harder
On Thu, Oct 4, 2012 at 9:22 AM, Liu Bo bo.li@oracle.com wrote:
 On 10/03/2012 10:02 PM, Chris Mason wrote:
 On Tue, Sep 25, 2012 at 07:07:53PM -0600, Liu Bo wrote:
 On 09/26/2012 01:39 AM, Mitch Harder wrote:
 On Mon, Sep 17, 2012 at 4:58 AM, Liu Bo bo.li@oracle.com wrote:
 This comes from one of btrfs's project ideas,
 As we defragment files, we break any sharing from other snapshots.
 The balancing code will preserve the sharing, and defrag needs to grow 
 this
 as well.

 Now we're able to fill the blank with this patch, in which we make full 
 use of
 backref walking stuff.

 Here is the basic idea,
 o  set the writeback ranges started by defragment with flag EXTENT_DEFRAG
 o  at endio, after we finish updating fs tree, we use backref walking to 
 find
all parents of the ranges and re-link them with the new COWed file 
 layout by
adding corresponding backrefs.

 Originally patch by Li Zefan l...@cn.fujitsu.com
 Signed-off-by: Liu Bo bo.li@oracle.com

 I'm hitting the WARN_ON in record_extent_backrefs() indicating a
 problem with the return value from iterate_inodes_from_logical().

 Me too.  It triggers reliably with mount -o autodefrag, and then crashes
 a in the next function ;)

 -chris


 Good news, I'm starting hitting the crash (a NULL pointer crash) ;)

 thanks,
 liubo

I'm also starting to hit this crash while balancing a test partition.

I guess this isn't surprising since both autodefrag and balancing make
use of relocation.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tree root

2012-10-03 Thread Mitch Harder
On Wed, Oct 3, 2012 at 11:35 AM, Øystein Sættem Middelthun
oyst...@middelthun.no wrote:
 Hi!

 I have a broken btrfs unable to mount because it is unable to find the tree
 root. Using find-root I find the following:

 Well block 14102764707840 seems great, but generation doesn't match,
 have=109268, want=109269

 Because the filesystem was last in use with a pre 3.2-kernel I am unable to
 use mount -o recovery, but restore seems to work when I specify the previous
 tree-root. My problem is however that the btrfs is so large I have nowhere
 to temporarily put all the files. I am currently running kernel 3.5. Does
 mount have an option to manually tell it to use the tree root at block
 14102764707840?


If you do not have a suitable backup for these files, please make an
effort to do what you can with restore.  Some of the repair methods
out there have a possibility to make the situation worse.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tree root

2012-10-03 Thread Mitch Harder
On Wed, Oct 3, 2012 at 5:11 PM, Øystein Sættem Middelthun
oyst...@middelthun.no wrote:
 On 10/03/2012 07:29 PM, Mitch Harder wrote:

 If you do not have a suitable backup for these files, please make an
 effort to do what you can with restore.  Some of the repair methods
 out there have a possibility to make the situation worse.


 We are talking about something like 50TB, so there is just no way I have the
 available space on other disks for temporary storage.

 So in effect you are saying that there are no other available options than a
 restore? If I understand correctly a feature along the lines of mount -o
 tree_root=14102764707840 /dev/ /path/ would solve my problem.

 The fs is unmountable because of a temporary loss of connection with an
 underlying disk controller, and I don't think the device has a lot of errors
 besides not being able to find the latest tree root.


You should probably try to supply some more information about your situation.

Was this btrfs volume build with RAID-1?

If so, we should be able to mount in degraded mode.

Even so, when I see the words unable to find the tree root and
temporary loss of connection with an underlying disk controller
along with the implication that you have no reliable backup of this
data, I worry that your situation is potentially precarious.

The possibility exists that recovering your data is your best option
(as opposed to restoring to previous working condition).

Using backup tree-roots and super-blocks has the potential to do
irreversible damage.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: Remove orphaned comment.

2012-10-02 Thread Mitch Harder
Remove a comment that was orphaned by a previous commit which
removed the function associated with the comment.

See commit efd049fb26a162c3830fd3cb1001fdc09b147f3b

This left the comment in a confusing context that seemed to be
associated with another function.

Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org
---
 fs/btrfs/inode.c |6 --
 1 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2c785c0..93e1351 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2441,12 +2441,6 @@ out_kfree:
return NULL;
 }
 
-/*
- * helper function for btrfs_finish_ordered_io, this
- * just reads in some of the csum leaves to prime them into ram
- * before we start the transaction.  It limits the amount of btree
- * reads required while inside the transaction.
- */
 /* as ordered data IO finishes, this gets called so we can finish
  * an ordered extent if the range of bytes in the file it covers are
  * fully written.
-- 
1.7.8.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2 v3] Btrfs: snapshot-aware defrag

2012-09-25 Thread Mitch Harder
On Mon, Sep 17, 2012 at 4:58 AM, Liu Bo bo.li@oracle.com wrote:
 This comes from one of btrfs's project ideas,
 As we defragment files, we break any sharing from other snapshots.
 The balancing code will preserve the sharing, and defrag needs to grow this
 as well.

 Now we're able to fill the blank with this patch, in which we make full use of
 backref walking stuff.

 Here is the basic idea,
 o  set the writeback ranges started by defragment with flag EXTENT_DEFRAG
 o  at endio, after we finish updating fs tree, we use backref walking to find
all parents of the ranges and re-link them with the new COWed file layout 
 by
adding corresponding backrefs.

 Originally patch by Li Zefan l...@cn.fujitsu.com
 Signed-off-by: Liu Bo bo.li@oracle.com

I'm hitting the WARN_ON in record_extent_backrefs() indicating a
problem with the return value from iterate_inodes_from_logical().

[ 6865.184782] [ cut here ]
[ 6865.184819] WARNING: at fs/btrfs/inode.c:2062
record_extent_backrefs+0xe5/0xe7 [btrfs]()
[ 6865.184823] Hardware name: OptiPlex 745
[ 6865.184825] Modules linked in: lpc_ich mfd_core xts gf128mul cryptd
aes_x86_64 sha256_generic btrfs libcrc32c
[ 6865.184841] Pid: 4239, comm: btrfs-endio-wri Not tainted 3.5.4-git-local+ #1
[ 6865.184844] Call Trace:
[ 6865.184856]  [81031d6a] warn_slowpath_common+0x74/0xa2
[ 6865.184862]  [81031db2] warn_slowpath_null+0x1a/0x1c
[ 6865.184884]  [a003356b] record_extent_backrefs+0xe5/0xe7 [btrfs]
[ 6865.184908]  [a003cf3a] btrfs_finish_ordered_io+0x131/0xa4b [btrfs]
[ 6865.184930]  [a003d869] finish_ordered_fn+0x15/0x17 [btrfs]
[ 6865.184951]  [a005882f] worker_loop+0x145/0x516 [btrfs]
[ 6865.184959]  [81059727] ? __wake_up_common+0x54/0x84
[ 6865.184983]  [a00586ea] ? btrfs_queue_worker+0x2d3/0x2d3 [btrfs]
[ 6865.184989]  [810516bb] kthread+0x93/0x98
[ 6865.184996]  [817d7934] kernel_thread_helper+0x4/0x10
[ 6865.185001]  [81051628] ? kthread_freezable_should_stop+0x6a/0x6a
[ 6865.185021]  [817d7930] ? gs_change+0xb/0xb
[ 6865.185025] ---[ end trace 26cc0e186efc79d8 ]---


I'm testing a 3.5.4 kernel merged with 3.6_rc patchset as well as the
send_recv patches and most of the btrfs-next patches.

I'm running into this issue when mounting with autodefrag, and running
some snapshot tests.

This may be related to a problem elsewhere, because I've been
encountering other backref issues even before testing this patch.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ENOSPC design issues

2012-09-24 Thread Mitch Harder
On Thu, Sep 20, 2012 at 2:03 PM, Josef Bacik jba...@fusionio.com wrote:
 Hello,

 I'm going to look at fixing some of the performance issues that crop up 
 because
 of our reservation system.  Before I go and do a whole lot of work I want some
 feedback.

When I was trying to figure out the problem with gzip ENOSPC issues, I
spent some time debugging and following the flow through the
reserve_metadata_bytes() function in extent-tree.c.

My observation was that the accounting around
space_info-bytes_may_use did not appear to be tightly closed.  The
space_info-bytes_may_use value would grow large (often 3 or 4 times
greater than space_info-total), and the flow through
reserve_metadata_bytes() would stay in overcommit.

I was unsuccessfull in figuring out how to rework or close the loop on
the accounting for space_info-bytes_may_use.

I noticed that btrfs seemed to work OK even though the value in
space_info-bytes_may_use appeared inexplicably large, and btrfs was
always in overcommit.

So, since you're asking for possibly 'crazy ideas', I suggest
considering finding a way to ignore space_info-bytes_may_use in
reserve_metadata_bytes().  Either make the overcommit the default
(which I found to approximate my real-life case anyhow), or have a
simple mechanism for quick fail-over to overcommit.

I doubt this will be any kind of comprehensive fix for ENOSPC issues,
but simplifying reserve_metadata_bytes() may make it easier to find
the other issues.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 2/2] Btrfs: fix the snapshot that should not exist

2012-09-18 Thread Mitch Harder
On Thu, Aug 2, 2012 at 6:46 AM, David Sterba d...@jikos.cz wrote:
...

 Fsck spits lots of errors:

 ref mismatch on [1133031424 4096] extent item 1, found 0
 Backref 1133031424 root 5 not referenced back 0x7d1f40
 Incorrect global backref count on 1133031424 found 1 wanted 0
 backpointer mismatch on [1133031424 4096]
 owner ref check failed [1133031424 4096]

 ref mismatch on [11213131776 16384] extent item 1, found 0
 Incorrect local backref count on 11213131776 root 5 owner 34509 offset 0 
 found 0 wanted 1 back 0x1424d8e0
 backpointer mismatch on [11213131776 16384]
 owner ref check failed [11213131776 16384]

 fs tree 260 refs 6 not found
 unresolved ref root 263 dir 256 index 4 namelen 14 name 
 snap2748615355 error 600
 unresolved ref root 267 dir 256 index 4 namelen 14 name 
 snap2748615355 error 600
 unresolved ref root 269 dir 256 index 4 namelen 14 name 
 snap2748615355 error 600
 unresolved ref root 273 dir 256 index 4 namelen 14 name 
 snap2748615355 error 600
 unresolved ref root 274 dir 256 index 4 namelen 14 name 
 snap2748615355 error 600
 unresolved ref root 276 dir 256 index 4 namelen 14 name 
 snap2748615355 error 600


 I've asked Josef to pull those patches out of btrfs-next, feel free to send 
 me any testing
 version if you can't reproduce it on your side.


I've run into similar errors after an unclean shutdown on a partition
where I make use of several subvolumes.

Some of the data in the subvolume is inaccessible, although the
original root volume seems OK.

So far, the partition is resisting my efforts to fix the errors.

This unclean shutdown occurred while using a 3.5.3 kernel merged with
the for-linus branch, so it did not contain any of Miao Xie's recent
patches to address this issue.

I've made an image of the corrupted volume if anybody has something
they'd like me to test.  But I'm primarily reporting this to let you
know I'm seeing errors similar to the one's thrown off by your test
case.

I'm going to look into merging the patches from Josef's btrfs-next to
see if the problem recurs.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: wait on async pages when shrinking delalloc

2012-09-06 Thread Mitch Harder
On Thu, Sep 6, 2012 at 3:51 PM, Josef Bacik jba...@fusionio.com wrote:
 Mitch reported a problem where you could get an ENOSPC error when untarring
 a kernel git tree onto a 16gb file system with compress-force=zlib.  This is
 because compression is a huge pain, it will return from -writepages()
 without having actually created any ordered extents.  To get around this we
 check to see if the async submit counter is up, and if it is wait until it
 drops to 0 before doing our normal ordered wait dance.  With this patch I
 can now untar a kernel git tree onto a 16gb file system without getting
 ENOSPC errors.  Thanks,

 Signed-off-by: Josef Bacik jba...@fusionio.com

Thanks, this patch fixes the issues I was seeing with ENOSPC on zlib
compression.

I also did some rough testing for any performance regressions on lzo
and with no compression, and my benchmarks were all in the same range.
 I don't have any comparisons available for zlib since my benchmark
tests would always trigger ENOSPC errors.

I also checked Zach Brown's suggestion of dropping:

 +   if (atomic_read(root-fs_info-async_delalloc_pages))

and just leaving:

 +   wait_event(root-fs_info-async_submit_wait,
 + !atomic_read(root-fs_info-async_delalloc_pages));

This is because the wait_event macro should perform the same test
(although it will start a 'do' loop before making the same test).

This change also worked in my tests.

For reference, I pulled up the wait_event macro according to the Linux
Cross Reference site:

http://lxr.free-electrons.com/source/include/linux/wait.h#L205

205 /**
206  * wait_event - sleep until a condition gets true
207  * @wq: the waitqueue to wait on
208  * @condition: a C expression for the event to wait for
209  *
210  * The process is put to sleep (TASK_UNINTERRUPTIBLE) until the
211  * @condition evaluates to true. The @condition is checked each time
212  * the waitqueue @wq is woken up.
213  *
214  * wake_up() has to be called after changing any variable that could
215  * change the result of the wait condition.
216  */
217 #define wait_event(wq, condition)   \
218 do {\
219 if (condition)  \
220 break;  \
221 __wait_event(wq, condition);\
222 } while (0)

Tested-by: Mitch Harder mitch.har...@sabayonlinux.org
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Varying Leafsize and Nodesize in Btrfs

2012-08-30 Thread Mitch Harder
I've been trying out different leafsize/nodesize settings by
benchmarking some typical operations.

These changes had more impact than I expected.  Using a
leafsize/nodesize of either 8192 or 16384 provided a noticeable
improvement in my limited testing.

These results are similar to some that Chris Mason has already
reported:  https://oss.oracle.com/~mason/blocksizes/

I noticed that metadata allocation was more efficient with bigger
block sizes.  My data was git kernel sources, which will utilize
btrfs' inlining.  This may have tilted the scales.

Read operations seemed to benefit the most.  Write operations seemed
to get punished when the leafsize/nodesize was increased to 64K.

Are there any known downsides to using a leafsize/nodesize bigger than
the default 4096?


Time (seconds) to finish 7 simultaneous copy operations on a set of
Linux kernel git sources.

Leafsize/
NodesizeTime (Std Dev%)
4096 124.7 (1.25%)
8192 115.2 (0.69%)
16384114.8 (0.53%)
65536130.5 (0.3%)


Time (seconds) to finish 'git status' on a set of Linux kernel git sources.

Leafsize/
NodesizeTime (Std Dev%)
4096  13.2 (0.86%)
8192  11.2 (1.36%)
16384  9.0 (0.92%)
65536  8.5 (1.3%)


Time (seconds) to perform a git checkout of a different branch on a
set of Linux kernel sources.

Leafsize/
NodesizeTime (Std Dev%)
4096  19.4 (1.1%)
8192  16.94 (3.1%)
16384 14.4 (0.6%)
65536 16.3 (0.8%)


Time (seconds) to perform 7 simultaneous rsync threads on the Linux
kernel git sources directories.

Leafsize/
NodesizeTime (Std Dev%)
4096 410.3 (4.5%)
8192 289.8 (0.96%)
16384250.7 (3.8%)
65536227.0 (1.2%)


Used Metadata (MB) as reported by 'btrfs fi df'

Leafsize/
NodesizeSize (Std Dev%)
4096 484 MB (0.13%)
8192 443 MB (0.2%)
16384424 MB (0.2%)
65536411 MB (0.2%)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs Intermittent ENOSPC Issues

2012-08-27 Thread Mitch Harder
On Tue, Jul 31, 2012 at 2:37 PM, Mitch Harder
mitch.har...@sabayonlinux.org wrote:
 I've been working on running down intermittent ENOSPC issues.

 I can only seem to replicate ENOSPC errors when running zlib
 compression.  However, I have been seeing similar ENOSPC errors to a
 lesser extent when playing with the LZ4HC patches.


I've been spending most of my efforts on the specific areas of code
that are generating the ENOSPC error.  But I've been developing the
perception that the real problem is elsewhere.

I probably should have looked at this a while ago, but if I generate
an Alt-SysRq-W delayed tasks traceback during the intermittent periods
when ENOSPC errors are occurring, I'm seeing delays in other areas.

It may be that the ENOSPC errors are occurring due to a page lock that
is not clearing in another thread.

[12339.617366] SysRq : HELP : loglevel(0-9) reBoot Crash
terminate-all-tasks(E) memory-full-oom-kill(F) kill-all-tasks(I)
thaw-filesystems(J) saK show-backtrace-all-active-cpus(L)
show-memory-usage(M) nice-all-RT-tasks(N) powerOff show-registers(P)
show-all-timers(Q) unRaw Sync show-task-states(T) Unmount
show-blocked-tasks(W) dump-ftrace-buffer(Z)
[12339.650620] SysRq : Show Blocked State
[12339.650624]   taskPC stack   pid father
[12339.650678] flush-btrfs-6   D 810c03bb 0  7162  2 0x
[12339.650681]  880126a83990 0046 880126a82000
8801266fad40
[12339.650684]  00012280 880126a83fd8 00012280
4000
[12339.650687]  880126a83fd8 00012280 880129af16a0
8801266fad40
[12339.650690] Call Trace:
[12339.650698]  [8106c6d0] ? ktime_get_ts+0xae/0xbb
[12339.650701]  [8106c6d0] ? ktime_get_ts+0xae/0xbb
[12339.650705]  [810c03bb] ? __lock_page+0x6d/0x6d
[12339.650708]  [8162da84] schedule+0x64/0x66
[12339.650710]  [8162db12] io_schedule+0x8c/0xcf
[12339.650713]  [810c03c9] sleep_on_page+0xe/0x12
[12339.650715]  [8162c159] __wait_on_bit_lock+0x46/0x8f
[12339.650717]  [810c0117] ? find_get_pages_tag+0xf8/0x134
[12339.650720]  [810c03b4] __lock_page+0x66/0x6d
[12339.650723]  [8104b7ff] ? autoremove_wake_function+0x39/0x39
[12339.650753]  [a0065f28]
extent_write_cache_pages.clone.16.clone.29+0x143/0x30c [btrfs]
[12339.650770]  [a0066303] extent_writepages+0x48/0x5d [btrfs]
[12339.650784]  [a0053019] ?
uncompress_inline.clone.33+0x15f/0x15f [btrfs]
[12339.650788]  [8105c8f4] ? update_curr+0x81/0x123
[12339.650802]  [a00528ac] btrfs_writepages+0x27/0x29 [btrfs]
[12339.650805]  [810c9975] do_writepages+0x20/0x29
[12339.650808]  [8112ec67]
__writeback_single_inode.clone.22+0x48/0x11c
[12339.650811]  [8112f1cf] writeback_sb_inodes+0x1f0/0x332
[12339.650813]  [810c870e] ? global_dirtyable_memory+0x1a/0x3b
[12339.650816]  [8112f389] __writeback_inodes_wb+0x78/0xb9
[12339.650818]  [8112f510] wb_writeback+0x146/0x23e
[12339.650820]  [810c891b] ? global_dirty_limits+0x2f/0x10f
[12339.650822]  [8112fdef] wb_do_writeback+0x195/0x1b0
[12339.650825]  [8112fe98] bdi_writeback_thread+0x8e/0x1f1
[12339.650827]  [8112fe0a] ? wb_do_writeback+0x1b0/0x1b0
[12339.650829]  [8112fe0a] ? wb_do_writeback+0x1b0/0x1b0
[12339.650832]  [8104b2ef] kthread+0x89/0x91
[12339.650835]  [816303f4] kernel_thread_helper+0x4/0x10
[12339.650837]  [8104b266] ? kthread_freezable_should_stop+0x57/0x57
[12339.650839]  [816303f0] ? gs_change+0xb/0xb
[12339.650842] tar D 88012683f8b8 0  7173   7152 0x
[12339.650845]  880126c0f9e8 0086 880126c0e000
8801267496a0
[12339.650848]  00012280 880126c0ffd8 00012280
4000
[12339.650851]  880126c0ffd8 00012280 880129af16a0
8801267496a0
[12339.650854] Call Trace:
[12339.650866]  [a0037b35] ?
block_rsv_release_bytes+0xc7/0x127 [btrfs]
[12339.650869]  [8103c073] ? lock_timer_base.clone.26+0x2b/0x50
[12339.650871]  [8162da84] schedule+0x64/0x66
[12339.650873]  [8162c075] schedule_timeout+0x22c/0x26a
[12339.650876]  [8103c038] ? run_timer_softirq+0x2d4/0x2d4
[12339.650878]  [8162c0f1] schedule_timeout_killable+0x1e/0x20
[12339.650890]  [a003dd0c]
reserve_metadata_bytes.clone.57+0x4ba/0x5e7 [btrfs]
[12339.650906]  [a0066b52] ? free_extent_buffer+0x68/0x6c [btrfs]
[12339.650918]  [a003e1a9] btrfs_block_rsv_add+0x2b/0x4d [btrfs]
[12339.650932]  [a004ff40] start_transaction+0x131/0x310 [btrfs]
[12339.650946]  [a0050386] btrfs_start_transaction+0x13/0x15 [btrfs]
[12339.650961]  [a005b10a] btrfs_create+0x3a/0x1e0 [btrfs]
[12339.650964]  [81120861] ? d_splice_alias+0xcc/0xd8
[12339.650966]  [811173aa] vfs_create+0x9c/0xf5
[12339.650968]  [81119786

Re: cross-subvolume cp --reflink

2012-08-18 Thread Mitch Harder
On Fri, Aug 17, 2012 at 12:20 AM, Marc MERLIN m...@merlins.org wrote:
 On Thu, Aug 16, 2012 at 09:20:00PM -0700, james northrup wrote:
 dunno if this thread is dead, but im inclined to patch in cp --reflink
 to fdupes prog.  It  currently does provide a poor-man's dedupe via
 md5sum and hardlink, or delete.

 all the better if the distro-kernels can backport cross-snapshot
 reflinks sooner than later.

 So, I'd love for cp --reflink to bring back a deleted VM (huge file) from a
 snapshot back to trunk without duplicating it.
 But how would fdupes help? I can't hardlink between two snapshots, can I?

 gandalfthegreat:/mnt/btrfs_pool1# ln 
 usr_weekly_20120812_00\:02\:01/svn-commit.tmp  usr/test
 ln: failed to create hard link `usr/test' = 
 `usr_weekly_20120812_00:02:01/svn-commit.tmp': Invalid cross-device link

 So, is there anything user space can do without kernel support?


A cross-subvolume copy patch has made it into 3.6_rc

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=362a20c5e27614739c4

This patch will allow cp --reflink across subvolumes, as long as the
copy does not cross mount points.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] Update LZO compression

2012-08-16 Thread Mitch Harder
On Thu, Aug 16, 2012 at 5:17 PM, Andi Kleen a...@firstfloor.org wrote:
 On Thu, Aug 16, 2012 at 11:55:06AM -0700, james northrup wrote:
 looks like ARM results are inconclusive from a lot of folks without
 bandwidth to do a write-up, what about just plain STAGING status for ARM so
 the android tweakers can beat on it for a while?

 Staging only really works for new drivers, not for updating existing
 library functions like this.

 I suppose you could keep both and have the architecture select with a
 CONFIG.


I've been doing some rough benchmarking with the updated LZO in btrfs.

My tests primarily consist of timing some typical copying, git
manipulating, and running rsync using a set of kernel git sources.
Git sources are typically about 50% pack files which won't compress
very well, with the remainder being mostly highly compressible source
files.

Of course, any underlying speed improvement attributable only to LZO
is not shown by test like this. But I thought it would be interesting
to see the impact in some typical real-world btrfs operations.

I was seeing between 3-9% improvement in speed with the new LZO.

Copying several directories of git sources showed the most
improvement, ~9%.  Typical git operations, such as a git checkout or
git status where only showing 3-5% improvement, which is close to the
noise level of my tests.  Running multiple rsync processes showed a 5%
improvement.

With only 10 trials (5 with each LZO), I can't say I would
statistically hang my hat on these numbers.

Given all the other stuff that is going on in my rough benchmarks, a
3-9% improvement from a single change is probably pretty good.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: do not allocate chunks as agressively

2012-08-15 Thread Mitch Harder
On Tue, Aug 14, 2012 at 3:22 PM, Josef Bacik jba...@fusionio.com wrote:
 Swinging this pendulum back the other way.  We've been allocating chunks up
 to 2% of the disk no matter how much we actually have allocated.  So instead
 fix this calculation to only allocate chunks if we have more than 80% of the
 space available allocated.  Please test this as it will likely cause all
 sorts of ENOSPC problems to pop up suddenly.  Thanks,

 Signed-off-by: Josef Bacik jba...@fusionio.com

I've been testing this patch with my multiple rsync test (On a 3.5.1
kernel merged with for-linus).

I tested without compression, and with lzo compression, and I haven't
run into any ENOSPC issues.  I still have ENOSPC issues with zlib,
with or without this patch.

I made a series of  runs with and without this patch (on an
uncompressed, newly formatted partition), and some of the results were
not what I anticipated.

1) I found that *MORE* metadata space was being allocated with this
patch than when using an unpatched baseline kernel.  The total
allocated space was exactly the same in each run (I saw a slight
variation in the amount of used Metadata).

On the unpatched baseline kernel, at the end of the run, the 'btrfs fi
df' command would show:

# btrfs fi df /mnt/benchmark/
Data: total=10.01GB, used=6.99GB
System: total=4.00MB, used=4.00KB
Metadata: total=776.00MB, used=481.38MB

With this patch applied, the 'btrfs fi df' command would show:

# btrfs fi df /mnt/benchmark/
Data: total=10.01GB, used=6.99GB
System: total=4.00MB, used=4.00KB
Metadata: total=1.01GB, used=480.94MB


2)  The multiple rsync's would run significantly faster with the patched kernel.

Unpatched baseline kernel:  Time to run 7 rysncs:  348.3 sec (+/- 9.7 sec)
Patched kernel: Time to run 7 rsyncs:  316.6 sec (+/- 6.5 sec)

Perhaps the extra allocated metadata space made things run better, or
perhaps something else was going on.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix race in run_clustered refs

2012-08-08 Thread Mitch Harder
On Wed, Aug 8, 2012 at 3:37 PM, Josef Bacik jba...@fusionio.com wrote:
 On Wed, Aug 08, 2012 at 01:49:06PM -0600, Arne Jansen wrote:
 run_clustered_refs runs all delayed refs for one head one by one. During
 the runs, the delayed_refs-lock is released. In this window, the ref_mod
 from the head does not match the sum of all refs below the head. When
 btrfs_lookup_extent_info is run in this window, it gives inconsistent
 results.
 The qgroups patch added code to put delayed refs back, thus opening this
 window very wide.
 This patch assures that head-ref_mod always matches the queued refs, but
 a window still remains where on-disk refs + delayed_refs miss the ref
 currently being run.

 Signed-off-by: Arne Jansen sensi...@gmx.net
 ---
  fs/btrfs/extent-tree.c |   17 +
  1 files changed, 17 insertions(+), 0 deletions(-)

 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index e66dc9a..60d175a 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -2318,6 +2318,23 @@ static noinline int run_clustered_refs(struct 
 btrfs_trans_handle *trans,
   ref-in_tree = 0;
   rb_erase(ref-rb_node, delayed_refs-root);
   delayed_refs-num_entries--;
 + if (locked_ref) {
 + /*
 +  * when we play the delayed ref, also correct the
 +  * ref_mod on head
 +  */
 + switch (ref-action) {
 + case BTRFS_ADD_DELAYED_REF:
 + case BTRFS_ADD_DELAYED_EXTENT:
 + locked_ref-node.ref_mod -= ref-ref_mod;
 + break;
 + case BTRFS_DROP_DELAYED_REF:
 + locked_ref-node.ref_mod += ref-ref_mod;
 + break;
 + default:
 + WARN_ON(1);
 + }
 + }
   spin_unlock(delayed_refs-lock);

   ret = run_one_delayed_ref(trans, root, ref, extent_op,

 btrfs_lookup_extent_info takes the mutex on the head before it looks at it's
 ref_mod, so it should always be consistent.  Maybe somebody else is messing 
 with
 refs and not doing the same thing?  If that's the case we should fix them by
 doing the same thing, this isn't a fix.  Thanks,

 Josef

I understand from discussion on IRC that there may be updates to this
patch.  But, FWIW, this patch addresses the multiple rsync problem I
was seeing.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix deadlock in wait_for_more_refs

2012-08-06 Thread Mitch Harder
On Mon, Aug 6, 2012 at 3:18 PM, Arne Jansen sensi...@gmx.net wrote:
 Commit a168650c introduced a waiting mechanism to prevent busy waiting in
 btrfs_run_delayed_refs. This can deadlock with btrfs_run_ordered_operations,
 where a tree_mod_seq is held while waiting for the io to complete, while
 the end_io calls btrfs_run_delayed_refs.
 This whole mechanism is unnecessary. If not enough runnable refs are
 available to satisfy count, just return as count is more like a guideline
 than a strict requirement.
 In case we have to run all refs, commit transaction makes sure that no
 other threads are working in the transaction anymore, so we just assert
 here that no refs are blocked.


I've been testing this patch after manually merging on top of Josef's
Btrfs: barrier before waitqueue_active V2 patch.

With that arrangement, I've been unable to reproduce the deadlock on my system.

I'll continue banging away on it tomorrow, and let you know if I
attain a deadlock.

Also, let me know if you need me to test without including Josef's
added barriers.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: barrier before waitqueue_active

2012-08-01 Thread Mitch Harder
On Wed, Aug 1, 2012 at 3:25 PM, Josef Bacik jba...@fusionio.com wrote:
 We need an smb_mb() before waitqueue_active to avoid missing wakeups.
 Before Mitch was hitting a deadlock between the ordered flushers and the
 transaction commit because the ordered flushers were waiting for more refs
 and were never woken up, so those smp_mb()'s are the most important.
 Everything else I added for correctness sake and to avoid getting bitten by
 this again somewhere else.  Thanks,


This patch seems to make it tougher to hit a deadlock, but I'm still
encountering intermittent deadlocks using this patch when running
multiple rsync threads.

I've also tested Patch 2, and that has me hitting a deadlock even
quicker (when starting several copying threads).

I also found a slight performance hit using this patch.  On a 3.4.6
kernel (merged with the 3.5_rc for-linus branch), I would typically
complete my rsync test in ~265 seconds.  Also, I can't recall hitting
a deadlock on the 3.4.6 kernel (with 3.5_rc for-linus).  When using
this patch, the test would take ~310 seconds (when it didn't hit a
deadlock).

Here's the Delayed Tasks (Ctrl-SysRq-W) when using JUST this patch:

[ 1568.794030] SysRq : Show Blocked State
[ 1568.794101]   taskPC stack   pid father
[ 1568.794123] btrfs-endio-wri D 88012579c000 0  3845  2 0x
[ 1568.794128]  8801254f3c20 0046 8801254f2000
8801241b5a80
[ 1568.794132]  00012280 8801254f3fd8 00012280
4000
[ 1568.794136]  8801254f3fd8 00012280 880129af16a0
8801241b5a80
[ 1568.794140] Call Trace:
[ 1568.794179]  [a0068785] ? memcpy_extent_buffer+0x159/0x17a [btrfs]
[ 1568.794200]  [a0082ab7] ? find_ref_head+0xa3/0xc6 [btrfs]
[ 1568.794220]  [a008343c] ? btrfs_find_ref_cluster+0xdd/0x117 [btrfs]
[ 1568.794225]  [8162d58c] schedule+0x64/0x66
[ 1568.794241]  [a003fc86] btrfs_run_delayed_refs+0x269/0x3f0 [btrfs]
[ 1568.794246]  [8104b10e] ? wake_up_bit+0x2a/0x2a
[ 1568.794265]  [a004fdc4] __btrfs_end_transaction+0xca/0x283 [btrfs]
[ 1568.794283]  [a004ffda] btrfs_end_transaction+0x15/0x17 [btrfs]
[ 1568.794302]  [a00555da] btrfs_finish_ordered_io+0x2e4/0x334 [btrfs]
[ 1568.794306]  [8103b980] ? run_timer_softirq+0x2d4/0x2d4
[ 1568.794325]  [a005563f] finish_ordered_fn+0x15/0x17 [btrfs]
[ 1568.794344]  [a0070ef8] worker_loop+0x188/0x4e0 [btrfs]
[ 1568.794365]  [a0070d70] ? btrfs_queue_worker+0x275/0x275 [btrfs]
[ 1568.794384]  [a0070d70] ? btrfs_queue_worker+0x275/0x275 [btrfs]
[ 1568.794387]  [8104ac37] kthread+0x89/0x91
[ 1568.794391]  [8162fd74] kernel_thread_helper+0x4/0x10
[ 1568.794395]  [8104abae] ? kthread_freezable_should_stop+0x57/0x57
[ 1568.794398]  [8162fd70] ? gs_change+0xb/0xb
[ 1568.794400] btrfs-transacti D 88009912ba50 0  3851  2 0x
[ 1568.794403]  8801241cfc70 0046 8801241ce000
8801248cda80
[ 1568.794407]  00012280 8801241cffd8 00012280
4000
[ 1568.794411]  8801241cffd8 00012280 8801254b8000
8801248cda80
[ 1568.794415] Call Trace:
[ 1568.794436]  [a0066646] ? extent_writepages+0x53/0x5d [btrfs]
[ 1568.794455]  [a005357b] ?
uncompress_inline.clone.33+0x15f/0x15f [btrfs]
[ 1568.794459]  [810c9ada] ? pagevec_lookup_tag+0x24/0x2e
[ 1568.794478]  [a0052e0e] ? btrfs_writepages+0x27/0x29 [btrfs]
[ 1568.794481]  [810c90b1] ? do_writepages+0x20/0x29
[ 1568.794485]  [8162d58c] schedule+0x64/0x66
[ 1568.794505]  [a0061547]
btrfs_start_ordered_extent+0xde/0xfa [btrfs]
[ 1568.794508]  [8104b10e] ? wake_up_bit+0x2a/0x2a
[ 1568.794529]  [a0061984] ?
btrfs_lookup_first_ordered_extent+0x65/0x99 [btrfs]
[ 1568.794549]  [a0061a6a] btrfs_wait_ordered_range+0xb2/0xda [btrfs]
[ 1568.794569]  [a0061bcc]
btrfs_run_ordered_operations+0x13a/0x1c1 [btrfs]
[ 1568.794587]  [a004f5f5]
btrfs_commit_transaction+0x287/0x960 [btrfs]
[ 1568.794606]  [a00502b1] ? start_transaction+0x2d5/0x310 [btrfs]
[ 1568.794609]  [8104b10e] ? wake_up_bit+0x2a/0x2a
[ 1568.794627]  [a004913b] transaction_kthread+0x187/0x258 [btrfs]
[ 1568.794644]  [a0048fb4] ? btrfs_alloc_root+0x42/0x42 [btrfs]
[ 1568.794661]  [a0048fb4] ? btrfs_alloc_root+0x42/0x42 [btrfs]
[ 1568.794664]  [8104ac37] kthread+0x89/0x91
[ 1568.794668]  [8162fd74] kernel_thread_helper+0x4/0x10
[ 1568.794671]  [8104abae] ? kthread_freezable_should_stop+0x57/0x57
[ 1568.794674]  [8162fd70] ? gs_change+0xb/0xb
[ 1568.794676] flush-btrfs-1   D 88012579c000 0  3857  2 0x
[ 1568.794680]  880037125670 0046 880037124000
8801254b8000
[ 1568.794684]  00012280 880037125fd8 00012280
4000
[ 

Btrfs Intermittent ENOSPC Issues

2012-07-31 Thread Mitch Harder
I've been working on running down intermittent ENOSPC issues.

I can only seem to replicate ENOSPC errors when running zlib
compression.  However, I have been seeing similar ENOSPC errors to a
lesser extent when playing with the LZ4HC patches.

I apologize for not following up on this sooner, but I had drifted
away from using zlib, and didn't notice there was still an issue.

My test case involves un-archiving linux git sources to a freshly
formatted btrfs partition, mounted with compress-force=zlib.  I am
using a 16 GB partition on a 250 GB Western Digital SATA Hard Disk.
My current kernel is x86_64 linux-3.5.0 merged with Chris' for-linus
branch (for 3.6_rc).  This includes Josef's Btrfs: flush delayed
inodes if we're short on space patch.

I haven't isolated a root cause, but here's the feedback I have so far.

(1)  My test case won't generate ENOSPC issues with lzo compression or
no compression.

(2)  I've inserted some trace_printk debugging statements to trace
back the call stack, and the ENOSPC errors only seem to occur on a new
transaction: vfs_create - btrfs_create - btrfs_start_transaction -
start_transaction - btrfs_block_rsv_add - reserve_metadata_bytes.

(3)  The ENOSPC condition will usually clear in a few seconds,
allowing writes to proceed.

(4)  I've added a loop to the reserve_metadata_bytes() function to
loop back with 'flush_state = FLUSH_DELALLOC (1)' for 1024 retries.
This reduces and/or eliminates the ENOSPC errors, as if we're waiting
on something else that is trying to complete.

(5)  I've been heavily debugging the reserve_metadata_bytes()
function, and I'm seeing problems with the way
space_info-bytes_may_use is handled.  The space_info-bytes_may_use
value is important in determining if we're in an over-commit state.
But space_info-bytes_may_use value is often increased arbitrarily
without any mechanism for correcting the value.  Subsequently,
space_info-bytes_may_use quickly increases in size to the point where
we are always in fallback allocation as if we're overcommitted.  In my
trials, it was hard to capture a point where space_info-bytes_may_use
wasn't larger than the available size.

(6)  Even though reserve_metadata_bytes() is almost always in fallback
overcommitted mode, it is still working pretty well, and I've
developed the perception that the problem is something that needs to
finish elsewhere.

Sorry for not having a patch to fix the issue.  I'll try to keep
banging on it as time allows.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/1] Btrfs: Explicitly include vmalloc.h in send.c

2012-07-28 Thread Mitch Harder
When compiling without SMP and generic x86_64, I encountered the
following errors due to vmalloc.h not being implicitly included:

  CC  fs/btrfs/send.o
fs/btrfs/send.c: In function ‘fs_path_free’:
fs/btrfs/send.c:185:4: error: implicit declaration of function ‘vfree’
fs/btrfs/send.c: In function ‘fs_path_ensure_buf’:
fs/btrfs/send.c:215:4: error: implicit declaration of function ‘vmalloc’
fs/btrfs/send.c:215:12: warning: assignment makes pointer from integer without 
a cast
fs/btrfs/send.c:225:12: warning: assignment makes pointer from integer without 
a cast
fs/btrfs/send.c:233:13: warning: assignment makes pointer from integer without 
a cast
fs/btrfs/send.c: In function ‘iterate_dir_item’:
fs/btrfs/send.c:900:10: warning: assignment makes pointer from integer without 
a cast
fs/btrfs/send.c:909:11: warning: assignment makes pointer from integer without 
a cast
fs/btrfs/send.c: In function ‘btrfs_ioctl_send’:
fs/btrfs/send.c:4462:17: warning: assignment makes pointer from integer without 
a cast
fs/btrfs/send.c:4468:17: warning: assignment makes pointer from integer without 
a cast
fs/btrfs/send.c:4474:2: error: implicit declaration of function ‘vzalloc’
fs/btrfs/send.c:4474:20: warning: assignment makes pointer from integer without 
a cast
fs/btrfs/send.c:4482:21: warning: assignment makes pointer from integer without 
a cast
make[2]: *** [fs/btrfs/send.o] Error 1
make[1]: *** [fs/btrfs] Error 2 

If it makes sense, please feel free to include this minor change in with
other send/receive fixes.

Mitch Harder (1):
  Btrfs: Explicitly include vmalloc.h in send.c

 fs/btrfs/send.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

-- 
1.7.8.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] Btrfs: Explicitly include vmalloc.h in send.c

2012-07-28 Thread Mitch Harder
Certain architectures or platforms or combinations of CONFIG options
require an explicit #include linux/vmalloc.h.

Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org
---
 fs/btrfs/send.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index bf232c8..118e76d 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -25,6 +25,7 @@
 #include linux/posix_acl_xattr.h
 #include linux/radix-tree.h
 #include linux/crc32c.h
+#include linux/vmalloc.h
 
 #include send.h
 #include backref.h
-- 
1.7.8.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4] Btrfs: Check INCOMPAT flags on remount and add helper function

2012-07-24 Thread Mitch Harder
In support of the recently added capability to remount with lzo
compression, provide a helper function to check the compression
INCOMPAT flags when remounting with lzo compression, and set
the flags if necessary.

Also, implement the new helper function when defragmenting with
explicit lzo compression and when setting the default subvolume.

Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org
---
v1-v2
- Remove extraneous formatting change.
v2-v3
- Consolidate into a single patch
- Convert helper function to a static inline function.
v3-v4
- Per feedback from Li Zefan, change function name from _chk_ to _set_
- Per feedback from David Sterba, make the helper function more generic.
- The more generic function can also be implemented in the INCOMPAT
  check made for setting the default subvolume.

 fs/btrfs/ctree.h |   17 +
 fs/btrfs/ioctl.c |   16 ++--
 fs/btrfs/super.c |1 +
 3 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index a0ee2f8..5422e54 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3103,6 +3103,23 @@ void __btrfs_abort_transaction(struct btrfs_trans_handle 
*trans,
   struct btrfs_root *root, const char *function,
   unsigned int line, int errno);
 
+#define btrfs_set_fs_incompat(__fs_info, opt) \
+   __btrfs_set_fs_incompat((__fs_info), BTRFS_FEATURE_INCOMPAT_##opt)
+
+static inline void __btrfs_set_fs_incompat(struct btrfs_fs_info *fs_info,
+  u64 flag)
+{
+   struct btrfs_super_block *disk_super;
+   u64 features;
+
+   disk_super = fs_info-super_copy;
+   features = btrfs_super_incompat_flags(disk_super);
+   if (!(features  flag)) {
+   features |= flag;
+   btrfs_set_super_incompat_flags(disk_super, features);
+   }
+}
+
 #define btrfs_abort_transaction(trans, root, errno)\
 do {   \
__btrfs_abort_transaction(trans, root, __func__,\
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 17facea..0d5d079 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1042,11 +1042,9 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
  u64 newer_than, unsigned long max_to_defrag)
 {
struct btrfs_root *root = BTRFS_I(inode)-root;
-   struct btrfs_super_block *disk_super;
struct file_ra_state *ra = NULL;
unsigned long last_index;
u64 isize = i_size_read(inode);
-   u64 features;
u64 last_len = 0;
u64 skip = 0;
u64 defrag_end = 0;
@@ -1233,11 +1231,8 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
mutex_unlock(inode-i_mutex);
}
 
-   disk_super = root-fs_info-super_copy;
-   features = btrfs_super_incompat_flags(disk_super);
if (range-compress_type == BTRFS_COMPRESS_LZO) {
-   features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO;
-   btrfs_set_super_incompat_flags(disk_super, features);
+   btrfs_set_fs_incompat(root-fs_info, COMPRESS_LZO);
}
 
ret = defrag_count;
@@ -2761,8 +2756,6 @@ static long btrfs_ioctl_default_subvol(struct file *file, 
void __user *argp)
struct btrfs_path *path;
struct btrfs_key location;
struct btrfs_disk_key disk_key;
-   struct btrfs_super_block *disk_super;
-   u64 features;
u64 objectid = 0;
u64 dir_id;
 
@@ -2813,12 +2806,7 @@ static long btrfs_ioctl_default_subvol(struct file 
*file, void __user *argp)
btrfs_mark_buffer_dirty(path-nodes[0]);
btrfs_free_path(path);
 
-   disk_super = root-fs_info-super_copy;
-   features = btrfs_super_incompat_flags(disk_super);
-   if (!(features  BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL)) {
-   features |= BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL;
-   btrfs_set_super_incompat_flags(disk_super, features);
-   }
+   btrfs_set_fs_incompat(root-fs_info, DEFAULT_SUBVOL);
btrfs_end_transaction(trans, root);
 
return 0;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 26da344..75ee2c7 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -401,6 +401,7 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options)
compress_type = lzo;
info-compress_type = BTRFS_COMPRESS_LZO;
btrfs_set_opt(info-mount_opt, COMPRESS);
+   btrfs_set_fs_incompat(info, COMPRESS_LZO);
} else if (strncmp(args[0].from, no, 2) == 0) {
compress_type = no;
info-compress_type = BTRFS_COMPRESS_NONE;
-- 
1.7.8.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message

[PATCH 0/2] LZO INCOMPAT Checking

2012-07-20 Thread Mitch Harder
The following patches are against Josef's btrfs-next repository,
and depend on Arnd Hannemann's Btrfs: allow mount -o remount,compress=no
patch.

The method was based on a previous example of checking for
lzo INCOMPAT used by Li Zefan when defragmenting with explicit
compression (btrfs: Allow to specify compress method when defrag)
in ioctl.c.

The second patch uses the new function in the above referenced
existing check for lzo INCOMPAT performed when defragmenting
with explicit lzo compression.  This patch provides no
functional changes.

Mitch Harder (2):
  Btrfs: Check INCOMPAT flags on remount with lzo compression
  Btrfs: Use common function to check lzo INCOMPAT on defrag.

 fs/btrfs/ctree.h |1 +
 fs/btrfs/ioctl.c |7 +--
 fs/btrfs/super.c |   21 -
 3 files changed, 22 insertions(+), 7 deletions(-)

-- 
1.7.8.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Btrfs: Check INCOMPAT flags on remount with lzo compression

2012-07-20 Thread Mitch Harder
In support of the recently added capability to remount with lzo
compression, check the compression INCOMPAT flags when remounting
with lzo compression, and set the flags if necessary.

Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org
---
 fs/btrfs/ctree.h |1 +
 fs/btrfs/super.c |   21 -
 2 files changed, 21 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index a0ee2f8..8bee032 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3094,6 +3094,7 @@ ssize_t btrfs_listxattr(struct dentry *dentry, char 
*buffer, size_t size);
 
 /* super.c */
 int btrfs_parse_options(struct btrfs_root *root, char *options);
+void btrfs_chk_lzo_incompat(struct btrfs_root *root);
 int btrfs_sync_fs(struct super_block *sb, int wait);
 void btrfs_printk(struct btrfs_fs_info *fs_info, const char *fmt, ...);
 void __btrfs_std_error(struct btrfs_fs_info *fs_info, const char *function,
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 26da344..4398fd2 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -401,11 +401,13 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options)
compress_type = lzo;
info-compress_type = BTRFS_COMPRESS_LZO;
btrfs_set_opt(info-mount_opt, COMPRESS);
+   btrfs_chk_lzo_incompat(root);
} else if (strncmp(args[0].from, no, 2) == 0) {
compress_type = no;
info-compress_type = BTRFS_COMPRESS_NONE;
btrfs_clear_opt(info-mount_opt, COMPRESS);
-   btrfs_clear_opt(info-mount_opt, 
FORCE_COMPRESS);
+   btrfs_clear_opt(info-mount_opt,
+   FORCE_COMPRESS);
compress_force = false;
} else {
ret = -EINVAL;
@@ -587,6 +589,23 @@ out:
 }
 
 /*
+ * Check the INCOMPAT features in the super block, and set the
+ * LZO INCOMPAT flag if it has not been set.
+ */
+void btrfs_chk_lzo_incompat(struct btrfs_root *root)
+{
+   struct btrfs_super_block *disk_super;
+   u64 features;
+
+   disk_super = root-fs_info-super_copy;
+   features = btrfs_super_incompat_flags(disk_super);
+   if (!(features  BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO)) {
+   features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO;
+   btrfs_set_super_incompat_flags(disk_super, features);
+   }
+}
+
+/*
  * Parse mount options that are required early in the mount process.
  *
  * All other options will be parsed on much later in the mount process and
-- 
1.7.8.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Btrfs: Use common function to check lzo INCOMPAT on defrag.

2012-07-20 Thread Mitch Harder
When defragmenting with explicit lzo compression, simplify
the check for lzo INCOMPAT by using the new common function
introduced to support remounting with lzo compression.

Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org
---
 fs/btrfs/ioctl.c |7 +--
 1 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 17facea..d5fd69e 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1042,11 +1042,9 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
  u64 newer_than, unsigned long max_to_defrag)
 {
struct btrfs_root *root = BTRFS_I(inode)-root;
-   struct btrfs_super_block *disk_super;
struct file_ra_state *ra = NULL;
unsigned long last_index;
u64 isize = i_size_read(inode);
-   u64 features;
u64 last_len = 0;
u64 skip = 0;
u64 defrag_end = 0;
@@ -1233,11 +1231,8 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
mutex_unlock(inode-i_mutex);
}
 
-   disk_super = root-fs_info-super_copy;
-   features = btrfs_super_incompat_flags(disk_super);
if (range-compress_type == BTRFS_COMPRESS_LZO) {
-   features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO;
-   btrfs_set_super_incompat_flags(disk_super, features);
+   btrfs_chk_lzo_incompat(root);
}
 
ret = defrag_count;
-- 
1.7.8.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] Btrfs: Check INCOMPAT flags on remount with lzo compression

2012-07-20 Thread Mitch Harder
In support of the recently added capability to remount with lzo
compression, check the compression INCOMPAT flags when remounting
with lzo compression, and set the flags if necessary.

Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org
---
v1-v2:
- Remove extraneous formatting change.

 fs/btrfs/ctree.h |1 +
 fs/btrfs/super.c |   18 ++
 2 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index a0ee2f8..8bee032 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3094,6 +3094,7 @@ ssize_t btrfs_listxattr(struct dentry *dentry, char 
*buffer, size_t size);
 
 /* super.c */
 int btrfs_parse_options(struct btrfs_root *root, char *options);
+void btrfs_chk_lzo_incompat(struct btrfs_root *root);
 int btrfs_sync_fs(struct super_block *sb, int wait);
 void btrfs_printk(struct btrfs_fs_info *fs_info, const char *fmt, ...);
 void __btrfs_std_error(struct btrfs_fs_info *fs_info, const char *function,
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 26da344..f3a5967 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -401,6 +401,7 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options)
compress_type = lzo;
info-compress_type = BTRFS_COMPRESS_LZO;
btrfs_set_opt(info-mount_opt, COMPRESS);
+   btrfs_chk_lzo_incompat(root);
} else if (strncmp(args[0].from, no, 2) == 0) {
compress_type = no;
info-compress_type = BTRFS_COMPRESS_NONE;
@@ -587,6 +588,23 @@ out:
 }
 
 /*
+ * Check the INCOMPAT features in the super block, and set the
+ * LZO INCOMPAT flag if it has not been set.
+ */
+void btrfs_chk_lzo_incompat(struct btrfs_root *root)
+{
+   struct btrfs_super_block *disk_super;
+   u64 features;
+
+   disk_super = root-fs_info-super_copy;
+   features = btrfs_super_incompat_flags(disk_super);
+   if (!(features  BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO)) {
+   features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO;
+   btrfs_set_super_incompat_flags(disk_super, features);
+   }
+}
+
+/*
  * Parse mount options that are required early in the mount process.
  *
  * All other options will be parsed on much later in the mount process and
-- 
1.7.8.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] Btrfs: Use common function to check lzo INCOMPAT on defrag.

2012-07-20 Thread Mitch Harder
When defragmenting with explicit lzo compression, simplify
the check for lzo INCOMPAT by using the new common function
introduced to support remounting with lzo compression.

Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org
---
 fs/btrfs/ioctl.c |7 +--
 1 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 17facea..d5fd69e 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1042,11 +1042,9 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
  u64 newer_than, unsigned long max_to_defrag)
 {
struct btrfs_root *root = BTRFS_I(inode)-root;
-   struct btrfs_super_block *disk_super;
struct file_ra_state *ra = NULL;
unsigned long last_index;
u64 isize = i_size_read(inode);
-   u64 features;
u64 last_len = 0;
u64 skip = 0;
u64 defrag_end = 0;
@@ -1233,11 +1231,8 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
mutex_unlock(inode-i_mutex);
}
 
-   disk_super = root-fs_info-super_copy;
-   features = btrfs_super_incompat_flags(disk_super);
if (range-compress_type == BTRFS_COMPRESS_LZO) {
-   features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO;
-   btrfs_set_super_incompat_flags(disk_super, features);
+   btrfs_chk_lzo_incompat(root);
}
 
ret = defrag_count;
-- 
1.7.8.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/1] LZO INCOMPAT Checking

2012-07-20 Thread Mitch Harder
The following patch is against Josef's btrfs-next repository,
and depends on Arnd Hannemann's patch:
Btrfs: allow mount -o remount,compress=no

The method was based on a previous example of checking for
lzo INCOMPAT used by Li Zefan when defragmenting with explicit
compression (btrfs: Allow to specify compress method when defrag)
in ioctl.c.

Based on feedback on IRC, the two patch version presented in the
previous version has been consolidated into a single patch, and
the helper function was converted to a static inline function.

Mitch Harder (1):
  Btrfs: Check INCOMPAT flags on remount and add helper function

 fs/btrfs/ctree.h |   13 +
 fs/btrfs/ioctl.c |7 +--
 fs/btrfs/super.c |1 +
 3 files changed, 15 insertions(+), 6 deletions(-)

-- 
1.7.8.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/1] Btrfs: Check INCOMPAT flags on remount and add helper function

2012-07-20 Thread Mitch Harder
In support of the recently added capability to remount with lzo
compression, provide a helper function to check the compression
INCOMPAT flags when remounting with lzo compression, and set
the flags if necessary.

Also, implement the new helper function when defragmenting with
explicit lzo compression.

Signed-off-by: Mitch Harder mitch.har...@sabayonlinux.org
---

v1-v2
- Remove extraneous formatting change.
v2-v3
- Consolidate into a single patch
- Convert helper function to a static inline function.

 fs/btrfs/ctree.h |   13 +
 fs/btrfs/ioctl.c |7 +--
 fs/btrfs/super.c |1 +
 3 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index a0ee2f8..3a1a700 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3103,6 +3103,19 @@ void __btrfs_abort_transaction(struct btrfs_trans_handle 
*trans,
   struct btrfs_root *root, const char *function,
   unsigned int line, int errno);
 
+static inline void btrfs_chk_lzo_incompat(struct btrfs_root *root)
+{
+   struct btrfs_super_block *disk_super;
+   u64 features;
+
+   disk_super = root-fs_info-super_copy;
+   features = btrfs_super_incompat_flags(disk_super);
+   if (!(features  BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO)) {
+   features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO;
+   btrfs_set_super_incompat_flags(disk_super, features);
+   }
+}
+
 #define btrfs_abort_transaction(trans, root, errno)\
 do {   \
__btrfs_abort_transaction(trans, root, __func__,\
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 17facea..d5fd69e 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1042,11 +1042,9 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
  u64 newer_than, unsigned long max_to_defrag)
 {
struct btrfs_root *root = BTRFS_I(inode)-root;
-   struct btrfs_super_block *disk_super;
struct file_ra_state *ra = NULL;
unsigned long last_index;
u64 isize = i_size_read(inode);
-   u64 features;
u64 last_len = 0;
u64 skip = 0;
u64 defrag_end = 0;
@@ -1233,11 +1231,8 @@ int btrfs_defrag_file(struct inode *inode, struct file 
*file,
mutex_unlock(inode-i_mutex);
}
 
-   disk_super = root-fs_info-super_copy;
-   features = btrfs_super_incompat_flags(disk_super);
if (range-compress_type == BTRFS_COMPRESS_LZO) {
-   features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO;
-   btrfs_set_super_incompat_flags(disk_super, features);
+   btrfs_chk_lzo_incompat(root);
}
 
ret = defrag_count;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 26da344..32c2bd9 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -401,6 +401,7 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options)
compress_type = lzo;
info-compress_type = BTRFS_COMPRESS_LZO;
btrfs_set_opt(info-mount_opt, COMPRESS);
+   btrfs_chk_lzo_incompat(root);
} else if (strncmp(args[0].from, no, 2) == 0) {
compress_type = no;
info-compress_type = BTRFS_COMPRESS_NONE;
-- 
1.7.8.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: allow mount -o remount,compress=no

2012-07-19 Thread Mitch Harder
On Wed, Jul 18, 2012 at 8:28 PM, David Sterba d...@jikos.cz wrote:
 On Fri, Jul 13, 2012 at 10:19:14AM -0500, Mitch Harder wrote:
 I was testing the lz4(hc) patches, and I found the the compression
 INCOMPAT flags are not being updated using the method in this patch.

 The compression INCOMPAT flags are generally checked and updated in
 the open_ctree() function.

 But, on remount, open_ctree() is not called.

 This currently happens with lzo as well, right?


Yes, this will happen with lzo as implemented in the patch at the head
of this thread.


 My preference is to let remount succeed and set the incompat bit,
 possibly with a KERN_INFO message to syslog in case the bit is yet
 unseen by the volume.


Great.

I've put together a patch that does just that, and I've been testing
it to make sure it works as intended.

I'll finish it up and send it to the list tomorrow.

This patch will only address the lzo INCOMPAT from the remount
capabilities provided by the patch at the head of the thread.

A similar modification will be needed for lz4 patches that allow for remount.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: allow mount -o remount,compress=no

2012-07-13 Thread Mitch Harder
On Thu, Jun 28, 2012 at 10:40 AM, David Sterba d...@jikos.cz wrote:
 On Tue, Jun 26, 2012 at 08:48:37AM +0200, Arnd Hannemann wrote:
 How show should we proceed to get above mentioned patch
 (or the similar patch from Andrei Popa) merged?

 Josef picked the patch into btrfs-next, I see not problem to include it
 in next merge window patchset.


I was testing the lz4(hc) patches, and I found the the compression
INCOMPAT flags are not being updated using the method in this patch.

The compression INCOMPAT flags are generally checked and updated in
the open_ctree() function.

But, on remount, open_ctree() is not called.

I was going to test a patch to update the INCOMPAT flags similar to
the way lzo INCOMPAT is updated when specifying the compress method in
defragmentation.

http://kerneltrap.org/mailarchive/linux-btrfs/2010/11/18/6886194

But, let me know if it is preferred to just return -EINVAL when trying
to remount with a compression method that has an INCOMPAT not yet seen
by that volume.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cannot remove files: rm gives no space left on device, 3.2.0-24, ubuntu

2012-06-17 Thread Mitch Harder
On Sun, Jun 17, 2012 at 3:04 AM, rupert THURNER
rupert.thur...@gmail.com wrote:
 On Sun, Jun 17, 2012 at 7:19 AM, Andrei Popa ierd...@gmail.com wrote:
 On Sun, 2012-06-17 at 06:14 +0200, rupert THURNER wrote:
  Will result in anything reported in 'dmesg' output?
  [ 6431.514454] device label 388gb-data devid 1 transid 1086 /dev/sda6
  [ 6431.514969] btrfs: disabling disk space caching
  [ 6431.514977] btrfs: force clearing of disk cache
 tried the same with kernel versions from
 http://kernel.ubuntu.com/~kernel-ppa/mainline/:
 * 3.2.20
 * 3.4.0
 with version 3.4.0, i could delete one tiny file, but only one. peter
 mentioned before to run the rm as root. yes, i did that, with all
 kernel versions, the error was the same all the time.

 Have you tried to delete the files with echo  file ? This will empty
 the file without requiring a new metadata allocation.

 thanks for the hint! i did with the original kernel, but now i tried
 it as root and with the 3.4.0 kernel as well. no space left on
 device. is there a  special kernel version or a special btrfs tool
 which allows to remove a file without writing more data?


Have you tried mounting with '-o nodatacow' yet?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   >