Re: BTRFS fsck apparent errors

2012-07-04 Thread David Sterba
On Wed, Jul 04, 2012 at 07:40:05AM +0700, Fajar A. Nugraha wrote:
 Are there any known btrfs regression in 3.4? I'm using 3.4.0-3-generic
 from a ppa, but a normal mount - umount cycle seems MUCH longer
 compared to how it was on 3.2, and iostat shows the disk is
 read-IOPS-bound

Is it just mount/umount without any other activity? Is the fs
fragmented (or aged), almost full, has lots of files?

 
 # time mount LABEL=WD-root
 
 real  0m10.400s
 user  0m0.000s
 sys   0m0.060s
 
 # time umount /media/WD-root/
 
 real  0m22.419s
 user  0m0.000s
 sys   0m0.064s
 
 # /proc/10142/stack  --- the PID of umount process

The process(es) actually doing the work are the btrfs workers, usual
sucspects are btrfs-cache (free space cache) or btrfs-ino (inode cache)
that are writing the cache states back to disk.
I'm using iotop to observe such things.


david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS fsck apparent errors

2012-07-04 Thread Fajar A. Nugraha
On Wed, Jul 4, 2012 at 8:42 PM, David Sterba d...@jikos.cz wrote:
 On Wed, Jul 04, 2012 at 07:40:05AM +0700, Fajar A. Nugraha wrote:
 Are there any known btrfs regression in 3.4? I'm using 3.4.0-3-generic
 from a ppa, but a normal mount - umount cycle seems MUCH longer
 compared to how it was on 3.2, and iostat shows the disk is
 read-IOPS-bound

 Is it just mount/umount without any other activity?

Yes

 Is the fs
 fragmented

Not sure how to check that quickly

 (or aged),

Over 1 year, so yes

 almost full,

df says 83% used, so probably yes (depending on how you define almost)

~ $ df -h /media/WD-root
Filesystem  Size  Used Avail Use% Mounted on
/dev/sdc2   922G  733G  155G  83% /media/WD-root

~ $ sudo btrfs fi df /media/WD-root/
Data: total=883.95GB, used=729.68GB
System, DUP: total=8.00MB, used=104.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=18.75GB, used=1.49GB
Metadata: total=8.00MB, used=0.00

 has lots of files?

it's a normal 1 TB usb disk, with docs, movies, vm images, etc. No
particular lots-of-small-files like maildir or anything like that.


 # time umount /media/WD-root/

 real  0m22.419s
 user  0m0.000s
 sys   0m0.064s

 # /proc/10142/stack  --- the PID of umount process

 The process(es) actually doing the work are the btrfs workers, usual
 sucspects are btrfs-cache (free space cache) or btrfs-ino (inode cache)
 that are writing the cache states back to disk.

Not sure about that, since iostat shows it's mostly read, not write.
Will try iotop later.
I tested also with Chris' for-linus on top of 3.4, same result (really
long time to umount).

Reverting back to ubuntu's 3.2.0-26-generic, umount only took less than 1 s :P
So I guess I'm switching back to 3.2 for now.

-- 
Fajar
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS fsck apparent errors

2012-07-04 Thread David Sterba
On Wed, Jul 04, 2012 at 10:46:21PM +0700, Fajar A. Nugraha wrote:
  Is it just mount/umount without any other activity?
 Yes
 
  Is the fs
  fragmented
 Not sure how to check that quickly
 
  (or aged),
 Over 1 year, so yes
 
  almost full,
 df says 83% used, so probably yes (depending on how you define almost)

that matches my expectation that could lead to the mount/umount
slowness due to fragmentation

  has lots of files?
 
 it's a normal 1 TB usb disk, with docs, movies, vm images, etc. No
 particular lots-of-small-files like maildir or anything like that.

So it's probably not an issue with inode_cache.

  # time umount /media/WD-root/
 
  real  0m22.419s
  user  0m0.000s
  sys   0m0.064s
 
  # /proc/10142/stack  --- the PID of umount process
 
  The process(es) actually doing the work are the btrfs workers, usual
  sucspects are btrfs-cache (free space cache) or btrfs-ino (inode cache)
  that are writing the cache states back to disk.
 
 Not sure about that, since iostat shows it's mostly read, not write.
 Will try iotop later.
 I tested also with Chris' for-linus on top of 3.4, same result (really
 long time to umount).

Would be good to verify if it's the btrfs-cache worker or not, IIRC
there were more writes than reads, so I'm not sure this is the right
direction.

The 3.5 series or 3.4+for-linus has some changes wrt free space cache
(removed the 'ideal caching mode') that caused slow mounts but has been
fixed.

I've looked again at the umount process call stack, and it's waiting
for writing the btree_inode which is the representation of the b-tree
nodes, it's quite possible that changes to the generic writeback code is
causing this. AFAIK the btree_inode does not behave as a normal file
inode regarding writeback.  The good reference point is 3.2, there were
non-trivial writeback changes merged since.

Guessing now, if the mount causes eg. atime update, then this triggers
cow, dirties the btree_inode and needs to read data from disk,
fragmentation slows this down. Number of cowed blocks is small compared
to the reads (and maybe generic readahead reads more than what's
actually needed for the cow operation ...).


david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS fsck apparent errors

2012-07-03 Thread Hugo Mills
On Tue, Jul 03, 2012 at 05:10:13PM +0200, Swâmi Petaramesh wrote:
 A couple days ago, I have converted my Ubuntu Precise machine from
 ext4 to BTRFS using btrfs-convert.
[snip]
 After I had shifted, I tried to defragment and compress my FS using
 commands such as :
 
 find /mnt/STORAGEFS/STORAGE/ -exec btrfs fi defrag -clzo -v {} \;
 
 During execution of such commands, my kernel oopsed, so I restarted.
 
 Afterwards, I noticed that, during the execution of such a command,
 my FS free space was quickly dropping, where I would have expected
 it to increase...

   What you're seeing is the fact that you've still got the complete
ext4 filesystem and all of its data sitting untouched on the disk as
well. The defrag will have taken a complete new copy of the data but
not removed the ext4 copy.

   If you delete the conversion recovery directory (ext2_subvol), then
you'll see the space usage drop again. Of course, doing that will also
mean that you won't be able to roll back to ext4 without reformatting
and restoring from your backups. (You have got backups, right?)

 Once finished, I checked a couple of BTRFS FSes using btrfsck, but I
 interpret the results as having some errors :
 
 root@fnix:/# btrfsck /dev/VG1/DEBMINT
 checking extents
 checking fs roots
 root 256 inode 257 errors 800
 found 7814565888 bytes used err is 1
 total csum bytes: 6264636
 total tree bytes: 394928128
 total fs tree bytes: 365121536
 btree space waste bytes: 101451531
 file data blocks allocated: 20067590144
  referenced 13270241280
 Btrfs Btrfs v0.19
 
 root@fnix:/# btrfsck /dev/VG1/STORAGE
 checking extents
 checking fs roots
 root 301 inode 10644 errors 1000
 root 301 inode 10687 errors 1000
 root 301 inode 10688 errors 1000
 root 301 inode 10749 errors 1000
 found 55683117056 bytes used err is 1
 total csum bytes: 54188580
 total tree bytes: 191500288
 total fs tree bytes: 103596032
 btree space waste bytes: 49730472
 file data blocks allocated: 55640522752
  referenced 56466059264
 Btrfs Btrfs v0.19
 
 It doesn't seem that btrfsck attempts to fix these errors in any
 way... It just displays them.

   Correct, by default it just checks the filesystem. Just to be sure:
the filesystems in question weren't mounted, were they?

   I would also suggest using a 3.4 kernel. There's at least one FS
corruption bug known to exist in 3.2 that's been fixed in 3.4.
(Probably not what's happened in this case, but it's best to try to
avoid these kinds of issues).

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- emacs: Eats Memory and Crashes. --- 


signature.asc
Description: Digital signature


Re: BTRFS fsck apparent errors

2012-07-03 Thread David Sterba
On Tue, Jul 03, 2012 at 04:22:08PM +0100, Hugo Mills wrote:
Correct, by default it just checks the filesystem. Just to be sure:
 the filesystems in question weren't mounted, were they?

fsck will refuse to run on a mounted filesystem, though in case of a
read-only mount it might be useful during debugging, I'm using this
patch

--- a/btrfsck.c
+++ b/btrfsck.c
@@ -3474,6 +3474,7 @@ static struct option long_options[] = {
{ repair, 0, NULL, 0 },
{ init-csum-tree, 0, NULL, 0 },
{ init-extent-tree, 0, NULL, 0 },
+   { force, 0, NULL, 0 },
{ 0, 0, 0, 0}
 };

@@ -3484,12 +3485,13 @@ int main(int ac, char **av)
struct btrfs_fs_info *info;
struct btrfs_trans_handle *trans = NULL;
u64 bytenr = 0;
-   int ret;
+   int ret = 0;
int num;
int repair = 0;
int option_index = 0;
int init_csum_tree = 0;
int rw = 0;
+   int force = 0;

while(1) {
int c;
@@ -3516,6 +3518,9 @@ int main(int ac, char **av)
printf(Creating a new CRC tree\n);
init_csum_tree = 1;
rw = 1;
+   } else if (option_index == 4) {
+   printf(Skip mount checks\n);
+   force = 1;
}

}
@@ -3527,7 +3532,7 @@ int main(int ac, char **av)
radix_tree_init();
cache_tree_init(root_cache);

-   if((ret = check_mounted(av[optind]))  0) {
+   if(!force  (ret = check_mounted(av[optind]))  0) {
fprintf(stderr, Could not check mount status: %s\n, 
strerror(-ret));
return ret;
} else if(ret) {

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS fsck apparent errors

2012-07-03 Thread Zach Brown

On 07/03/2012 08:52 AM, David Sterba wrote:

On Tue, Jul 03, 2012 at 04:22:08PM +0100, Hugo Mills wrote:

Correct, by default it just checks the filesystem. Just to be sure:
the filesystems in question weren't mounted, were they?


fsck will refuse to run on a mounted filesystem, though in case of a
read-only mount it might be useful during debugging, I'm using this
patch

--- a/btrfsck.c
+++ b/btrfsck.c
@@ -3474,6 +3474,7 @@ static struct option long_options[] = {
 { repair, 0, NULL, 0 },
 { init-csum-tree, 0, NULL, 0 },
 { init-extent-tree, 0, NULL, 0 },
+   { force, 0, NULL, 0 },


If we were to run with this, I think it should be called something other
than force.  fsck.ext* has trained people to think that 'forcing' a fsck
means doing a full repair pass even if the fs thinks that it was shut
down cleanly.

--read-only would be good if fsck was taught to not even try to write in
this mode.

- z
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS fsck apparent errors

2012-07-03 Thread David Sterba
On Tue, Jul 03, 2012 at 09:26:41AM -0700, Zach Brown wrote:
 On 07/03/2012 08:52 AM, David Sterba wrote:
 --- a/btrfsck.c
 +++ b/btrfsck.c
 @@ -3474,6 +3474,7 @@ static struct option long_options[] = {
  { repair, 0, NULL, 0 },
  { init-csum-tree, 0, NULL, 0 },
  { init-extent-tree, 0, NULL, 0 },
 +   { force, 0, NULL, 0 },
 
 If we were to run with this, I think it should be called something other
 than force.  fsck.ext* has trained people to think that 'forcing' a fsck
 means doing a full repair pass even if the fs thinks that it was shut
 down cleanly.

Agreed, it's not a good name and was rather a quick aid to myself, I
didn't put much thinking into the user interface as I usually do :)

 --read-only would be good if fsck was taught to not even try to write in
 this mode.

read-only mode is default and (hopefully) does no writes to the device,
this would require the --repair option so what you propose is sort of a
sanity check, right?


david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS fsck apparent errors

2012-07-03 Thread Zach Brown



read-only mode is default and (hopefully) does no writes to the device,
this would require the --repair option so what you propose is sort of a
sanity check, right?


Ah, I didn't realize that it didn't write without --repair.  Yeah,
making sure that people don't try to combine the repair and
read-from-mounted-devices options seems reasonable.

- z
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS fsck apparent errors

2012-07-03 Thread Swâmi Petaramesh
Le 03/07/2012 17:22, Hugo Mills a écrit :
 What you're seeing is the fact that you've still got the complete ext4
 filesystem and all of its data sitting untouched on the disk as well.
 The defrag will have taken a complete new copy of the data but not
 removed the ext4 copy. 
I though about that... However, I had btrfs su del the ext2_saved
subvolume, so it is expected to have been deleted...

If not, how could I possibly delete it, now that I can't see it anymore ?


 It doesn't seem that btrfsck attempts to fix these errors in any
 way... It just displays them.


Correct, by default it just checks the filesystem. Just to be sure:
 the filesystems in question weren't mounted, were they?

  
No, the filesystems weren't mounted... If by default, btrfsck doesn't
fix, how could I ask it to fix ? man btrfsck or btrfsck -h do not
show any option, only a device name...

TIA.

Kind regards.

-- 
Swâmi Petaramesh sw...@petaramesh.org http://petaramesh.org PGP 9076E32E

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS fsck apparent errors

2012-07-03 Thread Fajar A. Nugraha
On Tue, Jul 3, 2012 at 10:22 PM, Hugo Mills h...@carfax.org.uk wrote:
 On Tue, Jul 03, 2012 at 05:10:13PM +0200, Swâmi Petaramesh wrote:

 After I had shifted, I tried to defragment and compress my FS using
 commands such as :

 find /mnt/STORAGEFS/STORAGE/ -exec btrfs fi defrag -clzo -v {} \;

 During execution of such commands, my kernel oopsed, so I restarted.

I would also suggest using a 3.4 kernel. There's at least one FS
 corruption bug known to exist in 3.2 that's been fixed in 3.4.


Are there any known btrfs regression in 3.4? I'm using 3.4.0-3-generic
from a ppa, but a normal mount - umount cycle seems MUCH longer
compared to how it was on 3.2, and iostat shows the disk is
read-IOPS-bound

# time mount LABEL=WD-root

real0m10.400s
user0m0.000s
sys 0m0.060s

# time umount /media/WD-root/

real0m22.419s
user0m0.000s
sys 0m0.064s

# /proc/10142/stack  --- the PID of umount process
[8111dd1e] sleep_on_page+0xe/0x20
[8111de88] wait_on_page_bit+0x78/0x80
[8111e08c] filemap_fdatawait_range+0x10c/0x1a0
[a00744eb] btrfs_wait_marked_extents+0x6b/0xc0 [btrfs]
[a007457b] btrfs_write_and_wait_marked_extents+0x3b/0x60 [btrfs]
[a00745cb] btrfs_write_and_wait_transaction+0x2b/0x50 [btrfs]
[a0074e69] btrfs_commit_transaction+0x759/0x960 [btrfs]
[a00700db] btrfs_commit_super+0xbb/0x110 [btrfs]
[a0071490] close_ctree+0x2a0/0x310 [btrfs]
[a004b6c9] btrfs_put_super+0x19/0x20 [btrfs]
[811810b2] generic_shutdown_super+0x62/0xf0
[811811d6] kill_anon_super+0x16/0x30
[a004df3a] btrfs_kill_super+0x1a/0x90 [btrfs]
[811816ac] deactivate_locked_super+0x3c/0xa0
[81181f9e] deactivate_super+0x4e/0x70
[8119df9c] mntput_no_expire+0xdc/0x130
[8119f296] sys_umount+0x66/0xe0
[8169e129] system_call_fastpath+0x16/0x1b
[] 0x

-- 
Fajar
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS fsck apparent errors

2012-07-03 Thread Dave Chinner
On Tue, Jul 03, 2012 at 07:37:42PM +0200, David Sterba wrote:
 On Tue, Jul 03, 2012 at 09:26:41AM -0700, Zach Brown wrote:
  On 07/03/2012 08:52 AM, David Sterba wrote:
  --- a/btrfsck.c
  +++ b/btrfsck.c
  @@ -3474,6 +3474,7 @@ static struct option long_options[] = {
   { repair, 0, NULL, 0 },
   { init-csum-tree, 0, NULL, 0 },
   { init-extent-tree, 0, NULL, 0 },
  +   { force, 0, NULL, 0 },
  
  If we were to run with this, I think it should be called something other
  than force.  fsck.ext* has trained people to think that 'forcing' a fsck
  means doing a full repair pass even if the fs thinks that it was shut
  down cleanly.
 
 Agreed, it's not a good name and was rather a quick aid to myself, I
 didn't put much thinking into the user interface as I usually do :)

xfs_repair uses:

   -d Repair  dangerously.  Allow  xfs_repair  to  repair an
  XFS filesystem mounted read only. This is typically
  done on a root fileystem from single user mode,
  immediately followed by a reboot.

  --read-only would be good if fsck was taught to not even try to write in
  this mode.
 
 read-only mode is default and (hopefully) does no writes to the device,
 this would require the --repair option so what you propose is sort of a
 sanity check, right?

If you run fsck/reapir on a mounted filesystem, and it changes the
block device (i.e. fixes something) the mounted filesystem does not
know about it and so may use stale metadata and bad things will
happen. That's why it's called dangerous. ;)

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html