Re: Performance Issues

2014-09-20 Thread Marc Dietrich
Am Freitag, 19. September 2014, 13:51:22 schrieb Holger Hoffstätte:
 
 On Fri, 19 Sep 2014 13:18:34 +0100, Rob Spanton wrote:
 
  I have a particularly uncomplicated setup (a desktop PC with a hard
  disk) and I'm seeing particularly slow performance from btrfs.  A `git
  status` in the linux source tree takes about 46 seconds after dropping
  caches, whereas on other machines using ext4 this takes about 13s.  My
  mail client (evolution) also seems to perform particularly poorly on
  this setup, and my hunch is that it's spending a lot of time waiting on
  the filesystem.
 
 This is - unfortunately - a particular btrfs oddity/characteristic/flaw,
 whatever you want to call it. git relies a lot on fast stat() calls,
 and those seem to be particularly slow with btrfs esp. on rotational
 media. I have the same problem with rsync on a freshly mounted volume;
 it gets fast (quite so!) after the first run.

my favorite benchmark is ls -l /usr/bin:

ext4: 0.934s
btrfs:   21.814s

also mounting large partitons (several 100Gs) takes lot of time on btrfs.
Better to defer it during boot using e.g. noauto,x-systemd.automount.

Marc


signature.asc
Description: This is a digitally signed message part.


Re: general thoughts and questions + general and RAID5/6 stability?

2014-09-20 Thread Duncan
William Hanson posted on Fri, 19 Sep 2014 16:50:05 -0400 as excerpted:

 Hey guys...
 
 I was just crawling through the wiki and this list's archive to find
 answers about some questions. Actually many of them matching those
 which Christoph has asked here some time ago, though it seems no
 answers came up at all.

Seems his post slipped thru the cracks, perhaps because it was too much 
at once for people to try to chew on.  Let's see if second time around 
works better...

 
 On Sun, 2014-08-31 at 06:02 +0200, Christoph Anton Mitterer wrote:
 

 For some time now I consider to use btrfs at a larger scale, basically
 in two scenarios:
 

 a) As the backend for data pools handled by dcache (dcache.org), where
 we run a Tier-2 in the higher PiB range for the LHC Computing Grid...
 
 For now that would be rather boring use of btrfs (i.e. not really
 using any of its advanced features) and also RAID functionality would
 still be provided by hardware (at least with the current hardware
 generations we have in use).

While that scale is simply out of my league, here's what I'd say if I 
were asked my own opinion.

I'd say btrfs isn't ready for that, basically for one reason.

Btrfs has stabilized quite a bit in the last year, and the scary warnings 
have now come off, but it's still not fully stable, and keeping backups 
of any data you value is still very strongly recommended.

The scenario above is talking high PiB scale.  Simply put, that's a 
**LOT** of data to keep backups of, or to lose all at once if you don't 
and something happens!  At that scale I'd look at something more mature, 
with a reputation for working well at that scale.  Xfs is what I'd be 
looking at.  That or possibly zfs.

People who value their data highly tend, for good reason, to be rather 
conservative when it comes to filesystems.  At that level and at the 
conservatism I'd guess it calls for, I'd say another two years, perhaps 
longer, given btrfs history and how much longer than expected every step 
has seemed to take.

 b) Personally, for my NAS. Here the main goal is less performance but
 rather data safety (i.e. I want something like RAID6 or better) and
 security (i.e. it will be on top of dm-crypt/LUKS) and integrity.
 Hardware wise I'll use and UPS as well as enterprise SATA disks, from
 different vendors respectively different production lots.
 
 (Of course I'm aware that btrfs is experimental, and I would have
 regular backups)

[...]

 [1] So one issue I have is to determine the general stability of the
 different parts.

Raid5/6 are still out of the question at this point.  The operating code 
is there, but the recovery code is incomplete.  In effect, btrfs raid5/6 
must be treated as if it's slow raid0 in terms of dependability, but with 
a free upgrade to raid5/6 when the code is complete (assuming the array 
survives that long in its raid0 stage), as the operational code has been 
there all along and it has been creating and writing the parity, it just 
can't yet reliably restore from it if called to do so.

So if you wouldn't be comfortable with the data on raid0, that is, with 
the idea of losing it all if you lose any of it, don't put it on btrfs 
raid5/6 at this point.  The situation is actually /somewhat/ better than 
that, but that's the reliability bottom line you should be planning for, 
and if raid0 reliability isn't appropriate for your data, neither is 
btrfs raid5/6 at this point.

Btrfs raid1 and raid10 modes, OTOH, are reasonably mature and ready for 
use, basically at the same level as single-device btrfs.  Which is to say 
there's still active development and keep your backups ready as it's not 
/entirely/ stable yet, but a lot of people are using it without undue 
issues -- just keep those backups current and tested, and be prepared to 
use them if you need to.

For btrfs raid1 mode, it's worth pointing out that for btrfs raid1 means 
two copies on different devices, no matter how many devices are in the 
array.  It's always two copies, more devices simply adds more total 
capacity.

Similarly with btrfs raid10, the 1/mirror side of that 10 is always 
paired.  Stripes can be two or three or whatever width, but there's 
always only the two mirrors.

N-way-mirroring is on the roadmap, scheduled for introduction after 
raid5/6 is complete.  So it's coming, but given the time it has taken for 
raid5/6 and the fact that it's still not complete, reasonably reliable n-
way-mirroring could easily still be a year away or more.


Features: Most of the core btrfs features are reasonably stable but some 
don't work so well together; see my just-previous post on a different 
thread about nocow and snapshots, for instance.  (Basically, setting nocow 
ends up being nearly useless in the face of frequent snapshots of an 
actively rewritten file.)

Qgroups/quotas are an exception.  They've recently rewritten it as the 
old approach simply wasn't working, and while it /should/ be more stable 
now, it's still very new 

Re: Performance Issues

2014-09-20 Thread Martin
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 20/09/14 09:23, Marc Dietrich wrote:
 Am Freitag, 19. September 2014, 13:51:22 schrieb Holger
 Hoffstätte:
 
 On Fri, 19 Sep 2014 13:18:34 +0100, Rob Spanton wrote:
 
 I have a particularly uncomplicated setup (a desktop PC with a
 hard disk) and I'm seeing particularly slow performance from
 btrfs.  A `git status` in the linux source tree takes about 46
 seconds after dropping caches, whereas on other machines using
 ext4 this takes about 13s.  My mail client (evolution) also
 seems to perform particularly poorly on this setup, and my
 hunch is that it's spending a lot of time waiting on the
 filesystem.
 
 This is - unfortunately - a particular btrfs
 oddity/characteristic/flaw, whatever you want to call it. git
 relies a lot on fast stat() calls, and those seem to be
 particularly slow with btrfs esp. on rotational media. I have the
 same problem with rsync on a freshly mounted volume; it gets fast
 (quite so!) after the first run.
 
 my favorite benchmark is ls -l /usr/bin:
 
 ext4: 0.934s btrfs:   21.814s


So... On my old low power slow Atom SSD ext4 system:

time ls -l /usr/bin

real0m0.369s

user0m0.048s
sys 0m0.128s

Repeated:

real0m0.107s

user0m0.040s
sys 0m0.044s

and that is for:

# ls -l /usr/bin | wc
   1384   13135   88972


On a comparatively super dual core Athlon64 SSD btrfs three disk btrfs
raid1 system:

real0m0.103s

user0m0.004s
sys 0m0.040s

Repeated:

real0m0.027s

user0m0.008s
sys 0m0.012s

For:

# ls -l /usr/bin | wc
   1449   13534   89024


And on an identical comparatively super dual core Athlon64 HDD
'spinning rust' btrfs two disk btrfs raid1 system:

real0m0.101s

user0m0.008s
sys 0m0.020s

Repeated:

real0m0.020s

user0m0.004s
sys 0m0.012s

For:

# ls -l /usr/bin | wc
   1161   10994   79350


So, no untoward concerns there.

Marc:

You on something really ancient and hopelessly fragmented into oblivion?



 also mounting large partitons (several 100Gs) takes lot of time on
 btrfs.

I've noticed that also for some 16TB btrfs raid1 mounts, btrfs is not
as fast as mounting ext4 but then again all very much faster than
mounting ext4 when a fsck count is tripped!...

So, nothing untoward there.


For my usage, controlling fragmentation and having some automatic
mechanism to deal with pathological fragmentation with such as sqlite
files are greater concerns!

(Yes, there is the manual fix of NOCOW... I also put such horrors into
tmpfs and snapshot that... All well and good but all unnecessary admin
tasks!)


Regards,
Martin


-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.15 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAlQdhBwACgkQ+sI3Ds7h07f/VwCgkHPjrIkBkWh5zrKwvN7fXalZ
LWcAoIbLFEoc7iTNLzgSChNvnYatIkuZ
=YlDI
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Using two mirrored drives separately

2014-09-20 Thread Leonid Bloch
Hi,

I am wondering: if I set up btrfs on two identical drives, with data
and metadata mirroring, will it be possible to use these drives
separately later on? Will just one of these drives work as a regular
btrfs-formatted single drive if connected to a different machine?

Thanks ahead,
Leonid.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Using two mirrored drives separately

2014-09-20 Thread Piotr Szymaniak
On Sat, Sep 20, 2014 at 07:11:52PM +0300, Leonid Bloch wrote:
 I am wondering: if I set up btrfs on two identical drives, with data
 and metadata mirroring, will it be possible to use these drives
 separately later on? Will just one of these drives work as a regular
 btrfs-formatted single drive if connected to a different machine?

If by this you mean btrfs raid1, then yes, they will work this way. Just
both of them will have missing device and, afaik, wont mount without
-o degraded. But then you can rebalance to notraid1 or add another disk
and replace missing devices on new machines.


Piotr Szymaniak.


signature.asc
Description: Digital signature


Re: Performance Issues

2014-09-20 Thread Marc Dietrich
Hi, 

Am Samstag 20 September 2014, 22:04:16 schrieb Wang Shilong:
 Hi,
 
 just my two cents here.^_^
 
  Am Freitag, 19. September 2014, 13:51:22 schrieb Holger Hoffstätte:
  On Fri, 19 Sep 2014 13:18:34 +0100, Rob Spanton wrote:
  I have a particularly uncomplicated setup (a desktop PC with a hard
  disk) and I'm seeing particularly slow performance from btrfs.  A `git
  status` in the linux source tree takes about 46 seconds after dropping
  caches, whereas on other machines using ext4 this takes about 13s.  My
  mail client (evolution) also seems to perform particularly poorly on
  this setup, and my hunch is that it's spending a lot of time waiting on
  the filesystem.
  
  This is - unfortunately - a particular btrfs oddity/characteristic/flaw,
  whatever you want to call it. git relies a lot on fast stat() calls,
  and those seem to be particularly slow with btrfs esp. on rotational
  media. I have the same problem with rsync on a freshly mounted volume;
  it gets fast (quite so!) after the first run.
  
  my favorite benchmark is ls -l /usr/bin:
  
  ext4: 0.934s
  btrfs:   21.814s
 
 I did a quick benchmark for this:
 
 Testing tool is something like follows, it create 50W files
 and 50w directories under a fresh mkfs filesystem, btrfs is just
 a little slower than ext4:
 
 For ext4:
 real  0m9.295s
 user  0m2.252s
 sys   0m7.010s
 
 For btrfs:
 real  0m10.207s
 user  0m1.347s
 sys   0m8.353s
 
 And test is done with a 20G vm disk(backend is hard disk) with
 latest kernel compiled under VM.

thanks for testing! However, I think a double cached VM disk may not be a 
good test candidate.

 #!/bin/bash
 
 umount /dev/sdc
 #~/source/e2fsprogs/misc/mke2fs -F -O inline_data /dev/sdc /dev/null
 mkfs.ext4 -F /dev/sdc /dev/null
 mount /dev/sdc /mnt
 ./mdtest  -d /mnt/ext4 -n 50 -C /dev/null
 echo 3  /proc/sys/vm/drop_caches
 time ls -l /mnt/ext4/\#test-dir.0/mdtest_tree.0  /dev/null
 
 umount /dev/sdc
 mkfs.btrfs -f /dev/sdc /dev/null
 mount /dev/sdc /mnt
 ./mdtest  -d /mnt/btrfs -n 50 -C /dev/null
 echo 3  /proc/sys/vm/drop_caches
 time ls -l /mnt/btrfs/\#test-dir.0/mdtest_tree.0  /dev/null

ok, 50 is much more than my 5000 files in /usr/bin, so ext4 needs a bit 
more time. Also a fresh new btrfs may not reflect the same stage as an ageing 
one. Unfortunately, I haven't found a method yet to find out the fragmentation 
of a directory, so the question is why btrfs is that fast in your case 

I did a small experiment:

mkdir /usr/bin2; cd /usr/bin2
for i in ../bin/*; do ln $i; done
echo 3  /proc/sys/vm/drop_caches
time ls -l  /dev/null
real0m5.935s
user0m0.063s
sys 0m0.344s

better, so I think this is partly due to heavy fragmentation of the original 
directory where even defrag does not help, but

btrfs fi defrag bin2
echo 3  /proc/sys/vm/drop_caches
time ls -l  /dev/null
real0m8.059s
user0m0.080s
sys 0m0.381s

and
btrfs fi defrag -clzo bin2
echo 3  /proc/sys/vm/drop_caches
time ls -l  /dev/null
real0m12.524s
user0m0.072s
sys 0m0.461s

times are +/- 1s in repeated tests.

so defragging seems to hurt in this case.

 So here i think Btrfs is not that you think much slower than Ext4 at least
 for ‘ls’ which means directory reading performances…
 
 Here i think you could do several ways to improve or avoid such things:
 
 1. create a separate subvolume or separate partition for /usr and use
 noatime mount option if possible.(I remembered Marc MARLIEN gave some good
 example for this)

I have relatime, nodiratime and also compress=lzo.

 2. running defrag command to reduce fragmentation.

I do this once per week on all directories via a cron job:
find / -xdev -type d -print -exec btrfs filesystem defragment -c '{}' \;

 Reasons for doing these are because that directory like /usr/bin is
 regularly accessed which will also trigger Btrfs widely COWed which may
 cause serious fragmentation and that may cause some bad performances..

and compression may also not benefit it. 

 And another factor btrfs in default all files are mixed together in a a fs
 B-tree, which means that all read/write lock will walk through same tree
 which may cause some lock contention problem.
 
 So use a separate subvoulme tree could improve lock thing a bit for that
 IMO.

well, I have /, /var, and /usr on the same partition. So not so much 
additional data (/home, /usr/src, /opt have their own partition).

 BTW, next time if someone reported some problems,it will be nice to
 give your detailed information for example kernel version, how many
 subvolumes/snapshots, btrfs file system configurations, usage(running btrfs
 file show,btrfs file df e.g.) these informations are useful for others to
 reproduce and analysis...

ok, for the sake of completness:
no subvolumes/snapshots (but merged two partitons),
nodiratime,relatime,compress=lzo,space_cache,autodefrag,
kernel 3.17rc5 (+btrfsprogs 3.17.x)
rotating media

# btrfs file show /
Label: 'root'  uuid: 

[PATCH 1/4] Properly cast to avoid compiler warnings, fixes FTBFS on alpha and ia64.

2014-09-20 Thread Dimitri John Ledkov
Bug-Debian: http://bugs.debian.org/539433
Bug-Debian: http://bugs.debian.org/583768
Authors:
 Luca Bruno lu...@debian.org
 Alexander Kurtz kurtz.a...@googlemail.com
 Daniel Baumann daniel.baum...@progress-technologies.net

Signed-off-by: Dimitri John Ledkov x...@debian.org
---
 btrfs-convert.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/btrfs-convert.c b/btrfs-convert.c
index 71b7bd6..3673050 100644
--- a/btrfs-convert.c
+++ b/btrfs-convert.c
@@ -2441,7 +2441,7 @@ static int do_rollback(const char *devname)
ext2_root = btrfs_read_fs_root(root-fs_info, key);
if (!ext2_root || IS_ERR(ext2_root)) {
fprintf(stderr, unable to open subvol %llu\n,
-   key.objectid);
+   (unsigned long long) key.objectid);
goto fail;
}
 
-- 
2.1.0.rc1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] Fixing unaligned memory accesses.

2014-09-20 Thread Dimitri John Ledkov
From: Shawn Landen shawnland...@gmail.com

Bug-Debian: http://bugs.debian.org/656955
Signed-off-by: Dimitri John Ledkov x...@debian.org
---
 ctree.h   | 18 ++
 volumes.c |  5 +++--
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/ctree.h b/ctree.h
index fa73c4a..92c6ad3 100644
--- a/ctree.h
+++ b/ctree.h
@@ -19,6 +19,8 @@
 #ifndef __BTRFS__
 #define __BTRFS__
 
+#include stdint.h
+
 #if BTRFS_FLAT_INCLUDES
 #include list.h
 #include kerncompat.h
@@ -1191,13 +1193,17 @@ struct btrfs_root {
 static inline u##bits btrfs_##name(const struct extent_buffer *eb) \
 {  \
const struct btrfs_header *h = (struct btrfs_header *)eb-data; \
-   return le##bits##_to_cpu(h-member);\
+   uint##bits##_t t;   \
+   memcpy(t, h-member, sizeof(h-member));  \
+   return le##bits##_to_cpu(t);\
 }  \
 static inline void btrfs_set_##name(struct extent_buffer *eb,  \
u##bits val)\
 {  \
struct btrfs_header *h = (struct btrfs_header *)eb-data;   \
-   h-member = cpu_to_le##bits(val);   \
+   uint##bits##_t t;   \
+   t = cpu_to_le##bits(val);   \
+   memcpy(h-member, t, sizeof(h-member));  \
 }
 
 #define BTRFS_SETGET_FUNCS(name, type, member, bits)   \
@@ -1219,11 +1225,15 @@ static inline void btrfs_set_##name(struct 
extent_buffer *eb,   \
 #define BTRFS_SETGET_STACK_FUNCS(name, type, member, bits) \
 static inline u##bits btrfs_##name(const type *s)  \
 {  \
-   return le##bits##_to_cpu(s-member);\
+   uint##bits##_t t;   \
+   memcpy(t, s-member, sizeof(s-member));  \
+   return le##bits##_to_cpu(t);\
 }  \
 static inline void btrfs_set_##name(type *s, u##bits val)  \
 {  \
-   s-member = cpu_to_le##bits(val);   \
+   uint##bits##_t t;   \
+   t = cpu_to_le##bits(val);   \
+   memcpy(s-member, t, sizeof(s-member));  \
 }
 
 BTRFS_SETGET_FUNCS(device_type, struct btrfs_dev_item, type, 64);
diff --git a/volumes.c b/volumes.c
index 388c94e..102380b 100644
--- a/volumes.c
+++ b/volumes.c
@@ -472,10 +472,11 @@ static int find_next_chunk(struct btrfs_root *root, u64 
objectid, u64 *offset)
if (found_key.objectid != objectid)
*offset = 0;
else {
+   u64 t;
chunk = btrfs_item_ptr(path-nodes[0], path-slots[0],
   struct btrfs_chunk);
-   *offset = found_key.offset +
-   btrfs_chunk_length(path-nodes[0], chunk);
+   t = found_key.offset + 
btrfs_chunk_length(path-nodes[0], chunk);
+   memcpy(offset, t, sizeof(found_key.offset));
}
}
ret = 0;
-- 
2.1.0.rc1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] Fixes FTBFS with --no-add-needed.

2014-09-20 Thread Dimitri John Ledkov
From: Luk Claes l...@debian.org

Bug-Debian: http://bugs.debian.org/554059
Signed-off-by: Dimitri John Ledkov x...@debian.org
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index e721e99..441e925 100644
--- a/Makefile
+++ b/Makefile
@@ -26,7 +26,7 @@ TESTS = fsck-tests.sh convert-tests.sh
 INSTALL = install
 prefix ?= /usr/local
 bindir = $(prefix)/bin
-lib_LIBS = -luuid -lblkid -lm -lz -llzo2 -L.
+lib_LIBS = -luuid -lblkid -lm -lz -llzo2 -lcom_err -L.
 libdir ?= $(prefix)/lib
 incdir = $(prefix)/include/btrfs
 LIBS = $(lib_LIBS) $(libs_static)
-- 
2.1.0.rc1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] Default to acting like fsck.

2014-09-20 Thread Dimitri John Ledkov
Inspect arguments, if we are not called as btrfs, then assume we are
called to act like fsck.

Bug-Debian: http://bugs.debian.org/712078
Signed-off-by: Dimitri John Ledkov x...@debian.org
---
 btrfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/btrfs.c b/btrfs.c
index e83349c..e8a87ac 100644
--- a/btrfs.c
+++ b/btrfs.c
@@ -222,7 +222,7 @@ int main(int argc, char **argv)
else
bname = argv[0];
 
-   if (!strcmp(bname, btrfsck)) {
+   if (strcmp(bname, btrfs) != 0) {
argv[0] = check;
} else {
argc--;
-- 
2.1.0.rc1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


deleting a dead device

2014-09-20 Thread Russell Coker
On a system running the Debian 3.14.15-2 kernel I added a new drive to a 
RAID-1 array.  My aim was to add a device and remove one of the old devices.

Sep 21 11:26:51 server kernel: [2070145.375221] BTRFS: lost page write due to 
I/O error on /dev/sdc3
Sep 21 11:26:51 server kernel: [2070145.375225] BTRFS: bdev /dev/sdc3 errs: wr 
269, rd 0, flush 0, corrupt 0, gen 0
Sep 21 11:27:21 server kernel: [2070175.517691] BTRFS: lost page write due to 
I/O error on /dev/sdc3
Sep 21 11:27:21 server kernel: [2070175.517699] BTRFS: bdev /dev/sdc3 errs: wr 
270, rd 0, flush 0, corrupt 0, gen 0
Sep 21 11:27:21 server kernel: [2070175.517712] BTRFS: lost page write due to 
I/O error on /dev/sdc3
Sep 21 11:27:21 server kernel: [2070175.517715] BTRFS: bdev /dev/sdc3 errs: wr 
271, rd 0, flush 0, corrupt 0, gen 0
Sep 21 11:27:51 server kernel: [2070205.665947] BTRFS: lost page write due to 
I/O error on /dev/sdc3
Sep 21 11:27:51 server kernel: [2070205.665955] BTRFS: bdev /dev/sdc3 errs: wr 
272, rd 0, flush 0, corrupt 0, gen 0
Sep 21 11:27:51 server kernel: [2070205.665967] BTRFS: lost page write due to 
I/O error on /dev/sdc3
Sep 21 11:27:51 server kernel: [2070205.665971] BTRFS: bdev /dev/sdc3 errs: wr 
273, rd 0, flush 0, corrupt 0, gen 0

Anyway the new drive turned out to have some errors, writes failed and I've 
got a heap of errors such as the above.  The errors started immediately after 
adding the drive and the system wasn't actively writing to the filesystem.  So 
very few (if any) writes made it to the device.

# btrfs device delete /dev/sdc3 /
ERROR: error removing the device '/dev/sdc3' - Invalid argument

It seems that I can't remove the device because removing requires writing.

# btrfs device delete /dev/sdc3 /
ERROR: error removing the device '/dev/sdc3' - No such file or directory
# btrfs device stats /
[/dev/sda3].write_io_errs   0
[/dev/sda3].read_io_errs0
[/dev/sda3].flush_io_errs   0
[/dev/sda3].corruption_errs 57
[/dev/sda3].generation_errs 0
[/dev/sdb3].write_io_errs   0
[/dev/sdb3].read_io_errs0
[/dev/sdb3].flush_io_errs   0
[/dev/sdb3].corruption_errs 0
[/dev/sdb3].generation_errs 0
[/dev/sdc3].write_io_errs   267
[/dev/sdc3].read_io_errs0
[/dev/sdc3].flush_io_errs   0
[/dev/sdc3].corruption_errs 0
[/dev/sdc3].generation_errs 0

The drive is attached by USB so I turned off the USB device and then got the 
above result.  So it still seems impossible to remove the device even though 
it's physically not present.  I've connected a new USB disk which is now 
/dev/sdd, so it seems that BTRFS is keeping the name /dev/sdc locked.

Should there be a way to fix this without rebooting or anything?

Also as an aside, while the stats about write errors are useful, in this case 
it would be really good if there was a count of successful writes, it would be 
useful to know if the successful write count was close to 0.  My understanding 
of the BTRFS design is that there would be no performance penalty for adding 
counts of the number of successful reads and writes to the superblock.  Could 
this be done?

-- 
My Main Blog http://etbe.coker.com.au/
My Documents Bloghttp://doc.coker.com.au/
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


device delete progress

2014-09-20 Thread Russell Coker
We need to have a way to determine the progress of a device delete operation.  
Also for a balance of a RAID-1 that has more than 2 devices it would be good 
to know how much space is used on each device.

Could btrfs fi df be extended to show information separately for each device?

-- 
My Main Blog http://etbe.coker.com.au/
My Documents Bloghttp://doc.coker.com.au/
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Bug on Kernel Bugzilla I Found Today

2014-09-20 Thread nick
Greeting to the Btrfs Developers and Community,
Today , I found a new bug on the kernel bugzilla related to btrfs and am 
wondering if this bug has been fixed yet.
Further due to my limited knowledge of the  btrfs code base, I lack the 
knowledge to fix bugs like this yet.In addition, 
due to this I will paste below this  message a link to the reported bug page on 
the kernel bugzilla . It would be very 
helpful for mycontinued  learning about the btrfs file system  if someone can 
CC me into the patches and/or discussion of 
this  bug in order to help me learn more about btrfs  and bugs related to 
btrfs, this  would be greatly appreciated and extremely
helpful for my learning. Hopefully eventually I can help fix bugs like this too 
and aid eventually in future btrfs development.
Cheers and Thanks for the Help with Xfs Tests,
Nick  
Link to the Bug:
https://bugzilla.kernel.org/show_bug.cgi?id=84631
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: fix ABBA deadlock in btrfs_dev_replace_finishing()

2014-09-20 Thread Eryu Guan
btrfs_map_bio() first calls btrfs_bio_counter_inc_blocked() which checks
fs state and increase bio_counter, then calls __btrfs_map_block() which
will take the dev_replace lock.

On the other hand, btrfs_dev_replace_finishing() takes dev_replace lock
first then set fs state to BTRFS_FS_STATE_DEV_REPLACING and waits for
bio_counter to be zero.

The deadlock can be reproduced easily by running replace and fsstress at
the same time, e.g.

mkfs -t btrfs -f /dev/sdb1 /dev/sdb2
mount /dev/sdb1 /mnt/btrfs
fsstress -d /mnt/btrfs -n 100 -p 2 -l 0  # fsstress from ltp supports -l option
i=0
while btrfs replace start -Bf /dev/sdb2 /dev/sdb3 /mnt/btrfs  \
  btrfs replace start -Bf /dev/sdb3 /dev/sdb2 /mnt/btrfs; do
echo === loop $i ===
let i=$i+1
done

This was introduced by

c404e0d Btrfs: fix use-after-free in the finishing procedure of the device 
replace

Signed-off-by: Eryu Guan guane...@gmail.com
---

Tested by the reproducer and xfstests, no new failure found.

But I found kmem_cache leak if I remove btrfs module after my new test case[1],
which does fsstress  replace  subvolume create/mount/umount/delete at the same
time.

BUG btrfs_extent_state (Tainted: GB ): Objects remaining in 
btrfs_extent_state on kmem_cache_close()
..
kmem_cache_destroy btrfs_extent_state: Slab cache still has objects
CPU: 3 PID: 9503 Comm: modprobe Tainted: GB  3.17.0-rc5+ #12
Hardware name: Hewlett-Packard ProLiant DL388eGen8, BIOS P73 06/01/2012
  8dd09c52 880411c37eb0 81642f7a
 8800b9a19300 880411c37ed0 8118ce89 
 a05dcd20 880411c37ee0 a056a80f 880411c37ef0
Call Trace:
 [81642f7a] dump_stack+0x45/0x56
 [8118ce89] kmem_cache_destroy+0xf9/0x100
 [a056a80f] extent_io_exit+0x1f/0x50 [btrfs]
 [a05c3ae3] exit_btrfs_fs+0x2c/0x549 [btrfs]
 [810efda2] SyS_delete_module+0x162/0x200
 [81013bb7] ? do_notify_resume+0x97/0xb0
 [8164af69] system_call_fastpath+0x16/0x1b

The test would hang before the fix. I'm not sure if it's related to the fix
(seems not), please help review.

Thanks,
Eryu Guan

[1] http://www.spinics.net/lists/linux-btrfs/msg37625.html

 fs/btrfs/dev-replace.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index eea26e1..5dfd292 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -510,6 +510,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info 
*fs_info,
/* keep away write_all_supers() during the finishing procedure */
mutex_lock(root-fs_info-chunk_mutex);
mutex_lock(root-fs_info-fs_devices-device_list_mutex);
+   btrfs_rm_dev_replace_blocked(fs_info);
btrfs_dev_replace_lock(dev_replace);
dev_replace-replace_state =
scrub_ret ? BTRFS_IOCTL_DEV_REPLACE_STATE_CANCELED
@@ -567,12 +568,8 @@ static int btrfs_dev_replace_finishing(struct 
btrfs_fs_info *fs_info,
btrfs_kobj_rm_device(fs_info, src_device);
btrfs_kobj_add_device(fs_info, tgt_device);
 
-   btrfs_rm_dev_replace_blocked(fs_info);
-
btrfs_rm_dev_replace_srcdev(fs_info, src_device);
 
-   btrfs_rm_dev_replace_unblocked(fs_info);
-
/*
 * this is again a consistent state where no dev_replace procedure
 * is running, the target device is part of the filesystem, the
@@ -581,6 +578,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info 
*fs_info,
 * belong to this filesystem.
 */
btrfs_dev_replace_unlock(dev_replace);
+   btrfs_rm_dev_replace_unblocked(fs_info);
mutex_unlock(root-fs_info-fs_devices-device_list_mutex);
mutex_unlock(root-fs_info-chunk_mutex);
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html