Two identical copies of an image mounted result in changes to both images if only one is modified

2013-06-20 Thread Clemens Eisserer
Hi,

I've observed a rather strange behaviour while trying to mount two
identical copies of the same image to different mount points.
Each modification to one image is also performed in the second one.

Example:
dd if=/dev/sda? of=image1 bs=1M
cp image1 image2
mount -o loop image1 m1
mount -o loop image2 m2

touch m2/hello
ls -la m1  //will now also include a file calles hello

Is this behaviour intentional and known or should I create a bug-report?
I've deleted quite a bunch of files on my production system because of this...

Thanks, Clemens
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Two identical copies of an image mounted result in changes to both images if only one is modified

2013-06-20 Thread Fajar A. Nugraha
On Thu, Jun 20, 2013 at 3:47 PM, Clemens Eisserer linuxhi...@gmail.com wrote:
 Hi,

 I've observed a rather strange behaviour while trying to mount two
 identical copies of the same image to different mount points.
 Each modification to one image is also performed in the second one.

 Example:
 dd if=/dev/sda? of=image1 bs=1M
 cp image1 image2
 mount -o loop image1 m1
 mount -o loop image2 m2

 touch m2/hello
 ls -la m1  //will now also include a file calles hello

What do you get if you unmount BOTH m1 and m2, and THEN mount m1
again? Is the file still there?


 Is this behaviour intentional and known or should I create a bug-report?
 I've deleted quite a bunch of files on my production system because of this...

I'm pretty sure this is a known behavior in btrfs.
http://markmail.org/message/i522sdkrhlxhw757#query:+page:1+mid:ksdi5d4v26eqgxpi+state:results

-- 
Fajar
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Two identical copies of an image mounted result in changes to both images if only one is modified

2013-06-20 Thread Hugo Mills
On Thu, Jun 20, 2013 at 10:47:53AM +0200, Clemens Eisserer wrote:
 Hi,
 
 I've observed a rather strange behaviour while trying to mount two
 identical copies of the same image to different mount points.
 Each modification to one image is also performed in the second one.
 
 Example:
 dd if=/dev/sda? of=image1 bs=1M
 cp image1 image2
 mount -o loop image1 m1
 mount -o loop image2 m2
 
 touch m2/hello
 ls -la m1  //will now also include a file calles hello
 
 Is this behaviour intentional and known or should I create a bug-report?

   It's known, and not desired behaviour. The problem is that you've
ended up with two filesystems with the same UUID, and the FS code gets
rather confused about that. The same problem exists with LVM snapshots
(or other block-device-layer copies).

   The solution is a combination of a tool to scan an image and change
the UUID (offline), and of some code in the kernel that detects when
it's being told about a duplicate image (rather than an additional
device in the same FS). Neither of these has been written yet, I'm
afraid.

 I've deleted quite a bunch of files on my production system because of this...

   Oops. I'm sorry to hear that. :(

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Welcome to Rivendell,  Mr Anderson... ---  


signature.asc
Description: Digital signature


Re: Filesystem somewhat destroyed - need help for recovery/fixing

2013-06-20 Thread Alexander Skwar
Hi

On Mon, Jun 17, 2013 at 11:43 PM, Alexander Skwar
alexanders.mailinglists+nos...@gmail.com wrote:
 Hello Josef

 On Mon, Jun 17, 2013 at 11:21 PM, Josef Bacik jba...@fusionio.com wrote:

 Pull down my tree

 git://github.com/josefbacik/btrfs-progs.git

 and build and run the fsck in there and see if it's a bit more friendly.

 I just gave it a try, but wasn't successful, it seems… Kernel still
 crashes.
 Maybe checkout the screenphotos at http://goo.gl/DWkRH or
 http://imgur.com/a/00pTx

Any other ideas, about what I might be able to do, to
revive my btrfs filesystem?



Alexander
--
=Google+ = http://plus.skwar.me ==
= Chat (Jabber/Google Talk) = a.sk...@gmail.com ==
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Two identical copies of an image mounted result in changes to both images if only one is modified

2013-06-20 Thread Gabriel de Perthuis
On Thu, 20 Jun 2013 10:16:22 +0100, Hugo Mills wrote:
 On Thu, Jun 20, 2013 at 10:47:53AM +0200, Clemens Eisserer wrote:
 Hi,
 
 I've observed a rather strange behaviour while trying to mount two
 identical copies of the same image to different mount points.
 Each modification to one image is also performed in the second one.

 touch m2/hello
 ls -la m1  //will now also include a file calles hello
 
 Is this behaviour intentional and known or should I create a bug-report?
 
It's known, and not desired behaviour. The problem is that you've
 ended up with two filesystems with the same UUID, and the FS code gets
 rather confused about that. The same problem exists with LVM snapshots
 (or other block-device-layer copies).
 
The solution is a combination of a tool to scan an image and change
 the UUID (offline), and of some code in the kernel that detects when
 it's being told about a duplicate image (rather than an additional
 device in the same FS). Neither of these has been written yet, I'm
 afraid.

To clarify, the loop devices are properly distinct, but the first
device ends up mounted twice.

I've had a look at the vfs code, and it doesn't seem to be uuid-aware,
which makes sense because the uuid is a property of the superblock and
the fs structure doesn't expose it.  It's a Btrfs problem.

Instead of redirecting to a different block device, Btrfs could and
should refuse to mount an already-mounted superblock when the block
device doesn't match, somewhere in or below btrfs_mount.  Registering
extra, distinct superblocks for an already mounted raid is a different
matter, but that isn't done through the mount syscall anyway.

 I've deleted quite a bunch of files on my production system because of 
 this...
 
Oops. I'm sorry to hear that. :(
 
Hugo.


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Two identical copies of an image mounted result in changes to both images if only one is modified

2013-06-20 Thread Hugo Mills
On Thu, Jun 20, 2013 at 10:22:07AM +, Gabriel de Perthuis wrote:
 On Thu, 20 Jun 2013 10:16:22 +0100, Hugo Mills wrote:
  On Thu, Jun 20, 2013 at 10:47:53AM +0200, Clemens Eisserer wrote:
  Hi,
  
  I've observed a rather strange behaviour while trying to mount two
  identical copies of the same image to different mount points.
  Each modification to one image is also performed in the second one.
 
  touch m2/hello
  ls -la m1  //will now also include a file calles hello
  
  Is this behaviour intentional and known or should I create a bug-report?
  
 It's known, and not desired behaviour. The problem is that you've
  ended up with two filesystems with the same UUID, and the FS code gets
  rather confused about that. The same problem exists with LVM snapshots
  (or other block-device-layer copies).
  
 The solution is a combination of a tool to scan an image and change
  the UUID (offline), and of some code in the kernel that detects when
  it's being told about a duplicate image (rather than an additional
  device in the same FS). Neither of these has been written yet, I'm
  afraid.
 
 To clarify, the loop devices are properly distinct, but the first
 device ends up mounted twice.
 
 I've had a look at the vfs code, and it doesn't seem to be uuid-aware,
 which makes sense because the uuid is a property of the superblock and
 the fs structure doesn't expose it.  It's a Btrfs problem.

   Yes, it is. (I didn't intend, however obliquely, to imply that it
wasn't).

 Instead of redirecting to a different block device, Btrfs could and
 should refuse to mount an already-mounted superblock when the block
 device doesn't match, somewhere in or below btrfs_mount.  Registering
 extra, distinct superblocks for an already mounted raid is a different
 matter, but that isn't done through the mount syscall anyway.

   The problem here is that you could quite legitimately mount
/dev/sda (with UUID=AA1234) on, say, /mnt/fs-a, and /dev/sdb (with
UUID=AA1234) on /mnt/fs-b -- _provided_ that /dev/sda and /dev/sdb are
both part of the same filesystem. So you can't simply prevent mounting
based on the device that the mount's being done with.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- I know of three kinds: hot, ---   
cool,  and what-time-does-the-tune-start?


signature.asc
Description: Digital signature


Re: Two identical copies of an image mounted result in changes to both images if only one is modified

2013-06-20 Thread Gabriel de Perthuis
 Instead of redirecting to a different block device, Btrfs could and
 should refuse to mount an already-mounted superblock when the block
 device doesn't match, somewhere in or below btrfs_mount.  Registering
 extra, distinct superblocks for an already mounted raid is a different
 matter, but that isn't done through the mount syscall anyway.
 
The problem here is that you could quite legitimately mount
 /dev/sda (with UUID=AA1234) on, say, /mnt/fs-a, and /dev/sdb (with
 UUID=AA1234) on /mnt/fs-b -- _provided_ that /dev/sda and /dev/sdb are
 both part of the same filesystem. So you can't simply prevent mounting
 based on the device that the mount's being done with.

Okay.  The check should rely on a list of known block devices
for a given filesystem uuid.


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Two identical copies of an image mounted result in changes to both images if only one is modified

2013-06-20 Thread Hugo Mills
On Thu, Jun 20, 2013 at 10:41:53AM +, Gabriel de Perthuis wrote:
  Instead of redirecting to a different block device, Btrfs could and
  should refuse to mount an already-mounted superblock when the block
  device doesn't match, somewhere in or below btrfs_mount.  Registering
  extra, distinct superblocks for an already mounted raid is a different
  matter, but that isn't done through the mount syscall anyway.
  
 The problem here is that you could quite legitimately mount
  /dev/sda (with UUID=AA1234) on, say, /mnt/fs-a, and /dev/sdb (with
  UUID=AA1234) on /mnt/fs-b -- _provided_ that /dev/sda and /dev/sdb are
  both part of the same filesystem. So you can't simply prevent mounting
  based on the device that the mount's being done with.
 
 Okay.  The check should rely on a list of known block devices
 for a given filesystem uuid.

   And this is where we fail currently -- that list is held by the
btrfs module in the kernel, and is constructed on the basis of what
btrfs dev scan finds by looking at superblocks on block devices.
Currently, there's no method implemented for determining whether a
block device with a legitimate btrfs superblock on it is a duplicate
of another device, or whether it's a newly-discovered device which is
part of an as-yet incompletely specified multi-device FS.

   I think it should be possible to look up the device ID as well, and
complain (loudly, to the user, and in the kernel) at btrfs dev scan
time if we see duplicates. That would deal with the problem at the
earliest point of confusion.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- I know of three kinds: hot, ---   
cool,  and what-time-does-the-tune-start?


signature.asc
Description: Digital signature


[PATCH 1/4] Btrfs-progs: fix misuse of skinny metadata in btrfs-image

2013-06-20 Thread Liu Bo
As for skinny metadata, key.offset stores levels rather than extent length.

Signed-off-by: Liu Bo bo.li@oracle.com
---
 btrfs-image.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/btrfs-image.c b/btrfs-image.c
index 739ae35..e5ff795 100644
--- a/btrfs-image.c
+++ b/btrfs-image.c
@@ -798,9 +798,9 @@ static int copy_from_extent_tree(struct metadump_struct 
*metadump,
 
bytenr = key.objectid;
if (key.type == BTRFS_METADATA_ITEM_KEY)
-   num_bytes = key.offset;
-   else
num_bytes = extent_root-leafsize;
+   else
+   num_bytes = key.offset;
 
if (btrfs_item_size_nr(leaf, path-slots[0])  sizeof(*ei)) {
ei = btrfs_item_ptr(leaf, path-slots[0],
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] Btrfs-progs: skip open devices which is missing

2013-06-20 Thread Liu Bo
A device can be added to the device list without getting a name, so we may
access to illegal addresses while opening devices with their name.

Signed-off-by: Liu Bo bo.li@oracle.com
---
 volumes.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/volumes.c b/volumes.c
index 8285240..a06896d 100644
--- a/volumes.c
+++ b/volumes.c
@@ -186,6 +186,10 @@ int btrfs_open_devices(struct btrfs_fs_devices 
*fs_devices, int flags)
 
list_for_each(cur, head) {
device = list_entry(cur, struct btrfs_device, dev_list);
+   if (!device-name) {
+   printk(no name for device %llu, skip it now\n, 
device-devid);
+   continue;
+   }
 
fd = open(device-name, flags);
if (fd  0) {
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] multiple disks restore support of btrfs-image

2013-06-20 Thread Liu Bo
Patch 1-3 are bug fixes for several places.

Patch 4 adds btrfs-image support of multiple disks restore.

Liu Bo (4):
  Btrfs-progs: fix misuse of skinny metadata in btrfs-image
  Btrfs-progs: skip open devices which is missing
  Btrfs-progs: delete fs_devices itself from fs_uuid list before
freeing
  Btrfs-progs: exhance btrfs-image to restore image onto multiple disks

 btrfs-image.c |  298 ++---
 ctree.h   |1 +
 disk-io.c |   91 +-
 disk-io.h |5 +
 volumes.c |4 +
 5 files changed, 339 insertions(+), 60 deletions(-)

-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] Btrfs-progs: exhance btrfs-image to restore image onto multiple disks

2013-06-20 Thread Liu Bo
This adds a 'btrfs-image -m' option, which let us restore an image that
is built from a btrfs of multiple disks onto several disks altogether.

This aims to address the following case,
$ mkfs.btrfs -m raid0 sda sdb
$ btrfs-image sda image.file
$ btrfs-image -r image.file sdc
-
so we can only restore metadata onto sdc, and another thing is we can
only mount sdc with degraded mode as we don't provide informations of
another disk.  And, it's built as RAID0 and we have only one disk,
so after mount sdc we'll get into readonly mode.

This is just annoying for people(like me) who're trying to restore image
but turn to find they cannot make it work.

So this'll make your life easier, just tap
$ btrfs-image -m image.file sdc sdd
-
then you get everything about metadata done, the same offset with that of
the originals(of course, you need offer enough disk size, at least the disk
size of the original disks).

Besides, this also works with raid5 and raid6 metadata image.

Signed-off-by: Liu Bo bo.li@oracle.com
---
 btrfs-image.c |  294 ++---
 ctree.h   |1 +
 disk-io.c |   90 +-
 disk-io.h |5 +
 4 files changed, 332 insertions(+), 58 deletions(-)

diff --git a/btrfs-image.c b/btrfs-image.c
index e5ff795..6ca4589 100644
--- a/btrfs-image.c
+++ b/btrfs-image.c
@@ -119,6 +119,9 @@ struct mdrestore_struct {
int done;
int error;
int old_restore;
+   int fixup_offset;
+   int multi_devices;
+   struct btrfs_fs_info *info;
 };
 
 static void csum_block(u8 *buf, size_t len)
@@ -1233,33 +1236,67 @@ static void *restore_worker(void *data)
size = async-bufsize;
}
 
-   if (async-start == BTRFS_SUPER_INFO_OFFSET) {
-   if (mdres-old_restore) {
-   update_super_old(outbuf);
-   } else {
-   ret = update_super(outbuf);
+   if (!mdres-multi_devices) {
+   if (async-start == BTRFS_SUPER_INFO_OFFSET) {
+   if (mdres-old_restore) {
+   update_super_old(outbuf);
+   } else {
+   ret = update_super(outbuf);
+   if (ret)
+   err = ret;
+   }
+   } else if (!mdres-old_restore) {
+   ret = fixup_chunk_tree_block(mdres, async,
+outbuf, size);
if (ret)
err = ret;
}
-   } else if (!mdres-old_restore) {
-   ret = fixup_chunk_tree_block(mdres, async, outbuf, 
size);
-   if (ret)
-   err = ret;
}
 
-   ret = pwrite64(outfd, outbuf, size, async-start);
-   if (ret  size) {
-   if (ret  0) {
-   fprintf(stderr, Error writing to device %d\n,
-   errno);
-   err = errno;
-   } else {
-   fprintf(stderr, Short write\n);
-   err = -EIO;
+   if (!mdres-fixup_offset) {
+   ret = pwrite64(outfd, outbuf, size, async-start);
+   if (ret != size) {
+   if (ret  0) {
+   fprintf(stderr, Error writing to 
device %d\n,
+   errno);
+   err = errno;
+   } else {
+   fprintf(stderr, Short write\n);
+   err = -EIO;
+   }
+   }
+   } else if (async-start != BTRFS_SUPER_INFO_OFFSET) {
+   u64 cur_off;
+   size_t cur_size;
+   struct extent_buffer *eb;
+
+   cur_size = size;
+   cur_off = 0;
+   while (cur_size  0) {
+   eb = read_tree_block(mdres-info-chunk_root,
+async-start + cur_off,
+mdres-leafsize, 0);
+   BUG_ON(!eb); /* we should have eb now */
+
+   if (memcmp(eb-data, outbuf + cur_off,
+  mdres-leafsize)) {
+   printk(%s: eb %llu NOT same with 
outbuf\n, __func__, eb-start);
+ 

[PATCH 3/4] Btrfs-progs: delete fs_devices itself from fs_uuid list before freeing

2013-06-20 Thread Liu Bo
Otherwise we will access illegal addresses while searching on fs_uuid list.

Signed-off-by: Liu Bo bo.li@oracle.com
---
 disk-io.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/disk-io.c b/disk-io.c
index 21b410d..2892300 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -1277,6 +1277,7 @@ static int close_all_devices(struct btrfs_fs_info 
*fs_info)
kfree(device-label);
kfree(device);
}
+   list_del(fs_info-fs_devices-list);
kfree(fs_info-fs_devices);
return 0;
 }
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] Btrfs-progs: exhance btrfs-image to restore image onto multiple disks

2013-06-20 Thread Josef Bacik
On Thu, Jun 20, 2013 at 08:05:30PM +0800, Liu Bo wrote:
 This adds a 'btrfs-image -m' option, which let us restore an image that
 is built from a btrfs of multiple disks onto several disks altogether.
 
 This aims to address the following case,
 $ mkfs.btrfs -m raid0 sda sdb
 $ btrfs-image sda image.file
 $ btrfs-image -r image.file sdc
 -
 so we can only restore metadata onto sdc, and another thing is we can
 only mount sdc with degraded mode as we don't provide informations of
 another disk.  And, it's built as RAID0 and we have only one disk,
 so after mount sdc we'll get into readonly mode.
 

Um that shouldn't be happening, the restore will mask out the RAID parts of the
chunk tree and it should work just fine.  Are you using the most recent version
of btrfs-image?  If this is happening it's a bug and we need to fix it, but I've
restored several file systems from users with raid0/10 file systems onto a
single disk and it's worked just fine.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] Btrfs-progs: exhance btrfs-image to restore image onto multiple disks

2013-06-20 Thread Josef Bacik
On Thu, Jun 20, 2013 at 08:24:32AM -0400, Josef Bacik wrote:
 On Thu, Jun 20, 2013 at 08:05:30PM +0800, Liu Bo wrote:
  This adds a 'btrfs-image -m' option, which let us restore an image that
  is built from a btrfs of multiple disks onto several disks altogether.
  
  This aims to address the following case,
  $ mkfs.btrfs -m raid0 sda sdb
  $ btrfs-image sda image.file
  $ btrfs-image -r image.file sdc
  -
  so we can only restore metadata onto sdc, and another thing is we can
  only mount sdc with degraded mode as we don't provide informations of
  another disk.  And, it's built as RAID0 and we have only one disk,
  so after mount sdc we'll get into readonly mode.
  
 
 Um that shouldn't be happening, the restore will mask out the RAID parts of 
 the
 chunk tree and it should work just fine.  Are you using the most recent 
 version
 of btrfs-image?  If this is happening it's a bug and we need to fix it, but 
 I've
 restored several file systems from users with raid0/10 file systems onto a
 single disk and it's worked just fine.  Thanks,
 

Well apparently I've been hallucinating because it definitely doesn't work.  I'd
rather fix the device tree so it only restores onto one disk, since the raid
level shouldn't matter and it does in fact get masked out.  So the only thing
left would be to fix the device tree so the only device it knows about is the
device we're restoring to.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] Btrfs-progs: exhance btrfs-image to restore image onto multiple disks

2013-06-20 Thread Chris Mason
Quoting Josef Bacik (2013-06-20 08:24:32)
 On Thu, Jun 20, 2013 at 08:05:30PM +0800, Liu Bo wrote:
  This adds a 'btrfs-image -m' option, which let us restore an image that
  is built from a btrfs of multiple disks onto several disks altogether.
  
  This aims to address the following case,
  $ mkfs.btrfs -m raid0 sda sdb
  $ btrfs-image sda image.file
  $ btrfs-image -r image.file sdc
  -
  so we can only restore metadata onto sdc, and another thing is we can
  only mount sdc with degraded mode as we don't provide informations of
  another disk.  And, it's built as RAID0 and we have only one disk,
  so after mount sdc we'll get into readonly mode.
  
 
 Um that shouldn't be happening, the restore will mask out the RAID parts of 
 the
 chunk tree and it should work just fine.  Are you using the most recent 
 version
 of btrfs-image?  If this is happening it's a bug and we need to fix it, but 
 I've
 restored several file systems from users with raid0/10 file systems onto a
 single disk and it's worked just fine.  Thanks,

I just pushed my current merge of Josef's patches into my master branch.
Please base on that.

Josef, this should only be missing the enospc log, please go ahead and
rebase/double check.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Two identical copies of an image mounted result in changes to both images if only one is modified

2013-06-20 Thread Hugo Mills
On Thu, Jun 20, 2013 at 08:22:12AM -0500, Kevin O'Kelley wrote:
 Thank you for your reply. I appreciate it. Unfortunately this issue
 is a deal killer for us. The ability to take very fast snapshots and
 replicate them to another site is key for us. We just can't us Btrfs
 with this setup. That's too bad. Good luck and thank you.

   If you want to make fast atomic incremental copies of btrfs to a
remote system, then btrfs send/receive may be what you're looking for.

   Hugo.

 Sent from my iPhone
 
 On Jun 20, 2013, at 5:56 AM, Hugo Mills h...@carfax.org.uk wrote:
 
  On Thu, Jun 20, 2013 at 10:41:53AM +, Gabriel de Perthuis wrote:
  Instead of redirecting to a different block device, Btrfs could and
  should refuse to mount an already-mounted superblock when the block
  device doesn't match, somewhere in or below btrfs_mount.  Registering
  extra, distinct superblocks for an already mounted raid is a different
  matter, but that isn't done through the mount syscall anyway.
  
The problem here is that you could quite legitimately mount
  /dev/sda (with UUID=AA1234) on, say, /mnt/fs-a, and /dev/sdb (with
  UUID=AA1234) on /mnt/fs-b -- _provided_ that /dev/sda and /dev/sdb are
  both part of the same filesystem. So you can't simply prevent mounting
  based on the device that the mount's being done with.
  
  Okay.  The check should rely on a list of known block devices
  for a given filesystem uuid.
  
And this is where we fail currently -- that list is held by the
  btrfs module in the kernel, and is constructed on the basis of what
  btrfs dev scan finds by looking at superblocks on block devices.
  Currently, there's no method implemented for determining whether a
  block device with a legitimate btrfs superblock on it is a duplicate
  of another device, or whether it's a newly-discovered device which is
  part of an as-yet incompletely specified multi-device FS.
  
I think it should be possible to look up the device ID as well, and
  complain (loudly, to the user, and in the kernel) at btrfs dev scan
  time if we see duplicates. That would deal with the problem at the
  earliest point of confusion.
  
Hugo.
  

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Computer Science is not about computers,  any more than --- 
 astronomy is about telescopes.  


signature.asc
Description: Digital signature


Re: Two identical copies of an image mounted result in changes to both images if only one is modified

2013-06-20 Thread Gabriel
 Thank you for your reply. I appreciate it. Unfortunately this issue is a deal 
 killer for us. The ability to take very fast snapshots and replicate them to 
 another site is key for us. We just can't us Btrfs with this setup. That's 
 too bad. Good luck and thank you.

The issue we were discussing is: how to fail early when there are 
duplicate UUIDs.
Duplicate UUIDs will never be supported.
If *your* problem has to do with fast snapshots and fast replication, 
that's supported, see btrfs send/receive.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Two identical copies of an image mounted result in changes to both images if only one is modified

2013-06-20 Thread Kevin O'Kelley
Thank you for your reply. I appreciate it. Unfortunately this issue is a deal 
killer for us. The ability to take very fast snapshots and replicate them to 
another site is key for us. We just can't us Btrfs with this setup. That's too 
bad. Good luck and thank you.

Sent from my iPhone

On Jun 20, 2013, at 5:56 AM, Hugo Mills h...@carfax.org.uk wrote:

 On Thu, Jun 20, 2013 at 10:41:53AM +, Gabriel de Perthuis wrote:
 Instead of redirecting to a different block device, Btrfs could and
 should refuse to mount an already-mounted superblock when the block
 device doesn't match, somewhere in or below btrfs_mount.  Registering
 extra, distinct superblocks for an already mounted raid is a different
 matter, but that isn't done through the mount syscall anyway.
 
   The problem here is that you could quite legitimately mount
 /dev/sda (with UUID=AA1234) on, say, /mnt/fs-a, and /dev/sdb (with
 UUID=AA1234) on /mnt/fs-b -- _provided_ that /dev/sda and /dev/sdb are
 both part of the same filesystem. So you can't simply prevent mounting
 based on the device that the mount's being done with.
 
 Okay.  The check should rely on a list of known block devices
 for a given filesystem uuid.
 
   And this is where we fail currently -- that list is held by the
 btrfs module in the kernel, and is constructed on the basis of what
 btrfs dev scan finds by looking at superblocks on block devices.
 Currently, there's no method implemented for determining whether a
 block device with a legitimate btrfs superblock on it is a duplicate
 of another device, or whether it's a newly-discovered device which is
 part of an as-yet incompletely specified multi-device FS.
 
   I think it should be possible to look up the device ID as well, and
 complain (loudly, to the user, and in the kernel) at btrfs dev scan
 time if we see duplicates. That would deal with the problem at the
 earliest point of confusion.
 
   Hugo.
 
 -- 
 === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- I know of three kinds: hot, ---   
cool,  and what-time-does-the-tune-start?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] Btrfs-progs: exhance btrfs-image to restore image onto multiple disks

2013-06-20 Thread Liu Bo
On Thu, Jun 20, 2013 at 08:39:19AM -0400, Josef Bacik wrote:
 On Thu, Jun 20, 2013 at 08:24:32AM -0400, Josef Bacik wrote:
  On Thu, Jun 20, 2013 at 08:05:30PM +0800, Liu Bo wrote:
   This adds a 'btrfs-image -m' option, which let us restore an image that
   is built from a btrfs of multiple disks onto several disks altogether.
   
   This aims to address the following case,
   $ mkfs.btrfs -m raid0 sda sdb
   $ btrfs-image sda image.file
   $ btrfs-image -r image.file sdc
   -
   so we can only restore metadata onto sdc, and another thing is we can
   only mount sdc with degraded mode as we don't provide informations of
   another disk.  And, it's built as RAID0 and we have only one disk,
   so after mount sdc we'll get into readonly mode.
   
  
  Um that shouldn't be happening, the restore will mask out the RAID parts of 
  the
  chunk tree and it should work just fine.  Are you using the most recent 
  version
  of btrfs-image?  If this is happening it's a bug and we need to fix it, but 
  I've
  restored several file systems from users with raid0/10 file systems onto a
  single disk and it's worked just fine.  Thanks,
  
 
 Well apparently I've been hallucinating because it definitely doesn't work.  
 I'd
 rather fix the device tree so it only restores onto one disk, since the raid
 level shouldn't matter and it does in fact get masked out.  So the only thing
 left would be to fix the device tree so the only device it knows about is the
 device we're restoring to.  Thanks,

Um, I believe that'd work and it's not hard, but I'm afraid that way we're not
able to debug bugs related to raid types?

thanks,
liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] Btrfs-progs: exhance btrfs-image to restore image onto multiple disks

2013-06-20 Thread Liu Bo
On Thu, Jun 20, 2013 at 08:39:19AM -0400, Josef Bacik wrote:
 On Thu, Jun 20, 2013 at 08:24:32AM -0400, Josef Bacik wrote:
  On Thu, Jun 20, 2013 at 08:05:30PM +0800, Liu Bo wrote:
   This adds a 'btrfs-image -m' option, which let us restore an image that
   is built from a btrfs of multiple disks onto several disks altogether.
   
   This aims to address the following case,
   $ mkfs.btrfs -m raid0 sda sdb
   $ btrfs-image sda image.file
   $ btrfs-image -r image.file sdc
   -
   so we can only restore metadata onto sdc, and another thing is we can
   only mount sdc with degraded mode as we don't provide informations of
   another disk.  And, it's built as RAID0 and we have only one disk,
   so after mount sdc we'll get into readonly mode.
   
  
  Um that shouldn't be happening, the restore will mask out the RAID parts of 
  the
  chunk tree and it should work just fine.  Are you using the most recent 
  version
  of btrfs-image?  If this is happening it's a bug and we need to fix it, but 
  I've
  restored several file systems from users with raid0/10 file systems onto a
  single disk and it's worked just fine.  Thanks,
  
 
 Well apparently I've been hallucinating because it definitely doesn't work.  
 I'd
 rather fix the device tree so it only restores onto one disk, since the raid
 level shouldn't matter and it does in fact get masked out.  So the only thing
 left would be to fix the device tree so the only device it knows about is the
 device we're restoring to.  Thanks,

I just check the latest progs code, in
commit ef2a8889ef813ba77061f6a92f4954d047a78932
Btrfs-progs: make image restore with the original device offsets,
we suffer from an huge pain and take a great amount of efforts to map
logical offset to physical offset.

But with this patch, we'll build the same whole logical-physical mapping
on the disks we're restoring to with what it is on the disks that
generate the image file, so we can get rid of those pain causing by
mapping issues.

thanks,
liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: use a percpu to keep track of possibly pinned bytes

2013-06-20 Thread Zach Brown
 @@ -3380,6 +3382,10 @@ static int update_space_info(struct btrfs_fs_info 
 *info, u64 flags,
   if (!found)
   return -ENOMEM;
  
 + ret = percpu_counter_init(found-total_bytes_pinned, 0);
 + if (ret)
 + return ret;
 +

Leaks *found if percpu_counter_init() fails.

 - if (space_info-bytes_pinned + delayed_rsv-size  bytes) {
 + bytes_pinned = percpu_counter_sum(space_info-total_bytes_pinned);
 + if (bytes_pinned + delayed_rsv-size  bytes) {

This stood out as being different from the rest of the comparisons.

Why manually sum the counters instead of letting _compare() optimize it
away if it can?  _compare(, bytes - delayed_rsv-size)?

- z
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: stop using try_to_writeback_inodes_sb_nr to flush delalloc

2013-06-20 Thread Josef Bacik
try_to_writeback_inodes_sb_nr returns 1 if writeback is already underway, which
is completely fraking useless for us as we need to make sure pages are actually
written before we go and check if there are ordered extents.  So replace this
with an open coding of try_to_writeback_inodes_sb_nr minus the writeback
underway check so that we are sure to actually have flushed some dirty pages out
and will have ordered extents to use.  With this patch xfstests generic/273 now
passes.  Thanks,

Signed-off-by: Josef Bacik jba...@fusionio.com
---
 fs/btrfs/extent-tree.c |9 -
 1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 806801a..16da187 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3941,12 +3941,11 @@ static void btrfs_writeback_inodes_sb_nr(struct 
btrfs_root *root,
 unsigned long nr_pages)
 {
struct super_block *sb = root-fs_info-sb;
-   int started;
 
-   /* If we can not start writeback, just sync all the delalloc file. */
-   started = try_to_writeback_inodes_sb_nr(sb, nr_pages,
- WB_REASON_FS_FREE_SPACE);
-   if (!started) {
+   if (down_read_trylock(sb-s_umount)) {
+   writeback_inodes_sb_nr(sb, nr_pages, WB_REASON_FS_FREE_SPACE);
+   up_read(sb-s_umount);
+   } else {
/*
 * We needn't worry the filesystem going from r/w to r/o though
 * we don't acquire -s_umount mutex, because the filesystem
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] Btrfs-progs: Validate super block checksum

2013-06-20 Thread Filipe David Manana
Ping.

Is there any reason why the btrfs progs (except for btrfs-show-super)
don't validate the super block's checksum?

thanks

On Mon, Jun 10, 2013 at 8:51 PM, Filipe David Borba Manana
fdman...@gmail.com wrote:
 After finding a super block in a device also validate its
 checksum. This validation is done in the kernel but it was
 missing in btrfs-progs.

 The function btrfs_check_super_csum() is imported from the
 file fs/btrfs/disk-io.c in the kernel source tree.

 Signed-off-by: Filipe David Borba Manana fdman...@gmail.com
 ---
  disk-io.c |   76 
 +
  1 file changed, 62 insertions(+), 14 deletions(-)

 diff --git a/disk-io.c b/disk-io.c
 index bd9cf4e..edd4d52 100644
 --- a/disk-io.c
 +++ b/disk-io.c
 @@ -1085,47 +1085,95 @@ struct btrfs_root *open_ctree_fd(int fp, const char 
 *path, u64 sb_bytenr,
 return info-fs_root;
  }

 +static int btrfs_check_super_csum(char *raw_disk_sb)
 +{
 +   struct btrfs_super_block *disk_sb =
 +   (struct btrfs_super_block *)raw_disk_sb;
 +   u16 csum_type = btrfs_super_csum_type(disk_sb);
 +   int ret = 0;
 +
 +   if (csum_type == BTRFS_CSUM_TYPE_CRC32) {
 +   u32 crc = ~(u32)0;
 +   const int csum_size = sizeof(crc);
 +   char result[csum_size];
 +
 +   /*
 +* The super_block structure does not span the whole
 +* BTRFS_SUPER_INFO_SIZE range, we expect that the unused 
 space
 +* is filled with zeros and is included in the checkum.
 +*/
 +   crc = btrfs_csum_data(NULL, raw_disk_sb + BTRFS_CSUM_SIZE,
 +   crc, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE);
 +   btrfs_csum_final(crc, result);
 +
 +   if (memcmp(raw_disk_sb, result, csum_size))
 +   ret = 1;
 +
 +   if (ret  btrfs_super_generation(disk_sb)  10) {
 +   fprintf(stderr, btrfs: super block crcs don't match, 
 
 +   older mkfs detected\n);
 +   ret = 0;
 +   }
 +   }
 +
 +   if (csum_type = ARRAY_SIZE(btrfs_csum_sizes)) {
 +   fprintf(stderr, btrfs: unsupported checksum algorithm %u\n,
 +   csum_type);
 +   ret = 1;
 +   }
 +
 +   return ret;
 +}
 +
  int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr)
  {
 u8 fsid[BTRFS_FSID_SIZE];
 int fsid_is_initialized = 0;
 -   struct btrfs_super_block buf;
 +   char buf[BTRFS_SUPER_INFO_SIZE];
 +   struct btrfs_super_block *tmp_sb;
 int i;
 int ret;
 u64 transid = 0;
 u64 bytenr;

 if (sb_bytenr != BTRFS_SUPER_INFO_OFFSET) {
 -   ret = pread64(fd, buf, sizeof(buf), sb_bytenr);
 +   ret = pread64(fd, buf, sizeof(buf), sb_bytenr);
 if (ret  sizeof(buf))
 return -1;

 -   if (btrfs_super_bytenr(buf) != sb_bytenr ||
 -   buf.magic != cpu_to_le64(BTRFS_MAGIC))
 +   tmp_sb = (struct btrfs_super_block *)buf;
 +
 +   if (btrfs_super_bytenr(tmp_sb) != sb_bytenr ||
 +   tmp_sb-magic != cpu_to_le64(BTRFS_MAGIC) ||
 +   btrfs_check_super_csum(buf))
 return -1;

 -   memcpy(sb, buf, sizeof(*sb));
 +   memcpy(sb, buf, sizeof(*sb));
 return 0;
 }

 for (i = 0; i  BTRFS_SUPER_MIRROR_MAX; i++) {
 bytenr = btrfs_sb_offset(i);
 -   ret = pread64(fd, buf, sizeof(buf), bytenr);
 +   ret = pread64(fd, buf, sizeof(buf), bytenr);
 if (ret  sizeof(buf))
 break;

 -   if (btrfs_super_bytenr(buf) != bytenr )
 +   tmp_sb = (struct btrfs_super_block *)buf;
 +
 +   if (btrfs_super_bytenr(tmp_sb) != bytenr )
 continue;
 /* if magic is NULL, the device was removed */
 -   if (buf.magic == 0  i == 0)
 +   if (tmp_sb-magic == 0  i == 0)
 return -1;
 -   if (buf.magic != cpu_to_le64(BTRFS_MAGIC))
 +   if (tmp_sb-magic != cpu_to_le64(BTRFS_MAGIC))
 +   continue;
 +   if (btrfs_check_super_csum(buf))
 continue;

 if (!fsid_is_initialized) {
 -   memcpy(fsid, buf.fsid, sizeof(fsid));
 +   memcpy(fsid, tmp_sb-fsid, sizeof(fsid));
 fsid_is_initialized = 1;
 -   } else if (memcmp(fsid, buf.fsid, sizeof(fsid))) {
 +   } else if (memcmp(fsid, tmp_sb-fsid, sizeof(fsid))) {
 /*
  * the superblocks (the original one and
 

Re: [PATCH] Btrfs: use a percpu to keep track of possibly pinned bytes

2013-06-20 Thread Josef Bacik
On Thu, Jun 20, 2013 at 09:26:15AM -0700, Zach Brown wrote:
  @@ -3380,6 +3382,10 @@ static int update_space_info(struct btrfs_fs_info 
  *info, u64 flags,
  if (!found)
  return -ENOMEM;
   
  +   ret = percpu_counter_init(found-total_bytes_pinned, 0);
  +   if (ret)
  +   return ret;
  +
 
 Leaks *found if percpu_counter_init() fails.
 

Right thanks.

  -   if (space_info-bytes_pinned + delayed_rsv-size  bytes) {
  +   bytes_pinned = percpu_counter_sum(space_info-total_bytes_pinned);
  +   if (bytes_pinned + delayed_rsv-size  bytes) {
 
 This stood out as being different from the rest of the comparisons.
 
 Why manually sum the counters instead of letting _compare() optimize it
 away if it can?  _compare(, bytes - delayed_rsv-size)?
 

Cause negative numbers bother me?

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 6/8] Btrfs: introduce uuid-tree-gen field

2013-06-20 Thread Stefan Behrens
In order to be able to detect the case that a filesystem is mounted
with an old kernel, add a uuid-tree-gen field like the free space
cache is doing it. It is part of the super block and written with
each commit. Old kernels do not know this field and don't update it.

Signed-off-by: Stefan Behrens sbehr...@giantdisaster.de
---
 fs/btrfs/ctree.h   | 5 -
 fs/btrfs/transaction.c | 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 89b2d78..424c38d 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -481,9 +481,10 @@ struct btrfs_super_block {
char label[BTRFS_LABEL_SIZE];
 
__le64 cache_generation;
+   __le64 uuid_tree_generation;
 
/* future expansion */
-   __le64 reserved[31];
+   __le64 reserved[30];
u8 sys_chunk_array[BTRFS_SYSTEM_CHUNK_ARRAY_SIZE];
struct btrfs_root_backup super_roots[BTRFS_NUM_BACKUP_ROOTS];
 } __attribute__ ((__packed__));
@@ -2847,6 +2848,8 @@ BTRFS_SETGET_STACK_FUNCS(super_csum_type, struct 
btrfs_super_block,
 csum_type, 16);
 BTRFS_SETGET_STACK_FUNCS(super_cache_generation, struct btrfs_super_block,
 cache_generation, 64);
+BTRFS_SETGET_STACK_FUNCS(super_uuid_tree_generation, struct btrfs_super_block,
+uuid_tree_generation, 64);
 
 static inline int btrfs_super_csum_size(struct btrfs_super_block *s)
 {
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 00ae884..1ae9621 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1370,6 +1370,7 @@ static void update_super_roots(struct btrfs_root *root)
super-root_level = root_item-level;
if (btrfs_test_opt(root, SPACE_CACHE))
super-cache_generation = root_item-generation;
+   super-uuid_tree_generation = root_item-generation;
 }
 
 int btrfs_transaction_in_commit(struct btrfs_fs_info *info)
-- 
1.8.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 3/8] Btrfs: create UUID tree if required

2013-06-20 Thread Stefan Behrens
This tree is not created by mkfs.btrfs. Therefore when a filesystem
is mounted writable and the UUID tree does not exist, this tree is
created if required. The tree is also added to the fs_info structure
and initialized, but this commit does not yet read or write UUID tree
elements.

Signed-off-by: Stefan Behrens sbehr...@giantdisaster.de
---
 fs/btrfs/ctree.h   |  1 +
 fs/btrfs/disk-io.c | 34 ++
 fs/btrfs/extent-tree.c |  3 +++
 fs/btrfs/volumes.c | 26 ++
 fs/btrfs/volumes.h |  1 +
 5 files changed, 65 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 04447b6..1dac165 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1305,6 +1305,7 @@ struct btrfs_fs_info {
struct btrfs_root *fs_root;
struct btrfs_root *csum_root;
struct btrfs_root *quota_root;
+   struct btrfs_root *uuid_root;
 
/* the log root tree is a directory of all the other log roots */
struct btrfs_root *log_root_tree;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3c2886c..1db446a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1580,6 +1580,9 @@ struct btrfs_root *btrfs_read_fs_root_no_name(struct 
btrfs_fs_info *fs_info,
if (location-objectid == BTRFS_QUOTA_TREE_OBJECTID)
return fs_info-quota_root ? fs_info-quota_root :
 ERR_PTR(-ENOENT);
+   if (location-objectid == BTRFS_UUID_TREE_OBJECTID)
+   return fs_info-uuid_root ? fs_info-uuid_root :
+   ERR_PTR(-ENOENT);
 again:
root = btrfs_lookup_fs_root(fs_info, location-objectid);
if (root)
@@ -2037,6 +2040,12 @@ static void free_root_pointers(struct btrfs_fs_info 
*info, int chunk_root)
info-quota_root-node = NULL;
info-quota_root-commit_root = NULL;
}
+   if (info-uuid_root) {
+   free_extent_buffer(info-uuid_root-node);
+   free_extent_buffer(info-uuid_root-commit_root);
+   info-uuid_root-node = NULL;
+   info-uuid_root-commit_root = NULL;
+   }
if (chunk_root) {
free_extent_buffer(info-chunk_root-node);
free_extent_buffer(info-chunk_root-commit_root);
@@ -2097,11 +2106,13 @@ int open_ctree(struct super_block *sb,
struct btrfs_root *chunk_root;
struct btrfs_root *dev_root;
struct btrfs_root *quota_root;
+   struct btrfs_root *uuid_root;
struct btrfs_root *log_tree_root;
int ret;
int err = -EINVAL;
int num_backups_tried = 0;
int backup_index = 0;
+   bool create_uuid_tree = false;
 
tree_root = fs_info-tree_root = btrfs_alloc_root(fs_info);
chunk_root = fs_info-chunk_root = btrfs_alloc_root(fs_info);
@@ -2695,6 +2706,18 @@ retry_root_backup:
fs_info-quota_root = quota_root;
}
 
+   location.objectid = BTRFS_UUID_TREE_OBJECTID;
+   uuid_root = btrfs_read_tree_root(tree_root, location);
+   if (IS_ERR(uuid_root)) {
+   ret = PTR_ERR(uuid_root);
+   if (ret != -ENOENT)
+   goto recovery_tree_root;
+   create_uuid_tree = true;
+   } else {
+   uuid_root-track_dirty = 1;
+   fs_info-uuid_root = uuid_root;
+   }
+
fs_info-generation = generation;
fs_info-last_trans_committed = generation;
 
@@ -2881,6 +2904,17 @@ retry_root_backup:
 
btrfs_qgroup_rescan_resume(fs_info);
 
+   if (create_uuid_tree) {
+   pr_info(btrfs: creating UUID tree\n);
+   ret = btrfs_create_uuid_tree(fs_info);
+   if (ret) {
+   pr_warn(btrfs: failed to create the UUID tree %d\n,
+   ret);
+   close_ctree(tree_root);
+   return ret;
+   }
+   }
+
return 0;
 
 fail_qgroup:
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 6d5c5f7..1c4694a 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4308,6 +4308,9 @@ static struct btrfs_block_rsv *get_block_rsv(
if (root == root-fs_info-csum_root  trans-adding_csums)
block_rsv = trans-block_rsv;
 
+   if (root == root-fs_info-uuid_root)
+   block_rsv = trans-block_rsv;
+
if (!block_rsv)
block_rsv = root-block_rsv;
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c58bf19..d4c7955 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3411,6 +3411,32 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
return 0;
 }
 
+int btrfs_create_uuid_tree(struct btrfs_fs_info *fs_info)
+{
+   struct btrfs_trans_handle *trans;
+   struct btrfs_root *tree_root = fs_info-tree_root;
+   struct btrfs_root *uuid_root;
+
+   /*
+* 1 - root 

[PATCH v5 1/8] Btrfs: introduce a tree for items that map UUIDs to something

2013-06-20 Thread Stefan Behrens
Mapping UUIDs to subvolume IDs is an operation with a high effort
today. Today, the algorithm even has quadratic effort (based on the
number of existing subvolumes), which means, that it takes minutes
to send/receive a single subvolume if 10,000 subvolumes exist. But
even linear effort would be too much since it is a waste. And these
data structures to allow mapping UUIDs to subvolume IDs are created
every time a btrfs send/receive instance is started.

It is much more efficient to maintain a searchable persistent data
structure in the filesystem, one that is updated whenever a
subvolume/snapshot is created and deleted, and when the received
subvolume UUID is set by the btrfs-receive tool.

Therefore kernel code is added with this commit that is able to
maintain data structures in the filesystem that allow to quickly
search for a given UUID and to retrieve data that is assigned to
this UUID, like which subvolume ID is related to this UUID.

This commit adds a new tree to hold UUID-to-data mapping items. The
key of the items is the full UUID plus the key type BTRFS_UUID_KEY.
Multiple data blocks can be stored for a given UUID, a type/length/
value scheme is used.

Now follows the lengthy justification, why a new tree was added
instead of using the existing root tree:

The first approach was to not create another tree that holds UUID
items. Instead, the items should just go into the top root tree.
Unfortunately this confused the algorithm to assign the objectid
of subvolumes and snapshots. The reason is that
btrfs_find_free_objectid() calls btrfs_find_highest_objectid() for
the first created subvol or snapshot after mounting a filesystem,
and this function simply searches for the largest used objectid in
the root tree keys to pick the next objectid to assign. Of course,
the UUID keys have always been the ones with the highest offset
value, and the next assigned subvol ID was wastefully huge.

To use any other existing tree did not look proper. To apply a
workaround such as setting the objectid to zero in the UUID item
key and to implement collision handling would either add
limitations (in case of a btrfs_extend_item() approach to handle
the collisions) or a lot of complexity and source code (in case a
key would be looked up that is free of collisions). Adding new code
that introduces limitations is not good, and adding code that is
complex and lengthy for no good reason is also not good. That's the
justification why a completely new tree was introduced.

Signed-off-by: Stefan Behrens sbehr...@giantdisaster.de
---
 fs/btrfs/Makefile|   3 +-
 fs/btrfs/ctree.h |  50 ++
 fs/btrfs/uuid-tree.c | 480 +++
 3 files changed, 532 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 3932224..a550dfc 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -8,7 +8,8 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o 
root-tree.o dir-item.o \
   extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
   export.o tree-log.o free-space-cache.o zlib.o lzo.o \
   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
-  reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o
+  reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
+  uuid-tree.o
 
 btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
 btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 76e4983..04447b6 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -91,6 +91,9 @@ struct btrfs_ordered_sum;
 /* holds quota configuration and tracking */
 #define BTRFS_QUOTA_TREE_OBJECTID 8ULL
 
+/* for storing items that use the BTRFS_UUID_KEY */
+#define BTRFS_UUID_TREE_OBJECTID 9ULL
+
 /* for storing balance parameters in the root tree */
 #define BTRFS_BALANCE_OBJECTID -4ULL
 
@@ -953,6 +956,18 @@ struct btrfs_dev_replace_item {
__le64 num_uncorrectable_read_errors;
 } __attribute__ ((__packed__));
 
+/* for items that use the BTRFS_UUID_KEY */
+#define BTRFS_UUID_ITEM_TYPE_SUBVOL0 /* for UUIDs assigned to subvols */
+#define BTRFS_UUID_ITEM_TYPE_RECEIVED_SUBVOL   1 /* for UUIDs assigned to
+  * received subvols */
+
+/* a sequence of such items is stored under the BTRFS_UUID_KEY */
+struct btrfs_uuid_item {
+   __le16 type;/* refer to BTRFS_UUID_ITEM_TYPE* defines above */
+   __le32 len; /* number of following 64bit values */
+   __le64 subid[0];/* sequence of subids */
+} __attribute__ ((__packed__));
+
 /* different types of block groups (and chunks) */
 #define BTRFS_BLOCK_GROUP_DATA (1ULL  0)
 #define BTRFS_BLOCK_GROUP_SYSTEM   (1ULL  1)
@@ -1922,6 +1937,17 @@ struct btrfs_ioctl_defrag_range_args {
 #define BTRFS_DEV_REPLACE_KEY  250
 
 /*
+ * Stores items that allow to quickly map UUIDs to something else.
+ * These 

[PATCH v5 4/8] Btrfs: maintain subvolume items in the UUID tree

2013-06-20 Thread Stefan Behrens
When a new subvolume or snapshot is created, a new UUID item is added
to the UUID tree. Such items are removed when the subvolume is deleted.
The ioctl to set the received subvolume UUID is also touched and will
now also add this received UUID into the UUID tree together with the
ID of the subvolume. The latter is also done when read-only snapshots
are created which inherit all the send/receive information from the
parent subvolume.

User mode programs use the BTRFS_IOC_TREE_SEARCH ioctl to search and
read in the UUID tree.

Signed-off-by: Stefan Behrens sbehr...@giantdisaster.de
---
 fs/btrfs/ctree.h   |  1 +
 fs/btrfs/ioctl.c   | 74 +++---
 fs/btrfs/transaction.c | 19 -
 3 files changed, 83 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 1dac165..f2751e7 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3678,6 +3678,7 @@ extern const struct dentry_operations 
btrfs_dentry_operations;
 long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
 void btrfs_update_iflags(struct inode *inode);
 void btrfs_inherit_iflags(struct inode *inode, struct inode *dir);
+int btrfs_is_empty_uuid(u8 *uuid);
 int btrfs_defrag_file(struct inode *inode, struct file *file,
  struct btrfs_ioctl_defrag_range_args *range,
  u64 newer_than, unsigned long max_pages);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 0e17a30..4e0c292 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -363,6 +363,13 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, 
void __user *arg)
return 0;
 }
 
+int btrfs_is_empty_uuid(u8 *uuid)
+{
+   static char empty_uuid[BTRFS_UUID_SIZE] = {0};
+
+   return !memcmp(uuid, empty_uuid, BTRFS_UUID_SIZE);
+}
+
 static noinline int create_subvol(struct inode *dir,
  struct dentry *dentry,
  char *name, int namelen,
@@ -396,7 +403,7 @@ static noinline int create_subvol(struct inode *dir,
 * of create_snapshot().
 */
ret = btrfs_subvolume_reserve_metadata(root, block_rsv,
-  7, qgroup_reserved);
+  8, qgroup_reserved);
if (ret)
return ret;
 
@@ -518,9 +525,13 @@ static noinline int create_subvol(struct inode *dir,
ret = btrfs_add_root_ref(trans, root-fs_info-tree_root,
 objectid, root-root_key.objectid,
 btrfs_ino(dir), index, name, namelen);
-
BUG_ON(ret);
 
+   ret = btrfs_insert_uuid_subvol_item(trans, root-fs_info-uuid_root,
+   root_item.uuid, objectid);
+   if (ret)
+   btrfs_abort_transaction(trans, root, ret);
+
 fail:
trans-block_rsv = NULL;
trans-bytes_reserved = 0;
@@ -573,9 +584,10 @@ static int create_snapshot(struct btrfs_root *root, struct 
inode *dir,
 * 1 - root item
 * 2 - root ref/backref
 * 1 - root of snapshot
+* 1 - UUID item
 */
ret = btrfs_subvolume_reserve_metadata(BTRFS_I(dir)-root,
-   pending_snapshot-block_rsv, 7,
+   pending_snapshot-block_rsv, 8,
pending_snapshot-qgroup_reserved);
if (ret)
goto out;
@@ -2213,6 +2225,27 @@ static noinline int btrfs_ioctl_snap_destroy(struct file 
*file,
goto out_end_trans;
}
}
+
+   ret = btrfs_del_uuid_subvol_item(trans, root-fs_info-uuid_root,
+dest-root_item.uuid,
+dest-root_key.objectid);
+   if (ret  ret != -ENOENT) {
+   btrfs_abort_transaction(trans, root, ret);
+   err = ret;
+   goto out_end_trans;
+   }
+   if (!btrfs_is_empty_uuid(dest-root_item.received_uuid)) {
+   ret = btrfs_del_uuid_received_subvol_item(trans,
+   root-fs_info-uuid_root,
+   dest-root_item.received_uuid,
+   dest-root_key.objectid);
+   if (ret  ret != -ENOENT) {
+   btrfs_abort_transaction(trans, root, ret);
+   err = ret;
+   goto out_end_trans;
+   }
+   }
+
 out_end_trans:
trans-block_rsv = NULL;
trans-bytes_reserved = 0;
@@ -2424,7 +2457,6 @@ static long btrfs_ioctl_dev_info(struct btrfs_root *root, 
void __user *arg)
struct btrfs_fs_devices *fs_devices = root-fs_info-fs_devices;
int ret = 0;
char *s_uuid = NULL;
-   char empty_uuid[BTRFS_UUID_SIZE] = {0};
 
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -2433,7 

[PATCH v5 7/8] Btrfs: check UUID tree during mount if required

2013-06-20 Thread Stefan Behrens
If the filesystem was mounted with an old kernel that was not
aware of the UUID tree, this is detected by looking at the
uuid_tree_generation field of the superblock (similar to how
the free space cache is doing it). If a mismatch is detected
at mount time, a thread is started that does two things:
1. Iterate through the UUID tree, check each entry, delete those
   entries that are not valid anymore (i.e., the subvol does not
   exist anymore or the value changed).
2. Iterate through the root tree, for each found subvolume, add
   the UUID tree entries for the subvolume (if they are not
   already there).

This mechanism is also used to handle and repair errors that
happened during the initial creation and filling of the tree.
The update of the uuid_tree_generation field (which indicates
that the state of the UUID tree is up to date) is blocked until
all create and repair operations are successfully completed.

Signed-off-by: Stefan Behrens sbehr...@giantdisaster.de
---
 fs/btrfs/ctree.h   |   4 ++
 fs/btrfs/disk-io.c |  17 +-
 fs/btrfs/transaction.c |   3 +-
 fs/btrfs/uuid-tree.c   | 156 +
 fs/btrfs/volumes.c |  82 ++
 fs/btrfs/volumes.h |   1 +
 6 files changed, 261 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 424c38d..817894d 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1648,6 +1648,7 @@ struct btrfs_fs_info {
atomic_t mutually_exclusive_operation_running;
 
struct completion uuid_tree_rescan_completion;
+   unsigned int update_uuid_tree_gen:1;
 };
 
 /*
@@ -3453,6 +3454,9 @@ void btrfs_update_root_times(struct btrfs_trans_handle 
*trans,
 struct btrfs_root *root);
 
 /* uuid-tree.c */
+int btrfs_uuid_tree_iterate(struct btrfs_fs_info *fs_info,
+   int (*check_func)(struct btrfs_fs_info *, u8 *, u16,
+ u64));
 int btrfs_lookup_uuid_subvol_item(struct btrfs_root *uuid_root, u8 *uuid,
  u64 *subvol_id);
 int btrfs_insert_uuid_subvol_item(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a52504b..7508b3a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2112,7 +2112,8 @@ int open_ctree(struct super_block *sb,
int err = -EINVAL;
int num_backups_tried = 0;
int backup_index = 0;
-   bool create_uuid_tree = false;
+   bool create_uuid_tree;
+   bool check_uuid_tree;
 
tree_root = fs_info-tree_root = btrfs_alloc_root(fs_info);
chunk_root = fs_info-chunk_root = btrfs_alloc_root(fs_info);
@@ -2714,9 +2715,13 @@ retry_root_backup:
if (ret != -ENOENT)
goto recovery_tree_root;
create_uuid_tree = true;
+   check_uuid_tree = false;
} else {
uuid_root-track_dirty = 1;
fs_info-uuid_root = uuid_root;
+   create_uuid_tree = false;
+   check_uuid_tree =
+   generation != btrfs_super_uuid_tree_generation(disk_super);
}
 
fs_info-generation = generation;
@@ -2914,7 +2919,17 @@ retry_root_backup:
close_ctree(tree_root);
return ret;
}
+   } else if (check_uuid_tree) {
+   pr_info(btrfs: checking UUID tree\n);
+   ret = btrfs_check_uuid_tree(fs_info);
+   if (ret) {
+   pr_warn(btrfs: failed to check the UUID tree %d\n,
+   ret);
+   close_ctree(tree_root);
+   return ret;
+   }
} else {
+   fs_info-update_uuid_tree_gen = 1;
complete_all(fs_info-uuid_tree_rescan_completion);
}
 
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 1ae9621..cf07548 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1370,7 +1370,8 @@ static void update_super_roots(struct btrfs_root *root)
super-root_level = root_item-level;
if (btrfs_test_opt(root, SPACE_CACHE))
super-cache_generation = root_item-generation;
-   super-uuid_tree_generation = root_item-generation;
+   if (root-fs_info-update_uuid_tree_gen)
+   super-uuid_tree_generation = root_item-generation;
 }
 
 int btrfs_transaction_in_commit(struct btrfs_fs_info *info)
diff --git a/fs/btrfs/uuid-tree.c b/fs/btrfs/uuid-tree.c
index 3939a54..59697d1 100644
--- a/fs/btrfs/uuid-tree.c
+++ b/fs/btrfs/uuid-tree.c
@@ -415,6 +415,162 @@ out:
return ret;
 }
 
+static int btrfs_uuid_iter_rem(struct btrfs_root *uuid_root, u8 *uuid,
+  u16 sub_item_type, u64 subid)
+{
+   struct btrfs_trans_handle *trans;
+   int ret;
+
+   /* 1 - for the uuid item */
+   trans = 

[PATCH v5 0/8] Btrfs: introduce a tree for UUID to subvol ID mapping

2013-06-20 Thread Stefan Behrens
Mapping UUIDs to subvolume IDs is an operation with a high effort
today. Today, the algorithm even has quadratic effort (based on the
number of existing subvolumes), which means, that it takes minutes
to send/receive a single subvolume if 10,000 subvolumes exist. But
even linear effort would be too much since it is a waste. And these
data structures to allow mapping UUIDs to subvolume IDs are created
every time a btrfs send/receive instance is started.

So the issue to address is that Btrfs send / receive does not work
as it is today when a high number of subvolumes exist.

The table below shows the time it takes on my testbox to send _one_
empty subvolume depending on the number of subvolume that exist in
the filesystem.

# of subvols  | without| with
in filesystem | UUID tree  | UUID tree
--++--
2 |  0m00.004s | 0m00.003s
 1000 |  0m07.010s | 0m00.004s
 2000 |  0m28.210s | 0m00.004s
 3000 |  1m04.872s | 0m00.004s
 4000 |  1m56.059s | 0m00.004s
 5000 |  3m00.489s | 0m00.004s
 6000 |  4m27.376s | 0m00.004s
 7000 |  6m08.938s | 0m00.004s
 8000 |  7m54.020s | 0m00.004s
 9000 | 10m05.108s | 0m00.004s
1 | 12m47.406s | 0m00.004s
11000 | 15m05.800s | 0m00.004s
12000 | 18m00.170s | 0m00.004s
13000 | 21m39.438s | 0m00.004s
14000 | 24m54.681s | 0m00.004s
15000 | 28m09.096s | 0m00.004s
16000 | 33m08.856s | 0m00.004s
17000 | 37m10.562s | 0m00.004s
18000 | 41m44.727s | 0m00.004s
19000 | 46m14.335s | 0m00.004s
2 | 51m55.100s | 0m00.004s
21000 | 56m54.346s | 0m00.004s
22000 | 62m53.466s | 0m00.004s
23000 | 66m57.328s | 0m00.004s
24000 | 73m59.687s | 0m00.004s
25000 | 81m24.476s | 0m00.004s
26000 | 87m11.478s | 0m00.004s
27000 | 92m59.225s | 0m00.004s

Or as a chart:
http://btrfs.giantdisaster.de/Btrfs-send-recv-perf.pdf

It is much more efficient to maintain a searchable persistent data
structure in the filesystem, one that is updated whenever a
subvolume/snapshot is created and deleted, and when the received
subvolume UUID is set by the btrfs-receive tool.

Therefore kernel code is added that is able to maintain data
structures in the filesystem that allow to quickly search for a
given UUID and to retrieve the subvol ID.

Now follows the lengthy justification, why a new tree was added
instead of using the existing root tree:

The first approach was to not create another tree that holds UUID
items. Instead, the items should just go into the top root tree.
Unfortunately this confused the algorithm to assign the objectid
of subvolumes and snapshots. The reason is that
btrfs_find_free_objectid() calls btrfs_find_highest_objectid() for
the first created subvol or snapshot after mounting a filesystem,
and this function simply searches for the largest used objectid in
the root tree keys to pick the next objectid to assign. Of course,
the UUID keys have always been the ones with the highest offset
value, and the next assigned subvol ID was wastefully huge.

To use any other existing tree did not look proper. To apply a
workaround such as setting the objectid to zero in the UUID item
key and to implement collision handling would either add
limitations (in case of a btrfs_extend_item() approach to handle
the collisions) or a lot of complexity and source code (in case a
key would be looked up that is free of collisions). Adding new code
that introduces limitations is not good, and adding code that is
complex and lengthy for no good reason is also not good. That's the
justification why a completely new tree was introduced.

v1 - v2:
- All review comments from David Sterba, Josef Bacik and Jan Schmidt
  are addressed.
  The hugest change was to add a mechanism that handles the case that
  the filesystem is mounted with an older kernel. Now that case is
  detected when the filesystem is mounted with a newer kernel again,
  and the UUID tree is updated in the background.

v2 - v3:
- All review comments from Liu Bo are addressed:
  - shrinked the size of the uuid_item.
  - fixed the issue that the uuid-tree was not using the transaction
block reserve.

v3 - v4:
- Fixed a bug. A corrupted UUID tree entry could have caused an endless
  loop in the check+rescan thread.

v4 - v5:
- On demand from multiple persons, the way was changed that a umount
  waits for the completion of the uuid tree rescan thread. Now a
  struct completion is used instead of a struct semaphore.

Stefan Behrens (8):
  Btrfs: introduce a tree for items that map UUIDs to something
  Btrfs: support printing UUID tree elements
  Btrfs: create UUID tree if required
  Btrfs: maintain subvolume items in the UUID tree
  Btrfs: fill UUID tree initially
  Btrfs: introduce uuid-tree-gen field
  Btrfs: check UUID tree during mount if required
  Btrfs: add mount option to force UUID tree checking

 

[PATCH v5 5/8] Btrfs: fill UUID tree initially

2013-06-20 Thread Stefan Behrens
When the UUID tree is initially created, a task is spawned that
walks through the root tree. For each found subvolume root_item,
the uuid and received_uuid entries in the UUID tree are added.
This is such a quick operation so that in case somebody wants
to unmount the filesystem while the task is still running, the
unmount is delayed until the UUID tree building task is finished.

Signed-off-by: Stefan Behrens sbehr...@giantdisaster.de
---
 fs/btrfs/ctree.h   |   2 +
 fs/btrfs/disk-io.c |   6 +++
 fs/btrfs/volumes.c | 148 -
 3 files changed, 155 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index f2751e7..89b2d78 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1645,6 +1645,8 @@ struct btrfs_fs_info {
struct btrfs_dev_replace dev_replace;
 
atomic_t mutually_exclusive_operation_running;
+
+   struct completion uuid_tree_rescan_completion;
 };
 
 /*
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 1db446a..a52504b 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2288,6 +2288,7 @@ int open_ctree(struct super_block *sb,
init_rwsem(fs_info-extent_commit_sem);
init_rwsem(fs_info-cleanup_work_sem);
init_rwsem(fs_info-subvol_sem);
+   init_completion(fs_info-uuid_tree_rescan_completion);
fs_info-dev_replace.lock_owner = 0;
atomic_set(fs_info-dev_replace.nesting_level, 0);
mutex_init(fs_info-dev_replace.lock_finishing_cancel_unmount);
@@ -2913,6 +2914,8 @@ retry_root_backup:
close_ctree(tree_root);
return ret;
}
+   } else {
+   complete_all(fs_info-uuid_tree_rescan_completion);
}
 
return 0;
@@ -3543,6 +3546,9 @@ int close_ctree(struct btrfs_root *root)
fs_info-closing = 1;
smp_mb();
 
+   /* wait for the uuid_scan task to finish */
+   wait_for_completion(fs_info-uuid_tree_rescan_completion);
+
/* pause restriper - we want to resume on mount */
btrfs_pause_balance(fs_info);
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d4c7955..e2e2bbc 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3411,11 +3411,145 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info)
return 0;
 }
 
+static int btrfs_uuid_scan_kthread(void *data)
+{
+   struct btrfs_fs_info *fs_info = data;
+   struct btrfs_root *root = fs_info-tree_root;
+   struct btrfs_key key;
+   struct btrfs_key max_key;
+   struct btrfs_path *path = NULL;
+   int ret = 0;
+   struct extent_buffer *eb;
+   int slot;
+   struct btrfs_root_item root_item;
+   u32 item_size;
+   struct btrfs_trans_handle *trans;
+
+   path = btrfs_alloc_path();
+   if (!path) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   key.objectid = 0;
+   key.type = BTRFS_ROOT_ITEM_KEY;
+   key.offset = 0;
+
+   max_key.objectid = (u64)-1;
+   max_key.type = BTRFS_ROOT_ITEM_KEY;
+   max_key.offset = (u64)-1;
+
+   path-keep_locks = 1;
+
+   while (1) {
+   ret = btrfs_search_forward(root, key, max_key, path, 0);
+   if (ret) {
+   if (ret  0)
+   ret = 0;
+   break;
+   }
+
+   if (key.type != BTRFS_ROOT_ITEM_KEY ||
+   (key.objectid  BTRFS_FIRST_FREE_OBJECTID 
+key.objectid != BTRFS_FS_TREE_OBJECTID) ||
+   key.objectid  BTRFS_LAST_FREE_OBJECTID)
+   goto skip;
+
+   eb = path-nodes[0];
+   slot = path-slots[0];
+   item_size = btrfs_item_size_nr(eb, slot);
+   if (item_size  sizeof(root_item))
+   goto skip;
+
+   trans = NULL;
+   read_extent_buffer(eb, root_item,
+  btrfs_item_ptr_offset(eb, slot),
+  (int)sizeof(root_item));
+   if (btrfs_root_refs(root_item) == 0)
+   goto skip;
+   if (!btrfs_is_empty_uuid(root_item.uuid)) {
+   /*
+* 1 - subvol uuid item
+* 1 - received_subvol uuid item
+*/
+   trans = btrfs_start_transaction(fs_info-uuid_root, 2);
+   if (IS_ERR(trans)) {
+   ret = PTR_ERR(trans);
+   break;
+   }
+   ret = btrfs_insert_uuid_subvol_item(trans,
+   fs_info-uuid_root,
+   root_item.uuid,
+   key.objectid);
+   if (ret  0) 

[PATCH v5 8/8] Btrfs: add mount option to force UUID tree checking

2013-06-20 Thread Stefan Behrens
This should never be needed, but since all functions are there
to check and rebuild the UUID tree, a mount option is added that
allows to force this check and rebuild procedure.

Signed-off-by: Stefan Behrens sbehr...@giantdisaster.de
---
 fs/btrfs/ctree.h   | 1 +
 fs/btrfs/disk-io.c | 3 ++-
 fs/btrfs/super.c   | 8 +++-
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 817894d..ea1adf6 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1986,6 +1986,7 @@ struct btrfs_ioctl_defrag_range_args {
 #define BTRFS_MOUNT_CHECK_INTEGRITY(1  20)
 #define BTRFS_MOUNT_CHECK_INTEGRITY_INCLUDING_EXTENT_DATA (1  21)
 #define BTRFS_MOUNT_PANIC_ON_FATAL_ERROR   (1  22)
+#define BTRFS_MOUNT_RESCAN_UUID_TREE   (1  23)
 
 #define btrfs_clear_opt(o, opt)((o) = ~BTRFS_MOUNT_##opt)
 #define btrfs_set_opt(o, opt)  ((o) |= BTRFS_MOUNT_##opt)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 7508b3a..e76554b 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2919,7 +2919,8 @@ retry_root_backup:
close_ctree(tree_root);
return ret;
}
-   } else if (check_uuid_tree) {
+   } else if (check_uuid_tree ||
+  btrfs_test_opt(tree_root, RESCAN_UUID_TREE)) {
pr_info(btrfs: checking UUID tree\n);
ret = btrfs_check_uuid_tree(fs_info);
if (ret) {
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 8eb6191..191f281 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -320,7 +320,7 @@ enum {
Opt_enospc_debug, Opt_subvolrootid, Opt_defrag, Opt_inode_cache,
Opt_no_space_cache, Opt_recovery, Opt_skip_balance,
Opt_check_integrity, Opt_check_integrity_including_extent_data,
-   Opt_check_integrity_print_mask, Opt_fatal_errors,
+   Opt_check_integrity_print_mask, Opt_fatal_errors, Opt_rescan_uuid_tree,
Opt_err,
 };
 
@@ -360,6 +360,7 @@ static match_table_t tokens = {
{Opt_check_integrity, check_int},
{Opt_check_integrity_including_extent_data, check_int_data},
{Opt_check_integrity_print_mask, check_int_print_mask=%d},
+   {Opt_rescan_uuid_tree, rescan_uuid_tree},
{Opt_fatal_errors, fatal_errors=%s},
{Opt_err, NULL},
 };
@@ -554,6 +555,9 @@ int btrfs_parse_options(struct btrfs_root *root, char 
*options)
case Opt_space_cache:
btrfs_set_opt(info-mount_opt, SPACE_CACHE);
break;
+   case Opt_rescan_uuid_tree:
+   btrfs_set_opt(info-mount_opt, RESCAN_UUID_TREE);
+   break;
case Opt_no_space_cache:
printk(KERN_INFO btrfs: disabling disk space 
caching\n);
btrfs_clear_opt(info-mount_opt, SPACE_CACHE);
@@ -928,6 +932,8 @@ static int btrfs_show_options(struct seq_file *seq, struct 
dentry *dentry)
seq_puts(seq, ,space_cache);
else
seq_puts(seq, ,nospace_cache);
+   if (btrfs_test_opt(root, RESCAN_UUID_TREE))
+   seq_puts(seq, ,rescan_uuid_tree);
if (btrfs_test_opt(root, CLEAR_CACHE))
seq_puts(seq, ,clear_cache);
if (btrfs_test_opt(root, USER_SUBVOL_RM_ALLOWED))
-- 
1.8.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 1/8] Btrfs: introduce a tree for items that map UUIDs to something

2013-06-20 Thread Zach Brown

 +/* for items that use the BTRFS_UUID_KEY */
 +#define BTRFS_UUID_ITEM_TYPE_SUBVOL  0 /* for UUIDs assigned to subvols */
 +#define BTRFS_UUID_ITEM_TYPE_RECEIVED_SUBVOL 1 /* for UUIDs assigned to
 +* received subvols */
 +
 +/* a sequence of such items is stored under the BTRFS_UUID_KEY */
 +struct btrfs_uuid_item {
 + __le16 type;/* refer to BTRFS_UUID_ITEM_TYPE* defines above */
 + __le32 len; /* number of following 64bit values */
 + __le64 subid[0];/* sequence of subids */
 +} __attribute__ ((__packed__));
 +

[...]

  /*
 + * Stores items that allow to quickly map UUIDs to something else.
 + * These items are part of the filesystem UUID tree.
 + * The key is built like this:
 + * (UUID_upper_64_bits, BTRFS_UUID_KEY, UUID_lower_64_bits).
 + */
 +#if BTRFS_UUID_SIZE != 16
 +#error UUID items require BTRFS_UUID_SIZE == 16!
 +#endif
 +#define BTRFS_UUID_KEY   251

Why do we need this btrfs_uuid_item structure?  Why not set the key type
to either _SUBVOL or _RECEIVED_SUBVOL instead of embedding structs with
those types under items with the constant BTRFS_UUID_KEY.  Then use the
item size to determine the number of u64 subids.  Then the item has a
simple array of u64s in the data which will be a lot easier to work
with.

 +/* btrfs_uuid_item */
 +BTRFS_SETGET_FUNCS(uuid_type, struct btrfs_uuid_item, type, 16);
 +BTRFS_SETGET_FUNCS(uuid_len, struct btrfs_uuid_item, len, 32);
 +BTRFS_SETGET_STACK_FUNCS(stack_uuid_type, struct btrfs_uuid_item, type, 16);
 +BTRFS_SETGET_STACK_FUNCS(stack_uuid_len, struct btrfs_uuid_item, len, 32);

This would all go away.

 +/*
 + * One key is used to store a sequence of btrfs_uuid_item items.
 + * Each item in the sequence contains a type information and a sequence of
 + * ids (together with the information about the size of the sequence of ids).
 + * {[btrfs_uuid_item type0 {id0, id1, ..., idN}],
 + *  ...,
 + *  [btrfs_uuid_item typeZ {id0, id1, ..., idN}]}
 + *
 + * It is forbidden to put multiple items with the same type under the same 
 key.
 + * Instead the sequence of ids is extended and used to store any additional
 + * ids for the same item type.

This constraint, and the cost of ensuring it and repairing violations,
would go away.

 +static struct btrfs_uuid_item *btrfs_match_uuid_item_type(
 + struct btrfs_path *path, u16 type)
 +{
 + struct extent_buffer *eb;
 + int slot;
 + struct btrfs_uuid_item *ptr;
 + u32 item_size;
 +
 + eb = path-nodes[0];
 + slot = path-slots[0];
 + ptr = btrfs_item_ptr(eb, slot, struct btrfs_uuid_item);
 + item_size = btrfs_item_size_nr(eb, slot);
 + do {
 + u16 sub_item_type;
 + u64 sub_item_len;
 +
 + if (item_size  sizeof(*ptr)) {
 + pr_warn(btrfs: uuid item too short (%lu  %d)!\n,
 + (unsigned long)item_size, (int)sizeof(*ptr));
 + return NULL;
 + }
 + item_size -= sizeof(*ptr);
 + sub_item_type = btrfs_uuid_type(eb, ptr);
 + sub_item_len = btrfs_uuid_len(eb, ptr);
 + if (sub_item_len * sizeof(u64)  item_size) {
 + pr_warn(btrfs: uuid item too short (%llu  %lu)!\n,
 + (unsigned long long)(sub_item_len *
 +  sizeof(u64)),
 + (unsigned long)item_size);
 + return NULL;
 + }
 + if (sub_item_type == type)
 + return ptr;
 + item_size -= sub_item_len * sizeof(u64);
 + ptr = 1 + (struct btrfs_uuid_item *)
 + (((char *)ptr) + (sub_item_len * sizeof(u64)));
 + } while (item_size);

 +static int btrfs_uuid_tree_lookup_prepare(struct btrfs_root *uuid_root,
 +   u8 *uuid, u16 type,
 +   struct btrfs_path *path,
 +   struct btrfs_uuid_item **ptr)
 +{
 + int ret;
 + struct btrfs_key key;
 +
 + if (!uuid_root) {
 + WARN_ON_ONCE(1);
 + ret = -ENOENT;
 + goto out;
 + }
 +
 + btrfs_uuid_to_key(uuid, key);
 +
 + ret = btrfs_search_slot(NULL, uuid_root, key, path, 0, 0);
 + if (ret  0)
 + goto out;
 + if (ret  0) {
 + ret = -ENOENT;
 + goto out;
 + }
 +
 + *ptr = btrfs_match_uuid_item_type(path, type);
 + if (!*ptr) {
 + ret = -ENOENT;
 + goto out;
 + }
 +
 + ret = 0;
 +
 +out:
 + return ret;
 +}

All of this is replaced with the simple search_slot in the caller.

 + offset = (unsigned long)ptr;
 + while (sub_item_len  0) {
 + u64 data;
 +
 + read_extent_buffer(eb, data, offset, sizeof(data));
 + data = le64_to_cpu(data);
 + 

Re: hang on 3.9, 3.10-rc5

2013-06-20 Thread Sage Weil
On Wed, 19 Jun 2013, Sage Weil wrote:
 Hi Chris,
 
 On Tue, 18 Jun 2013, Chris Mason wrote:
  [...]
  Very long way of saying I think we're one release_path short.  Sage, I
  haven't tested this at all yet, I was hoping to trigger it first.
  
  diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
  index c276ac9..c1954b3 100644
  --- a/fs/btrfs/tree-log.c
  +++ b/fs/btrfs/tree-log.c
  @@ -3730,6 +3730,7 @@ next_slot:
   log_extents:
  if (fast_search) {
  btrfs_release_path(dst_path);
  +   btrfs_release_path(path);
  ret = btrfs_log_changed_extents(trans, root, inode, dst_path);
  if (ret) {
  err = ret;
 
 This seems to be doing the trick.  I'll keep testing overnight, but so far 
 so good!

...and it's still holding up well in QA.

Thanks, Chris!
sage
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hang on 3.9, 3.10-rc5

2013-06-20 Thread Chris Mason
Quoting Sage Weil (2013-06-20 17:56:19)
 On Wed, 19 Jun 2013, Sage Weil wrote:
  Hi Chris,
  
  On Tue, 18 Jun 2013, Chris Mason wrote:
   [...]
   Very long way of saying I think we're one release_path short.  Sage, I
   haven't tested this at all yet, I was hoping to trigger it first.
   
   diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
   index c276ac9..c1954b3 100644
   --- a/fs/btrfs/tree-log.c
   +++ b/fs/btrfs/tree-log.c
   @@ -3730,6 +3730,7 @@ next_slot:
log_extents:
   if (fast_search) {
   btrfs_release_path(dst_path);
   +   btrfs_release_path(path);
   ret = btrfs_log_changed_extents(trans, root, inode, dst_path);
   if (ret) {
   err = ret;
  
  This seems to be doing the trick.  I'll keep testing overnight, but so far 
  so good!
 
 ...and it's still holding up well in QA.

Awesome, thanks for getting the traces for us.  Looks like this one has
been around since v3.7, so I'm not going to try and sneak it into the
3.10 final.  I'll have it in the next merge window and for stable.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hang on 3.9, 3.10-rc5

2013-06-20 Thread Sage Weil
On Thu, 20 Jun 2013, Chris Mason wrote:
 Quoting Sage Weil (2013-06-20 17:56:19)
  On Wed, 19 Jun 2013, Sage Weil wrote:
   Hi Chris,
   
   On Tue, 18 Jun 2013, Chris Mason wrote:
[...]
Very long way of saying I think we're one release_path short.  Sage, I
haven't tested this at all yet, I was hoping to trigger it first.

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index c276ac9..c1954b3 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3730,6 +3730,7 @@ next_slot:
 log_extents:
if (fast_search) {
btrfs_release_path(dst_path);
+   btrfs_release_path(path);
ret = btrfs_log_changed_extents(trans, root, inode, 
dst_path);
if (ret) {
err = ret;
   
   This seems to be doing the trick.  I'll keep testing overnight, but so 
   far 
   so good!
  
  ...and it's still holding up well in QA.
 
 Awesome, thanks for getting the traces for us.  Looks like this one has
 been around since v3.7, so I'm not going to try and sneak it into the
 3.10 final.  I'll have it in the next merge window and for stable.

Weird, these same tests have been running on it nightly for ages and it 
seems like these failures just started with 3.9.  Perhaps some other 
change made it hang when it didn't before?

In any case, thanks!
sage
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hang on 3.9, 3.10-rc5

2013-06-20 Thread Chris Mason
Quoting Sage Weil (2013-06-20 21:00:21)
 On Thu, 20 Jun 2013, Chris Mason wrote:
  Awesome, thanks for getting the traces for us.  Looks like this one has
  been around since v3.7, so I'm not going to try and sneak it into the
  3.10 final.  I'll have it in the next merge window and for stable.
 
 Weird, these same tests have been running on it nightly for ages and it 
 seems like these failures just started with 3.9.  Perhaps some other 
 change made it hang when it didn't before?
 

It's always possible, there are a ton of moving pieces here.  The
wait_event you were hung on was waiting for crcs to finish, and that
part at least isn't new.

Somewhat unrelated, but are you still using notreelog?

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hang on 3.9, 3.10-rc5

2013-06-20 Thread Sage Weil
On Thu, 20 Jun 2013, Chris Mason wrote:
 Quoting Sage Weil (2013-06-20 21:00:21)
  On Thu, 20 Jun 2013, Chris Mason wrote:
   Awesome, thanks for getting the traces for us.  Looks like this one has
   been around since v3.7, so I'm not going to try and sneak it into the
   3.10 final.  I'll have it in the next merge window and for stable.
  
  Weird, these same tests have been running on it nightly for ages and it 
  seems like these failures just started with 3.9.  Perhaps some other 
  change made it hang when it didn't before?
  
 
 It's always possible, there are a ton of moving pieces here.  The
 wait_event you were hung on was waiting for crcs to finish, and that
 part at least isn't new.

K.  There was also a shift of writes to leveldb (which does the mmap 
thing), so that may explain the change in behavior.

 Somewhat unrelated, but are you still using notreelog?

Nope, just noatime.

Thanks-
sage
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] Btrfs-progs: exhance btrfs-image to restore image onto multiple disks

2013-06-20 Thread Chris Mason
Quoting Liu Bo (2013-06-20 08:05:30)
 This adds a 'btrfs-image -m' option, which let us restore an image that
 is built from a btrfs of multiple disks onto several disks altogether.

I'd like to pull this in, could you please rebase it against my current
master?

Thanks!

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hang on 3.9, 3.10-rc5

2013-06-20 Thread Chris Mason
Quoting Jon Nelson (2013-06-18 13:19:04)
 Josef Bacik jbacik at fusionio.com writes:
 
  
  On Tue, Jun 11, 2013 at 11:43:30AM -0400, Sage Weil wrote:
   I'm also seeing this hang regularly with both 3.9 and 3.10-rc5.  Is this 
   is a known problem?  In this case there is no powercycling; just a 
   regular 
   ceph-osd workload.
 
 ..
 
 
 I'm able to cause a complete kernel hang by defrag'ing even one 
 file on 3.9.X (3.9.0 through 3.9.4, so far).

I'm not able to reproduce this here.  Could you please capture the
output from sysrq-w during the hang?  (It will all go to your dmesg)

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] Btrfs-progs: exhance btrfs-image to restore image onto multiple disks

2013-06-20 Thread Liu Bo
On Thu, Jun 20, 2013 at 09:10:24PM -0400, Chris Mason wrote:
 Quoting Liu Bo (2013-06-20 08:05:30)
  This adds a 'btrfs-image -m' option, which let us restore an image that
  is built from a btrfs of multiple disks onto several disks altogether.
 
 I'd like to pull this in, could you please rebase it against my current
 master?

Yeah, I'll rebase it now.

thanks,
liubo

 
 Thanks!
 
 -chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hang on 3.9, 3.10-rc5

2013-06-20 Thread Jon Nelson
Is this what you are looking for?
After this, the CPU gets stuck and I have to reboot.


[360491.932226] [ cut here ]
[360491.932261] kernel BUG at
/home/abuild/rpmbuild/BUILD/kernel-desktop-3.9.6/linux-3.9/fs/btrfs/ctree.c:1144!
[360491.932312] invalid opcode:  [#1] PREEMPT SMP
[360491.932344] Modules linked in: xfs nilfs2 jfs usb_storage
nls_iso8859_1 nls_cp437 vfat fat mmc_block nfsv4 auth_rpcgss nfs
fscache lockd sunrpc tun snd_usb_audio snd_usbmidi_lib snd_rawmidi
snd_seq_device fuse xt_tcpudp xt_pkttype xt_LOG xt_limit af_packet
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT
iptable_raw xt_CT iptable_filter ip6table_mangle
nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4
nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter
ip6_tables x_tables arc4 iwldvm mac80211 snd_hda_codec_hdmi
snd_hda_codec_conexant iTCO_wdt iTCO_vendor_support mperf coretemp
kvm_intel kvm snd_hda_intel snd_hda_codec microcode snd_hwdep snd_pcm
thinkpad_acpi snd_timer joydev pcspkr sr_mod snd tpm_tis iwlwifi cdrom
e1000e wmi tpm battery ac tpm_bios cfg80211 sdhci_pci soundcore ptp
sdhci snd_page_alloc i2c_i801 rfkill pps_core mmc_core lpc_ich
mfd_core tcp_westwood sg autofs4 btrfs raid6_pq zlib_deflate xor
libcrc32c sha256_generic dm_crypt dm_mod crc32_pclmul
ghash_clmulni_intel crc32c_intel aesni_intel ablk_helper cryptd lrw
aes_x86_64 xts gf128mul thermal i915 drm_kms_helper drm i2c_algo_bit
video button processor thermal_sys scsi_dh_rdac scsi_dh_hp_sw
scsi_dh_emc scsi_dh_alua scsi_dh ata_generic ata_piix
[360491.933095] CPU 3
[360491.933110] Pid: 22166, comm: btrfs-endio-wri Not tainted
3.9.6-1.g8ead728-desktop #1 LENOVO 4239CTO/4239CTO
[360491.933161] RIP: 0010:[a022075b]  [a022075b]
__tree_mod_log_rewind+0x23b/0x240 [btrfs]
[360491.933225] RSP: 0018:88014cad7888  EFLAGS: 00010297
[360491.933253] RAX:  RBX: 880065705b40 RCX:
88014cad7828
[360491.933289] RDX: 0247cc54 RSI: 88014949e821 RDI:
8801bf916640
[360491.933326] RBP: 880073048490 R08: 1000 R09:
88014cad7838
[360491.933362] R10:  R11:  R12:
8801c0556640
[360491.933398] R13: 0001731e R14: 003d R15:
880158b76340
[360491.933435] FS:  () GS:88021e2c()
knlGS:
[360491.933476] CS:  0010 DS:  ES:  CR0: 80050033
[360491.933506] CR2: 7f0a360017f8 CR3: 00015be03000 CR4:
000407e0
[360491.933543] DR0:  DR1:  DR2:

[360491.933579] DR3:  DR6: 0ff0 DR7:
0400
[360491.933617] Process btrfs-endio-wri (pid: 22166, threadinfo
88014cad6000, task 8801a5f8a6c0)
[360491.933662] Stack:
[360491.933674]  88011b573800 88010126e7f0 0001
1600
[360491.933719]  880073048490 a0228d45 6db6db6db6db6db7
8801c0556640
[360491.933763]  88020e764000 8800 0001731e
880197cb8158
[360491.933807] Call Trace:
[360491.933848]  [a0228d45] btrfs_search_old_slot+0x635/0x950 [btrfs]
[360491.933909]  [a02a1ec6]
__resolve_indirect_refs+0x156/0x640 [btrfs]
[360491.934044]  [a02a2e0c] find_parent_nodes+0x95c/0x1050 [btrfs]
[360491.934176]  [a02a3592] btrfs_find_all_roots+0x92/0x100 [btrfs]
[360491.934307]  [a02a401e] iterate_extent_inodes+0x16e/0x370 [btrfs]
[360491.934440]  [a02a42b8]
iterate_inodes_from_logical+0x98/0xc0 [btrfs]
[360491.934572]  [a024c1c8] record_extent_backrefs+0x68/0xe0 [btrfs]
[360491.934652]  [a0256d80]
btrfs_finish_ordered_io+0x150/0x990 [btrfs]
[360491.934739]  [a0276ef3] worker_loop+0x153/0x560 [btrfs]
[360491.934833]  [810697c3] kthread+0xb3/0xc0
[360491.934864]  [815dc6bc] ret_from_fork+0x7c/0xb0
[360491.934896] DWARF2 unwinder stuck at ret_from_fork+0x7c/0xb0
[360491.934925]
[360491.934934] Leftover inexact backtrace:
[360491.934934]
[360491.934965]  [81069710] ? kthread_create_on_node+0x120/0x120
[360491.934999] Code: c1 48 63 43 58 48 89 c2 48 c1 e2 05 48 8d 54 10
65 48 63 43 2c 48 89 c6 48 c1 e6 05 48 8d 74 30 65 e8 3a af 04 00 e9
b3 fe ff ff 0f 0b 0f 0b 90 41 57 41 56 41 55 41 54 55 48 89 fd 53 48
83 ec
[360491.935188] RIP  [a022075b]
__tree_mod_log_rewind+0x23b/0x240 [btrfs]
[360491.935233]  RSP 88014cad7888
[360491.946047] ---[ end trace 1475a0830dcadf9c ]---
[360491.946051] note: btrfs-endio-wri[22166] exited with preempt_count 1

On Thu, Jun 20, 2013 at 8:11 PM, Chris Mason chris.ma...@fusionio.com wrote:
 Quoting Jon Nelson (2013-06-18 13:19:04)
 Josef Bacik jbacik at fusionio.com writes:

 
  On Tue, Jun 11, 2013 at 11:43:30AM -0400, Sage Weil wrote:
   I'm also seeing this hang regularly with both 3.9 and 3.10-rc5.  Is this
   is a known problem?  In this case there is no powercycling; just a 
   regular
   ceph-osd 

Re: hang on 3.9, 3.10-rc5

2013-06-20 Thread Chris Murphy

On Jun 20, 2013, at 7:46 PM, Jon Nelson jnel...@jamponi.net wrote:

 Is this what you are looking for?

If you're able to reproduce while you're remoted in via ssh, then if you get 
the dmesg at least you won't have to spend time trying to save it somewhere 
since you'll have it on the remote system's terminal window.

https://www.kernel.org/doc/Documentation/sysrq.txt

So basically:
echo w  /proc/sysrq-trigger
dmesg




Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html