Re: UI issues around RAID1

2009-11-17 Thread jim owens

Andrey Kuzmin wrote:

On Tue, Nov 17, 2009 at 12:48 AM, jim owens jow...@hp.com wrote:

But as we have said many times... if we have different
raid types active on different files, any attempt to make


Late question, but could you please explain it a bit further (or point
me to respective discussion archive)? Did I get it correct that btrfs
supports per-file raid topology? Or is it per-(sub)volume?


The design of btrfs actually allows each extent inside a file
to have a different raid type.  This probably will never happen
unless a file is written, we add disks and mount with a new
raid type, and then modify part of the file. (this may not
behave how I think but I plan to test it someday soon).

There is a flag on the file to allow per-file raid setting
via ioctl/fcntl.  The typical use for this would be to
make a file DUPlicate type on a simple disk.  DUPlicate acts
like a raid 1 mirror on a single drive and is the default raid
type for metadata extents.

[disclaimer] btrfs is still in development and Chris might
say it does not (or will not in the future) work like I think.


df report raid adjusted numbers instead of the current raw
total storage numbers is going to sometimes give wrong answers.


I have always thought that space (both physical and logical) used by
file-system could be accounted for correctly whatever topology or a
mixture thereof is in effect, the only point worth discussion being
accounting overhead. Free space, under variable topology, of course
can only be reliably reported as raw (or as an 'if you use this
topology,-then you have this logical capacity left' list).


So we know the raw free blocks, but can not guarantee
how many raw blocks per new user write-block will be
consumed because we do not know what topology will be
in effect for a new write.

We could cheat and use worst-case topology numbers
if all writes are the current default raid.  Of course
this ignores DUP unless it is set on the whole filesystem.

And we also have the problem of metadata - which is dynamic
and allocated in large chunks and has a DUP type, how do we
account for that in worst-case calculations.

The worst-case is probably wrong but may be more useful to
people to know when they will run out of space. Or at least
it might make some of our ENOSPC complaints go away :)

Only raw and worst-case can be explained to users and
which we report is up to Chris.  Today we report raw.

After spending 10 years on a multi-volume filesystem that
had (unsolvable) confusing df output, I'm just of the
opinion that nothing we do will make everyone happy.

But feel free to run a patch proposal by Chris.

jim
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RFC: Btrfs snapshots feature proposal for Fedora 13

2009-11-17 Thread Chris Ball
Hi,

I've written up a draft of an Fedora 13 feature proposal for
filesystem rollback using Btrfs snapshots that are automatically
created by yum:

   https://fedoraproject.org/wiki/Features/SystemRollbackWithBtrfs

It'd be great to get feedback on whether this is a good idea, and how
the UI interaction should work.  We're also discussing it in this
fedora-devel thread:

   http://thread.gmane.org/gmane.linux.redhat.fedora.devel/123695

Some comments I've got already received, from the thread:

* People want the UI to allow independent active snapshots per
  filesystem (i.e. btrfs /home is the live filesystem, and btrfs / is
  an older snapshot).

* Several people think that the ZFS Time Slider patches to nautilus¹
  look good, and want that for btrfs.  Sounds plausible, but I'm
  more interested in first working on ways to let developers feel
  comfortable upgrading to the development version of Fedora each
  day with the possibility of reverting.

* Instead of inventing a new system-config-blah, this should probably
  be part of Palimpsest².

* Perhaps we should encourage people using the Fedora installer with
  btrfs to create a rootfs separate to their /home, so that they can
  rollback rootfs snapshots without affecting their homedir.

Thanks!

- Chris.

¹:  http://blogs.sun.com/erwann/entry/zfs_on_the_desktop_zfs
http://blogs.sun.com/erwann/entry/time_slider_screencast
http://blogs.sun.com/erwann/entry/new_time_slider_features_in

²:  http://library.gnome.org/users/palimpsest/stable/intro.html.en
-- 
Chris Ball   c...@laptop.org
One Laptop Per Child
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: Btrfs snapshots feature proposal for Fedora 13

2009-11-17 Thread Goffredo Baroncelli
On Tuesday 17 November 2009, Chris Ball wrote:
 Hi,
 
 I've written up a draft of an Fedora 13 feature proposal for
 filesystem rollback using Btrfs snapshots that are automatically
 created by yum:
 
https://fedoraproject.org/wiki/Features/SystemRollbackWithBtrfs
 
 It'd be great to get feedback on whether this is a good idea, and how
 the UI interaction should work.  We're also discussing it in this
 fedora-devel thread:
 
http://thread.gmane.org/gmane.linux.redhat.fedora.devel/123695
 
 Some comments I've got already received, from the thread:
 
 * People want the UI to allow independent active snapshots per
   filesystem (i.e. btrfs /home is the live filesystem, and btrfs / is
   an older snapshot).

On the basis of some empirical tests, I discovered that in btrfs a snapshot 
doesn't doens't affect the other subvolume(s). If / (root) and /home are 
different subvolumes, a snapshot of the / (root) doesn't affect the /home 
content, and viceversa. 

So if the root and the /home directory (or better the userS directories) are 
separate volumes, you have the required behavior.

 
 * Several people think that the ZFS Time Slider patches to nautilus¹
   look good, and want that for btrfs.  Sounds plausible, but I'm
   more interested in first working on ways to let developers feel
   comfortable upgrading to the development version of Fedora each
   day with the possibility of reverting.
 
 * Instead of inventing a new system-config-blah, this should probably
   be part of Palimpsest².
 
 * Perhaps we should encourage people using the Fedora installer with
   btrfs to create a rootfs separate to their /home, so that they can
   rollback rootfs snapshots without affecting their homedir.

On the basis of my tests, I think that is sufficient to create a volume for 
the root ('/') and on for the /home (or a specific subvolume for every user). 
Then it is possible to snapshot and time sliding every subvolume without 
affecting the others.

I would like to add a my comment: in btrfs I think that snapshot (for the 
btrfs snapshot) is not the best name. I think that a better term is branch. 

For example the btrfs snapshot capability may be used not only for recovering 
from a mistake, but also may be used for maintaining different 
configurations...

 Thanks!
 
 - Chris.

BR
G.Baroncelli
 
 ¹:  http://blogs.sun.com/erwann/entry/zfs_on_the_desktop_zfs
 http://blogs.sun.com/erwann/entry/time_slider_screencast
 http://blogs.sun.com/erwann/entry/new_time_slider_features_in
 
 ²:  http://library.gnome.org/users/palimpsest/stable/intro.html.en
 -- 
 Chris Ball   c...@laptop.org
 One Laptop Per Child
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 


-- 
gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) kreijackATinwind.it
Key fingerprint = 4769 7E51 5293 D36C 814E  C054 BF04 F161 3DC5 0512


signature.asc
Description: This is a digitally signed message part.


Re: UI issues around RAID1

2009-11-17 Thread Andrey Kuzmin
On Tue, Nov 17, 2009 at 6:25 PM, jim owens jow...@hp.com wrote:
 snip
 So we know the raw free blocks, but can not guarantee
 how many raw blocks per new user write-block will be
 consumed because we do not know what topology will be
 in effect for a new write.

 We could cheat and use worst-case topology numbers
 if all writes are the current default raid.  Of course
 this ignores DUP unless it is set on the whole filesystem.

 And we also have the problem of metadata - which is dynamic
 and allocated in large chunks and has a DUP type, how do we
 account for that in worst-case calculations.

 The worst-case is probably wrong but may be more useful to
 people to know when they will run out of space. Or at least
 it might make some of our ENOSPC complaints go away :)

 Only raw and worst-case can be explained to users and
 which we report is up to Chris.  Today we report raw.

 After spending 10 years on a multi-volume filesystem that
 had (unsolvable) confusing df output, I'm just of the
 opinion that nothing we do will make everyone happy.

df is user-centric, and therefore is naturally expected to return
used/available _logical_ capacity (how this translates to used
physical space is up to file-system-specific tools to find
out/report). Returning raw is counter-intuitive and causes surprise
similar to that of Roland.

With so flexible, down to per-file, topology configuration the only
option I see for df to return logical capacity available is to compute
the  latter off the file-system object for which df is invoked. For
instance, 'df /path/to/some/file' could return logical capacity for
the mountpoint where some-file resides, computed from underlying
physical capacity available _and_ topology for this file. 'df
/mount-point' would under this implementation return  available
logical capacity assuming default topology for the referenced
file-system.

As to used logical space accounting, this is file-system-specific and
I'm not yet familiar enough with btrfs code-base to argument for any
approach.

Regards,
Andrey

 But feel free to run a patch proposal by Chris.

 jim

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Somewhat borked filesystem in need of fixing

2009-11-17 Thread Nuutti Kotivuori
Hello,

I recently told here about a broken filesystem after btrfs-convert 
friends and I was informed that this was a known bug and fixed in later
releases. I should've reformatted the partition and copied in the files
anew, but I didn't, since this isn't a crucial production system for me.

So, after a day or two, I got in a case where I had done a hibernate,
several suspend/resume cycles and exhausted the swap space so OOM killer
was running wild and stuff.

And I managed to get an endless stream of errors to the system log that
looked more or less like this:

  parent transid verify failed on 39620608 wanted 5946 found 5944

I don't know if I'd be able to reproduce the problem, if spending a lot
of time on it, but atleast for now, this was just a one time off case.

So I rebooted the machine, and got the same stream of errors upon
mounting the filesystem. 500 per second. So I tried fsck, and it gave
the same error and promptly segmentation faulted. btrfs-image did the
same I think (though I'm not sure about this one).

Anyhow, I'd love to have a more reliable btrfsck, that is able to fix
all kinds of corruptions, even then ones that are never supposed to
happen. So, personally, I'd like to work on this issue until btrfsck is
able to fix the filesystem in to perfect working order - or atleast in
such a good state, that files can be copied off from it.

However, if you'd rather debug the mount problems, that can be done as
well - though that's more of a problem since I will have to do it in a
virtual machine as to not mess up my server.

Anyhow, long story short, here's the error reported by btrfsck:

---
$ sudo ./btrfsck /dev/mapper/perspire-root
parent transid verify failed on 39620608 wanted 5946 found 5944
Segmentation fault
---

And here's valgrind telling what goes awry:

---
parent transid verify failed on 39620608 wanted 5946 found 5944
==17536== Invalid read of size 4
==17536==at 0x40F9AB: btrfs_print_leaf (ctree.h:1411)
==17536==by 0x40C066: btrfs_lookup_extent_info (extent-tree.c:1450)
==17536==by 0x4023F2: check_extents (btrfsck.c:2509)
==17536==by 0x405004: main (btrfsck.c:2829)
==17536==  Address 0xc4 is not stack'd, malloc'd or (recently) free'd
---

This is with the current head of btrfs-progs-unstable.

Thank you in advance,
-- Naked
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Somewhat borked filesystem in need of fixing

2009-11-17 Thread Yan, Zheng
On Wed, Nov 18, 2009 at 4:48 AM, Nuutti Kotivuori na...@iki.fi wrote:
 Hello,

 I recently told here about a broken filesystem after btrfs-convert 
 friends and I was informed that this was a known bug and fixed in later
 releases. I should've reformatted the partition and copied in the files
 anew, but I didn't, since this isn't a crucial production system for me.

 So, after a day or two, I got in a case where I had done a hibernate,
 several suspend/resume cycles and exhausted the swap space so OOM killer
 was running wild and stuff.

 And I managed to get an endless stream of errors to the system log that
 looked more or less like this:

  parent transid verify failed on 39620608 wanted 5946 found 5944

 I don't know if I'd be able to reproduce the problem, if spending a lot
 of time on it, but atleast for now, this was just a one time off case.

 So I rebooted the machine, and got the same stream of errors upon
 mounting the filesystem. 500 per second. So I tried fsck, and it gave
 the same error and promptly segmentation faulted. btrfs-image did the
 same I think (though I'm not sure about this one).

 Anyhow, I'd love to have a more reliable btrfsck, that is able to fix
 all kinds of corruptions, even then ones that are never supposed to
 happen. So, personally, I'd like to work on this issue until btrfsck is
 able to fix the filesystem in to perfect working order - or atleast in
 such a good state, that files can be copied off from it.

 However, if you'd rather debug the mount problems, that can be done as
 well - though that's more of a problem since I will have to do it in a
 virtual machine as to not mess up my server.

 Anyhow, long story short, here's the error reported by btrfsck:

 ---
 $ sudo ./btrfsck /dev/mapper/perspire-root
 parent transid verify failed on 39620608 wanted 5946 found 5944
 Segmentation fault
 ---

 And here's valgrind telling what goes awry:

 ---
 parent transid verify failed on 39620608 wanted 5946 found 5944
 ==17536== Invalid read of size 4
 ==17536==    at 0x40F9AB: btrfs_print_leaf (ctree.h:1411)
 ==17536==    by 0x40C066: btrfs_lookup_extent_info (extent-tree.c:1450)
 ==17536==    by 0x4023F2: check_extents (btrfsck.c:2509)
 ==17536==    by 0x405004: main (btrfsck.c:2829)
 ==17536==  Address 0xc4 is not stack'd, malloc'd or (recently) free'd
 ---

 This is with the current head of btrfs-progs-unstable.


You can try mounting the FS in read only mode and copying files out.
If you still get that error, try making verify_parent_transid() in disk-io.c
always return 0. These are all we can do now.

Yan, Zheng
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Snapshot/subvolume listing feature

2009-11-17 Thread TARUISI Hiroaki
Patches for Snapshot/subvolume listing feature are
modified.

With these patches, ioctl searches for subvolumes
under only one tree, and btrfsctl repeatedly creates
parameters and calls ioctl until all subvolumes
are listed.

An intrenal logic was changed but user interfaces
(option and contents of a list) remain same as
a patch I posted before.

Regards,
taruisi

TARUISI Hiroaki wrote:
 Thank you for your advice.
 
 I'm aware of redundant search, but I didn't think of
 getdents like interface.
 
 I'll remake it without redundant search.
 
 Regards,
 taruisi
 
 Yan, Zheng wrote:
 2009/11/16 TARUISI Hiroaki taruishi.hir...@jp.fujitsu.com:
 I made Snapshot/subvolume listing feature.

 This feature consists of two patches, for kernel(ioctl),
 and for progs(btrfsctl). I send these two patches as response
 of this mail soon.

 New option '-l' is introduced to btrfsctl for listing.

 If this option is specified, btrfsctl call new ioctl. New ioctl
 searches root tree and enumerates subtrees. For each subtrees,
 ioctl searches directory path to tree root, and enumerates
 more descendant until no more subtree is found.

 MANPAGE-like option description and examples are as follows.

  OPTIONS
-l _file_
List all snapshot/subvolume directories under a tree
which _file_ belongs to.

  EXAMPLES
# btrfsctl -l /work/btrfs
Base path = /work/btrfs/
No.Tree ID  Subvolume Relative Path
 1 256  ss1/
 2 257  ss2/
 3 258  svs1/ss1/
 4 259  svs1/ss2/
 5 260  svs2/ss1/
 6 261  svs2/ss2/
 7 262  ss3/
 8 263  ss4/
 9 264  sv_pool/
10 265  sv_pool/ss01/
11 266  sv_pool/ss02/
12 267  sv_pool/ss03/
13 268  sv_pool/ss04/
14 269  sv_pool/ss05/
15 270  sv_pool/ss06/
16 271  sv_pool/ss07/
17 272  sv_pool/ss08/
18 273  sv_pool/ss09/
19 274  sv_pool/ss10/
  operation complete
  Btrfs v0.19-9-gd67dad2

 Thank you for doing this.

 I have a quick look at the patches. It seems the ioctl returns full path
 to each subvolume and uses sequence ID to indicate the progress
 of listing. Every time the ioctl is called, it tries building full list of
 subvolume, then skip entries that already returned.  I think the API is
 suboptimal, a getdents like API is better. (The ioctl only lists subvolumes
 within a given subvolume, the user program call the ioctl recursively
 to list all subvolumes.)

 Yan, Zheng
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Subvolume listing feature for ioctl.

2009-11-17 Thread TARUISI Hiroaki
New feature to list up subvolume/snapshots under
specified tree of file is introduced to ioctl.

Signed-off-by: TARUISI Hiroaki taruishi.hir...@jp.fujitsu.com
---
 fs/btrfs/ioctl.c |  283 +++
 fs/btrfs/ioctl.h |   29 +
 2 files changed, 312 insertions(+)

Index: b/fs/btrfs/ioctl.c
===
--- a/fs/btrfs/ioctl.c  2009-11-12 23:47:05.0 +0900
+++ b/fs/btrfs/ioctl.c  2009-11-18 13:51:05.0 +0900
@@ -48,6 +48,7 @@
 #include print-tree.h
 #include volumes.h
 #include locking.h
+#include ctree.h

 /* Mask out flags that are inappropriate for the given type of inode. */
 static inline __u32 btrfs_mask_flags(umode_t mode, __u32 flags)
@@ -738,6 +739,286 @@ out:
return ret;
 }

+/*
+  Search INODE_REFs to identify path name of 'dirid' directory
+  in a 'tree_id' tree. and sets path name to 'name'.
+*/
+static noinline int btrfs_search_path_in_tree(struct btrfs_fs_info *info,
+   u64 tree_id, u64 dirid, char *name)
+{
+   struct btrfs_root *root;
+   struct btrfs_key key;
+   char *name_stack, *ptr;
+   int ret = -1;
+   int slot;
+   int len;
+   int total_len = 0;
+   struct btrfs_inode_ref *iref;
+   struct extent_buffer *l;
+   struct btrfs_path *path;
+
+   if (dirid == BTRFS_FIRST_FREE_OBJECTID) {
+   name[0]='\0';
+   ret = 0;
+   goto out_direct;
+   }
+
+   path = btrfs_alloc_path();
+   name_stack = kzalloc(BTRFS_PATH_NAME_MAX+1, GFP_NOFS);
+   ptr = name_stack[BTRFS_PATH_NAME_MAX];
+
+   key.objectid = tree_id;
+   key.type = BTRFS_ROOT_ITEM_KEY;
+   key.offset = (u64)-1;
+   root = btrfs_read_fs_root_no_name(info, key);
+
+   key.objectid = dirid;
+   key.type = BTRFS_INODE_REF_KEY;
+   key.offset = 0;
+
+   while(1) {
+   ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
+   if (ret0)
+   goto out;
+
+   l = path-nodes[0];
+   slot = path-slots[0];
+   btrfs_item_key_to_cpu(l, key, slot);
+
+   if (ret0  (key.objectid != dirid ||
+   key.type != BTRFS_INODE_REF_KEY))
+   goto out;
+
+   iref = btrfs_item_ptr(l, slot, struct btrfs_inode_ref);
+   len = btrfs_inode_ref_name_len(l, iref);
+   ptr -= len + 1;
+   total_len += len + 1;
+   if (ptr  name_stack)
+   goto out;
+
+   *(ptr + len) = '/';
+   read_extent_buffer(l, ptr,(unsigned long)(iref + 1), len);
+
+   if (key.offset == BTRFS_FIRST_FREE_OBJECTID)
+   break;
+
+   btrfs_release_path(root, path);
+   key.objectid = key.offset;
+   key.offset = 0;
+   dirid = key.objectid;
+
+   }
+   if (ptr  name_stack)
+   goto out;
+   strncpy(name, ptr, total_len);
+   name[total_len]='\0';
+   ret = 0;
+out:
+   btrfs_release_path(root, path);
+   kfree(path);
+   kfree(name_stack);
+
+out_direct:
+   return ret;
+}
+
+static inline char *btrfs_path_ptr(struct btrfs_ioctl_subvol_leaf *l,
+   int nr)
+{
+   return ((char *)l+l-items[nr].path_offset);
+}
+
+/*
+  Helper function to search tree root directory which contains
+  specified dentry.
+  This function is used in btrfs_ioctl_snap_listing function,
+  to notify root directory(different from the directory what
+  user specified) to user.
+*/
+static noinline struct dentry *btrfs_walkup_dentry_to_root(struct dentry *d)
+{
+   u64 ino;
+   struct dentry *dent = d;
+
+   ino = dent-d_inode-i_ino;
+   while (ino != BTRFS_FIRST_FREE_OBJECTID) {
+   dent = dent-d_parent;
+   ino = dent-d_inode-i_ino;
+   }
+   return dent;
+}
+
+/*
+  Create a list of Snapshot/Subvolume in specified tree.
+  Target tree is specified by struct file.
+*/
+static noinline int btrfs_ioctl_snap_listing(struct file *file,
+void __user *arg)
+{
+   struct btrfs_ioctl_subvol_leaf *leaf;
+   struct btrfs_ioctl_subvol_args *svol;
+   int rest, offset, idx, name_len, i;
+   struct btrfs_root *tree_root;
+   struct btrfs_root_ref *ref;
+   struct extent_buffer *l;
+   struct btrfs_path *path = NULL;
+   struct btrfs_key key;
+   u64 dirid;
+   char *work_path, *f_path, *name;
+   int err, ret = 0, slot = 0;
+   LIST_HEAD(pending_subvols);
+   struct path vfs_path;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+
+   work_path = kzalloc(BTRFS_PATH_NAME_MAX + 1, GFP_NOFS);
+   if (!work_path) 

[PATCH] Subvolume listing feature for btrfsctl.

2009-11-17 Thread TARUISI Hiroaki
New feature to list up subvolume/snapshots under
specified tree of file is introduced to btrfsctl.

This patch should apply with corresponding patch
for kernel.

Signed-off-by: TARUISI Hiroaki taruishi.hir...@jp.fujitsu.com
---
 btrfsctl.c |  191 +
 ioctl.h|   32 ++
 2 files changed, 223 insertions(+)

Index: b/btrfsctl.c
===
--- a/btrfsctl.c2009-11-16 11:18:57.0 +0900
+++ b/btrfsctl.c2009-11-18 12:09:37.0 +0900
@@ -47,6 +47,7 @@ static void print_usage(void)
 {
printf(usage: btrfsctl [ -d file|dir] [ -s snap_name subvol|tree ]\n);
printf([-r size] [-A device] [-a] [-c] [-D dir .]\n);
+   printf([-l dir]\n);
printf(\t-d filename: defragments one file\n);
printf(\t-d directory: defragments the entire Btree\n);
printf(\t-s snap_name dir: creates a new snapshot of dir\n);
@@ -56,6 +57,7 @@ static void print_usage(void)
printf(\t-a: scans all devices for Btrfs filesystems\n);
printf(\t-c: forces a single FS sync\n);
printf(\t-D: delete snapshot\n);
+   printf(\t-l file: listing snapshot/subvolume under a subvolume\n);
printf(%s\n, BTRFS_BUILD_VERSION);
exit(1);
 }
@@ -88,6 +90,169 @@ static int open_file_or_dir(const char *
}
return fd;
 }
+
+static noinline int btrfs_gather_under_one_subvolume(int fd,
+   unsigned long command,
+   struct btrfs_ioctl_subvol_args *svargs,
+   u64 tree_id,
+   struct list_head *list,
+   int len)
+{
+   u64 last_tree = 0ULL;
+   int i, ret = 1, local_size;
+
+   while (ret  0) {
+
+   svargs-leaf = malloc(len);
+   if (!svargs-leaf)
+   return -1;
+   svargs-len = len;
+   svargs-leaf-len = len;
+   svargs-leaf-nritems = 0;
+   svargs-leaf-last_tree = last_tree;
+   svargs-leaf-parent_tree = tree_id;
+
+again:
+   ret = ioctl(fd, command, svargs);
+   if (ret  0) {
+   free(svargs-leaf);
+   svargs-leaf = NULL;
+   return -1;
+   }
+   if (svargs-leaf-nritems == 0) {
+   free(svargs-leaf);
+   if (ret  0) {
+   local_size = (svargs-next_len + 1) * 2 +
+   offsetof(struct 
btrfs_ioctl_subvol_leaf, items) +
+   sizeof(struct 
btrfs_ioctl_subvol_items)*2;
+   svargs-leaf = malloc(local_size);
+   if (!svargs-leaf)
+   return -1;
+   svargs-len = local_size;
+   svargs-leaf-len = local_size;
+   svargs-leaf-last_tree = last_tree;
+   svargs-leaf-parent_tree = tree_id;
+   goto again;
+   }
+   } else {
+   for (i = 0; i  svargs-leaf-nritems; i++)
+
INIT_LIST_HEAD(svargs-leaf-items[i].children);
+   list_add_tail(svargs-leaf-brother, list);
+   last_tree = svargs-leaf-last_tree;
+   }
+   }
+   return 0;
+}
+
+int btrfs_gather_subvolumes(int fd, unsigned long command,
+   struct btrfs_ioctl_subvol_args *svargs,
+   u64 tree_id, struct list_head *list_top, int len)
+{
+   struct btrfs_ioctl_subvol_leaf *cur;
+   int i;
+
+   if (btrfs_gather_under_one_subvolume(fd, command, svargs, tree_id,
+   list_top, len))
+   return -1;
+   list_for_each_entry(cur, list_top, brother) {
+   for(i = 0; i  cur-nritems; i++) {
+   if (btrfs_gather_subvolumes( fd, command, svargs,
+   cur-items[i].tree_id, cur-items[i].children, 
len))
+   return -1;
+   }
+   }
+   return 0;
+}
+
+int btrfs_free_subvolume_info(struct list_head *list_top)
+{
+   struct btrfs_ioctl_subvol_leaf *cur, *tmp;
+   int i;
+
+   list_for_each_entry_safe(cur, tmp, list_top, brother) {
+   for(i = 0; i  cur-nritems; i++) {
+   if (!list_empty(cur-items[i].children))
+   
btrfs_free_subvolume_info(cur-items[i].children);
+   }
+   list_del(cur-brother);
+   free(cur);
+   }
+   return 0;
+}
+
+int btrfs_show_subvolume(struct list_head *list_top, char *path,
+   int *seq)
+{
+   int nr = *seq, i;
+   int base_path_len, path_len;
+   struct