Re: Moving top level to a subvolume

2012-06-13 Thread C Anthony Risinger
On Fri, Jun 8, 2012 at 2:40 PM, Arne Jansen sensi...@gmx.net wrote:
 On 06/08/2012 09:24 PM, Matthew Hawn wrote:
 I just converted my root filesystem to btrfs with btrfs-convert.  However, 
 since I am running Ubuntu, I would like to have the same subvolume structure 
 as a default install,. How do I move the top-level subvolume (where all my 
 files currently are) to another subvolume?

 Just snapshot the root subvol and continue working in the snapshot.

... yeah but that solution totally sucks when you:

a) have a lot of data
b) need to do this via script
c) ???

... because in a), data will *copied* the slow way, and in b) you
leave a bunch of junk laying around in the old root that will rot
unless you `rm -rf` it ... and idk about you, but issuing what is very
near to that command on someone else's machine -- via script -- makes
me REALLY uneasy ;-)

i have asked this exact question at least 4 times specifically, and
referenced it probably 8-10, in the last 3 years or more.  i needed it
then.  i still need it now.  but since i never got an answer up/down
or around, i gave up and told people to `rm -rf`themselves ...

http://markmail.org/message/7hj5ioqrztkeerqv

... that's from May of 2010, but i don't think it's the first.

so, would it possible to implement this, or could someone kindly (and
briefly!) explain why it cannot be done?

1. people install stuff to the top-level
2. top-level is unmanageable
3.  problem

in my case i wrote an initramfs hook that implemented rollback
functionality, but there was not way for me to cleanly -- and safely
-- rotate the user's setup to one that DOES NOT have user items in
the top-level volume.

-- 

C Anthony
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Moving top level to a subvolume

2012-06-13 Thread Arne Jansen
On 13.06.2012 09:04, C Anthony Risinger wrote:
 On Fri, Jun 8, 2012 at 2:40 PM, Arne Jansen sensi...@gmx.net wrote:
 On 06/08/2012 09:24 PM, Matthew Hawn wrote:
 I just converted my root filesystem to btrfs with btrfs-convert.  However, 
 since I am running Ubuntu, I would like to have the same subvolume 
 structure as a default install,. How do I move the top-level subvolume 
 (where all my files currently are) to another subvolume?

 Just snapshot the root subvol and continue working in the snapshot.
 
 ... yeah but that solution totally sucks when you:
 
 a) have a lot of data
 b) need to do this via script
 c) ???
 
 ... because in a), data will *copied* the slow way, and in b) you
 leave a bunch of junk laying around in the old root that will rot
 unless you `rm -rf` it ... and idk about you, but issuing what is very
 near to that command on someone else's machine -- via script -- makes
 me REALLY uneasy ;-)

well, don't put data in the top level in the first place. Yes, you have
to remove the content of the subvol / by rm -rf, but I don't really see
the problem with it.
What I don't understand is why you think data will be copied.

 
 i have asked this exact question at least 4 times specifically, and
 referenced it probably 8-10, in the last 3 years or more.  i needed it
 then.  i still need it now.  but since i never got an answer up/down
 or around, i gave up and told people to `rm -rf`themselves ...
 
 http://markmail.org/message/7hj5ioqrztkeerqv
 
 ... that's from May of 2010, but i don't think it's the first.
 
 so, would it possible to implement this, or could someone kindly (and
 briefly!) explain why it cannot be done?

The default subvol ('/') has the special number 5 and is expected to
always be around. All other subvols get numbers starting with 256.
Creating a new 5 and internally renumbering the old 5 isn't easy, because
each tree block has an owner recorded in it. Also, all backreferences
have the root number in them. If you have to touch each tree block, you
can as well choose the snapshot/rm -rf approach.

 
 1. people install stuff to the top-level
 2. top-level is unmanageable
 3.  problem
 
 in my case i wrote an initramfs hook that implemented rollback
 functionality, but there was not way for me to cleanly -- and safely
 -- rotate the user's setup to one that DOES NOT have user items in
 the top-level volume.

Can't instead add code to the installer that warns a user if he wants
to install into the default subvol?
Or you could hack mkfs.btrfs to always create an additional subvol.
Even making / readonly except for creating mountpoint could be possible.
Just some random ideas...

-Arne

 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Moving top level to a subvolume

2012-06-13 Thread Duncan
Fajar A. Nugraha posted on Wed, 13 Jun 2012 08:49:47 +0700 as excerpted:

 As for lose their filesystems, are there recent ones that uses one of
 the three distros above, and is purely btrfs fault? The ones I can
 remember (from the post to this list) were broken on earlier kernels, or
 caused by bad disks.

I tried btrfs during the 3.4 cycle for a bit, and didn't lose the whole 
filesystem, but definitely found it not upto my usual standard of 
robustness, my previous and back to now filesystem, Chris's former 
project, reiserfs.

My system's old and has a bit of a problem with overheating in the 
Phoenix summer, so has been suffering SATA resets (not the disk, the sata 
chipset most likely, and/or issues with the graphics overheating since 
I'm using an AMD 8xxx chipset with AGPGART split between IOMMU for 
storage I/O and graphics) and full system freezes.

Not only did I have way more stuff disappearing or being zeroed out than 
on reiserfs (in default data=ordered mode), but in one case I had a 
segment disappear out of the middle of a file, and in another, I had 
firefox's crash-resume-file /content/ show up as what SHOULD have been an 
entirely unrelated configuration file.

Naturally I had backups to restore from, and if it wasn't for the 
freezes, it would have likely been fine, but it's exactly this sort of 
corner-case that filesystems need to be able to deal with, and what 
bothered me wasn't disappearing or zeroed out last few seconds of work 
with well documented explanations, but having random segments of files 
that I hadn't changed (whether the app was rewriting them with the same 
data's another question) in some time disappear, and having one file's 
content show up with an entirely unrelated name.  I thought that's the 
sort of thing btrfs checksums were supposed to detect and effectively 
zero out, but...

I decided that's /too/ experimental for me ATM, especially with not-quite-
stable hardware (it's worth noting that I survived bad memory and the 
related crashes on reiserfs, without /that/ sort of damage, at least not 
since data=ordered mode!), so am back on reiserfs for now, anyway.  I'll 
likely try again next year sometime...

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs and data nocow per inode basis

2012-06-13 Thread Liu Bo
On 06/13/2012 05:10 AM, Ted Ts'o wrote:

 On Tue, Jun 12, 2012 at 04:44:23PM -0400, Chris Mason wrote:
 On Tue, Jun 12, 2012 at 01:15:27PM -0600, Ted Ts'o wrote:
 It appears the NOCOW_FL flag is currently a no-op in the 3.2 kernel?
 It's not a noop, but it is only setting the NODATACOW flag.  It needs to
 set the nodatasum flag as well, just like the mount -o nodatacow mount
 option does.

 I'll fix this up on the kernel side, thanks Ted.
 


ohh, that's my fault...sorry.

 Here's the final patch to e2fsprogs that will be going into 1.42.4:
 


This commit is lack of the related usage update, I'll send a patch for it :)

thanks,
liubo

 commit 5a23c93aeb65d61892a47f8f27bffad38f4759ea
 Author: Theodore Ts'o ty...@mit.edu
 Date:   Tue Jun 12 17:09:39 2012 -0400
 
 lsattr, chattr: add support for btrfs's No_COW flag
 
 Signed-off-by: Theodore Ts'o ty...@mit.edu
 
 diff --git a/lib/e2p/pf.c b/lib/e2p/pf.c
 index f03193c..e2f8ce5 100644
 --- a/lib/e2p/pf.c
 +++ b/lib/e2p/pf.c
 @@ -49,6 +49,7 @@ static struct flags_name flags_array[] = {
   { EXT2_TOPDIR_FL, T, Top_of_Directory_Hierarchies },
   { EXT4_EXTENTS_FL, e, Extents },
   { EXT4_HUGE_FILE_FL, h, Huge_file },
 + { FS_NOCOW_FL, C, No_COW },
   { 0, NULL, NULL }
  };
  
 diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
 index f46a1a9..fb3f7cc 100644
 --- a/lib/ext2fs/ext2_fs.h
 +++ b/lib/ext2fs/ext2_fs.h
 @@ -301,6 +301,7 @@ struct ext2_dx_countlimit {
  #define EXT4_EXTENTS_FL  0x0008 /* Inode uses extents */
  #define EXT4_EA_INODE_FL 0x0020 /* Inode used for large EA */
  /* EXT4_EOFBLOCKS_FL 0x0040 was here */
 +#define FS_NOCOW_FL  0x0080 /* Do not cow file */
  #define EXT4_SNAPFILE_FL 0x0100  /* Inode is a snapshot */
  #define EXT4_SNAPFILE_DELETED_FL 0x0400  /* Snapshot is being 
 deleted */
  #define EXT4_SNAPFILE_SHRUNK_FL  0x0800  /* Snapshot shrink 
 has completed */
 diff --git a/misc/chattr.1.in b/misc/chattr.1.in
 index 92f6d70..5a57d2c 100644
 --- a/misc/chattr.1.in
 +++ b/misc/chattr.1.in
 @@ -64,6 +64,15 @@ this file compresses data before storing them on the disk. 
  Note: please
  make sure to read the bugs and limitations section at the end of this
  document.
  .PP
 +A file with the 'C' attribute set will not be subject to copy-on-write
 +updates.  This flag is only supported on file systems which perform
 +copy-on-write.  (Note: For btrfs, the 'C' flag should be only
 +set on new or empty files.  If it is set on a file which already has
 +data blocks, it is undefined when the blocks assigned to the file will
 +be fully stable.  If the 'C' flag is set on a directory, it will have no
 +effect on the directory, but new files created in that directory will
 +the No_COW attribute.)
 +.PP
  When a directory with the `D' attribute set is modified,
  the changes are written synchronously on the disk; this is equivalent to
  the `dirsync' mount option applied to a subset of the files.
 @@ -159,8 +168,7 @@ maintained by Theodore Ts'o ty...@alum.mit.edu.
  .SH BUGS AND LIMITATIONS
  The `c', 's',  and `u' attributes are not honored 
  by the ext2 and ext3 filesystems as implemented in the current mainline
 -Linux kernels.These attributes may be implemented
 -in future versions of the ext2 and ext3 filesystems.
 +Linux kernels.
  .PP
  The `j' option is only useful if the filesystem is mounted as ext3.
  .PP
 diff --git a/misc/chattr.c b/misc/chattr.c
 index 8a2d61f..141ea6e 100644
 --- a/misc/chattr.c
 +++ b/misc/chattr.c
 @@ -107,6 +107,7 @@ static const struct flags_char flags_array[] = {
   { EXT2_UNRM_FL, 'u' },
   { EXT2_NOTAIL_FL, 't' },
   { EXT2_TOPDIR_FL, 'T' },
 + { FS_NOCOW_FL, 'C' },
   { 0, 0 }
  };
  
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] E2fsprogs: add missing usage for No_COW

2012-06-13 Thread Liu Bo
Add the missing usage for No_COW since we've supported No_COW flag.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 misc/chattr.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/misc/chattr.c b/misc/chattr.c
index 141ea6e..24254cc 100644
--- a/misc/chattr.c
+++ b/misc/chattr.c
@@ -83,7 +83,7 @@ static unsigned long sf;
 static void usage(void)
 {
fprintf(stderr,
-   _(Usage: %s [-RVf] [-+=AacDdeijsSu] [-v version] files...\n),
+   _(Usage: %s [-RVf] [-+=AacDdeijsSuC] [-v version] files...\n),
program_name);
exit(1);
 }
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] E2fsprogs: add missing usage for No_COW

2012-06-13 Thread Roman Mamedov
On Wed, 13 Jun 2012 15:47:13 +0800
Liu Bo liubo2...@cn.fujitsu.com wrote:

 Add the missing usage for No_COW since we've supported No_COW flag.
 
 Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
 ---
  misc/chattr.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/misc/chattr.c b/misc/chattr.c
 index 141ea6e..24254cc 100644
 --- a/misc/chattr.c
 +++ b/misc/chattr.c
 @@ -83,7 +83,7 @@ static unsigned long sf;
  static void usage(void)
  {
   fprintf(stderr,
 - _(Usage: %s [-RVf] [-+=AacDdeijsSu] [-v version] files...\n),
 + _(Usage: %s [-RVf] [-+=AacDdeijsSuC] [-v version] files...\n),
   program_name);
   exit(1);
  }

These were sorted alphabetically so the better way would be to use AaCcDdeijsSu

-- 
With respect,
Roman

~~~
Stallman had a printer,
with code he could not see.
So he began to tinker,
and set the software free.


signature.asc
Description: PGP signature


RE: Bug in btrfs-debug-tree for two or more devices.

2012-06-13 Thread Santosh Hosamani


-Original Message-
From: linux-btrfs-ow...@vger.kernel.org 
[mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Hugo Mills
Sent: Wednesday, June 13, 2012 1:37 AM
To: Santosh Hosamani
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Bug in btrfs-debug-tree for two or more devices.

On Tue, Jun 12, 2012 at 06:53:00AM +, Santosh Hosamani wrote:

 Hi btrfs folks,
 I am working on btrfs filesystem on how it manages the free 
 space. And found out btrfs maintain a ctree which manages the physical 
 location of the chunks and stripes of the filesystem.
 Btrfs-debug-tree also gives the information on the chunk tree

 I created btrfs on single device and two device.I have attached the output of 
 both on running btrfs-debug-tree.
 For single device sum of all the length in the chunks will add upto the total 
 used bytes which is expected behavior.

 But for two devices sum of all lengths in the chunks does not add to the 
 total bytes .Am I missing something .

   Without actually seeing the details of your technique and expectations, I 
shall make a guess that you're not accounting for the double-counting of RAID-1 
metadata. In other words, you will find that all of the metadata device extents 
(or chunks) will appear twice -- once on each device.

   Actually, this isn't quite right either -- what you really need to do is 
look at the RAID-1, RAID-10 and DUP bits in the chunk flags, add up all of 
those chunks, divide by two, and then add in the remaining
(RAID-0 and single) chunks. That total should then add up to the total value of 
allocated space that you get from the output of btrfs fi df.


 chunk tree leaf 20971520 items 8 free space 3023 generation 4 owner 3 fs uuid 
23f86d1e-038a-4f5b-b87c-2ba78018135c
chunk uuid db672366-6801-4f83-99ef-2087a60bb394
item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 3897 itemsize 98
dev item devid 1 total_bytes 3221225472 bytes used 673579008
item 1 key (DEV_ITEMS DEV_ITEM 2) itemoff 3799 itemsize 98
dev item devid 2 total_bytes 3221225472 bytes used 652607488
item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 0) itemoff 3719 itemsize 80
chunk length 4194304 owner 2 type 2 num_stripes 1
stripe 0 devid 1 offset 0
item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 4194304) itemoff 3639 itemsize 
80
chunk length 8388608 owner 2 type 4 num_stripes 1
stripe 0 devid 1 offset 4194304
item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 12582912) itemoff 3559 itemsize 
80
chunk length 8388608 owner 2 type 1 num_stripes 1
stripe 0 devid 1 offset 12582912
item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 20971520) itemoff 3447 itemsize 
112
chunk length 8388608 owner 2 type 18 num_stripes 2
stripe 0 devid 2 offset 1048576
stripe 1 devid 1 offset 20971520
item 6 key (FIRST_CHUNK_TREE CHUNK_ITEM 29360128) itemoff 3335 itemsize 
112
chunk length 322109440 owner 2 type 20 num_stripes 2
stripe 0 devid 2 offset 9437184
stripe 1 devid 1 offset 29360128
item 7 key (FIRST_CHUNK_TREE CHUNK_ITEM 351469568) itemoff 3223 
itemsize 112
chunk length 644218880 owner 2 type 9 num_stripes 2
stripe 0 devid 2 offset 331546624
stripe 1 devid 1 offset 351469568
chunk tree will tell me where the physical stripes are there right 
.?Irrespective of the raid type ... correct me if I am wrong.
If not how will you know which all blocks are occupied and which all block are 
free.

Basically  what I want to do is .
 get the used blocks of all the devices and create a bitmap of that and zero 
out all the free block. Then I should not overwrite the used blocks.
I should be able to mount the filesystem without any error.
How do I achieve that?

 Also I notice that for the second device the superblock location 0x1 is 
 not considered as used .

 I would be really grateful if you folks can answer my query.

 I hav run these tests on SLES11-sp2-x86 Kernel 3.0.13.0.27-default

   This is pretty old, but shouldn't affect the results. It will cause 
reliability problems if you try running it seriously.

   Hugo.

--
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- There's a Martian war machine outside -- they want to talk ---
to you about a cure for the common cold.



http://www.mindtree.com/email/disclaimer.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Bug in btrfs-debug-tree for two or more devices.

2012-06-13 Thread Santosh Hosamani


-Original Message-
From: linux-btrfs-ow...@vger.kernel.org 
[mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Randy Barlow
Sent: Tuesday, June 12, 2012 8:28 PM
To: linux-btrfs@vger.kernel.org
Subject: Re: Bug in btrfs-debug-tree for two or more devices.

On Tuesday, June 12, 2012 06:53:00 AM Santosh Hosamani wrote:
 Kernel 3.0.13.0.27-default

This kernel is very old for btrfs. Can you try with at least Linux 3.4?

I have installed 3.4.2 kernel but still I am facing the same issue.May be my 
understanding of calculating the used block may be wrong.
If someone could help me in understanding .It would be great.

--
R
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in the 
body of a message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html



http://www.mindtree.com/email/disclaimer.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] E2fsprogs: add missing usage for No_COW

2012-06-13 Thread Liu Bo
Add the missing usage for No_COW since we've supported No_COW flag.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
v1-v2: sort options alphabetically, thanks to Roman Mamedov.

 misc/chattr.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/misc/chattr.c b/misc/chattr.c
index 141ea6e..24254cc 100644
--- a/misc/chattr.c
+++ b/misc/chattr.c
@@ -83,7 +83,7 @@ static unsigned long sf;
 static void usage(void)
 {
fprintf(stderr,
-   _(Usage: %s [-RVf] [-+=AacDdeijsSu] [-v version] files...\n),
+   _(Usage: %s [-RVf] [-+=AaCcDdeijsSu] [-v version] files...\n),
program_name);
exit(1);
 }
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Moving top level to a subvolume

2012-06-13 Thread Fajar A. Nugraha
On Wed, Jun 13, 2012 at 2:23 PM, Duncan 1i5t5.dun...@cox.net wrote:
 Fajar A. Nugraha posted on Wed, 13 Jun 2012 08:49:47 +0700 as excerpted:

 As for lose their filesystems, are there recent ones that uses one of
 the three distros above, and is purely btrfs fault? The ones I can
 remember (from the post to this list) were broken on earlier kernels, or
 caused by bad disks.

 My system's old and has a bit of a problem with overheating in the
 Phoenix summer, so has been suffering SATA resets

 it's exactly this sort of
 corner-case that filesystems need to be able to deal with

IIRC XFS had corruption problems when used on top of LVM (or other
block device that doesn't support barriers correctly), while using
ext2/3/4 on the same block device will be fine. Yet XFS doesn't have
the mark of unstable, highly experimental, do not use. People simply
use the right (for them) fs for the right job.

My point is yes, btrfs is new. And it's being developed at much faster
rate than any other more-mature fs out there. And there are known
cases of data loss on certain configuration of corner cases/buggy
hardware and/or old version of kernel. But when used in the correct
environment, btrfs can be a good choice, even for critical data.

Of course IF the data were REALLY critical, and I REALLY need btrfs'
features, and it were on an enterprise environment, I would've bought
support from oracle linux (or SLES 12, when it's out, or whatever
enterprise distro supporting btrfs which sells support contract) so I
can have someone to turn to in case of problems, and (in some cases)
transfer the risk/blame :D

-- 
Fajar
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Moving top level to a subvolume

2012-06-13 Thread C Anthony Risinger
On Wed, Jun 13, 2012 at 2:21 AM, Arne Jansen sensi...@gmx.net wrote:
 On 13.06.2012 09:04, C Anthony Risinger wrote:

 a) have a lot of data
 b) need to do this via script
 c) ???

 ... because in a), data will *copied* the slow way, and in b) you
 leave a bunch of junk laying around in the old root that will rot
 unless you `rm -rf` it ... [...]

 What I don't understand is why you think data will be copied.

at one point i tried to create a new subvol and `mv` files there, and
it took quite some time to complete
(cross-link-device-what-have-you?), but maybe things changed ... will
try it out.

 [...]

 so, would it possible to implement this, or could someone kindly (and
 briefly!) explain why it cannot be done?

 The default subvol ('/') has the special number 5 and is expected to
 always be around. All other subvols get numbers starting with 256.
 Creating a new 5 and internally renumbering the old 5 isn't easy, because
 each tree block has an owner recorded in it. Also, all backreferences
 have the root number in them. If you have to touch each tree block, you
 can as well choose the snapshot/rm -rf approach.

ok this makes sense thanks, the last sentence especially ... top-level
volume is different.  it's identical to other subvols in 99% of ways
save one-gotcha-little-1%.  couldn't we shield ourselves a little
better?

 1. people install stuff to the top-level
 2. top-level is unmanageable
 3.  problem
 [...]

 Can't instead add code to the installer that warns a user if he wants
 to install into the default subvol?


 Just some random ideas...

i would like to see #5 cut off from natural access: accessible by an
_explicit_ manual mount only, cannot be made default, and cannot be
removed. maybe btrfs manages a proxy/facade subvol, say, #10, settable
by `--flag-origin` or `{insert-here}` option -- a symlink to subvol?
if, at absolutely any time or whatever reason, #10 pointer should not
exist, immediately snapshot #5 and update.

#5 - #10 - #256+ ?

... this might allow the root to be replaced.  default is set to #10
proxy volume when FS is initialized.

 [...]
 Or you could hack mkfs.btrfs to always create an additional subvol.
 Even making / readonly except for creating mountpoint could be possible.

^ yeah, this sounds like exactly what i'm thinking, differing
mainly on who does the work... i just want a guaranteed way of
replacing the logical root, at #10.  the physical root at #5 it's
more-or-less indestructible and off limits, and never available except
as a template.

... i am new to postgresql, but their template0/template1 feels
related to solving problems like this.

-- 

C Anthony
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Moving top level to a subvolume

2012-06-13 Thread Fajar A. Nugraha
On Wed, Jun 13, 2012 at 4:44 PM, C Anthony Risinger anth...@xtfx.me wrote:
 On Wed, Jun 13, 2012 at 2:21 AM, Arne Jansen sensi...@gmx.net wrote:
 On 13.06.2012 09:04, C Anthony Risinger wrote:

 ... because in a), data will *copied* the slow way

 What I don't understand is why you think data will be copied.

 at one point i tried to create a new subvol and `mv` files there, and
 it took quite some time to complete
 (cross-link-device-what-have-you?), but maybe things changed ... will
 try it out.

IIRC it hasn't. Not in upstream anyway. Some distros (e.g. opensuse)
carry their own patch which allows cross-subvolume links (cp --reflink
...).

But it shouldn't matter anyway, since you can SNAPSHOT the old subvol
(even root subvol), instead of creating a new subvol. Which means
nothing needs to be copied.

You'd still have to do rm manually though.

-- 
Fajar
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4 v2][RFC] apply rwlock for extent state

2012-06-13 Thread Liu Bo
This patchset is against one of project ideas, RBtree lock contention:
Btrfs uses a number of rbtrees to index in-memory data structures.
Some of these are dominated by reads, and the lock contention from searching
them is showing up in profiles.  We need to look into an RCU and sequence
counter combination to allow lockless reads.

The goal is to use RCU, but we take it as a long term one, and instead we use
rwlock until we find a mature rcu structure for lockless read.

So what we need to do is to make the code RCU friendly, and the idea mainly
comes from Chris Mason:
Quoted:
I think the extent_state code can be much more RCU friendly if we separate
the operations on the tree from operations on the individual state.
In general, we can gain a lot of performance if we are able to reduce
the write locks taken at endio time.  Especially for reads, these are
critical.

The patchset is also available in:
git://github.com/liubogithub/btrfs-work.git rwlock-for-extent-state



I've run through xfstests, and no bugs jump out by then.

I made a simple test to show the difference on my box:
$ cat 6_FIO/fio-4thread-4M-sync-read
[global]
group_reporting
thread
numjobs=4
bs=4M
rw=read
sync=0
ioengine=sync
directory=/mnt/btrfs/

[READ]
filename=foobar
size=4000M

*results:*
  w/o patch w patch
READ bandwidth(aggrb)  849MB/s  971MB/s

MORE TESTS ARE WELCOME!

v1-v2: drop changes on invalidatepage() and rebase to the latest btrfs 
upstream.

Liu Bo (4):
  Btrfs: use radix tree for checksum
  Btrfs: merge adjacent states as much as possible
  Btrfs: use large extent range for read and its endio
  Btrfs: apply rwlock for extent state

 fs/btrfs/extent_io.c |  712 +++---
 fs/btrfs/extent_io.h |5 +-
 fs/btrfs/inode.c |7 +-
 3 files changed, 568 insertions(+), 156 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] Btrfs: merge adjacent states as much as possible

2012-06-13 Thread Liu Bo
In order to reduce write locks, we do merge_state as much as much as possible.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/extent_io.c |   47 +++
 1 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2923ede..081fe13 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -276,29 +276,36 @@ static void merge_state(struct extent_io_tree *tree,
if (state-state  (EXTENT_IOBITS | EXTENT_BOUNDARY))
return;
 
-   other_node = rb_prev(state-rb_node);
-   if (other_node) {
+   while (1) {
+   other_node = rb_prev(state-rb_node);
+   if (!other_node)
+   break;
other = rb_entry(other_node, struct extent_state, rb_node);
-   if (other-end == state-start - 1 
-   other-state == state-state) {
-   merge_cb(tree, state, other);
-   state-start = other-start;
-   other-tree = NULL;
-   rb_erase(other-rb_node, tree-state);
-   free_extent_state(other);
-   }
+   if (other-end != state-start - 1 ||
+   other-state != state-state)
+   break;
+
+   merge_cb(tree, state, other);
+   state-start = other-start;
+   other-tree = NULL;
+   rb_erase(other-rb_node, tree-state);
+   free_extent_state(other);
}
-   other_node = rb_next(state-rb_node);
-   if (other_node) {
+
+   while (1) {
+   other_node = rb_next(state-rb_node);
+   if (!other_node)
+   break;
other = rb_entry(other_node, struct extent_state, rb_node);
-   if (other-start == state-end + 1 
-   other-state == state-state) {
-   merge_cb(tree, state, other);
-   state-end = other-end;
-   other-tree = NULL;
-   rb_erase(other-rb_node, tree-state);
-   free_extent_state(other);
-   }
+   if (other-start != state-end + 1 ||
+   other-state != state-state)
+   break;
+
+   merge_cb(tree, state, other);
+   state-end = other-end;
+   other-tree = NULL;
+   rb_erase(other-rb_node, tree-state);
+   free_extent_state(other);
}
 }
 
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] Btrfs: use radix tree for checksum

2012-06-13 Thread Liu Bo
We used to issue a checksum to an extent state of 4K range for read endio,
but now we want to use larger range for performance optimization, so instead we
create a radix tree for checksum, where an item stands for checksum of 4K data.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/extent_io.c |   84 --
 fs/btrfs/extent_io.h |2 +
 fs/btrfs/inode.c |7 +---
 3 files changed, 23 insertions(+), 70 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2c8f7b2..2923ede 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -117,10 +117,12 @@ void extent_io_tree_init(struct extent_io_tree *tree,
 {
tree-state = RB_ROOT;
INIT_RADIX_TREE(tree-buffer, GFP_ATOMIC);
+   INIT_RADIX_TREE(tree-csum, GFP_ATOMIC);
tree-ops = NULL;
tree-dirty_bytes = 0;
spin_lock_init(tree-lock);
spin_lock_init(tree-buffer_lock);
+   spin_lock_init(tree-csum_lock);
tree-mapping = mapping;
 }
 
@@ -703,15 +705,6 @@ static void cache_state(struct extent_state *state,
}
 }
 
-static void uncache_state(struct extent_state **cached_ptr)
-{
-   if (cached_ptr  (*cached_ptr)) {
-   struct extent_state *state = *cached_ptr;
-   *cached_ptr = NULL;
-   free_extent_state(state);
-   }
-}
-
 /*
  * set some bits on a range in the tree.  This may require allocations or
  * sleeping, so the gfp mask is used to indicate what is allowed.
@@ -1666,56 +1659,32 @@ out:
  */
 int set_state_private(struct extent_io_tree *tree, u64 start, u64 private)
 {
-   struct rb_node *node;
-   struct extent_state *state;
int ret = 0;
 
-   spin_lock(tree-lock);
-   /*
-* this search will find all the extents that end after
-* our range starts.
-*/
-   node = tree_search(tree, start);
-   if (!node) {
-   ret = -ENOENT;
-   goto out;
-   }
-   state = rb_entry(node, struct extent_state, rb_node);
-   if (state-start != start) {
-   ret = -ENOENT;
-   goto out;
-   }
-   state-private = private;
-out:
-   spin_unlock(tree-lock);
+   spin_lock(tree-csum_lock);
+   ret = radix_tree_insert(tree-csum, (unsigned long)start,
+  (void *)((unsigned long)private  1));
+   BUG_ON(ret);
+   spin_unlock(tree-csum_lock);
return ret;
 }
 
 int get_state_private(struct extent_io_tree *tree, u64 start, u64 *private)
 {
-   struct rb_node *node;
-   struct extent_state *state;
-   int ret = 0;
+   void **slot = NULL;
 
-   spin_lock(tree-lock);
-   /*
-* this search will find all the extents that end after
-* our range starts.
-*/
-   node = tree_search(tree, start);
-   if (!node) {
-   ret = -ENOENT;
-   goto out;
-   }
-   state = rb_entry(node, struct extent_state, rb_node);
-   if (state-start != start) {
-   ret = -ENOENT;
-   goto out;
+   spin_lock(tree-csum_lock);
+   slot = radix_tree_lookup_slot(tree-csum, (unsigned long)start);
+   if (!slot) {
+   spin_unlock(tree-csum_lock);
+   return -ENOENT;
}
-   *private = state-private;
-out:
-   spin_unlock(tree-lock);
-   return ret;
+   *private = (u64)(*slot)  1;
+
+   radix_tree_delete(tree-csum, (unsigned long)start);
+   spin_unlock(tree-csum_lock);
+
+   return 0;
 }
 
 /*
@@ -2294,7 +2263,6 @@ static void end_bio_extent_readpage(struct bio *bio, int 
err)
do {
struct page *page = bvec-bv_page;
struct extent_state *cached = NULL;
-   struct extent_state *state;
 
pr_debug(end_bio_extent_readpage: bi_vcnt=%d, idx=%d, err=%d, 
 mirror=%ld\n, bio-bi_vcnt, bio-bi_idx, err,
@@ -2313,21 +2281,10 @@ static void end_bio_extent_readpage(struct bio *bio, 
int err)
if (++bvec = bvec_end)
prefetchw(bvec-bv_page-flags);
 
-   spin_lock(tree-lock);
-   state = find_first_extent_bit_state(tree, start, EXTENT_LOCKED);
-   if (state  state-start == start) {
-   /*
-* take a reference on the state, unlock will drop
-* the ref
-*/
-   cache_state(state, cached);
-   }
-   spin_unlock(tree-lock);
-
mirror = (int)(unsigned long)bio-bi_bdev;
if (uptodate  tree-ops  tree-ops-readpage_end_io_hook) {
ret = tree-ops-readpage_end_io_hook(page, start, end,
- state, mirror);
+ NULL, mirror);

[PATCH 3/4] Btrfs: use large extent range for read and its endio

2012-06-13 Thread Liu Bo
we use larger extent state range for both readpages and read endio, so that
we can lock or unlock less and avoid most of split ops, then we'll reduce write
locks taken at endio time.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/extent_io.c |  201 +-
 1 files changed, 182 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 081fe13..bb66e3c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2258,18 +2258,26 @@ static void end_bio_extent_readpage(struct bio *bio, 
int err)
struct bio_vec *bvec_end = bio-bi_io_vec + bio-bi_vcnt - 1;
struct bio_vec *bvec = bio-bi_io_vec;
struct extent_io_tree *tree;
+   struct extent_state *cached = NULL;
u64 start;
u64 end;
int whole_page;
int mirror;
int ret;
+   u64 up_start, up_end, un_start, un_end;
+   int up_first, un_first;
+   int for_uptodate[bio-bi_vcnt];
+   int i = 0;
+
+   up_start = un_start = (u64)-1;
+   up_end = un_end = 0;
+   up_first = un_first = 1;
 
if (err)
uptodate = 0;
 
do {
struct page *page = bvec-bv_page;
-   struct extent_state *cached = NULL;
 
pr_debug(end_bio_extent_readpage: bi_vcnt=%d, idx=%d, err=%d, 
 mirror=%ld\n, bio-bi_vcnt, bio-bi_idx, err,
@@ -2280,11 +2288,6 @@ static void end_bio_extent_readpage(struct bio *bio, int 
err)
bvec-bv_offset;
end = start + bvec-bv_len - 1;
 
-   if (bvec-bv_offset == 0  bvec-bv_len == PAGE_CACHE_SIZE)
-   whole_page = 1;
-   else
-   whole_page = 0;
-
if (++bvec = bvec_end)
prefetchw(bvec-bv_page-flags);
 
@@ -2337,14 +2340,71 @@ static void end_bio_extent_readpage(struct bio *bio, 
int err)
}
}
 
+   if (uptodate)
+   for_uptodate[i++] = 1;
+   else
+   for_uptodate[i++] = 0;
+
if (uptodate  tree-track_uptodate) {
-   set_extent_uptodate(tree, start, end, cached,
-   GFP_ATOMIC);
+   if (up_first) {
+   up_start = start;
+   up_end = end;
+   up_first = 0;
+   } else {
+   if (up_start == end + 1) {
+   up_start = start;
+   } else if (up_end == start - 1) {
+   up_end = end;
+   } else {
+   set_extent_uptodate(
+   tree, up_start, up_end,
+   cached, GFP_ATOMIC);
+   up_start = start;
+   up_end = end;
+   }
+   }
}
-   unlock_extent_cached(tree, start, end, cached, GFP_ATOMIC);
+
+   if (un_first) {
+   un_start = start;
+   un_end = end;
+   un_first = 0;
+   } else {
+   if (un_start == end + 1) {
+   un_start = start;
+   } else if (un_end == start - 1) {
+   un_end = end;
+   } else {
+   unlock_extent_cached(tree, un_start, un_end,
+cached, GFP_ATOMIC);
+   un_start = start;
+   un_end = end;
+   }
+   }
+   } while (bvec = bvec_end);
+
+   cached = NULL;
+   if (up_start  up_end)
+   set_extent_uptodate(tree, up_start, up_end, cached,
+   GFP_ATOMIC);
+   if (un_start  un_end)
+   unlock_extent_cached(tree, un_start, un_end, cached,
+GFP_ATOMIC);
+
+   i = 0;
+   bvec = bio-bi_io_vec;
+   do {
+   struct page *page = bvec-bv_page;
+
+   tree = BTRFS_I(page-mapping-host)-io_tree;
+
+   if (bvec-bv_offset == 0  bvec-bv_len == PAGE_CACHE_SIZE)
+   whole_page = 1;
+   else
+   whole_page = 0;
 
if (whole_page) {
-   if (uptodate) {
+   if (for_uptodate[i++]) {
SetPageUptodate(page);
} else {

[PATCH 4/4] Btrfs: apply rwlock for extent state

2012-06-13 Thread Liu Bo
We used to protect both extent state tree and an individual state's state
by tree-lock, but this can be an obstacle of lockless read.

So we seperate them here:
o tree-lock protects the tree
o state-lock protects the state.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/extent_io.c |  380 --
 fs/btrfs/extent_io.h |3 +-
 2 files changed, 336 insertions(+), 47 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index bb66e3c..4c6b743 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -27,7 +27,7 @@ static struct kmem_cache *extent_buffer_cache;
 static LIST_HEAD(buffers);
 static LIST_HEAD(states);
 
-#define LEAK_DEBUG 0
+#define LEAK_DEBUG 1
 #if LEAK_DEBUG
 static DEFINE_SPINLOCK(leak_lock);
 #endif
@@ -120,7 +120,7 @@ void extent_io_tree_init(struct extent_io_tree *tree,
INIT_RADIX_TREE(tree-csum, GFP_ATOMIC);
tree-ops = NULL;
tree-dirty_bytes = 0;
-   spin_lock_init(tree-lock);
+   rwlock_init(tree-lock);
spin_lock_init(tree-buffer_lock);
spin_lock_init(tree-csum_lock);
tree-mapping = mapping;
@@ -146,6 +146,7 @@ static struct extent_state *alloc_extent_state(gfp_t mask)
 #endif
atomic_set(state-refs, 1);
init_waitqueue_head(state-wq);
+   spin_lock_init(state-lock);
trace_alloc_extent_state(state, mask, _RET_IP_);
return state;
 }
@@ -281,6 +282,7 @@ static void merge_state(struct extent_io_tree *tree,
if (!other_node)
break;
other = rb_entry(other_node, struct extent_state, rb_node);
+   /* FIXME: need other-lock? */
if (other-end != state-start - 1 ||
other-state != state-state)
break;
@@ -297,6 +299,7 @@ static void merge_state(struct extent_io_tree *tree,
if (!other_node)
break;
other = rb_entry(other_node, struct extent_state, rb_node);
+   /* FIXME: need other-lock? */
if (other-start != state-end + 1 ||
other-state != state-state)
break;
@@ -364,7 +367,10 @@ static int insert_state(struct extent_io_tree *tree,
return -EEXIST;
}
state-tree = tree;
+
+   spin_lock(state-lock);
merge_state(tree, state);
+   spin_unlock(state-lock);
return 0;
 }
 
@@ -410,6 +416,23 @@ static int split_state(struct extent_io_tree *tree, struct 
extent_state *orig,
return 0;
 }
 
+static struct extent_state *
+alloc_extent_state_atomic(struct extent_state *prealloc)
+{
+   if (!prealloc)
+   prealloc = alloc_extent_state(GFP_ATOMIC);
+
+   return prealloc;
+}
+
+enum extent_lock_type {
+   EXTENT_READ= 0,
+   EXTENT_WRITE   = 1,
+   EXTENT_RLOCKED = 2,
+   EXTENT_WLOCKED = 3,
+   EXTENT_LAST= 4,
+};
+
 static struct extent_state *next_state(struct extent_state *state)
 {
struct rb_node *next = rb_next(state-rb_node);
@@ -426,13 +449,17 @@ static struct extent_state *next_state(struct 
extent_state *state)
  * If no bits are set on the state struct after clearing things, the
  * struct is freed and removed from the tree
  */
-static struct extent_state *clear_state_bit(struct extent_io_tree *tree,
-   struct extent_state *state,
-   int *bits, int wake)
+static int __clear_state_bit(struct extent_io_tree *tree,
+struct extent_state *state,
+int *bits, int wake, int check)
 {
-   struct extent_state *next;
int bits_to_clear = *bits  ~EXTENT_CTLBITS;
 
+   if (check) {
+   if ((state-state  ~bits_to_clear) == 0)
+   return 1;
+   }
+
if ((bits_to_clear  EXTENT_DIRTY)  (state-state  EXTENT_DIRTY)) {
u64 range = state-end - state-start + 1;
WARN_ON(range  tree-dirty_bytes);
@@ -442,7 +469,17 @@ static struct extent_state *clear_state_bit(struct 
extent_io_tree *tree,
state-state = ~bits_to_clear;
if (wake)
wake_up(state-wq);
+   return 0;
+}
+
+static struct extent_state *
+try_free_or_merge_state(struct extent_io_tree *tree, struct extent_state 
*state)
+{
+   struct extent_state *next = NULL;
+
+   BUG_ON(!spin_is_locked(state-lock));
if (state-state == 0) {
+   spin_unlock(state-lock);
next = next_state(state);
if (state-tree) {
rb_erase(state-rb_node, tree-state);
@@ -453,18 +490,17 @@ static struct extent_state *clear_state_bit(struct 
extent_io_tree *tree,
}
} else {
merge_state(tree, state);
+   spin_unlock(state-lock);
next = next_state(state);
  

Re: Massive metadata size increase after upgrade from 3.2.18 to 3.4.1

2012-06-13 Thread Anand Jain


 Did you try balance ? (also there is a balance option
 to pick the least utilized metadata chunks).

 in long run when you have the understanding of your
 files and sizes tuning using mount option metadata_ratio
 might help.

 but not sure how the  metadata expanded to 84.38G
 was there any major delete operation on the filesystem?

thanks, Anand
  


On 13/06/12 01:38, Calvin Walton wrote:

On Sat, 2012-06-09 at 01:38 +0600, Roman Mamedov wrote:

Hello,

Before the upgrade (on 3.2.18):

Metadata, DUP: total=9.38GB, used=5.94GB

After the FS has been mounted once with 3.4.1:

Data: total=3.44TB, used=2.67TB
System, DUP: total=8.00MB, used=412.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=84.38GB, used=5.94GB

Where did my 75 GB of free space just went?


Btrfs tries to keep a certain ratio of allocated data space to allocated
metadata space at all times, in order to ensure that there is always
some free metadata space available. In 3.3 (I believe, but haven't
actually checked...) this ratio was increased, since people were still
complaining about btrfs reporting out of space errors too soon.

On a filesystem containing (a relatively small number of) large files,
it probably over-allocates the metadata space, which is what you're
seeing. I'm not sure if the ratio is tunable.

But better to have a bit of unused metadata space than to get 'out of
space' errors once you've filled your disk and you're trying to delete
some files!


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Btrfs: use rcu to protect device-name V2

2012-06-13 Thread Stefan Behrens
On Wed, 13 Jun 2012 00:35:26 +0200, David Sterba wrote:
 On Tue, Jun 12, 2012 at 03:50:41PM -0400, Josef Bacik wrote:
 +++ b/fs/btrfs/check-integrity.c
 @@ -93,6 +93,7 @@
  #include print-tree.h
  #include locking.h
  #include check-integrity.h
 +#include rcu-string.h
  
  #define BTRFSIC_BLOCK_HASHTABLE_SIZE 0x1
  #define BTRFSIC_BLOCK_LINK_HASHTABLE_SIZE 0x1
 @@ -843,13 +844,14 @@ static int btrfsic_process_superblock_dev_mirror(
  superblock_tmp-never_written = 0;
  superblock_tmp-mirror_num = 1 + superblock_mirror_num;
  if (state-print_mask  BTRFSIC_PRINT_MASK_SUPERBLOCK_WRITE)
 -printk(KERN_INFO New initial S-block (bdev %p, %s)
 -@%llu (%s/%llu/%d)\n,
 -   superblock_bdev, device-name,
 -   (unsigned long long)dev_bytenr,
 -   dev_state-name,
 -   (unsigned long long)dev_bytenr,
 -   superblock_mirror_num);
 +printk_in_rcu(KERN_INFO New initial S-block (bdev %p,
 
 can you please add the 'btrfs: ' prefixes?

Please no additional btrfs prefix in the check-integrity printk lines
that are enabled with the print_mask option. If they are enabled, then
for btrfs debugging, and then the context is known. And you get
thousands of these lines...


 
 +  %s) @%llu (%s/%llu/%d)\n,
 + superblock_bdev,
 + rcu_str_deref(device-name),
 + (unsigned long long)dev_bytenr,
 + dev_state-name,
 + (unsigned long long)dev_bytenr,
 + superblock_mirror_num);
  list_add(superblock_tmp-all_blocks_node,
   state-all_blocks_list);
  btrfsic_block_hashtable_add(superblock_tmp,
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Computing size of snapshots approximatly

2012-06-13 Thread Jan-Hendrik Palic

Hi,

we using on a server several lvm volumes with btrfs. We want to use 
nightly build snapshots for some days as an alternative to backups.


Now I want to get the size of the snapshots in detail. Therefore I 
played with


  btrfs subvolume find-new $snapshot $gen-id.

And I know, that this is quite complicated and not implemented. 
Therefore I try to go my own way:


Now assume there are two snapshots of one subvolume, snap1 and snap2. 
Further get the find-new informations of these snapshots with $gen-id=1 
and save them into different files. A diff of these files shows the 
changes between snap1 and snap2, right?


Ok.

There are three operations on a filesystem, I think,

1. copy a file on the filesystem
2. change a file on the filesystem
3. delete a file on the filesystem

Am I right to assume, that operation 1 and 2 are not change much the 
size of a snapshot and the delete operation let increase the size of a 
snapshot in the size of the deleted files?


If it is so, it would be enough for me to get the deletions of files 
between two snapshots and their size. But is there another way to get 
these informations beside btrfs subvolume find-new? Perhaps it makes 
sense to use ioctl for it? What about the send/receive feature, which is 
upcoming?


Are there any hints?

Many thanks in advance.

Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Btrfs: use rcu to protect device-name V2

2012-06-13 Thread Josef Bacik
On Wed, Jun 13, 2012 at 12:35:26AM +0200, David Sterba wrote:
 On Tue, Jun 12, 2012 at 03:50:41PM -0400, Josef Bacik wrote:
  +++ b/fs/btrfs/check-integrity.c
  @@ -93,6 +93,7 @@
   #include print-tree.h
   #include locking.h
   #include check-integrity.h
  +#include rcu-string.h
   
   #define BTRFSIC_BLOCK_HASHTABLE_SIZE 0x1
   #define BTRFSIC_BLOCK_LINK_HASHTABLE_SIZE 0x1
  @@ -843,13 +844,14 @@ static int btrfsic_process_superblock_dev_mirror(
  superblock_tmp-never_written = 0;
  superblock_tmp-mirror_num = 1 + superblock_mirror_num;
  if (state-print_mask  BTRFSIC_PRINT_MASK_SUPERBLOCK_WRITE)
  -   printk(KERN_INFO New initial S-block (bdev %p, %s)
  -   @%llu (%s/%llu/%d)\n,
  -  superblock_bdev, device-name,
  -  (unsigned long long)dev_bytenr,
  -  dev_state-name,
  -  (unsigned long long)dev_bytenr,
  -  superblock_mirror_num);
  +   printk_in_rcu(KERN_INFO New initial S-block (bdev %p,
 
 can you please add the 'btrfs: ' prefixes?
 

No, I'm not changing the output of print statements in this patch, I'll leave
that up to the Strato guys.

  + %s) @%llu (%s/%llu/%d)\n,
  +superblock_bdev,
  +rcu_str_deref(device-name),
  +(unsigned long long)dev_bytenr,
  +dev_state-name,
  +(unsigned long long)dev_bytenr,
  +superblock_mirror_num);
  list_add(superblock_tmp-all_blocks_node,
   state-all_blocks_list);
  btrfsic_block_hashtable_add(superblock_tmp,
  diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
  index e39a3b9..7d658f2 100644
  --- a/fs/btrfs/disk-io.c
  +++ b/fs/btrfs/disk-io.c
  @@ -44,6 +44,7 @@
   #include free-space-cache.h
   #include inode-map.h
   #include check-integrity.h
  +#include rcu-string.h
   
   static struct extent_io_ops btree_extent_io_ops;
   static void end_workqueue_fn(struct btrfs_work *work);
  @@ -2575,8 +2576,9 @@ static void btrfs_end_buffer_write_sync(struct 
  buffer_head *bh, int uptodate)
  struct btrfs_device *device = (struct btrfs_device *)
  bh-b_private;
   
  -   printk_ratelimited(KERN_WARNING lost page write due to 
  -  I/O error on %s\n, device-name);
  +   printk_in_rcu(KERN_WARNING lost page write due to 
 
 here
 
  + I/O error on %s\n,
  + rcu_str_deref(device-name));
  /* note, we dont' set_buffer_write_io_error because we have
   * our own ways of dealing with the IO errors
   */
  diff --git a/fs/btrfs/rcu-string.h b/fs/btrfs/rcu-string.h
  new file mode 100644
  index 000..2fbb56b
  --- /dev/null
  +++ b/fs/btrfs/rcu-string.h
  @@ -0,0 +1,56 @@
  +/*
  + * Copyright (C) 2012 Red Hat.  All rights reserved.
  + *
  + * This program is free software; you can redistribute it and/or
  + * modify it under the terms of the GNU General Public
  + * License v2 as published by the Free Software Foundation.
  + *
  + * This program is distributed in the hope that it will be useful,
  + * but WITHOUT ANY WARRANTY; without even the implied warranty of
  + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  + * General Public License for more details.
  + *
  + * You should have received a copy of the GNU General Public
  + * License along with this program; if not, write to the
  + * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
  + * Boston, MA 021110-1307, USA.
  + */
  +
  +struct rcu_string {
  +   struct rcu_head rcu;
  +   char str[0];
  +};
  +
  +static inline struct rcu_string *rcu_string_strdup(const char *src, gfp_t 
  mask)
  +{
  +   size_t len = strlen(src);
  +   struct rcu_string *ret = kzalloc(sizeof(struct rcu_string) +
  +(len * sizeof(char)), mask);
 
 len + 1 ? or is the devname not null-terminated?

Oh hey strlen doesn't include the null how about that.  I will fix, thanks.

 
  +   if (!ret)
  +   return ret;
  +   strncpy(ret-str, src, len);
  +   return ret;
  +}
  +
  +static inline void rcu_string_free(struct rcu_string *str)
  +{
  +   if (str)
  +   kfree_rcu(str, rcu);
  +}
  +
  +#define printk_in_rcu(fmt, ...) do {   \
  +   rcu_read_lock();\
  +   printk(fmt, ##__VA_ARGS__); \
 
 drop the ##
 see http://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html
 
   #define eprintf(...) fprintf (stderr, __VA_ARGS__)
 

Well everybody else in the kernel does it that way, but if this works I'll
change it.

  +   rcu_read_unlock();  \
  +} while (0)
  

Re: Computing size of snapshots approximatly

2012-06-13 Thread Hugo Mills
On Wed, Jun 13, 2012 at 02:15:33PM +0200, Jan-Hendrik Palic wrote:
 Hi,
 
 we using on a server several lvm volumes with btrfs. We want to use
 nightly build snapshots for some days as an alternative to backups.
 
 Now I want to get the size of the snapshots in detail.

   There are basically two figures you can get for each snapshot.
These values may differ wildly. Which one do you want?

(A) The first, larger, value is the total computed size of the
   files in the subvolume. This is what du returns.

(B) The second, smaller, value is the amount of space that would be
   freed by deleting the subvolume. (Alternatively, this is the amount
   of data in the subvolume which is not shared with some other
   subvolume). It is currently a difficult process to work out this
   value in general, but the qgroups patch set will track this
   information automatically, and expose an API that will allow you to
   retrieve it.

   The qgroups patches aren't complete yet.

 Therefore I
 played with
 
   btrfs subvolume find-new $snapshot $gen-id.

 And I know, that this is quite complicated and not implemented.
 Therefore I try to go my own way:
 
 Now assume there are two snapshots of one subvolume, snap1 and
 snap2. Further get the find-new informations of these snapshots with
 $gen-id=1 and save them into different files. A diff of these files
 shows the changes between snap1 and snap2, right?
 
 Ok.
 
 There are three operations on a filesystem, I think,
 
 1. copy a file on the filesystem
 2. change a file on the filesystem
 3. delete a file on the filesystem
 
 Am I right to assume, that operation 1 and 2 are not change much the
 size of a snapshot and the delete operation let increase the size of
 a snapshot in the size of the deleted files?

   It depends on which measure of the two above you're trying to use,
and whether the subvolume (and file) you're modifying still has
extents shared with some other subvolume.

1. Copying a file (without --reflink) will increase both the (A) and
   the (B) size of the snapshot. Copying a file with --reflink will
   increase (A) and leave (B) much the same.

2. Changing a file will, obviously, cause (A) to change by the
   difference between the old file and the new. If that file shares no
   extents with anything else, then (B) will also change by that
   amount. Otherwise, if it shares extents with anything else (another
   subvolume, or a reflink copy), then (B) will increase by the amount
   of data modified.

3. Deleting a file will reduce (A) by the size of the file. (B) will
   reduce by the size of non-shared extents owned by that file.

   Note that btrfs sub find-new will not allow you to track file
deletions.

 If it is so, it would be enough for me to get the deletions of files
 between two snapshots and their size. But is there another way to
 get these informations beside btrfs subvolume find-new? Perhaps it
 makes sense to use ioctl for it? What about the send/receive
 feature, which is upcoming?
 
 Are there any hints?

   Wait for qgroups to land, because that actually does it the right
way, and will avoid you having to track all kinds of awkward (and
hard-to-find) corner cases.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Summoning his Cosmic Powers, and glowing slightly ---
from his toes... 


signature.asc
Description: Digital signature


Re: [PATCH 1/2] Btrfs: use rcu to protect device-name V2

2012-06-13 Thread Stefan Behrens
On Wed, 13 Jun 2012 09:14:27 -0400, Josef Bacik wrote:
 On Wed, Jun 13, 2012 at 12:35:26AM +0200, David Sterba wrote:
 On Tue, Jun 12, 2012 at 03:50:41PM -0400, Josef Bacik wrote:
 @@ -4694,8 +4716,11 @@ int btrfs_init_dev_stats(struct btrfs_fs_info 
 *fs_info)
 key.offset = device-devid;
 ret = btrfs_search_slot(NULL, dev_root, key, path, 0, 0);
 if (ret) {
 -   printk(KERN_WARNING btrfs: no dev_stats entry found 
 for device %s (devid %llu) (OK on first mount after mkfs)\n,
 -  device-name, (unsigned long long)device-devid);
 +   printk_in_rcu(KERN_WARNING btrfs: no dev_stats entry 
 + found for device %s (devid %llu) (OK on
 +  first mount after mkfs)\n,

 breaking printk strings hurts when grepping for a message

 + rcu_str_deref(device-name),
 + (unsigned long long)device-devid);
 __btrfs_reset_dev_stats(device);
 device-dev_stats_valid = 1;
 btrfs_release_path(path);
 @@ -4747,8 +4772,9 @@ static int update_dev_stat_item(struct 
 btrfs_trans_handle *trans,
 BUG_ON(!path);
 ret = btrfs_search_slot(trans, dev_root, key, path, -1, 1);
 if (ret  0) {
 -   printk(KERN_WARNING btrfs: error %d while searching for 
 dev_stats item for device %s!\n,
 -  ret, device-name);
 +   printk_in_rcu(KERN_WARNING btrfs: error %d while searching 
 + for dev_stats item for device %s!\n, ret,

 and here as well

 + rcu_str_deref(device-name));
 goto out;
 }
  
 @@ -4757,8 +4783,9 @@ static int update_dev_stat_item(struct 
 btrfs_trans_handle *trans,
 /* need to delete old one and insert a new one */
 ret = btrfs_del_item(trans, dev_root, path);
 if (ret != 0) {
 -   printk(KERN_WARNING btrfs: delete too small dev_stats 
 item for device %s failed %d!\n,
 -  device-name, ret);
 +   printk_in_rcu(KERN_WARNING btrfs: delete too small 
 + dev_stats item for device %s failed 
 %d!\n,

 here

 + rcu_str_deref(device-name), ret);
 goto out;
 }
 ret = 1;
 @@ -4770,8 +4797,9 @@ static int update_dev_stat_item(struct 
 btrfs_trans_handle *trans,
 ret = btrfs_insert_empty_item(trans, dev_root, path,
   key, sizeof(*ptr));
 if (ret  0) {
 -   printk(KERN_WARNING btrfs: insert dev_stats item for 
 device %s failed %d!\n,
 -  device-name, ret);
 +   printk_in_rcu(KERN_WARNING btrfs: insert dev_stats 
 + item for device %s failed %d!\n,

 here

 + rcu_str_deref(device-name), ret);
 goto out;
 }
 }

 mostly minor things, but please fix them.

 
 I'm breaking them for the 80 char limit, it happens for all long messages, 
 we're
 all used to it.  I'll fix up the other things.  Thanks,
 
 Josef

The last sentence of chapter 2 of Documentation/CodingStyle is quite
unambiguous. Here is the full quote of that chapter:

Chapter 2: Breaking long lines and strings

Coding style is all about readability and maintainability using commonly
available tools.

The limit on the length of lines is 80 columns and this is a strongly
preferred limit.

Statements longer than 80 columns will be broken into sensible chunks,
unless
exceeding 80 columns significantly increases readability and does not hide
information. Descendants are always substantially shorter than the
parent and
are placed substantially to the right. The same applies to function headers
with a long argument list. However, never break user-visible strings such as
printk messages, because that breaks the ability to grep for them.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Btrfs: use rcu to protect device-name V2

2012-06-13 Thread Josef Bacik
On Wed, Jun 13, 2012 at 03:49:07PM +0200, Stefan Behrens wrote:
 On Wed, 13 Jun 2012 09:14:27 -0400, Josef Bacik wrote:
  On Wed, Jun 13, 2012 at 12:35:26AM +0200, David Sterba wrote:
  On Tue, Jun 12, 2012 at 03:50:41PM -0400, Josef Bacik wrote:
  @@ -4694,8 +4716,11 @@ int btrfs_init_dev_stats(struct btrfs_fs_info 
  *fs_info)
key.offset = device-devid;
ret = btrfs_search_slot(NULL, dev_root, key, path, 0, 0);
if (ret) {
  - printk(KERN_WARNING btrfs: no dev_stats entry found 
  for device %s (devid %llu) (OK on first mount after mkfs)\n,
  -device-name, (unsigned long long)device-devid);
  + printk_in_rcu(KERN_WARNING btrfs: no dev_stats entry 
  +   found for device %s (devid %llu) (OK on
  +first mount after mkfs)\n,
 
  breaking printk strings hurts when grepping for a message
 
  +   rcu_str_deref(device-name),
  +   (unsigned long long)device-devid);
__btrfs_reset_dev_stats(device);
device-dev_stats_valid = 1;
btrfs_release_path(path);
  @@ -4747,8 +4772,9 @@ static int update_dev_stat_item(struct 
  btrfs_trans_handle *trans,
BUG_ON(!path);
ret = btrfs_search_slot(trans, dev_root, key, path, -1, 1);
if (ret  0) {
  - printk(KERN_WARNING btrfs: error %d while searching for 
  dev_stats item for device %s!\n,
  -ret, device-name);
  + printk_in_rcu(KERN_WARNING btrfs: error %d while searching 
  +   for dev_stats item for device %s!\n, ret,
 
  and here as well
 
  +   rcu_str_deref(device-name));
goto out;
}
   
  @@ -4757,8 +4783,9 @@ static int update_dev_stat_item(struct 
  btrfs_trans_handle *trans,
/* need to delete old one and insert a new one */
ret = btrfs_del_item(trans, dev_root, path);
if (ret != 0) {
  - printk(KERN_WARNING btrfs: delete too small dev_stats 
  item for device %s failed %d!\n,
  -device-name, ret);
  + printk_in_rcu(KERN_WARNING btrfs: delete too small 
  +   dev_stats item for device %s failed 
  %d!\n,
 
  here
 
  +   rcu_str_deref(device-name), ret);
goto out;
}
ret = 1;
  @@ -4770,8 +4797,9 @@ static int update_dev_stat_item(struct 
  btrfs_trans_handle *trans,
ret = btrfs_insert_empty_item(trans, dev_root, path,
  key, sizeof(*ptr));
if (ret  0) {
  - printk(KERN_WARNING btrfs: insert dev_stats item for 
  device %s failed %d!\n,
  -device-name, ret);
  + printk_in_rcu(KERN_WARNING btrfs: insert dev_stats 
  +   item for device %s failed %d!\n,
 
  here
 
  +   rcu_str_deref(device-name), ret);
goto out;
}
}
 
  mostly minor things, but please fix them.
 
  
  I'm breaking them for the 80 char limit, it happens for all long messages, 
  we're
  all used to it.  I'll fix up the other things.  Thanks,
  
  Josef
 
 The last sentence of chapter 2 of Documentation/CodingStyle is quite
 unambiguous. Here is the full quote of that chapter:
 
 Chapter 2: Breaking long lines and strings
 
 Coding style is all about readability and maintainability using commonly
 available tools.
 
 The limit on the length of lines is 80 columns and this is a strongly
 preferred limit.
 
 Statements longer than 80 columns will be broken into sensible chunks,
 unless
 exceeding 80 columns significantly increases readability and does not hide
 information. Descendants are always substantially shorter than the
 parent and
 are placed substantially to the right. The same applies to function headers
 with a long argument list. However, never break user-visible strings such as
 printk messages, because that breaks the ability to grep for them.

Ah never seen that part of it, I will leave them alone then.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Computing size of snapshots approximatly

2012-06-13 Thread Jan-Hendrik Palic

Hi Hugo, hi all,

On 13.06.2012 15:27, Hugo Mills wrote:

On Wed, Jun 13, 2012 at 02:15:33PM +0200, Jan-Hendrik Palic wrote:

Hi,

we using on a server several lvm volumes with btrfs. We want to use
nightly build snapshots for some days as an alternative to backups.

Now I want to get the size of the snapshots in detail.


There are basically two figures you can get for each snapshot.
These values may differ wildly. Which one do you want?

(A) The first, larger, value is the total computed size of the
files in the subvolume. This is what du returns.

(B) The second, smaller, value is the amount of space that would be
freed by deleting the subvolume. (Alternatively, this is the amount
of data in the subvolume which is not shared with some other
subvolume). It is currently a difficult process to work out this
value in general, but the qgroups patch set will track this
information automatically, and expose an API that will allow you to
retrieve it.

The qgroups patches aren't complete yet.


Sorry, that I forgot to mention that. I want the size which I will get, 
if I delete a snapshot. The next assumption I forgot, sorry, was, that 
the snapshot are not changing. The user only get readonly access to the 
snapshots.


[...]

There are three operations on a filesystem, I think,

1. copy a file on the filesystem
2. change a file on the filesystem
3. delete a file on the filesystem

Am I right to assume, that operation 1 and 2 are not change much the
size of a snapshot and the delete operation let increase the size of
a snapshot in the size of the deleted files?


It depends on which measure of the two above you're trying to use,
and whether the subvolume (and file) you're modifying still has
extents shared with some other subvolume.


Sure, and honestly, this is the point, where the complexity is exploding 
for me. ,-)



1. Copying a file (without --reflink) will increase both the (A) and
the (B) size of the snapshot. Copying a file with --reflink will
increase (A) and leave (B) much the same.


Yep.


2. Changing a file will, obviously, cause (A) to change by the
difference between the old file and the new. If that file shares no
extents with anything else, then (B) will also change by that
amount. Otherwise, if it shares extents with anything else (another
subvolume, or a reflink copy), then (B) will increase by the amount
of data modified.


Yep.


3. Deleting a file will reduce (A) by the size of the file. (B) will
reduce by the size of non-shared extents owned by that file.


Yep.

I think, I got the right thought. Thanks for the explanation.


Note that btrfs sub find-new will not allow you to track file
deletions.


Yep, I got this to. But you can get them not directly by a diff.

You have a subvolume with a file_A on it.
Taking a snapshot snap_A of this subvolume let show the existence of 
that file in the btrfs sub find-new output.


Now delete the fila_A on this subvolume and take a new snapshot, call it 
snap_B.
The btrfs sub find-new output doesn't show it anymore, right. So, a diff 
of the both outputs, from snap_A to snap_B gives you the deleted file.


It is a cruel way, but I think, that it is working.


If it is so, it would be enough for me to get the deletions of files
between two snapshots and their size. But is there another way to
get these informations beside btrfs subvolume find-new? Perhaps it
makes sense to use ioctl for it? What about the send/receive
feature, which is upcoming?

Are there any hints?

Wait for qgroups to land, because that actually does it the right
way, and will avoid you having to track all kinds of awkward (and
hard-to-find) corner cases.


Thanks for the hint, I will have a look for that.

Best regards,
Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: use rcu to protect device-name V3

2012-06-13 Thread Josef Bacik
Al pointed out that we can just toss out the old name on a device and add a
new one arbitrarily, so anybody who uses device-name in printk could
possibly use free'd memory.  Instead of adding locking around all of this he
suggested doing it with RCU, so I've introduced a struct rcu_string that
does just that and have gone through and protected all accesses to
device-name that aren't under the uuid_mutex with rcu_read_lock().  This
protects us and I will use it for dealing with removing the device that we
used to mount the file system in a later patch.  Thanks,

Signed-off-by: Josef Bacik jo...@redhat.com
---
V2-V3:
-fixed rcu_string_strdup to get the null character
-fixed __VA_ARGS__ usage
-undid 80 char line wrapping
-moved some rcu_strings into blocks
 fs/btrfs/check-integrity.c |   16 ---
 fs/btrfs/disk-io.c |   10 +++--
 fs/btrfs/extent_io.c   |7 ++-
 fs/btrfs/ioctl.c   |   13 +-
 fs/btrfs/rcu-string.h  |   56 ++
 fs/btrfs/scrub.c   |   30 +--
 fs/btrfs/volumes.c |   92 +++
 fs/btrfs/volumes.h |2 +-
 8 files changed, 162 insertions(+), 64 deletions(-)
 create mode 100644 fs/btrfs/rcu-string.h

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index 9cebb1f..da6e936 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -93,6 +93,7 @@
 #include print-tree.h
 #include locking.h
 #include check-integrity.h
+#include rcu-string.h
 
 #define BTRFSIC_BLOCK_HASHTABLE_SIZE 0x1
 #define BTRFSIC_BLOCK_LINK_HASHTABLE_SIZE 0x1
@@ -843,13 +844,14 @@ static int btrfsic_process_superblock_dev_mirror(
superblock_tmp-never_written = 0;
superblock_tmp-mirror_num = 1 + superblock_mirror_num;
if (state-print_mask  BTRFSIC_PRINT_MASK_SUPERBLOCK_WRITE)
-   printk(KERN_INFO New initial S-block (bdev %p, %s)
-   @%llu (%s/%llu/%d)\n,
-  superblock_bdev, device-name,
-  (unsigned long long)dev_bytenr,
-  dev_state-name,
-  (unsigned long long)dev_bytenr,
-  superblock_mirror_num);
+   printk_in_rcu(KERN_INFO New initial S-block (bdev %p, 
%s)
+ @%llu (%s/%llu/%d)\n,
+superblock_bdev,
+rcu_str_deref(device-name),
+(unsigned long long)dev_bytenr,
+dev_state-name,
+(unsigned long long)dev_bytenr,
+superblock_mirror_num);
list_add(superblock_tmp-all_blocks_node,
 state-all_blocks_list);
btrfsic_block_hashtable_add(superblock_tmp,
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index e39a3b9..43bd7b9 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -44,6 +44,7 @@
 #include free-space-cache.h
 #include inode-map.h
 #include check-integrity.h
+#include rcu-string.h
 
 static struct extent_io_ops btree_extent_io_ops;
 static void end_workqueue_fn(struct btrfs_work *work);
@@ -2575,8 +2576,9 @@ static void btrfs_end_buffer_write_sync(struct 
buffer_head *bh, int uptodate)
struct btrfs_device *device = (struct btrfs_device *)
bh-b_private;
 
-   printk_ratelimited(KERN_WARNING lost page write due to 
-  I/O error on %s\n, device-name);
+   printk_ratelimited_in_rcu(KERN_WARNING lost page write due to 
+ I/O error on %s\n,
+ rcu_str_deref(device-name));
/* note, we dont' set_buffer_write_io_error because we have
 * our own ways of dealing with the IO errors
 */
@@ -2749,8 +2751,8 @@ static int write_dev_flush(struct btrfs_device *device, 
int wait)
wait_for_completion(device-flush_wait);
 
if (bio_flagged(bio, BIO_EOPNOTSUPP)) {
-   printk(btrfs: disabling barriers on dev %s\n,
-  device-name);
+   printk_in_rcu(btrfs: disabling barriers on dev %s\n,
+ rcu_str_deref(device-name));
device-nobarriers = 1;
}
if (!bio_flagged(bio, BIO_UPTODATE)) {
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2c8f7b2..aaa12c1 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -20,6 +20,7 @@
 #include volumes.h
 #include check-integrity.h
 #include locking.h
+#include rcu-string.h
 
 static struct kmem_cache *extent_state_cache;
 static struct kmem_cache 

Re: ceph-on-btrfs inline-cow regression fix for 3.4.3

2012-06-13 Thread Chris Mason
On Tue, Jun 12, 2012 at 09:46:26PM -0600, Alexandre Oliva wrote:
 Hi, Greg,
 
 There's a btrfs regression in 3.4 that's causing a lot of grief to
 ceph-on-btrfs users like myself.  This small and nice patch cures it.
 It's in Linus' master already.  I've been running it on top of 3.4.2,
 and it would be very convenient for me if this could be in 3.4.3.

Ack, this can definitely to go 3.4-stable.  Thanks Alexandre.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] Btrfs: use radix tree for checksum

2012-06-13 Thread Zach Brown



  int set_state_private(struct extent_io_tree *tree, u64 start, u64 private)
  {

[...]

+   ret = radix_tree_insert(tree-csum, (unsigned long)start,
+  (void *)((unsigned long)private  1));


Will this fail for 64bit files on 32bit hosts?


+   BUG_ON(ret);


I wonder if we can patch BUG_ON() to break the build if its only
argument is ret.

- z
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Moving top level to a subvolume

2012-06-13 Thread Goffredo Baroncelli
On 06/13/2012 09:21 AM, Arne Jansen wrote:
 On 13.06.2012 09:04, C Anthony Risinger wrote:
 On Fri, Jun 8, 2012 at 2:40 PM, Arne Jansen sensi...@gmx.net wrote:
 On 06/08/2012 09:24 PM, Matthew Hawn wrote:
 I just converted my root filesystem to btrfs with btrfs-convert.  However, 
 since I am running Ubuntu, I would like to have the same subvolume 
 structure as a default install,. How do I move the top-level subvolume 
 (where all my files currently are) to another subvolume?

 Just snapshot the root subvol and continue working in the snapshot.

 ... yeah but that solution totally sucks when you:

 a) have a lot of data
 b) need to do this via script
 c) ???

 ... because in a), data will *copied* the slow way, and in b) you
 leave a bunch of junk laying around in the old root that will rot
 unless you `rm -rf` it ... and idk about you, but issuing what is very
 near to that command on someone else's machine -- via script -- makes
 me REALLY uneasy ;-)
 
 well, don't put data in the top level in the first place. Yes, you have
 to remove the content of the subvol / by rm -rf, but I don't really see
 the problem with it.

It is slow. You have to change a lot of metadata (each shared metadata
block have to be unshared, and then one copy will be deleted ).

 What I don't understand is why you think data will be copied.
 

 i have asked this exact question at least 4 times specifically, and
 referenced it probably 8-10, in the last 3 years or more.  i needed it
 then.  i still need it now.  but since i never got an answer up/down
 or around, i gave up and told people to `rm -rf`themselves ...

 http://markmail.org/message/7hj5ioqrztkeerqv

 ... that's from May of 2010, but i don't think it's the first.

 so, would it possible to implement this, or could someone kindly (and
 briefly!) explain why it cannot be done?
 
 The default subvol ('/') has the special number 5 and is expected to
 always be around. All other subvols get numbers starting with 256.
 Creating a new 5 and internally renumbering the old 5 isn't easy, because
 each tree block has an owner recorded in it. Also, all backreferences
 have the root number in them. If you have to touch each tree block, you
 can as well choose the snapshot/rm -rf approach.

I don't know very well the internal of btrfs. Do you think that It is
possible to move/swap the root subvolume ?

 

[...]

 Or you could hack mkfs.btrfs to always create an additional subvol.

Which can be the default one: so nobody should complain. I



 Even making / readonly except for creating mountpoint could be possible.
 Just some random ideas...
 
 -Arne
 

 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 .
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Leaving Oracle

2012-06-13 Thread Chris Mason
On Sun, Jun 10, 2012 at 12:01:28PM -0600, David Pottage wrote:
 On 07/06/12 02:04, Chris Mason wrote:
  Hello everyone,
 
  Oracle has been a fantastic place to work, and I really appreciate their
  support for my projects.  But, I've decided to take a new position at
  Fusion-io.  I will start the new job on Monday, June 11.
 Congratulations.
  Fusion-io really believes in open source, and I'm excited to help
  them shape the future of high performance storage.
 
 Are you sure about that?
 
 I installed one of their IO Drive SSD cards in one of my employer's 
 servers, and while the driver source code was supplied, the licence was 
 definitely not open source. (See http://www.fusionio.com/legal/eula/)
  4.1 General Restrictions.  [...] you will not, and will not 
  permit or authorize third parties to: (a) reproduce, modify, 
  translate, enhance, decompile, disassemble, reverse engineer, or 
  create derivative works of the Software; 
 

Hi everyone,

Circling back around to this, now that I'm up and running again.

Most of your storage is hidden behind some kind of closed source
firmware.  With Fusion-io, you get a closed driver, and that has its own
long standing debates that won't get resolved here.

Fusion-io has a strong track record of contributing to Linux, and I'm
sure we'll keep hiring more developers that are well known in the
community.

Of course, Btrfs is a GPL project, and all the future work in Btrfs is
going to stay GPL.

The great thing about Fusion-io is they are very actively trying to
engage higher parts of the storage stack to take advantage of the
hardware.  Since these features need to be in upstream filesystems,
we'll have to hammer out nice generic apis to take advantage of
them.

(This is my favorite kind of we that really means Jens Axboe)

Anyone who wants to support a backend for the apis is welcome to do so,
and I'm sure they will change over time as we all figure out what works
best.

Long story short, yes, I am sure that Fusion-io cares about open source.
Oracle too, since a few people misread that line as a dig at Oracle.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM on top of BTRFS

2012-06-13 Thread Ernst Sjöstrand
Hi,

you can't beat the benchmarks that Serge Hallyn did, really thorough!

http://s3hh.wordpress.com/2012/05/02/first-round-of-kvm-performance-tests/

Regards
//Ernst

2012/6/12 steamraven steamra...@yahoo.com:

 Seems a little unfair on btrfs to just to look at absolutes in this context.
 Prior reports said that the fs ground to a halt,
 it isn't doing that by any stretch.


 I agree.  What I am mostly looking for is the best setup
 for using KVM snapshots:

 KVM qcow2 on top of something like ext4  or
 raw on top of btrfs



 I haven't let any of these installs complete and used it as intended.
 So that's what I intend to do next; after all one doesn't install every day.


 I am going to try to benchmark a couple variations and flags
 qcow2 on ext4  (noatime)
 raw on btrfs (defaults)
 raw on btrfs (noatime,space_cache)
 raw on btrfs (noatime,nospace_cache)
 raw on btrfs (noatime,nodatacow)

 Any other options that might be good to try?


 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM on top of BTRFS

2012-06-13 Thread steamraven
Ernst Sjöstrand ernstp at gmail.com writes:

 
 Hi,
 
 you can't beat the benchmarks that Serge Hallyn did, really thorough!
 
 http://s3hh.wordpress.com/2012/05/02/first-round-of-kvm-performance-tests/

They do seem very thorough. Unfortunately, they are kvm on top of ext4 and
 he was mainly checking caching parameters and storage formats.  I am 
looking at BTRFS options and comparing them against qcow2 on ext4.

Matt  

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] Btrfs: use radix tree for checksum

2012-06-13 Thread Liu Bo
On 06/14/2012 12:07 AM, Zach Brown wrote:

 
   int set_state_private(struct extent_io_tree *tree, u64 start, u64
 private)
   {
 [...]
 +ret = radix_tree_insert(tree-csum, (unsigned long)start,
 +   (void *)((unsigned long)private  1));
 
 Will this fail for 64bit files on 32bit hosts?


In theory it will fail, but crc32c return u32, so private will be originally 
u32,
and it'd be ok on 32bit hosts.

 
 +BUG_ON(ret);
 
 I wonder if we can patch BUG_ON() to break the build if its only
 argument is ret.
 


why?

thanks,
liubo

 - z
 -- 
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] E2fsprogs: add missing usage for No_COW

2012-06-13 Thread Ted Ts'o
On Wed, Jun 13, 2012 at 03:47:13PM +0800, Liu Bo wrote:
 Add the missing usage for No_COW since we've supported No_COW flag.
 
 Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com

Applied, thanks.

- Ted
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Btrfs: implement -show_devname V2

2012-06-13 Thread Miao Xie
On  tue, 12 Jun 2012 15:50:42 -0400, Josef Bacik wrote:
 Because btrfs can remove the device that was mounted we need to have a
 -show_devname so that in this case we can print out some other device in
 the file system to /proc/mount.  So if there are multiple devices in a btrfs
 file system we will just print the device with the lowest devid that we can
 find.  This will make everything consistent and deal with device removal
 properly.  The drawback is if you mount with a device that is higher than
 the lowest devicd it won't show up as the mounted device in /proc/mounts,
 but this is a small price to pay. This was inspired by Miao Xie's patch.
 Thanks,
 
 Signed-off-by: Josef Bacik jo...@redhat.com

Reviewed-by: Miao Xie mi...@cn.fujitsu.com

 ---
 V1-V2: Dropped the mounted tracking stuff since it doesn't work right if you
 mount the same thing twice
  fs/btrfs/super.c |   33 +
  1 files changed, 33 insertions(+), 0 deletions(-)
 
 diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
 index 85cef50..0874dba 100644
 --- a/fs/btrfs/super.c
 +++ b/fs/btrfs/super.c
 @@ -54,6 +54,7 @@
  #include version.h
  #include export.h
  #include compression.h
 +#include rcu-string.h
  
  #define CREATE_TRACE_POINTS
  #include trace/events/btrfs.h
 @@ -1472,12 +1473,44 @@ static int btrfs_unfreeze(struct super_block *sb)
   return 0;
  }
  
 +static int btrfs_show_devname(struct seq_file *m, struct dentry *root)
 +{
 + struct btrfs_fs_info *fs_info = btrfs_sb(root-d_sb);
 + struct btrfs_fs_devices *cur_devices;
 + struct btrfs_device *dev, *first_dev = NULL;
 + struct list_head *head;
 + struct rcu_string *name;
 +
 + mutex_lock(fs_info-fs_devices-device_list_mutex);
 + cur_devices = fs_info-fs_devices;
 + while (cur_devices) {
 + head = cur_devices-devices;
 + list_for_each_entry(dev, head, dev_list) {
 + if (!first_dev || dev-devid  first_dev-devid)
 + first_dev = dev;
 + }
 + cur_devices = cur_devices-seed;
 + }
 +
 + if (first_dev) {
 + rcu_read_lock();
 + name = rcu_dereference(first_dev-name);
 + seq_escape(m, name-str,  \t\n\\);
 + rcu_read_unlock();
 + } else {
 + WARN_ON(1);
 + }
 + mutex_unlock(fs_info-fs_devices-device_list_mutex);
 + return 0;
 +}
 +
  static const struct super_operations btrfs_super_ops = {
   .drop_inode = btrfs_drop_inode,
   .evict_inode= btrfs_evict_inode,
   .put_super  = btrfs_put_super,
   .sync_fs= btrfs_sync_fs,
   .show_options   = btrfs_show_options,
 + .show_devname   = btrfs_show_devname,
   .write_inode= btrfs_write_inode,
   .alloc_inode= btrfs_alloc_inode,
   .destroy_inode  = btrfs_destroy_inode,

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs: filenames collide with snapshot/subvolume names

2012-06-13 Thread Ben Hutchings
Γιώργος (Giorgos?) reports:
 Namely, being inside a snapshot directory, I can't create a file/directory
 with the name of the snapshot directory.
 
 For example, inside /mnt/aSnap, I can't create a file named 'aSnap', so I'm
 filling this bug report.

It seems that the snapshot directory is partially created before the
snapshot is taken, so that the snapshot directory half-exists (can be
looked up, but doesn't appear in listings) inside the snapshot itself.

This doesn't seem to be the recommended way to organise subvolumes, but
it seems like it should at least result in a coherent filesystem within
each subvolume.

Ben.

 Below follows full reproduction of this behavior:
 
 aris tmp # dd if=/dev/zero of=FILE bs=4k seek=`echo 5*1024*1024 | bc` count=1
 1+0 records in
 1+0 records out
 4096 bytes (4.1 kB) copied, 1.8695e-05 s, 219 MB/s
 aris tmp # losetup /dev/loop0 FILE 
 aris tmp # losetup -a
 /dev/loop0: [fe01]:263872 (/tmp/FILE)
 aris tmp # mkfs.btrfs -Ltest /dev/loop0
 
 WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL
 WARNING! - see http://btrfs.wiki.kernel.org before using
 
 fs created label test on /dev/loop0
 nodesize 4096 leafsize 4096 sectorsize 4096 size 20.00GB
 Btrfs Btrfs v0.19
 aris tmp # mount /dev/loop0 /mnt/
 aris tmp # cd /mnt
 aris mnt # ls -la
 total 8
 dr-xr-xr-x  1 root root0 Mar  8 12:07 .
 drwxr-xr-x 24 root root 4096 Mar  8 11:41 ..
 aris mnt # mkdir dir1
 aris mnt # mkdir dir2
 aris mnt # mkdir dir3
 aris mnt # l
 total 0
 drwxr-xr-x 1 root root 0 Mar  8 12:08 dir1
 drwxr-xr-x 1 root root 0 Mar  8 12:08 dir2
 drwxr-xr-x 1 root root 0 Mar  8 12:08 dir3
 aris mnt # btrfs subvolume snapshot /mnt/ /mnt/aSnap
 Create a snapshot of '/mnt/' in '/mnt/aSnap'
 aris mnt # cd /mnt/aSnap/
 aris aSnap # ls -la
 total 8
 dr-xr-xr-x 1 root root 34 Mar  8 12:08 .
 dr-xr-xr-x 1 root root 34 Mar  8 12:08 ..
 drwxr-xr-x 1 root root  0 Mar  8 12:08 dir1
 drwxr-xr-x 1 root root  0 Mar  8 12:08 dir2
 drwxr-xr-x 1 root root  0 Mar  8 12:08 dir3
 aris aSnap # date  aSnap
 bash: aSnap: Is a directory

-- 
Ben Hutchings
Computers are not intelligent.  They only think they are.


signature.asc
Description: This is a digitally signed message part