Re: [PATCH] Btrfs: make lzo the default compression scheme

2011-05-27 Thread Sander
Li Zefan wrote (ao):
 As the lzo compression feature has been established for quite
 a while, we are now ready to replace zlib with lzo as the default
 compression scheme.

Please be aware that grub2 currently can't load files from a btrfs with
lzo compression (on debian sid/experimental at least).

Just found out the hard way after a kernel upgrade on a system with no
separate /boot partition :-)

Found this: https://bugs.archlinux.org/task/23901

Sander

-- 
Humilis IT Services and Solutions
http://www.humilis.net
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange btrfs sub list output

2011-05-27 Thread Stephane Chazelas
2011-05-26 22:22:03 +0100, Stephane Chazelas:
[...]
 I get a btrfs sub list output that I don't understand:
 
 # btrfs sub list /backup/
 ID 257 top level 5 path u1/linux/lvm+btrfs/storage/data/data
 ID 260 top level 5 path u2/linux/lvm/linux/var/data
 ID 262 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-10-11
 ID 263 top level 5 path u2/linux/lvm/linux/home/snapshots/2011-04-07
 ID 264 top level 5 path u2/linux/lvm/linux/root/snapshots/2011-04-07
 ID 265 top level 5 path u2/linux/lvm/linux/var/snapshots/2011-04-07
 ID 266 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-10-26
 ID 267 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-11-08
 ID 268 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-11-22
 ID 269 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-12-15
 ID 270 top level 5 path u2/linux/lvm/linux/home/snapshots/2011-04-14
 ID 271 top level 5 path u2/linux/lvm/linux/root/snapshots/2011-04-14
 ID 272 top level 5 path u2/linux/lvm/linux/var/snapshots/2011-04-14
 ID 273 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-12-29
 ID 274 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2011-01-26
 ID 275 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2011-03-07
 ID 276 top level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2011-04-01
 ID 277 top level 5 path u2/linux/lvm/linux/home/data
 ID 278 top level 5 path u2/linux/lvm/linux/home/snapshots/2011-04-27
 ID 279 top level 5 path u2/linux/lvm/linux/root/snapshots/2011-04-27
 ID 280 top level 5 path u2/linux/lvm/linux/var/snapshots/2011-04-27
 ID 281 top level 5 path u3:10022/vm+xfs@u9/xvda1/g1/v4/data
 ID 282 top level 5 path u3:10022/vm+xfs@u9/xvda1/g1/v4/snapshots/2011-05-19
 ID 283 top level 5 path u5/vm+xfs@u9/xvda1/g1/v5/data
 ID 284 top level 5 path u6:10022/vm+xfs@u8/xvda1/g8/v3/data
 ID 286 top level 5 path u5/vm+xfs@u9/xvda1/g1/v5/snapshots/2011-05-24
 ID 287 top level 285 path data
 ID 288 top level 5 path u4/vm+xfs@u9/xvda1/g1/v1/data
 ID 289 top level 5 path u4/vm+xfs@u9/xvda1/g1/v1/snapshots/2011-03-11
 ID 290 top level 5 path u4/vm+xfs@u9/xvda1/g1/v2/data
 ID 291 top level 5 path u4/vm+xfs@u9/xvda1/g1/v2/snapshots/2011-05-11
 ID 292 top level 5 path u4/vm+xfs@u9/xvda1/g1/v1/snapshots/2011-05-11
[...]
 There is no /backup/data directory. There is however a
 /backup/u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30 that
 contains the same thing as what I get if I mount the fs with
 subvolid=287. And I did do a btrfs sub snap data
 snapshots/2011-03/30 there.
 
 What could be the cause of that? How to fix it?
 
 In case that matters, there used to be more components in the
 path of u6:10022/vm+xfs@u8/xvda1/g8/v3/data.
[...]

I tried deleting the
/backup/u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30
subvolume (what seems to be id 287) and I get:

# btrfs sub delete snapshots/2011-03-30
Delete subvolume '/backup/u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30'
ERROR: cannot delete 
'/backup/u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30'

With a strace, it tells me:

ioctl(3, 0x5000940f, 0x7fffc7841a80)= -1 ENOTEMPTY (Directory not empty)

Then I realised that there was a data directory in there and
that snapshots/2011-03-30 was actually id 285 (which doesn't
appear in the btrfs sub list) and snapshots/2011-03-30/data is
id 287.

What do those top-level IDs mean by the way?

Then I was able to delete snapshots/2011-03-30/data, but
snapshots/2011-03-30 still didn't appear in the list.

Then I was able to delete snapshots/2011-03-30 and recreate it,
and this time it was fine.

Still don't know what happened there.

-- 
Stephane

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange btrfs sub list output

2011-05-27 Thread Andreas Philipp
On 27.05.2011 10:01, Stephane Chazelas wrote:
 2011-05-26 22:22:03 +0100, Stephane Chazelas: [...]
 I get a btrfs sub list output that I don't understand:

 # btrfs sub list /backup/ ID 257 top level 5 path
 u1/linux/lvm+btrfs/storage/data/data ID 260 top level 5 path
 u2/linux/lvm/linux/var/data ID 262 top level 5 path
 u1/linux/lvm+btrfs/storage/data/snapshots/2010-10-11 ID 263 top
 level 5 path u2/linux/lvm/linux/home/snapshots/2011-04-07 ID 264
 top level 5 path u2/linux/lvm/linux/root/snapshots/2011-04-07 ID
 265 top level 5 path u2/linux/lvm/linux/var/snapshots/2011-04-07
 ID 266 top level 5 path
 u1/linux/lvm+btrfs/storage/data/snapshots/2010-10-26 ID 267 top
 level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-11-08
 ID 268 top level 5 path
 u1/linux/lvm+btrfs/storage/data/snapshots/2010-11-22 ID 269 top
 level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2010-12-15
 ID 270 top level 5 path
 u2/linux/lvm/linux/home/snapshots/2011-04-14 ID 271 top level 5
 path u2/linux/lvm/linux/root/snapshots/2011-04-14 ID 272 top
 level 5 path u2/linux/lvm/linux/var/snapshots/2011-04-14 ID 273
 top level 5 path
 u1/linux/lvm+btrfs/storage/data/snapshots/2010-12-29 ID 274 top
 level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2011-01-26
 ID 275 top level 5 path
 u1/linux/lvm+btrfs/storage/data/snapshots/2011-03-07 ID 276 top
 level 5 path u1/linux/lvm+btrfs/storage/data/snapshots/2011-04-01
 ID 277 top level 5 path u2/linux/lvm/linux/home/data ID 278 top
 level 5 path u2/linux/lvm/linux/home/snapshots/2011-04-27 ID 279
 top level 5 path u2/linux/lvm/linux/root/snapshots/2011-04-27 ID
 280 top level 5 path u2/linux/lvm/linux/var/snapshots/2011-04-27
 ID 281 top level 5 path u3:10022/vm+xfs@u9/xvda1/g1/v4/data ID
 282 top level 5 path
 u3:10022/vm+xfs@u9/xvda1/g1/v4/snapshots/2011-05-19 ID 283 top
 level 5 path u5/vm+xfs@u9/xvda1/g1/v5/data ID 284 top level 5
 path u6:10022/vm+xfs@u8/xvda1/g8/v3/data ID 286 top level 5 path
 u5/vm+xfs@u9/xvda1/g1/v5/snapshots/2011-05-24 ID 287 top level
 285 path data ID 288 top level 5 path
 u4/vm+xfs@u9/xvda1/g1/v1/data ID 289 top level 5 path
 u4/vm+xfs@u9/xvda1/g1/v1/snapshots/2011-03-11 ID 290 top level 5
 path u4/vm+xfs@u9/xvda1/g1/v2/data ID 291 top level 5 path
 u4/vm+xfs@u9/xvda1/g1/v2/snapshots/2011-05-11 ID 292 top level 5
 path u4/vm+xfs@u9/xvda1/g1/v1/snapshots/2011-05-11
 [...]
 There is no /backup/data directory. There is however a
 /backup/u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30 that
 contains the same thing as what I get if I mount the fs with
 subvolid=287. And I did do a btrfs sub snap data
 snapshots/2011-03/30 there.

 What could be the cause of that? How to fix it?

 In case that matters, there used to be more components in the
 path of u6:10022/vm+xfs@u8/xvda1/g8/v3/data.
 [...]

 I tried deleting the
 /backup/u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30
 subvolume (what seems to be id 287) and I get:

 # btrfs sub delete snapshots/2011-03-30 Delete subvolume
 '/backup/u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30'
 ERROR: cannot delete
 '/backup/u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30'

 With a strace, it tells me:

 ioctl(3, 0x5000940f, 0x7fffc7841a80) = -1 ENOTEMPTY (Directory
 not empty)

 Then I realised that there was a data directory in there and
 that snapshots/2011-03-30 was actually id 285 (which doesn't appear
 in the btrfs sub list) and snapshots/2011-03-30/data is id 287.

 What do those top-level IDs mean by the way?
The top-level ID associated with a subvolume is NOT the ID of this
particular subvolume but of the subvolume containing it. Since the
root/initial (sub-)volume has always ID 0, the subvolumes of depth
1 will all have top-level ID set to 0. You need those top-level IDs to
correctly mount a specific subvolume by name.

# mount /dev/dummy -o subvol=subvolume,subvolrootid=top-level ID
/mountpoint

Of course, you do need them, if you specify the subvolume to mount by
its ID.

Cheers,
Andreas Philipp


 Then I was able to delete snapshots/2011-03-30/data, but
 snapshots/2011-03-30 still didn't appear in the list.

 Then I was able to delete snapshots/2011-03-30 and recreate it,
 and this time it was fine.

 Still don't know what happened there.


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange btrfs sub list output

2011-05-27 Thread Stephane Chazelas
2011-05-27 10:21:03 +0200, Andreas Philipp:
[...]
  What do those top-level IDs mean by the way?
 The top-level ID associated with a subvolume is NOT the ID of this
 particular subvolume but of the subvolume containing it. Since the
 root/initial (sub-)volume has always ID 0, the subvolumes of depth
 1 will all have top-level ID set to 0. You need those top-level IDs to
 correctly mount a specific subvolume by name.
 
 # mount /dev/dummy -o subvol=subvolume,subvolrootid=top-level ID
 /mountpoint
 
 Of course, you do need them, if you specify the subvolume to mount by
 its ID.
[...]

Thanks Andreas for pointing that subvolrootid (might be worth
adding it to
https://btrfs.wiki.kernel.org/index.php/Getting_started#Mount_Options
BTW).

In my case, on a freshly made btrfs file system, subvolumes have
top-level 5. (and neither volume with id 0 or 5 appear in the
btrfs sub list).

All the top-levels are 5, and I don't even know how to create a
subvolume with a different top-level there, so I wonder how that
subvol that I had created with

btrfs sub snap data snapshots/2011-03-30

ending up being a subvolume with ID 285 that doesn't appear in
the btrfs sub list and contains a subvolume of path data
in there (with its top-level being 285). All the other
subvolumes and snapshots I've created in the exact same way are
created with a top-level 5 and have an entry in btrfs sub list
and don't have subvolumes of their own.

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange btrfs sub list output

2011-05-27 Thread Stephane Chazelas
Is there a way to derive the subvolume ID from the stat(2)
st_dev, by the way.

# btrfs sub list .
ID 256 top level 5 path a
ID 257 top level 5 path b
# zstat +dev . a b
. 27
a 28
b 29

Are the dev numbers allocated in the same order as the
subvolids? Would there be any /sys, /proc, ioctl interface to
get this kind of information?

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange btrfs sub list output

2011-05-27 Thread Hugo Mills
On Fri, May 27, 2011 at 09:47:33AM +0100, Stephane Chazelas wrote:
 2011-05-27 10:21:03 +0200, Andreas Philipp:
 [...]
   What do those top-level IDs mean by the way?
  The top-level ID associated with a subvolume is NOT the ID of this
  particular subvolume but of the subvolume containing it. Since the
  root/initial (sub-)volume has always ID 0, the subvolumes of depth
  1 will all have top-level ID set to 0. You need those top-level IDs to
  correctly mount a specific subvolume by name.
  
  # mount /dev/dummy -o subvol=subvolume,subvolrootid=top-level ID
  /mountpoint
  
  Of course, you do need them, if you specify the subvolume to mount by
  its ID.
 [...]
 
 Thanks Andreas for pointing that subvolrootid (might be worth
 adding it to
 https://btrfs.wiki.kernel.org/index.php/Getting_started#Mount_Options
 BTW).
 
 In my case, on a freshly made btrfs file system, subvolumes have
 top-level 5. (and neither volume with id 0 or 5 appear in the
 btrfs sub list).
 
 All the top-levels are 5, and I don't even know how to create a
 subvolume with a different top-level there, so I wonder how that
 subvol that I had created with

   Actually, top-level subvolume ID=0 is a fiction. Internally, each
subvolume is a separate FS tree (an FS tree in btrfs is a btree
containing all of the inode and directory information for some
subvolume). These trees are all referred to by a tree called the root
tree, which indexes all of the btrees in the filesystem.

   The root tree has a unique reference ID for each tree that it
points to: most of the trees (extent tree, device tree, etc) have
fixed and well-known IDs smaller than 256. The FS tree for the
top-level subvolume -- the one that doesn't show up on a subvolume
list -- always has ID 5. Hence the containing subvolume for most of
your subvolumes is 5. The FS trees for the non-top-level subvolumes
have IDs starting at 256 and increasing monotonically.

   Internally, there's a bit of a fiddle in the API, where a request
for a subvolume ID of 0 is (sometimes) translated to an ID of 5. It's
not always done, I think, and those cases where a subvol ID of 0
doesn't get you the top-level subvolume should be treated as bugs.

   That's all rather dense, and probably too much information. Hope
it's helpful, though.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- A linked list is still a binary tree.  Just a very unbalanced ---  
 one.  -- dragon 


signature.asc
Description: Digital signature


Re: strange btrfs sub list output

2011-05-27 Thread Andreas Philipp

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 
On 27.05.2011 11:12, Hugo Mills wrote:
 On Fri, May 27, 2011 at 09:47:33AM +0100, Stephane Chazelas wrote:
 2011-05-27 10:21:03 +0200, Andreas Philipp:
 [...]
 What do those top-level IDs mean by the way?
 The top-level ID associated with a subvolume is NOT the ID of this
 particular subvolume but of the subvolume containing it. Since the
 root/initial (sub-)volume has always ID 0, the subvolumes of depth
 1 will all have top-level ID set to 0. You need those top-level IDs to
 correctly mount a specific subvolume by name.

 # mount /dev/dummy -o subvol=subvolume,subvolrootid=top-level ID
 /mountpoint

 Of course, you do need them, if you specify the subvolume to mount by
 its ID.
 [...]

 Thanks Andreas for pointing that subvolrootid (might be worth
 adding it to
 https://btrfs.wiki.kernel.org/index.php/Getting_started#Mount_Options
 BTW).

 In my case, on a freshly made btrfs file system, subvolumes have
 top-level 5. (and neither volume with id 0 or 5 appear in the
 btrfs sub list).

 All the top-levels are 5, and I don't even know how to create a
 subvolume with a different top-level there, so I wonder how that
 subvol that I had created with

 Actually, top-level subvolume ID=0 is a fiction. Internally, each
 subvolume is a separate FS tree (an FS tree in btrfs is a btree
 containing all of the inode and directory information for some
 subvolume). These trees are all referred to by a tree called the root
 tree, which indexes all of the btrees in the filesystem.

 The root tree has a unique reference ID for each tree that it
 points to: most of the trees (extent tree, device tree, etc) have
 fixed and well-known IDs smaller than 256. The FS tree for the
 top-level subvolume -- the one that doesn't show up on a subvolume
 list -- always has ID 5. Hence the containing subvolume for most of
 your subvolumes is 5. The FS trees for the non-top-level subvolumes
 have IDs starting at 256 and increasing monotonically.

 Internally, there's a bit of a fiddle in the API, where a request
 for a subvolume ID of 0 is (sometimes) translated to an ID of 5. It's
 not always done, I think, and those cases where a subvol ID of 0
 doesn't get you the top-level subvolume should be treated as bugs.
Thank you for all this information. Once I had a such a situation,
where mount with subvolid=0 did not mount the top-level subvolume. I
will try to recreate it with a recent kernel.

Thanks,
Andreas


 That's all rather dense, and probably too much information. Hope
 it's helpful, though.

 Hugo.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
 
iQIcBAEBAgAGBQJN323eAAoJEJIcBJ3+XkgiFe8QALC9pa9DwygWNhULHF1jGoqY
+sHCvgD5WazkcquFD3xWg2pc52rnvDWpdeJAPw+6DzViCqnrk6lICyhhvjnAbm8a
h/87/7cV2CZbcVn/v283iuPLsok+HXsiyoMUHSEOhSCAE8CvveZbK7LtMSxagQpv
+e9TM9HUImw6UweYZ2LwMXY/Wu1z9yBaG/JuOq2MkslLniFekKaIPe8eZD4aej3o
RFkVKplvx3egu5lVJMDaK4rpL8xrQVxE4G8CtHLvVKRzJVHs8V3XTccaXmwpDks6
sZ+lzeU2+lNg+776K9+saXOuT9Ytuo0rpcDiEUAYxBO2DxSmbV2NArYkTLo0C3Sf
32+ecoqtZeNJH/v9a68+Pq0UH5cualLROGwyoc+MgqqIB+4zFq+nuTqk9eGtKchh
2YxQePXejnVsga8wgFMFSDYYaGKtfYUDKM+loq5XA/1A9bqjprIC40ovc3AHcJID
eqb861TEGXDBMajhFlLICk4YxyLd87ze6BOa4NxWwpVjkLW4HHPplsbW6EkTJBv6
bVwKDIpE4bmIpovIhRwxo5Eba4DNRtHrRD7U+2Ep+Juxx8n3y6DQD+qm40mOEtG0
oAhpVE/rKcR6FTxHPWon6lGH6D51bDDVOxVTwAyzETGbRA+eSA3nP05dtisXjEB2
07UBm2s0wHX7oQKOiATE
=R/Ih
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange btrfs sub list output

2011-05-27 Thread Stephane Chazelas
2011-05-27 10:12:24 +0100, Hugo Mills:
[skipped useful clarification]
 
That's all rather dense, and probably too much information. Hope
 it's helpful, though.
[...]

It is, thanks.

How would one end up in a situation where the output of btrfs
sub list . has:

ID 287 top level 285 path data

How could a subvolume 285 become a top level?

How does one get a subvolume with a top-level other than 5?

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange btrfs sub list output

2011-05-27 Thread Hugo Mills
On Fri, May 27, 2011 at 10:30:29AM +0100, Stephane Chazelas wrote:
 2011-05-27 10:12:24 +0100, Hugo Mills:
 [skipped useful clarification]
  
 That's all rather dense, and probably too much information. Hope
  it's helpful, though.
 [...]
 
 It is, thanks.
 
 How would one end up in a situation where the output of btrfs
 sub list . has:
 
 ID 287 top level 285 path data
 
 How could a subvolume 285 become a top level?

 How does one get a subvolume with a top-level other than 5?

   This just means that subvolume 287 was created (somewhere) inside
subvolume 285.

   Due to the way that the FS trees and subvolumes work, there's no
global namespace structure in btrfs; that is, there's no single data
structure that represents the entirety of the file/directory hierarchy
in the filesystem. Instead, it's broken up into these sub-namespaces
called subvolumes, and we only record parent/child relationships for
each subvolume separately. The full path you get from btrfs subv
list is reconstructed from that information in userspace(*).

   Hugo.

(*) Here's how it does it:

The userspace tool gets a list of every subvolume by looking at the FS
tree. It uses the corresponding back-refs to get the inode that
represents each of those FS trees inside its parent:

Subvol  inode   in subvol
256  9915
257  896  256
258 1073  257

From the inode numbers, it can then recursively walk back up the
directory path to the top of each subvolume:

Subvol  inode   in subvolrelative path
256  9915henry
257  896  256edward/mary
258 1073  257elizabeth

From that, it can then reconstruct the full pathnames, by walking back
up the subvolume tree:

subvol 258 is elizabeth in 257
   is edward/mary/elizabeth in 256
   is henry/edward/mary/elizabeth in 5

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- A linked list is still a binary tree.  Just a very unbalanced ---  
 one.  -- dragon 


signature.asc
Description: Digital signature


Re: strange btrfs sub list output

2011-05-27 Thread Andreas Philipp

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 
On 27.05.2011 11:45, Hugo Mills wrote:
 On Fri, May 27, 2011 at 10:30:29AM +0100, Stephane Chazelas wrote:
 2011-05-27 10:12:24 +0100, Hugo Mills:
 [skipped useful clarification]

 That's all rather dense, and probably too much information. Hope
 it's helpful, though.
 [...]

 It is, thanks.

 How would one end up in a situation where the output of btrfs
 sub list . has:

 ID 287 top level 285 path data

 How could a subvolume 285 become a top level?

 How does one get a subvolume with a top-level other than 5?

 This just means that subvolume 287 was created (somewhere) inside
 subvolume 285.

 Due to the way that the FS trees and subvolumes work, there's no
 global namespace structure in btrfs; that is, there's no single data
 structure that represents the entirety of the file/directory hierarchy
 in the filesystem. Instead, it's broken up into these sub-namespaces
 called subvolumes, and we only record parent/child relationships for
 each subvolume separately. The full path you get from btrfs subv
 list is reconstructed from that information in userspace(*).

 Hugo.

 (*) Here's how it does it:

 The userspace tool gets a list of every subvolume by looking at the FS
 tree. It uses the corresponding back-refs to get the inode that
 represents each of those FS trees inside its parent:

 Subvol inode in subvol
 256 991 5
 257 896 256
 258 1073 257

 From the inode numbers, it can then recursively walk back up the
 directory path to the top of each subvolume:

 Subvol inode in subvol relative path
 256 991 5 henry
 257 896 256 edward/mary
 258 1073 257 elizabeth

 From that, it can then reconstruct the full pathnames, by walking back
 up the subvolume tree:

 subvol 258 is elizabeth in 257
 is edward/mary/elizabeth in 256
 is henry/edward/mary/elizabeth in 5
Just one (hopefully) short question: A line in the ouput of btrfs
subvolume list like
ID 257 top level 5 path test1/test1.1
says that the subvolume with name test1.1 (the last segment of the
path) and ID 257 has the path test1/test1.1 starting at the top level
subvolume which has ID 5 ?

Thanks,
Andreas
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
 
iQIcBAEBAgAGBQJN33ezAAoJEJIcBJ3+Xkgi3egP/25y7JjnHJ9ZfQ2TF0cVWlhh
4FSHKhXlokH7E8fMcBbwP6YTB2zJioRkdWKmzoNjAvLlL8QkI7PvljAipe7YMgai
Zq+FNzN2y6qkpNBhdpJC0rnURbtD7neDdcRCDF3uatP2p+m6UghfPyTfqX31h1qc
UOp+3r+HLvlhAtKILxRaIZHidpS9ThZyN2mFHyKbyMMCoFYRXlJwL8xurPWdInbQ
sgjDmXVstsnoTcDaCsdWfUkRiLyPeiieOgCiB0X+/GdEG/gE6ICtzOf93fIeJu/B
CdGoaOSz73UIPdXstqiawhKxB83Ly68GNfoc/mrjFEml91KalGUnq/6f/344u6mB
2Ipwn1dpeC5ImwZO+VEc1HSv/GPCWyotUFzjV8NB/CcYYehX8GiNY0cSaT0NjTzs
ycUOOJUTWHTmavdT8ryDILPqSsqzMnN9NnrjJhs7EjEXSkvRxNQ4vUNOsWvCPjJl
HlooInMQ8/QTBkBLPkkiHWmhNuUaMPH6DJ85v6RNpFLiyf9TFDzBJvvyrZbkWx2y
tIvg8C1oKuZ1iulZidfY36h2wf4u/DuYgNYPSL0vsdOfABStn9MBeqPbqeF6fF42
AJ0gzVd+cqIWiFbnXEi4Zxt72l1DViLqe3Rxij2u00QOPRMtgGoKcwY7WmLKBnU5
1/vjmYvTJNnShewXMvsh
=zCSk
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange btrfs sub list output

2011-05-27 Thread Hugo Mills
On Fri, May 27, 2011 at 12:06:44PM +0200, Andreas Philipp wrote:
 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
  
 On 27.05.2011 11:45, Hugo Mills wrote:
  On Fri, May 27, 2011 at 10:30:29AM +0100, Stephane Chazelas wrote:
  2011-05-27 10:12:24 +0100, Hugo Mills:
  [skipped useful clarification]
 
  That's all rather dense, and probably too much information. Hope
  it's helpful, though.
  [...]
 
  It is, thanks.
 
  How would one end up in a situation where the output of btrfs
  sub list . has:
 
  ID 287 top level 285 path data
 
  How could a subvolume 285 become a top level?
 
  How does one get a subvolume with a top-level other than 5?
 
  This just means that subvolume 287 was created (somewhere) inside
  subvolume 285.
 
  Due to the way that the FS trees and subvolumes work, there's no
  global namespace structure in btrfs; that is, there's no single data
  structure that represents the entirety of the file/directory hierarchy
  in the filesystem. Instead, it's broken up into these sub-namespaces
  called subvolumes, and we only record parent/child relationships for
  each subvolume separately. The full path you get from btrfs subv
  list is reconstructed from that information in userspace(*).
 
  Hugo.
 
  (*) Here's how it does it:
 
  The userspace tool gets a list of every subvolume by looking at the FS
  tree. It uses the corresponding back-refs to get the inode that
  represents each of those FS trees inside its parent:
 
  Subvol inode in subvol
  256 991 5
  257 896 256
  258 1073 257
 
  From the inode numbers, it can then recursively walk back up the
  directory path to the top of each subvolume:
 
  Subvol inode in subvol relative path
  256 991 5 henry
  257 896 256 edward/mary
  258 1073 257 elizabeth
 
  From that, it can then reconstruct the full pathnames, by walking back
  up the subvolume tree:
 
  subvol 258 is elizabeth in 257
  is edward/mary/elizabeth in 256
  is henry/edward/mary/elizabeth in 5
 Just one (hopefully) short question: A line in the ouput of btrfs
 subvolume list like
 ID 257 top level 5 path test1/test1.1
 says that the subvolume with name test1.1 (the last segment of the
 path) and ID 257 has the path test1/test1.1 starting at the top level
 subvolume which has ID 5 ?

   Yes. IIRC, the paths reported by btrfs subv list are full paths
back to the top level directory of the filesystem. In the case of your
example, that's the same thing.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- Welcome to Rivendell,  Mr Anderson... ---  


signature.asc
Description: Digital signature


Re: strange btrfs sub list output

2011-05-27 Thread Stephane Chazelas
2011-05-27 10:45:23 +0100, Hugo Mills:
[...]
  How could a subvolume 285 become a top level?
 
  How does one get a subvolume with a top-level other than 5?
 
This just means that subvolume 287 was created (somewhere) inside
 subvolume 285.
 
Due to the way that the FS trees and subvolumes work, there's no
 global namespace structure in btrfs; that is, there's no single data
 structure that represents the entirety of the file/directory hierarchy
 in the filesystem. Instead, it's broken up into these sub-namespaces
 called subvolumes, and we only record parent/child relationships for
 each subvolume separately. The full path you get from btrfs subv
 list is reconstructed from that information in userspace(*).
[...]

Thanks, I can understand that. What I don't get is how one
creates a subvol with a top-level other than 5. I might be
missing the obvious, though.

If I do:

btrfs sub create A
btrfs sub create A/B
btrfs sub snap A A/B/C

A, A/B, A/B/C have their top-level being 5. How would I get a
new snapshot to be a child of A/B for instance?

In my case, 285, was not appearing in the btrfs sub list output,
287 was a child of 285 with path data while all I did was
create a snapshot of 284 (path
u6:10022/vm+xfs@u8/xvda1/g8/v3/data in vol 5) in
u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30

So I did manage to get a volume with a parent other than 5, but
I did not ask for it.

-- 
Stephane
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: strange btrfs sub list output

2011-05-27 Thread Hugo Mills
On Fri, May 27, 2011 at 12:30:10PM +0100, Stephane Chazelas wrote:
 2011-05-27 10:45:23 +0100, Hugo Mills:
 [...]
   How could a subvolume 285 become a top level?
  
   How does one get a subvolume with a top-level other than 5?
  
 This just means that subvolume 287 was created (somewhere) inside
  subvolume 285.
  
 Due to the way that the FS trees and subvolumes work, there's no
  global namespace structure in btrfs; that is, there's no single data
  structure that represents the entirety of the file/directory hierarchy
  in the filesystem. Instead, it's broken up into these sub-namespaces
  called subvolumes, and we only record parent/child relationships for
  each subvolume separately. The full path you get from btrfs subv
  list is reconstructed from that information in userspace(*).
 [...]
 
 Thanks, I can understand that. What I don't get is how one
 creates a subvol with a top-level other than 5. I might be
 missing the obvious, though.
 
 If I do:
 
 btrfs sub create A
 btrfs sub create A/B
 btrfs sub snap A A/B/C
 
 A, A/B, A/B/C have their top-level being 5. How would I get a
 new snapshot to be a child of A/B for instance?

   Hm. OK, that's not doing what I thought it was, then. I'll have to
look at the code to work out what that top-level output actually is,
then. (Won't be for a few hours, until I get home from work).

 In my case, 285, was not appearing in the btrfs sub list output,
 287 was a child of 285 with path data while all I did was
 create a snapshot of 284 (path
 u6:10022/vm+xfs@u8/xvda1/g8/v3/data in vol 5) in
 u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30
 
 So I did manage to get a volume with a parent other than 5, but
 I did not ask for it.
 

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- I hate housework. You make the beds, you wash the dishes, and ---  
   six months later you have to start all over again.


signature.asc
Description: Digital signature


Re: strange btrfs sub list output

2011-05-27 Thread Andreas Philipp

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 
On 27.05.2011 13:30, Stephane Chazelas wrote:
 2011-05-27 10:45:23 +0100, Hugo Mills: [...]
 How could a subvolume 285 become a top level?

 How does one get a subvolume with a top-level other than 5?

 This just means that subvolume 287 was created (somewhere)
 inside subvolume 285.

 Due to the way that the FS trees and subvolumes work, there's no
 global namespace structure in btrfs; that is, there's no single
 data structure that represents the entirety of the file/directory
 hierarchy in the filesystem. Instead, it's broken up into these
 sub-namespaces called subvolumes, and we only record parent/child
 relationships for each subvolume separately. The full path you
 get from btrfs subv list is reconstructed from that information
 in userspace(*).
 [...]

 Thanks, I can understand that. What I don't get is how one creates
 a subvol with a top-level other than 5. I might be missing the
 obvious, though.

 If I do:

 btrfs sub create A btrfs sub create A/B btrfs sub snap A A/B/C

 A, A/B, A/B/C have their top-level being 5. How would I get a new
 snapshot to be a child of A/B for instance?

 In my case, 285, was not appearing in the btrfs sub list output,
 287 was a child of 285 with path data while all I did was create
 a snapshot of 284 (path u6:10022/vm+xfs@u8/xvda1/g8/v3/data in vol
 5) in u6:10022/vm+xfs@u8/xvda1/g8/v3/snapshots/2011-03-30

 So I did manage to get a volume with a parent other than 5, but I
 did not ask for it.
Reconsidering the explanations on btrfs subvolume list in this thread
I get the impression that a line in the output of btrfs subvolume list
with top level other than 5 indicates that the backrefs from one
subvolume to its parent are broken.

What's your opinion on this?

Thanks,
Andreas
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
 
iQIcBAEBAgAGBQJN34/fAAoJEJIcBJ3+XkgiTVcP/iQ62XnEAS0rVGOl+0DNqySb
5A5N3/pzhgzOdMhldJYtgg0K60lV0qs0H31ITgOdGUtpEXibybU/6Yuy2yIfqx0T
3OQCb2KE8la2hlh472aTuIN3beljFYzPu89KVrGaT6kD7lABRXkCG5y1Y5+fvVXI
gtq5/mCqvyaxxUMTppgzLHwtt0YVICZeCDmALMtsVe1DMr0uT5QI0XY+4Glpl7AJ
1G6Plyr7qciOwdRgvM/7NkHl/gsJ4GEvIOSVFiBM4Hb8fX7APy/C//sIPfD2Kg5K
7B6sJMpS2i87uEsrr+w8j7nLWn9Y/255W89r/cG3uISDFRn/RDs9xEnRCfEXb6qf
ZeBPVfv9+pN6mmwrfUOJr4pb44f9/UgTC+udCfzKm1yWVci895NIGsfJgYfA0OOf
GRnCWVRwFStiUGf0uSRH0yJAW5ozI8DzDnDKzByFpMcmw3eVNq5usCftA4XxVi7r
Wu/v9z6DNdHj7ibsSdeYXAmVGpwennILPeEvGWDbMB/OZIDKC3s75yCzXIhxWpya
zR5jGDbGj9IkvUhSAwW0afFqBK+bZny/SJsqA0vFH7Emao0CG1FIJVlN7/S6OSg1
Dtye//ocjhO0kf3OX3hj689n4/mvaBZeVArCz5vJzG2wEcRZTF4DZ4ApsUjne0LC
q4L2n9nLM4yeAs+YjFx/
=R53y
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: try to only do one btrfs_search_slot in do_setxattr

2011-05-27 Thread Josef Bacik
I've been watching how many btrfs_search_slot()'s we do and I noticed that when
we create a file with selinux enabled we were doing 2 each time we initialize
the security context.  That's because we lookup the xattr first so we can delete
it if we're setting a new value to an existing xattr.  But in the create case we
don't have any xattrs, so it is completely useless to have the extra lookup.  So
re-arrange things so that we only lookup first if we specifically have
XATTR_REPLACE.  That way in the basic case we only do 1 search, and in the more
complicated case we do the normal 2 lookups.  Thanks,

Signed-off-by: Josef Bacik jo...@redhat.com
---
 fs/btrfs/xattr.c |   54 ++
 1 files changed, 26 insertions(+), 28 deletions(-)

diff --git a/fs/btrfs/xattr.c b/fs/btrfs/xattr.c
index 72ab029..4857a87 100644
--- a/fs/btrfs/xattr.c
+++ b/fs/btrfs/xattr.c
@@ -102,43 +102,41 @@ static int do_setxattr(struct btrfs_trans_handle *trans,
if (!path)
return -ENOMEM;
 
-   /* first lets see if we already have this xattr */
-   di = btrfs_lookup_xattr(trans, root, path, inode-i_ino, name,
-   strlen(name), -1);
-   if (IS_ERR(di)) {
-   ret = PTR_ERR(di);
-   goto out;
-   }
-
-   /* ok we already have this xattr, lets remove it */
-   if (di) {
-   /* if we want create only exit */
-   if (flags  XATTR_CREATE) {
-   ret = -EEXIST;
+   if (flags  XATTR_REPLACE) {
+   di = btrfs_lookup_xattr(trans, root, path, inode-i_ino, name,
+   strlen(name), -1);
+   if (IS_ERR(di)) {
+   ret = PTR_ERR(di);
+   goto out;
+   } else if (!di) {
+   ret = -ENODATA;
goto out;
}
-
ret = btrfs_delete_one_dir_name(trans, root, path, di);
-   BUG_ON(ret);
-   btrfs_release_path(root, path);
-
-   /* if we don't have a value then we are removing the xattr */
-   if (!value)
+   if (ret)
goto out;
-   } else {
btrfs_release_path(root, path);
-
-   if (flags  XATTR_REPLACE) {
-   /* we couldn't find the attr to replace */
-   ret = -ENODATA;
-   goto out;
-   }
}
 
-   /* ok we have to create a completely new xattr */
+again:
ret = btrfs_insert_xattr_item(trans, root, path, inode-i_ino,
  name, name_len, value, size);
-   BUG_ON(ret);
+   if (ret == -EEXIST) {
+   if (flags  XATTR_CREATE)
+   goto out;
+   di = btrfs_match_dir_item_name(root, path, name, name_len);
+   ret = btrfs_delete_one_dir_name(trans, root, path, di);
+   if (ret)
+   goto out;
+
+   /*
+* We have a value to set, so go back and try to insert it now.
+*/
+   if (value) {
+   btrfs_release_path(root, path);
+   goto again;
+   }
+   }
 out:
btrfs_free_path(path);
return ret;
-- 
1.7.2.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix bitmap regression

2011-05-27 Thread Josef Bacik
In cleaning up the clustering code I accidently introduced a regression by
adding bitmap entries to the cluster rb tree.  The problem is if we've maxed out
the number of bitmaps we can have for the block group we can only add free space
to the bitmaps, but since the bitmap is on the cluster we can't find it and we
try to create another one.  This would result in a panic because the total
bitmaps was bigger than the max bitmaps that were allowed.  This patch fixes
this by checking to see if we have a cluster, and then looking at the cluster rb
tree to see if it has a bitmap entry and if it does and that space belongs to
that bitmap, go ahead and add it to that bitmap.

I could hit this panic every time with an fs_mark test within a couple of
minutes.  With this patch I no longer hit the panic and fs_mark goes to
completion.  Thanks,

Signed-off-by: Josef Bacik jo...@redhat.com
---
 fs/btrfs/free-space-cache.c |   75 +-
 1 files changed, 59 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index a827c97..dac2546 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -1365,12 +1365,28 @@ again:
return 0;
 }
 
+static u64 add_bytes_to_bitmap(struct btrfs_block_group_cache *block_group,
+  struct btrfs_free_space *info, u64 offset,
+  u64 bytes)
+{
+   u64 bytes_to_set = 0;
+   u64 end;
+
+   end = info-offset + (u64)(BITS_PER_BITMAP * block_group-sectorsize);
+
+   bytes_to_set = min(end - offset, bytes);
+
+   bitmap_set_bits(block_group, info, offset, bytes_to_set);
+
+   return bytes_to_set;
+
+}
 static int insert_into_bitmap(struct btrfs_block_group_cache *block_group,
  struct btrfs_free_space *info)
 {
struct btrfs_free_space *bitmap_info;
int added = 0;
-   u64 bytes, offset, end;
+   u64 bytes, offset, bytes_added;
int ret;
 
/*
@@ -1405,6 +1421,44 @@ static int insert_into_bitmap(struct 
btrfs_block_group_cache *block_group,
bytes = info-bytes;
offset = info-offset;
 
+   /*
+* Since we link bitmaps right into the cluster we need to see if we
+* have a cluster here, and if so and it has our bitmap we need to add
+* the free space to that bitmap.
+*/
+   if (!list_empty(block_group-cluster_list)) {
+   struct btrfs_free_cluster *cluster;
+   struct rb_node *node;
+   struct btrfs_free_space *entry;
+
+   cluster = list_entry(block_group-cluster_list.next,
+struct btrfs_free_cluster,
+block_group_list);
+   spin_lock(cluster-lock);
+   node = rb_first(cluster-root);
+   if (!node) {
+   spin_unlock(cluster-lock);
+   goto again;
+   }
+
+   entry = rb_entry(node, struct btrfs_free_space, offset_index);
+   if (!entry-bitmap) {
+   spin_unlock(cluster-lock);
+   goto again;
+   }
+
+   if (entry-offset == offset_to_bitmap(block_group, offset)) {
+   bytes_added = add_bytes_to_bitmap(block_group, entry,
+ offset, bytes);
+   bytes -= bytes_added;
+   offset += bytes_added;
+   }
+   spin_unlock(cluster-lock);
+   if (!bytes) {
+   ret = 1;
+   goto out;
+   }
+   }
 again:
bitmap_info = tree_search_offset(block_group,
 offset_to_bitmap(block_group, offset),
@@ -1414,21 +1468,10 @@ again:
goto new_bitmap;
}
 
-   end = bitmap_info-offset +
-   (u64)(BITS_PER_BITMAP * block_group-sectorsize);
-
-   if (offset = bitmap_info-offset  offset + bytes  end) {
-   bitmap_set_bits(block_group, bitmap_info, offset,
-   end - offset);
-   bytes -= end - offset;
-   offset = end;
-   added = 0;
-   } else if (offset = bitmap_info-offset  offset + bytes = end) {
-   bitmap_set_bits(block_group, bitmap_info, offset, bytes);
-   bytes = 0;
-   } else {
-   BUG();
-   }
+   bytes_added = add_bytes_to_bitmap(block_group, bitmap_info, offset, 
bytes);
+   bytes -= bytes_added;
+   offset += bytes_added;
+   added = 0;
 
if (!bytes) {
ret = 1;
-- 
1.7.2.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at fs/btrfs/inode.c:2271!

2011-05-27 Thread Marco Neubauer

Am 25.05.2011 um 21:25 schrieb Josef Back:
 
 Hrm well that's doubly weird, the root should be right so it should be
 able to find the orphan item to delete it for the bad inode, and why the
 hell are we looping on that orphan item?  Remove the previous patch I
 gave you and apply this one instead and run with this please and provide
 me the log.  Sorry in advance, it will likely give you a giant log file
 again.  Thanks,

The system didn't crash until now and all I got in the log was the following 
output.
This is the expected correct behavior?

May 26 03:10:13 mainframe kernel: found orphan item for 909457 on 267
May 26 03:10:13 mainframe kernel: lookup of inode was from disk
May 26 03:10:13 mainframe kernel: inode needs to be truncated
May 26 03:10:13 mainframe kernel: drop is 0
May 26 03:10:13 mainframe kernel: found orphan item for 909415 on 267
May 26 03:10:13 mainframe kernel: lookup of inode was from disk
May 26 03:10:13 mainframe kernel: inode needs to be unlinked
May 26 03:10:13 mainframe kernel: drop is 1
May 26 03:10:13 mainframe kernel: found orphan item for 909414 on 267
May 26 03:10:13 mainframe kernel: lookup of inode was from disk
May 26 03:10:13 mainframe kernel: inode needs to be unlinked
May 26 03:10:13 mainframe kernel: drop is 1
May 26 03:10:13 mainframe kernel: found orphan item for 899452 on 267
May 26 03:10:13 mainframe kernel: lookup of inode was from disk
May 26 03:10:13 mainframe kernel: inode needs to be unlinked
May 26 03:10:13 mainframe kernel: drop is 1
May 26 03:10:13 mainframe kernel: found orphan item for 899451 on 267
May 26 03:10:13 mainframe kernel: lookup of inode was from disk
May 26 03:10:13 mainframe kernel: inode needs to be unlinked
May 26 03:10:13 mainframe kernel: drop is 1
May 26 03:10:13 mainframe kernel: btrfs: unlinked 4 orphans
May 26 03:10:13 mainframe kernel: btrfs: truncated 1 orphans
May 26 03:10:46 mainframe kernel: drop is 0

I'll keep an eye on it.

-marco



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] Btrfs updates

2011-05-27 Thread Chris Mason
Hi everyone,

I always thought that I'd be retired and with my flying car at the
beach by the time 3.0 came out, but I've setup the for-linus branch of
the btrfs-unstable tree for pulling:

git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable.git for-linus

This pull request is probably the biggest I've sent, but it isn't
a code dump into our shiny new .0 rc.  The bulk of the changes are
three separate projects that have been going on for 6-8 months:

A new btrfs inode allocation cache (Li Zefan)
Delayed metadata insertion into the btree (Miao Xe)
Device scrubbing (Arne Jansen)

On top of that Dave Sterba kicked in a series of code cleanups and Josef
Bacik did some really nice tuning.  The short log lists a few other
cleanups and fixes too.

I coded up a mount -o autodefrag that will detect random writes
into existing files and kick off background defragging.  It is well
suited to bdb or sqlite databases, but not virtualization images or big
databases (yet).  Once I make sure it doesn't defrag files over and over
again, I'll move this toward the default.

David Sterba (17) commits (+310/-3253):
btrfs: rename variables clashing with global function names (+53/-53)
btrfs: use printk_ratelimited instead of printk_ratelimit (+10/-24)
btrfs: drop unused parameter from btrfs_release_path (+160/-160)
btrfs: drop unused parameter from extent_map_tree_init (+5/-7)
btrfs: drop unused argument from extent_io_tree_init (+10/-12)
btrfs: remove nested duplicate variable declarations (+0/-4)
btrfs: drop gfp parameter from alloc_extent_buffer (+7/-9)
btrfs: drop gfp parameter from find_extent_buffer (+4/-6)
btrfs: drop gfp parameter from alloc_extent_map (+16/-17)
btrfs: use unsigned type for single bit bitfield (+4/-4)
btrfs: remove old unused commented out code (+1/-2071)
btrfs: Document a mutex lock/unlock sequence (+12/-0)
btrfs: make functions static when possible (+7/-7)
btrfs: unify checking of IS_ERR and null (+17/-17)
btrfs: remove unused function prototypes (+0/-43)
btrfs: remove all unused functions (+1/-817)
btrfs: fix dereference before check (+3/-2)

Li Zefan (8) commits (+1449/-665):
Btrfs: Make the code for reading/writing free space cache generic 
(+204/-154)
Btrfs: setup free ino caching in a more asynchronous way (+22/-6)
Btrfs: Support reading/writing on disk free ino cache (+236/-19)
Btrfs: Remove unused btrfs_block_group_free_space() (+0/-16)
Btrfs: Make free space cache code generic (+271/-223)
Btrfs: Cache free inode numbers in memory (+500/-53)
Btrfs: Always use 64bit inode number (+208/-182)
Btrfs: Use bitmap_set/clear() (+8/-12)

Xiao Guangrong (7) commits (+134/-59):
Btrfs: allocate extent state and check the result properly (+26/-8)
Btrfs: using rcu lock in the reader side of devices list (+72/-36)
Btrfs: fix the race between reading and updating devices (+9/-0)
Btrfs: fix the race between remove dev and alloc chunk (+6/-0)
Btrfs: fix bh leak on __btrfs_open_devices path (+1/-0)
Btrfs: fix unsafe usage of merge_state (+14/-8)
Btrfs: drop unnecessary device lock (+6/-7)

Arne Jansen (6) commits (+1822/-361):
btrfs scrub: don't coalesce pages that are logically discontiguous (+2/-1)
btrfs: move btrfs_cmp_device_free_bytes to super.c (+26/-28)
btrfs: quasi-round-robin for chunk allocation (+177/-305)
btrfs: add readonly flag (+16/-12)
btrfs: heed alloc_start (+1/-4)
btrfs: scrub (+1600/-11)

Tsutomu Itoh (5) commits (+43/-36):
Btrfs: BUG_ON is deleted from the caller of btrfs_truncate_item  
btrfs_extend_item (+2/-17)
Btrfs: return error code to caller when btrfs_previous_item fails (+3/-2)
Btrfs: return error code to caller when btrfs_del_item fails (+19/-11)
Btrfs: return error to caller if read_one_inode() fails (+18/-6)
Btrfs: check return value of btrfs_inc_extent_ref() (+1/-0)

Chris Mason (4) commits (+689/-144):
Btrfs: update the delayed inode code to use the btrfs_ino helper. (+7/-6)
Btrfs: use the device_list_mutex during write_dev_supers (+2/-2)
Btrfs: return -ENOMEM in clear_extent_bit (+2/-1)
Btrfs: add mount -o auto_defrag (+678/-135)

Sergei Trofimovich (3) commits (+7/-3):
btrfs: don't spin in shrink_delalloc if there is nothing to free (+4/-0)
btrfs: fix typo 'testeing' - 'testing' (+2/-2)
btrfs: typo: 'btrfS' - 'btrfs' (+1/-1)

Jan Schmidt (1) commits (+169/-2):
btrfs: new ioctls for scrub

liubo (1) commits (+3/-0):
Btrfs: do not flush csum items of unchanged file data during treelog

Miao Xie (1) commits (+2074/-91):
btrfs: implement delayed inode items operation

Julia Lawall (1) commits (+4/-1):
fs/btrfs: Add missing btrfs_free_path

Andi Kleen (1) commits (+0/-4):
BTRFS: Remove unused node_lock

Ilya Dryomov (1) commits (+80/-207):
btrfs scrub: make fixups sync

Jamey Sharp (1) commits (+0/-43):
btrfs: Delete 

[PATCH] Btrfs: fix the allocator loop logic

2011-05-27 Thread Josef Bacik
I was testing with empty_cluster = 0 to try and reproduce a problem and kept
hitting early enospc panics.  This was because our loop logic was a little
confused.  So this is what I did

1) Make the loop variable the ultimate decider on wether we should loop again
isntead of checking to see if we had an uncached bg, empty size or empty
cluster.

2) Increment loop before checking to see what we are on to make the loop
definitions make more sense.

3) If we are on the chunk alloc loop don't set empty_size/empty_cluster to 0
unless we didn't actually allocate a chunk.  If we did allocate a chunk we
should be able to easily setup a new cluster so clearing
empty_size/empty_cluster makes us less efficient.

This kept me from hitting panics while trying to reproduce the other problem.
Thanks,

Signed-off-by: Josef Bacik jo...@redhat.com
---
 fs/btrfs/extent-tree.c |   48 +---
 1 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index b4f67e8..cb4bbc9 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5360,9 +5360,7 @@ loop:
 * LOOP_NO_EMPTY_SIZE, set empty_size and empty_cluster to 0 and try
 *  again
 */
-   if (!ins-objectid  loop  LOOP_NO_EMPTY_SIZE 
-   (found_uncached_bg || empty_size || empty_cluster ||
-allowed_chunk_alloc)) {
+   if (!ins-objectid  loop  LOOP_NO_EMPTY_SIZE) {
index = 0;
if (loop == LOOP_FIND_IDEAL  found_uncached_bg) {
found_uncached_bg = false;
@@ -5402,32 +5400,36 @@ loop:
goto search;
}
 
-   if (loop  LOOP_CACHING_WAIT) {
-   loop++;
-   goto search;
-   }
+   loop++;
 
if (loop == LOOP_ALLOC_CHUNK) {
-   empty_size = 0;
-   empty_cluster = 0;
-   }
+  if (allowed_chunk_alloc) {
+   ret = do_chunk_alloc(trans, root, num_bytes +
+2 * 1024 * 1024, data,
+CHUNK_ALLOC_LIMITED);
+   allowed_chunk_alloc = 0;
+   if (ret == 1)
+   done_chunk_alloc = 1;
+   } else if (!done_chunk_alloc 
+  space_info-force_alloc ==
+  CHUNK_ALLOC_NO_FORCE) {
+   space_info-force_alloc = CHUNK_ALLOC_LIMITED;
+   }
 
-   if (allowed_chunk_alloc) {
-   ret = do_chunk_alloc(trans, root, num_bytes +
-2 * 1024 * 1024, data,
-CHUNK_ALLOC_LIMITED);
-   allowed_chunk_alloc = 0;
-   done_chunk_alloc = 1;
-   } else if (!done_chunk_alloc 
-  space_info-force_alloc == CHUNK_ALLOC_NO_FORCE) {
-   space_info-force_alloc = CHUNK_ALLOC_LIMITED;
+  /*
+   * We didn't allocate a chunk, go ahead and drop the
+   * empty size and loop again.
+   */
+  if (!done_chunk_alloc)
+  loop = LOOP_NO_EMPTY_SIZE;
}
 
-   if (loop  LOOP_NO_EMPTY_SIZE) {
-   loop++;
-   goto search;
+   if (loop == LOOP_NO_EMPTY_SIZE) {
+   empty_size = 0;
+   empty_cluster = 0;
}
-   ret = -ENOSPC;
+
+   goto search;
} else if (!ins-objectid) {
ret = -ENOSPC;
} else if (ins-objectid) {
-- 
1.7.2.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at fs/btrfs/inode.c:2271!

2011-05-27 Thread Josef Bacik
On 05/27/2011 03:23 PM, Marco Neubauer wrote:
 
 Am 25.05.2011 um 21:25 schrieb Josef Back:

 Hrm well that's doubly weird, the root should be right so it should be
 able to find the orphan item to delete it for the bad inode, and why the
 hell are we looping on that orphan item?  Remove the previous patch I
 gave you and apply this one instead and run with this please and provide
 me the log.  Sorry in advance, it will likely give you a giant log file
 again.  Thanks,
 
 The system didn't crash until now and all I got in the log was the following 
 output.
 This is the expected correct behavior?
 

Well I was expecting a panic message and such, but if you didn't get any
of that I'm stumped.  I guess this means you've gotten past the mount
section and are able to run for a while before it blows up?  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] Btrfs updates

2011-05-27 Thread Chester
One question. Will the autodefrag option be snapshot aware? Would
enabling this option double the amount of used space if there is a
snapshot present?

On Fri, May 27, 2011 at 2:55 PM, Chris Mason chris.ma...@oracle.com wrote:

 Hi everyone,

 I always thought that I'd be retired and with my flying car at the
 beach by the time 3.0 came out, but I've setup the for-linus branch of
 the btrfs-unstable tree for pulling:

 git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable.git 
 for-linus

 This pull request is probably the biggest I've sent, but it isn't
 a code dump into our shiny new .0 rc.  The bulk of the changes are
 three separate projects that have been going on for 6-8 months:

        A new btrfs inode allocation cache (Li Zefan)
        Delayed metadata insertion into the btree (Miao Xe)
        Device scrubbing (Arne Jansen)

 On top of that Dave Sterba kicked in a series of code cleanups and Josef
 Bacik did some really nice tuning.  The short log lists a few other
 cleanups and fixes too.

 I coded up a mount -o autodefrag that will detect random writes
 into existing files and kick off background defragging.  It is well
 suited to bdb or sqlite databases, but not virtualization images or big
 databases (yet).  Once I make sure it doesn't defrag files over and over
 again, I'll move this toward the default.

 David Sterba (17) commits (+310/-3253):
    btrfs: rename variables clashing with global function names (+53/-53)
    btrfs: use printk_ratelimited instead of printk_ratelimit (+10/-24)
    btrfs: drop unused parameter from btrfs_release_path (+160/-160)
    btrfs: drop unused parameter from extent_map_tree_init (+5/-7)
    btrfs: drop unused argument from extent_io_tree_init (+10/-12)
    btrfs: remove nested duplicate variable declarations (+0/-4)
    btrfs: drop gfp parameter from alloc_extent_buffer (+7/-9)
    btrfs: drop gfp parameter from find_extent_buffer (+4/-6)
    btrfs: drop gfp parameter from alloc_extent_map (+16/-17)
    btrfs: use unsigned type for single bit bitfield (+4/-4)
    btrfs: remove old unused commented out code (+1/-2071)
    btrfs: Document a mutex lock/unlock sequence (+12/-0)
    btrfs: make functions static when possible (+7/-7)
    btrfs: unify checking of IS_ERR and null (+17/-17)
    btrfs: remove unused function prototypes (+0/-43)
    btrfs: remove all unused functions (+1/-817)
    btrfs: fix dereference before check (+3/-2)

 Li Zefan (8) commits (+1449/-665):
    Btrfs: Make the code for reading/writing free space cache generic 
 (+204/-154)
    Btrfs: setup free ino caching in a more asynchronous way (+22/-6)
    Btrfs: Support reading/writing on disk free ino cache (+236/-19)
    Btrfs: Remove unused btrfs_block_group_free_space() (+0/-16)
    Btrfs: Make free space cache code generic (+271/-223)
    Btrfs: Cache free inode numbers in memory (+500/-53)
    Btrfs: Always use 64bit inode number (+208/-182)
    Btrfs: Use bitmap_set/clear() (+8/-12)

 Xiao Guangrong (7) commits (+134/-59):
    Btrfs: allocate extent state and check the result properly (+26/-8)
    Btrfs: using rcu lock in the reader side of devices list (+72/-36)
    Btrfs: fix the race between reading and updating devices (+9/-0)
    Btrfs: fix the race between remove dev and alloc chunk (+6/-0)
    Btrfs: fix bh leak on __btrfs_open_devices path (+1/-0)
    Btrfs: fix unsafe usage of merge_state (+14/-8)
    Btrfs: drop unnecessary device lock (+6/-7)

 Arne Jansen (6) commits (+1822/-361):
    btrfs scrub: don't coalesce pages that are logically discontiguous (+2/-1)
    btrfs: move btrfs_cmp_device_free_bytes to super.c (+26/-28)
    btrfs: quasi-round-robin for chunk allocation (+177/-305)
    btrfs: add readonly flag (+16/-12)
    btrfs: heed alloc_start (+1/-4)
    btrfs: scrub (+1600/-11)

 Tsutomu Itoh (5) commits (+43/-36):
    Btrfs: BUG_ON is deleted from the caller of btrfs_truncate_item  
 btrfs_extend_item (+2/-17)
    Btrfs: return error code to caller when btrfs_previous_item fails (+3/-2)
    Btrfs: return error code to caller when btrfs_del_item fails (+19/-11)
    Btrfs: return error to caller if read_one_inode() fails (+18/-6)
    Btrfs: check return value of btrfs_inc_extent_ref() (+1/-0)

 Chris Mason (4) commits (+689/-144):
    Btrfs: update the delayed inode code to use the btrfs_ino helper. (+7/-6)
    Btrfs: use the device_list_mutex during write_dev_supers (+2/-2)
    Btrfs: return -ENOMEM in clear_extent_bit (+2/-1)
    Btrfs: add mount -o auto_defrag (+678/-135)

 Sergei Trofimovich (3) commits (+7/-3):
    btrfs: don't spin in shrink_delalloc if there is nothing to free (+4/-0)
    btrfs: fix typo 'testeing' - 'testing' (+2/-2)
    btrfs: typo: 'btrfS' - 'btrfs' (+1/-1)

 Jan Schmidt (1) commits (+169/-2):
    btrfs: new ioctls for scrub

 liubo (1) commits (+3/-0):
    Btrfs: do not flush csum items of unchanged file data during treelog

 Miao Xie (1) commits (+2074/-91):
    btrfs: implement delayed inode items operation

 Julia Lawall 

Re: [PATCH] Btrfs: make lzo the default compression scheme

2011-05-27 Thread C Anthony Risinger
On Fri, May 27, 2011 at 2:41 AM, Fajar A. Nugraha l...@fajar.net wrote:
 On Fri, May 27, 2011 at 2:32 PM, Sander san...@humilis.net wrote:
 Li Zefan wrote (ao):
 As the lzo compression feature has been established for quite
 a while, we are now ready to replace zlib with lzo as the default
 compression scheme.

 Please be aware that grub2 currently can't load files from a btrfs with
 lzo compression (on debian sid/experimental at least).

 Just found out the hard way after a kernel upgrade on a system with no
 separate /boot partition :-)

 Found this: https://bugs.archlinux.org/task/23901

 IIRC what matters is compression actually used by the files.
 If /boot/grub/* and kernel/initrd is not compressed, or compressed
 with zlib, then grub2 can read it just fine, even when the filesystem
 is usually mounted with -o compress=lzo (I'm using Ubuntu Natty).

 I think the move to use lzo compression by default is a good thing, since:
 - it's superior performance-wise to zlib
 - btrfs is not really recommended (yet) for production uses, so it's
 valid enough to assume users brave enough to use btrfs will know the
 necessary workarounds (like having separate /boot, or temporary
 remount with -o compress=zlib when upgrading kernel)
 - even if by accident you ended with unbootable system due to lzo, you
 can fix it using livecd and btrfs filesystem defragment to force
 the needed files to be uncompressed/compressed with zlib.

i'd agree with the LZO default and everything else you've said, but i
was bitten by this too :-)

in my case however, i was using syslinux, and even though /boot was
not compressed syslinux still failed with something like:

Found compressed data! cannot continue!

... or similar, i don't recall exactly.  funny thing is, if i typed
out the full kernel boot line (which was super annoying for about a
week until i updated to a separate /boot) the system would start up
just fine ... so i don't know if syslinux was checking the incompat
bit or what, but it failed even though the files themselves were
technically ok.

something for others to keep in mind at the least.

C Anthony
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html