Re: btrfs: filenames collide with snapshot/subvolume names
Γιώργος (Giorgos?) reports: > Namely, being inside a snapshot directory, I can't create a file/directory > with the name of the snapshot directory. > > For example, inside /mnt/aSnap, I can't create a file named 'aSnap', so I'm > filling this bug report. It seems that the snapshot directory is partially created before the snapshot is taken, so that the snapshot directory half-exists (can be looked up, but doesn't appear in listings) inside the snapshot itself. This doesn't seem to be the recommended way to organise subvolumes, but it seems like it should at least result in a coherent filesystem within each subvolume. Ben. > Below follows full reproduction of this behavior: > > aris tmp # dd if=/dev/zero of=FILE bs=4k seek=`echo 5*1024*1024 | bc` count=1 > 1+0 records in > 1+0 records out > 4096 bytes (4.1 kB) copied, 1.8695e-05 s, 219 MB/s > aris tmp # losetup /dev/loop0 FILE > aris tmp # losetup -a > /dev/loop0: [fe01]:263872 (/tmp/FILE) > aris tmp # mkfs.btrfs -Ltest /dev/loop0 > > WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL > WARNING! - see http://btrfs.wiki.kernel.org before using > > fs created label test on /dev/loop0 > nodesize 4096 leafsize 4096 sectorsize 4096 size 20.00GB > Btrfs Btrfs v0.19 > aris tmp # mount /dev/loop0 /mnt/ > aris tmp # cd /mnt > aris mnt # ls -la > total 8 > dr-xr-xr-x 1 root root0 Mar 8 12:07 . > drwxr-xr-x 24 root root 4096 Mar 8 11:41 .. > aris mnt # mkdir dir1 > aris mnt # mkdir dir2 > aris mnt # mkdir dir3 > aris mnt # l > total 0 > drwxr-xr-x 1 root root 0 Mar 8 12:08 dir1 > drwxr-xr-x 1 root root 0 Mar 8 12:08 dir2 > drwxr-xr-x 1 root root 0 Mar 8 12:08 dir3 > aris mnt # btrfs subvolume snapshot /mnt/ /mnt/aSnap > Create a snapshot of '/mnt/' in '/mnt/aSnap' > aris mnt # cd /mnt/aSnap/ > aris aSnap # ls -la > total 8 > dr-xr-xr-x 1 root root 34 Mar 8 12:08 . > dr-xr-xr-x 1 root root 34 Mar 8 12:08 .. > drwxr-xr-x 1 root root 0 Mar 8 12:08 dir1 > drwxr-xr-x 1 root root 0 Mar 8 12:08 dir2 > drwxr-xr-x 1 root root 0 Mar 8 12:08 dir3 > aris aSnap # date > aSnap > bash: aSnap: Is a directory -- Ben Hutchings Computers are not intelligent. They only think they are. signature.asc Description: This is a digitally signed message part
Re: [PATCH 2/2] Btrfs: implement ->show_devname V2
On tue, 12 Jun 2012 15:50:42 -0400, Josef Bacik wrote: > Because btrfs can remove the device that was mounted we need to have a > ->show_devname so that in this case we can print out some other device in > the file system to /proc/mount. So if there are multiple devices in a btrfs > file system we will just print the device with the lowest devid that we can > find. This will make everything consistent and deal with device removal > properly. The drawback is if you mount with a device that is higher than > the lowest devicd it won't show up as the mounted device in /proc/mounts, > but this is a small price to pay. This was inspired by Miao Xie's patch. > Thanks, > > Signed-off-by: Josef Bacik Reviewed-by: Miao Xie > --- > V1->V2: Dropped the mounted tracking stuff since it doesn't work right if you > mount the same thing twice > fs/btrfs/super.c | 33 + > 1 files changed, 33 insertions(+), 0 deletions(-) > > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c > index 85cef50..0874dba 100644 > --- a/fs/btrfs/super.c > +++ b/fs/btrfs/super.c > @@ -54,6 +54,7 @@ > #include "version.h" > #include "export.h" > #include "compression.h" > +#include "rcu-string.h" > > #define CREATE_TRACE_POINTS > #include > @@ -1472,12 +1473,44 @@ static int btrfs_unfreeze(struct super_block *sb) > return 0; > } > > +static int btrfs_show_devname(struct seq_file *m, struct dentry *root) > +{ > + struct btrfs_fs_info *fs_info = btrfs_sb(root->d_sb); > + struct btrfs_fs_devices *cur_devices; > + struct btrfs_device *dev, *first_dev = NULL; > + struct list_head *head; > + struct rcu_string *name; > + > + mutex_lock(&fs_info->fs_devices->device_list_mutex); > + cur_devices = fs_info->fs_devices; > + while (cur_devices) { > + head = &cur_devices->devices; > + list_for_each_entry(dev, head, dev_list) { > + if (!first_dev || dev->devid < first_dev->devid) > + first_dev = dev; > + } > + cur_devices = cur_devices->seed; > + } > + > + if (first_dev) { > + rcu_read_lock(); > + name = rcu_dereference(first_dev->name); > + seq_escape(m, name->str, " \t\n\\"); > + rcu_read_unlock(); > + } else { > + WARN_ON(1); > + } > + mutex_unlock(&fs_info->fs_devices->device_list_mutex); > + return 0; > +} > + > static const struct super_operations btrfs_super_ops = { > .drop_inode = btrfs_drop_inode, > .evict_inode= btrfs_evict_inode, > .put_super = btrfs_put_super, > .sync_fs= btrfs_sync_fs, > .show_options = btrfs_show_options, > + .show_devname = btrfs_show_devname, > .write_inode= btrfs_write_inode, > .alloc_inode= btrfs_alloc_inode, > .destroy_inode = btrfs_destroy_inode, -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] E2fsprogs: add missing usage for No_COW
On Wed, Jun 13, 2012 at 03:47:13PM +0800, Liu Bo wrote: > Add the missing usage for No_COW since we've supported No_COW flag. > > Signed-off-by: Liu Bo Applied, thanks. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] Btrfs: use radix tree for checksum
On 06/14/2012 12:07 AM, Zach Brown wrote: > >> int set_state_private(struct extent_io_tree *tree, u64 start, u64 >> private) >> { > [...] >> +ret = radix_tree_insert(&tree->csum, (unsigned long)start, >> + (void *)((unsigned long)private<< 1)); > > Will this fail for 64bit files on 32bit hosts? In theory it will fail, but crc32c return u32, so private will be originally u32, and it'd be ok on 32bit hosts. > >> +BUG_ON(ret); > > I wonder if we can patch BUG_ON() to break the build if its only > argument is "ret". > why? thanks, liubo > - z > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM on top of BTRFS
Ernst Sjöstrand gmail.com> writes: > > Hi, > > you can't beat the benchmarks that Serge Hallyn did, really thorough! > > http://s3hh.wordpress.com/2012/05/02/first-round-of-kvm-performance-tests/ They do seem very thorough. Unfortunately, they are kvm on top of ext4 and he was mainly checking caching parameters and storage formats. I am looking at BTRFS options and comparing them against qcow2 on ext4. Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM on top of BTRFS
Hi, you can't beat the benchmarks that Serge Hallyn did, really thorough! http://s3hh.wordpress.com/2012/05/02/first-round-of-kvm-performance-tests/ Regards //Ernst 2012/6/12 steamraven : >> >> Seems a little unfair on btrfs to just to look at absolutes in this context. >> Prior reports said that the fs ground to a halt, >> it isn't doing that by any stretch. > > > I agree. What I am mostly looking for is the best setup > for using KVM snapshots: > > KVM qcow2 on top of something like ext4 or > raw on top of btrfs > > >> >> I haven't let any of these installs complete and used it as intended. >> So that's what I intend to do next; after all one doesn't install every day. >> > > I am going to try to benchmark a couple variations and flags > qcow2 on ext4 (noatime) > raw on btrfs (defaults) > raw on btrfs (noatime,space_cache) > raw on btrfs (noatime,nospace_cache) > raw on btrfs (noatime,nodatacow) > > Any other options that might be good to try? > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Leaving Oracle
On Sun, Jun 10, 2012 at 12:01:28PM -0600, David Pottage wrote: > On 07/06/12 02:04, Chris Mason wrote: > > Hello everyone, > > > > Oracle has been a fantastic place to work, and I really appreciate their > > support for my projects. But, I've decided to take a new position at > > Fusion-io. I will start the new job on Monday, June 11. > Congratulations. > > Fusion-io really believes in open source, and I'm excited to help > > them shape the future of high performance storage. > > Are you sure about that? > > I installed one of their IO Drive SSD cards in one of my employer's > servers, and while the driver source code was supplied, the licence was > definitely not open source. (See http://www.fusionio.com/legal/eula/) > > 4.1 General Restrictions. [...] you will not, and will not > > permit or authorize third parties to: (a) reproduce, modify, > > translate, enhance, decompile, disassemble, reverse engineer, or > > create derivative works of the Software; > Hi everyone, Circling back around to this, now that I'm up and running again. Most of your storage is hidden behind some kind of closed source firmware. With Fusion-io, you get a closed driver, and that has its own long standing debates that won't get resolved here. Fusion-io has a strong track record of contributing to Linux, and I'm sure we'll keep hiring more developers that are well known in the community. Of course, Btrfs is a GPL project, and all the future work in Btrfs is going to stay GPL. The great thing about Fusion-io is they are very actively trying to engage higher parts of the storage stack to take advantage of the hardware. Since these features need to be in upstream filesystems, we'll have to hammer out nice generic apis to take advantage of them. (This is my favorite kind of we that really means Jens Axboe) Anyone who wants to support a backend for the apis is welcome to do so, and I'm sure they will change over time as we all figure out what works best. Long story short, yes, I am sure that Fusion-io cares about open source. Oracle too, since a few people misread that line as a dig at Oracle. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] E2fsprogs: add missing usage for No_COW
On Wed, Jun 13, 2012 at 04:56:42PM +0800, Liu Bo wrote: > Add the missing usage for No_COW since we've supported No_COW flag. > > Signed-off-by: Liu Bo Applied, although I changed the commit desciption to read: chattr: add the -C option to the usage message - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Moving top level to a subvolume
Fajar A. Nugraha posted on Wed, 13 Jun 2012 16:08:40 +0700 as excerpted: >> My system's old and has a bit of a problem with overheating in the >> Phoenix summer, so has been suffering SATA resets > >> it's exactly this sort of corner-case that filesystems need to be able >> to deal with > > IIRC XFS had corruption problems when used on top of LVM (or other block > device that doesn't support barriers correctly), while using ext2/3/4 on > the same block device will be "fine". Yet XFS doesn't have the mark of > "unstable, highly experimental, do not use". People simply use the right > (for them) fs for the right job. It /does/ have a reputation of "don't use for a normal home system or other location without a suitably dependable UPS on fully stable hardware", however. (From what I've read however, much like reiserfs has been for me here after data=ordered, xfs is vastly improved now, and said reputation likely no longer applies.) If btrfs is to replace ext3/4, then that sort of reputation isn't what its devs will be shooting for. We have at least the two filesystems (reiserfs and xfs) with serious reputation problems from their earlier life, and my big concern is that if enough people fail to consider btrfs' current development state and end up with data loss as a result, btrfs will end up with a very similar reputation, which will similarly live many years longer than the reality that created it. But the other two weren't shooting for the ext* mantle, and btrfs is. If its reputation is damaged to that extent and it becomes the assumed Linux default as ext3/4 is now, that will be the Linux reputation. > My point is yes, btrfs is new. And it's being developed at much faster > rate than any other more-mature fs out there. And there are known cases > of data loss on certain configuration of corner cases/"buggy" hardware > and/or old version of kernel. But when used in the correct environment, > btrfs can be a good choice, even for critical data. Of course, critical data is backed up, or by definition you don't REALLY consider it so critical after all. (There was a time when drives were small enough and expensive enough, as were the alternatives, that wasn't the case. That time is long gone, for first and second world usage, anyway. Even a home user with a single drive can split it into multiple partitions, with backup data on separate partitions, at least, with the real critical data on a thumb-drive too.) But while people can and do unfortunately go without backups and are often lucky, doing so on btrfs at this stage in its development is "doubly insane", to channel Linus (actually being nice that day =:^) describing a recent commit, in his revert. But there's obviously "doubly insane" people out there! =:^\ > Of course IF the data were REALLY critical, and I REALLY need btrfs' > features, and it were on an enterprise environment, I would've bought > support from oracle linux (or SLES 12, when it's out, or whatever > enterprise distro supporting btrfs which sells support contract) so I > can have someone to turn to in case of problems, and (in some cases) > transfer the risk/blame :D That's a good point. Of course, as many people find out, such "support" often /does/ really come down to "someone else to blame". Yes, they'll help with recovery if it comes to it, but if your $100K+ an hour business is down in the mean time... and you didn't have those backups and at /least/ "cold" failover... they'll be finding someone to blame as well! -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Moving top level to a subvolume
On 06/13/2012 09:21 AM, Arne Jansen wrote: > On 13.06.2012 09:04, C Anthony Risinger wrote: >> On Fri, Jun 8, 2012 at 2:40 PM, Arne Jansen wrote: >>> On 06/08/2012 09:24 PM, Matthew Hawn wrote: I just converted my root filesystem to btrfs with btrfs-convert. However, since I am running Ubuntu, I would like to have the same subvolume structure as a default install,. How do I move the top-level subvolume (where all my files currently are) to another subvolume? >>> >>> Just snapshot the root subvol and continue working in the snapshot. >> >> ... yeah but that solution totally sucks when you: >> >> a) have a lot of data >> b) need to do this via script >> c) ??? >> >> ... because in a), data will *copied* the slow way, and in b) you >> leave a bunch of junk laying around in the old root that will rot >> unless you `rm -rf` it ... and idk about you, but issuing what is very >> near to that command on someone else's machine -- via script -- makes >> me REALLY uneasy ;-) > > well, don't put data in the top level in the first place. Yes, you have > to remove the content of the subvol / by rm -rf, but I don't really see > the problem with it. It is slow. You have to change a lot of metadata (each shared metadata block have to be unshared, and then one copy will be deleted ). > What I don't understand is why you think data will be copied. > >> >> i have asked this exact question at least 4 times specifically, and >> referenced it probably 8-10, in the last 3 years or more. i needed it >> then. i still need it now. but since i never got an answer up/down >> or around, i gave up and told people to `rm -rf`themselves ... >> >> http://markmail.org/message/7hj5ioqrztkeerqv >> >> ... that's from May of 2010, but i don't think it's the first. >> >> so, would it possible to implement this, or could someone kindly (and >> briefly!) explain why it cannot be done? > > The default subvol ('/') has the special number 5 and is expected to > always be around. All other subvols get numbers starting with 256. > Creating a new 5 and internally renumbering the old 5 isn't easy, because > each tree block has an owner recorded in it. Also, all backreferences > have the root number in them. If you have to touch each tree block, you > can as well choose the snapshot/rm -rf approach. I don't know very well the internal of btrfs. Do you think that It is possible to move/swap the root subvolume ? > >> [...] > Or you could hack mkfs.btrfs to always create an additional subvol. Which can be the default one: so nobody should complain. I > Even making / readonly except for creating mountpoint could be possible. > Just some random ideas... > > -Arne > >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > . > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] Btrfs: use radix tree for checksum
int set_state_private(struct extent_io_tree *tree, u64 start, u64 private) { [...] + ret = radix_tree_insert(&tree->csum, (unsigned long)start, + (void *)((unsigned long)private<< 1)); Will this fail for 64bit files on 32bit hosts? + BUG_ON(ret); I wonder if we can patch BUG_ON() to break the build if its only argument is "ret". - z -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ceph-on-btrfs inline-cow regression fix for 3.4.3
On Tue, Jun 12, 2012 at 09:46:26PM -0600, Alexandre Oliva wrote: > Hi, Greg, > > There's a btrfs regression in 3.4 that's causing a lot of grief to > ceph-on-btrfs users like myself. This small and nice patch cures it. > It's in Linus' master already. I've been running it on top of 3.4.2, > and it would be very convenient for me if this could be in 3.4.3. Ack, this can definitely to go 3.4-stable. Thanks Alexandre. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: use rcu to protect device->name V3
Al pointed out that we can just toss out the old name on a device and add a new one arbitrarily, so anybody who uses device->name in printk could possibly use free'd memory. Instead of adding locking around all of this he suggested doing it with RCU, so I've introduced a struct rcu_string that does just that and have gone through and protected all accesses to device->name that aren't under the uuid_mutex with rcu_read_lock(). This protects us and I will use it for dealing with removing the device that we used to mount the file system in a later patch. Thanks, Signed-off-by: Josef Bacik --- V2->V3: -fixed rcu_string_strdup to get the null character -fixed __VA_ARGS__ usage -undid 80 char line wrapping -moved some rcu_strings into blocks fs/btrfs/check-integrity.c | 16 --- fs/btrfs/disk-io.c | 10 +++-- fs/btrfs/extent_io.c |7 ++- fs/btrfs/ioctl.c | 13 +- fs/btrfs/rcu-string.h | 56 ++ fs/btrfs/scrub.c | 30 +-- fs/btrfs/volumes.c | 92 +++ fs/btrfs/volumes.h |2 +- 8 files changed, 162 insertions(+), 64 deletions(-) create mode 100644 fs/btrfs/rcu-string.h diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c index 9cebb1f..da6e936 100644 --- a/fs/btrfs/check-integrity.c +++ b/fs/btrfs/check-integrity.c @@ -93,6 +93,7 @@ #include "print-tree.h" #include "locking.h" #include "check-integrity.h" +#include "rcu-string.h" #define BTRFSIC_BLOCK_HASHTABLE_SIZE 0x1 #define BTRFSIC_BLOCK_LINK_HASHTABLE_SIZE 0x1 @@ -843,13 +844,14 @@ static int btrfsic_process_superblock_dev_mirror( superblock_tmp->never_written = 0; superblock_tmp->mirror_num = 1 + superblock_mirror_num; if (state->print_mask & BTRFSIC_PRINT_MASK_SUPERBLOCK_WRITE) - printk(KERN_INFO "New initial S-block (bdev %p, %s)" - " @%llu (%s/%llu/%d)\n", - superblock_bdev, device->name, - (unsigned long long)dev_bytenr, - dev_state->name, - (unsigned long long)dev_bytenr, - superblock_mirror_num); + printk_in_rcu(KERN_INFO "New initial S-block (bdev %p, %s)" +" @%llu (%s/%llu/%d)\n", +superblock_bdev, +rcu_str_deref(device->name), +(unsigned long long)dev_bytenr, +dev_state->name, +(unsigned long long)dev_bytenr, +superblock_mirror_num); list_add(&superblock_tmp->all_blocks_node, &state->all_blocks_list); btrfsic_block_hashtable_add(superblock_tmp, diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index e39a3b9..43bd7b9 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -44,6 +44,7 @@ #include "free-space-cache.h" #include "inode-map.h" #include "check-integrity.h" +#include "rcu-string.h" static struct extent_io_ops btree_extent_io_ops; static void end_workqueue_fn(struct btrfs_work *work); @@ -2575,8 +2576,9 @@ static void btrfs_end_buffer_write_sync(struct buffer_head *bh, int uptodate) struct btrfs_device *device = (struct btrfs_device *) bh->b_private; - printk_ratelimited(KERN_WARNING "lost page write due to " - "I/O error on %s\n", device->name); + printk_ratelimited_in_rcu(KERN_WARNING "lost page write due to " + "I/O error on %s\n", + rcu_str_deref(device->name)); /* note, we dont' set_buffer_write_io_error because we have * our own ways of dealing with the IO errors */ @@ -2749,8 +2751,8 @@ static int write_dev_flush(struct btrfs_device *device, int wait) wait_for_completion(&device->flush_wait); if (bio_flagged(bio, BIO_EOPNOTSUPP)) { - printk("btrfs: disabling barriers on dev %s\n", - device->name); + printk_in_rcu("btrfs: disabling barriers on dev %s\n", + rcu_str_deref(device->name)); device->nobarriers = 1; } if (!bio_flagged(bio, BIO_UPTODATE)) { diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 2c8f7b2..aaa12c1 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -20,6 +20,7 @@ #include "volumes.h" #include "check-integrity.h" #include "locking.h" +#include "rcu-string.h" static struct kmem_cache *extent_st
Re: Computing size of snapshots approximatly
Hi Hugo, hi all, On 13.06.2012 15:27, Hugo Mills wrote: On Wed, Jun 13, 2012 at 02:15:33PM +0200, Jan-Hendrik Palic wrote: Hi, we using on a server several lvm volumes with btrfs. We want to use nightly build snapshots for some days as an alternative to backups. Now I want to get the size of the snapshots in detail. There are basically two figures you can get for each snapshot. These values may differ wildly. Which one do you want? (A) The first, larger, value is the total computed size of the files in the subvolume. This is what du returns. (B) The second, smaller, value is the amount of space that would be freed by deleting the subvolume. (Alternatively, this is the amount of data in the subvolume which is not shared with some other subvolume). It is currently a difficult process to work out this value in general, but the qgroups patch set will track this information automatically, and expose an API that will allow you to retrieve it. The qgroups patches aren't complete yet. Sorry, that I forgot to mention that. I want the size which I will get, if I delete a snapshot. The next assumption I forgot, sorry, was, that the snapshot are not changing. The user only get readonly access to the snapshots. [...] There are three operations on a filesystem, I think, 1. copy a file on the filesystem 2. change a file on the filesystem 3. delete a file on the filesystem Am I right to assume, that operation 1 and 2 are not change much the size of a snapshot and the delete operation let increase the size of a snapshot in the size of the deleted files? It depends on which measure of the two above you're trying to use, and whether the subvolume (and file) you're modifying still has extents shared with some other subvolume. Sure, and honestly, this is the point, where the complexity is exploding for me. ,-) 1. Copying a file (without --reflink) will increase both the (A) and the (B) size of the snapshot. Copying a file with --reflink will increase (A) and leave (B) much the same. Yep. 2. Changing a file will, obviously, cause (A) to change by the difference between the old file and the new. If that file shares no extents with anything else, then (B) will also change by that amount. Otherwise, if it shares extents with anything else (another subvolume, or a reflink copy), then (B) will increase by the amount of data modified. Yep. 3. Deleting a file will reduce (A) by the size of the file. (B) will reduce by the size of non-shared extents owned by that file. Yep. I think, I got the right thought. Thanks for the explanation. Note that btrfs sub find-new will not allow you to track file deletions. Yep, I got this to. But you can get them not directly by a diff. You have a subvolume with a file_A on it. Taking a snapshot snap_A of this subvolume let show the existence of that file in the btrfs sub find-new output. Now delete the fila_A on this subvolume and take a new snapshot, call it snap_B. The btrfs sub find-new output doesn't show it anymore, right. So, a diff of the both outputs, from snap_A to snap_B gives you the deleted file. It is a cruel way, but I think, that it is working. If it is so, it would be enough for me to get the deletions of files between two snapshots and their size. But is there another way to get these informations beside btrfs subvolume find-new? Perhaps it makes sense to use ioctl for it? What about the send/receive feature, which is upcoming? Are there any hints? Wait for qgroups to land, because that actually does it the right way, and will avoid you having to track all kinds of awkward (and hard-to-find) corner cases. Thanks for the hint, I will have a look for that. Best regards, Jan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Btrfs: use rcu to protect device->name V2
On Wed, Jun 13, 2012 at 03:49:07PM +0200, Stefan Behrens wrote: > On Wed, 13 Jun 2012 09:14:27 -0400, Josef Bacik wrote: > > On Wed, Jun 13, 2012 at 12:35:26AM +0200, David Sterba wrote: > >> On Tue, Jun 12, 2012 at 03:50:41PM -0400, Josef Bacik wrote: > >>> @@ -4694,8 +4716,11 @@ int btrfs_init_dev_stats(struct btrfs_fs_info > >>> *fs_info) > >>> key.offset = device->devid; > >>> ret = btrfs_search_slot(NULL, dev_root, &key, path, 0, 0); > >>> if (ret) { > >>> - printk(KERN_WARNING "btrfs: no dev_stats entry found > >>> for device %s (devid %llu) (OK on first mount after mkfs)\n", > >>> -device->name, (unsigned long long)device->devid); > >>> + printk_in_rcu(KERN_WARNING "btrfs: no dev_stats entry " > >>> + "found for device %s (devid %llu) (OK on" > >>> + " first mount after mkfs)\n", > >> > >> breaking printk strings hurts when grepping for a message > >> > >>> + rcu_str_deref(device->name), > >>> + (unsigned long long)device->devid); > >>> __btrfs_reset_dev_stats(device); > >>> device->dev_stats_valid = 1; > >>> btrfs_release_path(path); > >>> @@ -4747,8 +4772,9 @@ static int update_dev_stat_item(struct > >>> btrfs_trans_handle *trans, > >>> BUG_ON(!path); > >>> ret = btrfs_search_slot(trans, dev_root, &key, path, -1, 1); > >>> if (ret < 0) { > >>> - printk(KERN_WARNING "btrfs: error %d while searching for > >>> dev_stats item for device %s!\n", > >>> -ret, device->name); > >>> + printk_in_rcu(KERN_WARNING "btrfs: error %d while searching " > >>> + "for dev_stats item for device %s!\n", ret, > >> > >> and here as well > >> > >>> + rcu_str_deref(device->name)); > >>> goto out; > >>> } > >>> > >>> @@ -4757,8 +4783,9 @@ static int update_dev_stat_item(struct > >>> btrfs_trans_handle *trans, > >>> /* need to delete old one and insert a new one */ > >>> ret = btrfs_del_item(trans, dev_root, path); > >>> if (ret != 0) { > >>> - printk(KERN_WARNING "btrfs: delete too small dev_stats > >>> item for device %s failed %d!\n", > >>> -device->name, ret); > >>> + printk_in_rcu(KERN_WARNING "btrfs: delete too small " > >>> + "dev_stats item for device %s failed > >>> %d!\n", > >> > >> here > >> > >>> + rcu_str_deref(device->name), ret); > >>> goto out; > >>> } > >>> ret = 1; > >>> @@ -4770,8 +4797,9 @@ static int update_dev_stat_item(struct > >>> btrfs_trans_handle *trans, > >>> ret = btrfs_insert_empty_item(trans, dev_root, path, > >>> &key, sizeof(*ptr)); > >>> if (ret < 0) { > >>> - printk(KERN_WARNING "btrfs: insert dev_stats item for > >>> device %s failed %d!\n", > >>> -device->name, ret); > >>> + printk_in_rcu(KERN_WARNING "btrfs: insert dev_stats " > >>> + "item for device %s failed %d!\n", > >> > >> here > >> > >>> + rcu_str_deref(device->name), ret); > >>> goto out; > >>> } > >>> } > >> > >> mostly minor things, but please fix them. > >> > > > > I'm breaking them for the 80 char limit, it happens for all long messages, > > we're > > all used to it. I'll fix up the other things. Thanks, > > > > Josef > > The last sentence of chapter 2 of Documentation/CodingStyle is quite > unambiguous. Here is the full quote of that chapter: > > Chapter 2: Breaking long lines and strings > > Coding style is all about readability and maintainability using commonly > available tools. > > The limit on the length of lines is 80 columns and this is a strongly > preferred limit. > > Statements longer than 80 columns will be broken into sensible chunks, > unless > exceeding 80 columns significantly increases readability and does not hide > information. Descendants are always substantially shorter than the > parent and > are placed substantially to the right. The same applies to function headers > with a long argument list. However, never break user-visible strings such as > printk messages, because that breaks the ability to grep for them. Ah never seen that part of it, I will leave them alone then. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Btrfs: use rcu to protect device->name V2
On Wed, 13 Jun 2012 09:14:27 -0400, Josef Bacik wrote: > On Wed, Jun 13, 2012 at 12:35:26AM +0200, David Sterba wrote: >> On Tue, Jun 12, 2012 at 03:50:41PM -0400, Josef Bacik wrote: >>> @@ -4694,8 +4716,11 @@ int btrfs_init_dev_stats(struct btrfs_fs_info >>> *fs_info) >>> key.offset = device->devid; >>> ret = btrfs_search_slot(NULL, dev_root, &key, path, 0, 0); >>> if (ret) { >>> - printk(KERN_WARNING "btrfs: no dev_stats entry found >>> for device %s (devid %llu) (OK on first mount after mkfs)\n", >>> - device->name, (unsigned long long)device->devid); >>> + printk_in_rcu(KERN_WARNING "btrfs: no dev_stats entry " >>> + "found for device %s (devid %llu) (OK on" >>> + " first mount after mkfs)\n", >> >> breaking printk strings hurts when grepping for a message >> >>> + rcu_str_deref(device->name), >>> + (unsigned long long)device->devid); >>> __btrfs_reset_dev_stats(device); >>> device->dev_stats_valid = 1; >>> btrfs_release_path(path); >>> @@ -4747,8 +4772,9 @@ static int update_dev_stat_item(struct >>> btrfs_trans_handle *trans, >>> BUG_ON(!path); >>> ret = btrfs_search_slot(trans, dev_root, &key, path, -1, 1); >>> if (ret < 0) { >>> - printk(KERN_WARNING "btrfs: error %d while searching for >>> dev_stats item for device %s!\n", >>> - ret, device->name); >>> + printk_in_rcu(KERN_WARNING "btrfs: error %d while searching " >>> + "for dev_stats item for device %s!\n", ret, >> >> and here as well >> >>> + rcu_str_deref(device->name)); >>> goto out; >>> } >>> >>> @@ -4757,8 +4783,9 @@ static int update_dev_stat_item(struct >>> btrfs_trans_handle *trans, >>> /* need to delete old one and insert a new one */ >>> ret = btrfs_del_item(trans, dev_root, path); >>> if (ret != 0) { >>> - printk(KERN_WARNING "btrfs: delete too small dev_stats >>> item for device %s failed %d!\n", >>> - device->name, ret); >>> + printk_in_rcu(KERN_WARNING "btrfs: delete too small " >>> + "dev_stats item for device %s failed >>> %d!\n", >> >> here >> >>> + rcu_str_deref(device->name), ret); >>> goto out; >>> } >>> ret = 1; >>> @@ -4770,8 +4797,9 @@ static int update_dev_stat_item(struct >>> btrfs_trans_handle *trans, >>> ret = btrfs_insert_empty_item(trans, dev_root, path, >>> &key, sizeof(*ptr)); >>> if (ret < 0) { >>> - printk(KERN_WARNING "btrfs: insert dev_stats item for >>> device %s failed %d!\n", >>> - device->name, ret); >>> + printk_in_rcu(KERN_WARNING "btrfs: insert dev_stats " >>> + "item for device %s failed %d!\n", >> >> here >> >>> + rcu_str_deref(device->name), ret); >>> goto out; >>> } >>> } >> >> mostly minor things, but please fix them. >> > > I'm breaking them for the 80 char limit, it happens for all long messages, > we're > all used to it. I'll fix up the other things. Thanks, > > Josef The last sentence of chapter 2 of Documentation/CodingStyle is quite unambiguous. Here is the full quote of that chapter: Chapter 2: Breaking long lines and strings Coding style is all about readability and maintainability using commonly available tools. The limit on the length of lines is 80 columns and this is a strongly preferred limit. Statements longer than 80 columns will be broken into sensible chunks, unless exceeding 80 columns significantly increases readability and does not hide information. Descendants are always substantially shorter than the parent and are placed substantially to the right. The same applies to function headers with a long argument list. However, never break user-visible strings such as printk messages, because that breaks the ability to grep for them. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Computing size of snapshots approximatly
On Wed, Jun 13, 2012 at 02:15:33PM +0200, Jan-Hendrik Palic wrote: > Hi, > > we using on a server several lvm volumes with btrfs. We want to use > nightly build snapshots for some days as an alternative to backups. > > Now I want to get the size of the snapshots in detail. There are basically two figures you can get for each snapshot. These values may differ wildly. Which one do you want? (A) The first, larger, value is the total computed size of the files in the subvolume. This is what du returns. (B) The second, smaller, value is the amount of space that would be freed by deleting the subvolume. (Alternatively, this is the amount of data in the subvolume which is not shared with some other subvolume). It is currently a difficult process to work out this value in general, but the qgroups patch set will track this information automatically, and expose an API that will allow you to retrieve it. The qgroups patches aren't complete yet. > Therefore I > played with > > btrfs subvolume find-new $snapshot $gen-id. > And I know, that this is quite complicated and not implemented. > Therefore I try to go my own way: > > Now assume there are two snapshots of one subvolume, snap1 and > snap2. Further get the find-new informations of these snapshots with > $gen-id=1 and save them into different files. A diff of these files > shows the changes between snap1 and snap2, right? > > Ok. > > There are three operations on a filesystem, I think, > > 1. copy a file on the filesystem > 2. change a file on the filesystem > 3. delete a file on the filesystem > > Am I right to assume, that operation 1 and 2 are not change much the > size of a snapshot and the delete operation let increase the size of > a snapshot in the size of the deleted files? It depends on which measure of the two above you're trying to use, and whether the subvolume (and file) you're modifying still has extents shared with some other subvolume. 1. Copying a file (without --reflink) will increase both the (A) and the (B) size of the snapshot. Copying a file with --reflink will increase (A) and leave (B) much the same. 2. Changing a file will, obviously, cause (A) to change by the difference between the old file and the new. If that file shares no extents with anything else, then (B) will also change by that amount. Otherwise, if it shares extents with anything else (another subvolume, or a reflink copy), then (B) will increase by the amount of data modified. 3. Deleting a file will reduce (A) by the size of the file. (B) will reduce by the size of non-shared extents owned by that file. Note that btrfs sub find-new will not allow you to track file deletions. > If it is so, it would be enough for me to get the deletions of files > between two snapshots and their size. But is there another way to > get these informations beside btrfs subvolume find-new? Perhaps it > makes sense to use ioctl for it? What about the send/receive > feature, which is upcoming? > > Are there any hints? Wait for qgroups to land, because that actually does it the right way, and will avoid you having to track all kinds of awkward (and hard-to-find) corner cases. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Summoning his Cosmic Powers, and glowing slightly --- from his toes... signature.asc Description: Digital signature
Re: [PATCH 1/2] Btrfs: use rcu to protect device->name V2
On Wed, Jun 13, 2012 at 12:35:26AM +0200, David Sterba wrote: > On Tue, Jun 12, 2012 at 03:50:41PM -0400, Josef Bacik wrote: > > +++ b/fs/btrfs/check-integrity.c > > @@ -93,6 +93,7 @@ > > #include "print-tree.h" > > #include "locking.h" > > #include "check-integrity.h" > > +#include "rcu-string.h" > > > > #define BTRFSIC_BLOCK_HASHTABLE_SIZE 0x1 > > #define BTRFSIC_BLOCK_LINK_HASHTABLE_SIZE 0x1 > > @@ -843,13 +844,14 @@ static int btrfsic_process_superblock_dev_mirror( > > superblock_tmp->never_written = 0; > > superblock_tmp->mirror_num = 1 + superblock_mirror_num; > > if (state->print_mask & BTRFSIC_PRINT_MASK_SUPERBLOCK_WRITE) > > - printk(KERN_INFO "New initial S-block (bdev %p, %s)" > > - " @%llu (%s/%llu/%d)\n", > > - superblock_bdev, device->name, > > - (unsigned long long)dev_bytenr, > > - dev_state->name, > > - (unsigned long long)dev_bytenr, > > - superblock_mirror_num); > > + printk_in_rcu(KERN_INFO "New initial S-block (bdev %p," > > can you please add the 'btrfs: ' prefixes? > No, I'm not changing the output of print statements in this patch, I'll leave that up to the Strato guys. > > +" %s) @%llu (%s/%llu/%d)\n", > > +superblock_bdev, > > +rcu_str_deref(device->name), > > +(unsigned long long)dev_bytenr, > > +dev_state->name, > > +(unsigned long long)dev_bytenr, > > +superblock_mirror_num); > > list_add(&superblock_tmp->all_blocks_node, > > &state->all_blocks_list); > > btrfsic_block_hashtable_add(superblock_tmp, > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > > index e39a3b9..7d658f2 100644 > > --- a/fs/btrfs/disk-io.c > > +++ b/fs/btrfs/disk-io.c > > @@ -44,6 +44,7 @@ > > #include "free-space-cache.h" > > #include "inode-map.h" > > #include "check-integrity.h" > > +#include "rcu-string.h" > > > > static struct extent_io_ops btree_extent_io_ops; > > static void end_workqueue_fn(struct btrfs_work *work); > > @@ -2575,8 +2576,9 @@ static void btrfs_end_buffer_write_sync(struct > > buffer_head *bh, int uptodate) > > struct btrfs_device *device = (struct btrfs_device *) > > bh->b_private; > > > > - printk_ratelimited(KERN_WARNING "lost page write due to " > > - "I/O error on %s\n", device->name); > > + printk_in_rcu(KERN_WARNING "lost page write due to " > > here > > > + "I/O error on %s\n", > > + rcu_str_deref(device->name)); > > /* note, we dont' set_buffer_write_io_error because we have > > * our own ways of dealing with the IO errors > > */ > > diff --git a/fs/btrfs/rcu-string.h b/fs/btrfs/rcu-string.h > > new file mode 100644 > > index 000..2fbb56b > > --- /dev/null > > +++ b/fs/btrfs/rcu-string.h > > @@ -0,0 +1,56 @@ > > +/* > > + * Copyright (C) 2012 Red Hat. All rights reserved. > > + * > > + * This program is free software; you can redistribute it and/or > > + * modify it under the terms of the GNU General Public > > + * License v2 as published by the Free Software Foundation. > > + * > > + * This program is distributed in the hope that it will be useful, > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + * General Public License for more details. > > + * > > + * You should have received a copy of the GNU General Public > > + * License along with this program; if not, write to the > > + * Free Software Foundation, Inc., 59 Temple Place - Suite 330, > > + * Boston, MA 021110-1307, USA. > > + */ > > + > > +struct rcu_string { > > + struct rcu_head rcu; > > + char str[0]; > > +}; > > + > > +static inline struct rcu_string *rcu_string_strdup(const char *src, gfp_t > > mask) > > +{ > > + size_t len = strlen(src); > > + struct rcu_string *ret = kzalloc(sizeof(struct rcu_string) + > > +(len * sizeof(char)), mask); > > len + 1 ? or is the devname not null-terminated? Oh hey strlen doesn't include the null how about that. I will fix, thanks. > > > + if (!ret) > > + return ret; > > + strncpy(ret->str, src, len); > > + return ret; > > +} > > + > > +static inline void rcu_string_free(struct rcu_string *str) > > +{ > > + if (str) > > + kfree_rcu(str, rcu); > > +} > > + > > +#define printk_in_rcu(fmt, ...) do { \ > > + rcu_read_lock();\ > > + printk(fmt, ##__VA_ARGS__); \ > > drop
Computing size of snapshots approximatly
Hi, we using on a server several lvm volumes with btrfs. We want to use nightly build snapshots for some days as an alternative to backups. Now I want to get the size of the snapshots in detail. Therefore I played with btrfs subvolume find-new $snapshot $gen-id. And I know, that this is quite complicated and not implemented. Therefore I try to go my own way: Now assume there are two snapshots of one subvolume, snap1 and snap2. Further get the find-new informations of these snapshots with $gen-id=1 and save them into different files. A diff of these files shows the changes between snap1 and snap2, right? Ok. There are three operations on a filesystem, I think, 1. copy a file on the filesystem 2. change a file on the filesystem 3. delete a file on the filesystem Am I right to assume, that operation 1 and 2 are not change much the size of a snapshot and the delete operation let increase the size of a snapshot in the size of the deleted files? If it is so, it would be enough for me to get the deletions of files between two snapshots and their size. But is there another way to get these informations beside btrfs subvolume find-new? Perhaps it makes sense to use ioctl for it? What about the send/receive feature, which is upcoming? Are there any hints? Many thanks in advance. Jan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Btrfs: use rcu to protect device->name V2
On Wed, 13 Jun 2012 00:35:26 +0200, David Sterba wrote: > On Tue, Jun 12, 2012 at 03:50:41PM -0400, Josef Bacik wrote: >> +++ b/fs/btrfs/check-integrity.c >> @@ -93,6 +93,7 @@ >> #include "print-tree.h" >> #include "locking.h" >> #include "check-integrity.h" >> +#include "rcu-string.h" >> >> #define BTRFSIC_BLOCK_HASHTABLE_SIZE 0x1 >> #define BTRFSIC_BLOCK_LINK_HASHTABLE_SIZE 0x1 >> @@ -843,13 +844,14 @@ static int btrfsic_process_superblock_dev_mirror( >> superblock_tmp->never_written = 0; >> superblock_tmp->mirror_num = 1 + superblock_mirror_num; >> if (state->print_mask & BTRFSIC_PRINT_MASK_SUPERBLOCK_WRITE) >> -printk(KERN_INFO "New initial S-block (bdev %p, %s)" >> - " @%llu (%s/%llu/%d)\n", >> - superblock_bdev, device->name, >> - (unsigned long long)dev_bytenr, >> - dev_state->name, >> - (unsigned long long)dev_bytenr, >> - superblock_mirror_num); >> +printk_in_rcu(KERN_INFO "New initial S-block (bdev %p," > > can you please add the 'btrfs: ' prefixes? Please no additional "btrfs" prefix in the check-integrity printk lines that are enabled with the print_mask option. If they are enabled, then for btrfs debugging, and then the context is known. And you get thousands of these lines... > >> + " %s) @%llu (%s/%llu/%d)\n", >> + superblock_bdev, >> + rcu_str_deref(device->name), >> + (unsigned long long)dev_bytenr, >> + dev_state->name, >> + (unsigned long long)dev_bytenr, >> + superblock_mirror_num); >> list_add(&superblock_tmp->all_blocks_node, >> &state->all_blocks_list); >> btrfsic_block_hashtable_add(superblock_tmp, -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Massive metadata size increase after upgrade from 3.2.18 to 3.4.1
Did you try balance ? (also there is a balance option to pick the least utilized metadata chunks). in long run when you have the understanding of your files and sizes tuning using mount option metadata_ratio might help. but not sure how the metadata expanded to 84.38G was there any major delete operation on the filesystem? thanks, Anand On 13/06/12 01:38, Calvin Walton wrote: On Sat, 2012-06-09 at 01:38 +0600, Roman Mamedov wrote: Hello, Before the upgrade (on 3.2.18): Metadata, DUP: total=9.38GB, used=5.94GB After the FS has been mounted once with 3.4.1: Data: total=3.44TB, used=2.67TB System, DUP: total=8.00MB, used=412.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=84.38GB, used=5.94GB Where did my 75 GB of free space just went? Btrfs tries to keep a certain ratio of allocated data space to allocated metadata space at all times, in order to ensure that there is always some free metadata space available. In 3.3 (I believe, but haven't actually checked...) this ratio was increased, since people were still complaining about btrfs reporting out of space errors too soon. On a filesystem containing (a relatively small number of) large files, it probably over-allocates the metadata space, which is what you're seeing. I'm not sure if the ratio is tunable. But better to have a bit of unused metadata space than to get 'out of space' errors once you've filled your disk and you're trying to delete some files! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] Btrfs: use large extent range for read and its endio
we use larger extent state range for both readpages and read endio, so that we can lock or unlock less and avoid most of split ops, then we'll reduce write locks taken at endio time. Signed-off-by: Liu Bo --- fs/btrfs/extent_io.c | 201 +- 1 files changed, 182 insertions(+), 19 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 081fe13..bb66e3c 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2258,18 +2258,26 @@ static void end_bio_extent_readpage(struct bio *bio, int err) struct bio_vec *bvec_end = bio->bi_io_vec + bio->bi_vcnt - 1; struct bio_vec *bvec = bio->bi_io_vec; struct extent_io_tree *tree; + struct extent_state *cached = NULL; u64 start; u64 end; int whole_page; int mirror; int ret; + u64 up_start, up_end, un_start, un_end; + int up_first, un_first; + int for_uptodate[bio->bi_vcnt]; + int i = 0; + + up_start = un_start = (u64)-1; + up_end = un_end = 0; + up_first = un_first = 1; if (err) uptodate = 0; do { struct page *page = bvec->bv_page; - struct extent_state *cached = NULL; pr_debug("end_bio_extent_readpage: bi_vcnt=%d, idx=%d, err=%d, " "mirror=%ld\n", bio->bi_vcnt, bio->bi_idx, err, @@ -2280,11 +2288,6 @@ static void end_bio_extent_readpage(struct bio *bio, int err) bvec->bv_offset; end = start + bvec->bv_len - 1; - if (bvec->bv_offset == 0 && bvec->bv_len == PAGE_CACHE_SIZE) - whole_page = 1; - else - whole_page = 0; - if (++bvec <= bvec_end) prefetchw(&bvec->bv_page->flags); @@ -2337,14 +2340,71 @@ static void end_bio_extent_readpage(struct bio *bio, int err) } } + if (uptodate) + for_uptodate[i++] = 1; + else + for_uptodate[i++] = 0; + if (uptodate && tree->track_uptodate) { - set_extent_uptodate(tree, start, end, &cached, - GFP_ATOMIC); + if (up_first) { + up_start = start; + up_end = end; + up_first = 0; + } else { + if (up_start == end + 1) { + up_start = start; + } else if (up_end == start - 1) { + up_end = end; + } else { + set_extent_uptodate( + tree, up_start, up_end, + &cached, GFP_ATOMIC); + up_start = start; + up_end = end; + } + } } - unlock_extent_cached(tree, start, end, &cached, GFP_ATOMIC); + + if (un_first) { + un_start = start; + un_end = end; + un_first = 0; + } else { + if (un_start == end + 1) { + un_start = start; + } else if (un_end == start - 1) { + un_end = end; + } else { + unlock_extent_cached(tree, un_start, un_end, +&cached, GFP_ATOMIC); + un_start = start; + un_end = end; + } + } + } while (bvec <= bvec_end); + + cached = NULL; + if (up_start < up_end) + set_extent_uptodate(tree, up_start, up_end, &cached, + GFP_ATOMIC); + if (un_start < un_end) + unlock_extent_cached(tree, un_start, un_end, &cached, +GFP_ATOMIC); + + i = 0; + bvec = bio->bi_io_vec; + do { + struct page *page = bvec->bv_page; + + tree = &BTRFS_I(page->mapping->host)->io_tree; + + if (bvec->bv_offset == 0 && bvec->bv_len == PAGE_CACHE_SIZE) + whole_page = 1; + else + whole_page = 0; if (whole_page) { - if (uptodate) { + if (for_uptodate[i++]) { SetPageUptodate(page); } else {
[PATCH 4/4] Btrfs: apply rwlock for extent state
We used to protect both extent state tree and an individual state's state by tree->lock, but this can be an obstacle of lockless read. So we seperate them here: o tree->lock protects the tree o state->lock protects the state. Signed-off-by: Liu Bo --- fs/btrfs/extent_io.c | 380 -- fs/btrfs/extent_io.h |3 +- 2 files changed, 336 insertions(+), 47 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index bb66e3c..4c6b743 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -27,7 +27,7 @@ static struct kmem_cache *extent_buffer_cache; static LIST_HEAD(buffers); static LIST_HEAD(states); -#define LEAK_DEBUG 0 +#define LEAK_DEBUG 1 #if LEAK_DEBUG static DEFINE_SPINLOCK(leak_lock); #endif @@ -120,7 +120,7 @@ void extent_io_tree_init(struct extent_io_tree *tree, INIT_RADIX_TREE(&tree->csum, GFP_ATOMIC); tree->ops = NULL; tree->dirty_bytes = 0; - spin_lock_init(&tree->lock); + rwlock_init(&tree->lock); spin_lock_init(&tree->buffer_lock); spin_lock_init(&tree->csum_lock); tree->mapping = mapping; @@ -146,6 +146,7 @@ static struct extent_state *alloc_extent_state(gfp_t mask) #endif atomic_set(&state->refs, 1); init_waitqueue_head(&state->wq); + spin_lock_init(&state->lock); trace_alloc_extent_state(state, mask, _RET_IP_); return state; } @@ -281,6 +282,7 @@ static void merge_state(struct extent_io_tree *tree, if (!other_node) break; other = rb_entry(other_node, struct extent_state, rb_node); + /* FIXME: need other->lock? */ if (other->end != state->start - 1 || other->state != state->state) break; @@ -297,6 +299,7 @@ static void merge_state(struct extent_io_tree *tree, if (!other_node) break; other = rb_entry(other_node, struct extent_state, rb_node); + /* FIXME: need other->lock? */ if (other->start != state->end + 1 || other->state != state->state) break; @@ -364,7 +367,10 @@ static int insert_state(struct extent_io_tree *tree, return -EEXIST; } state->tree = tree; + + spin_lock(&state->lock); merge_state(tree, state); + spin_unlock(&state->lock); return 0; } @@ -410,6 +416,23 @@ static int split_state(struct extent_io_tree *tree, struct extent_state *orig, return 0; } +static struct extent_state * +alloc_extent_state_atomic(struct extent_state *prealloc) +{ + if (!prealloc) + prealloc = alloc_extent_state(GFP_ATOMIC); + + return prealloc; +} + +enum extent_lock_type { + EXTENT_READ= 0, + EXTENT_WRITE = 1, + EXTENT_RLOCKED = 2, + EXTENT_WLOCKED = 3, + EXTENT_LAST= 4, +}; + static struct extent_state *next_state(struct extent_state *state) { struct rb_node *next = rb_next(&state->rb_node); @@ -426,13 +449,17 @@ static struct extent_state *next_state(struct extent_state *state) * If no bits are set on the state struct after clearing things, the * struct is freed and removed from the tree */ -static struct extent_state *clear_state_bit(struct extent_io_tree *tree, - struct extent_state *state, - int *bits, int wake) +static int __clear_state_bit(struct extent_io_tree *tree, +struct extent_state *state, +int *bits, int wake, int check) { - struct extent_state *next; int bits_to_clear = *bits & ~EXTENT_CTLBITS; + if (check) { + if ((state->state & ~bits_to_clear) == 0) + return 1; + } + if ((bits_to_clear & EXTENT_DIRTY) && (state->state & EXTENT_DIRTY)) { u64 range = state->end - state->start + 1; WARN_ON(range > tree->dirty_bytes); @@ -442,7 +469,17 @@ static struct extent_state *clear_state_bit(struct extent_io_tree *tree, state->state &= ~bits_to_clear; if (wake) wake_up(&state->wq); + return 0; +} + +static struct extent_state * +try_free_or_merge_state(struct extent_io_tree *tree, struct extent_state *state) +{ + struct extent_state *next = NULL; + + BUG_ON(!spin_is_locked(&state->lock)); if (state->state == 0) { + spin_unlock(&state->lock); next = next_state(state); if (state->tree) { rb_erase(&state->rb_node, &tree->state); @@ -453,18 +490,17 @@ static struct extent_state *clear_state_bit(struct extent_io_tree *tree, } } else { merge_state(tree, state); + spin_unlock(&state->lock);
[PATCH 1/4] Btrfs: use radix tree for checksum
We used to issue a checksum to an extent state of 4K range for read endio, but now we want to use larger range for performance optimization, so instead we create a radix tree for checksum, where an item stands for checksum of 4K data. Signed-off-by: Liu Bo --- fs/btrfs/extent_io.c | 84 -- fs/btrfs/extent_io.h |2 + fs/btrfs/inode.c |7 +--- 3 files changed, 23 insertions(+), 70 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 2c8f7b2..2923ede 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -117,10 +117,12 @@ void extent_io_tree_init(struct extent_io_tree *tree, { tree->state = RB_ROOT; INIT_RADIX_TREE(&tree->buffer, GFP_ATOMIC); + INIT_RADIX_TREE(&tree->csum, GFP_ATOMIC); tree->ops = NULL; tree->dirty_bytes = 0; spin_lock_init(&tree->lock); spin_lock_init(&tree->buffer_lock); + spin_lock_init(&tree->csum_lock); tree->mapping = mapping; } @@ -703,15 +705,6 @@ static void cache_state(struct extent_state *state, } } -static void uncache_state(struct extent_state **cached_ptr) -{ - if (cached_ptr && (*cached_ptr)) { - struct extent_state *state = *cached_ptr; - *cached_ptr = NULL; - free_extent_state(state); - } -} - /* * set some bits on a range in the tree. This may require allocations or * sleeping, so the gfp mask is used to indicate what is allowed. @@ -1666,56 +1659,32 @@ out: */ int set_state_private(struct extent_io_tree *tree, u64 start, u64 private) { - struct rb_node *node; - struct extent_state *state; int ret = 0; - spin_lock(&tree->lock); - /* -* this search will find all the extents that end after -* our range starts. -*/ - node = tree_search(tree, start); - if (!node) { - ret = -ENOENT; - goto out; - } - state = rb_entry(node, struct extent_state, rb_node); - if (state->start != start) { - ret = -ENOENT; - goto out; - } - state->private = private; -out: - spin_unlock(&tree->lock); + spin_lock(&tree->csum_lock); + ret = radix_tree_insert(&tree->csum, (unsigned long)start, + (void *)((unsigned long)private << 1)); + BUG_ON(ret); + spin_unlock(&tree->csum_lock); return ret; } int get_state_private(struct extent_io_tree *tree, u64 start, u64 *private) { - struct rb_node *node; - struct extent_state *state; - int ret = 0; + void **slot = NULL; - spin_lock(&tree->lock); - /* -* this search will find all the extents that end after -* our range starts. -*/ - node = tree_search(tree, start); - if (!node) { - ret = -ENOENT; - goto out; - } - state = rb_entry(node, struct extent_state, rb_node); - if (state->start != start) { - ret = -ENOENT; - goto out; + spin_lock(&tree->csum_lock); + slot = radix_tree_lookup_slot(&tree->csum, (unsigned long)start); + if (!slot) { + spin_unlock(&tree->csum_lock); + return -ENOENT; } - *private = state->private; -out: - spin_unlock(&tree->lock); - return ret; + *private = (u64)(*slot) >> 1; + + radix_tree_delete(&tree->csum, (unsigned long)start); + spin_unlock(&tree->csum_lock); + + return 0; } /* @@ -2294,7 +2263,6 @@ static void end_bio_extent_readpage(struct bio *bio, int err) do { struct page *page = bvec->bv_page; struct extent_state *cached = NULL; - struct extent_state *state; pr_debug("end_bio_extent_readpage: bi_vcnt=%d, idx=%d, err=%d, " "mirror=%ld\n", bio->bi_vcnt, bio->bi_idx, err, @@ -2313,21 +2281,10 @@ static void end_bio_extent_readpage(struct bio *bio, int err) if (++bvec <= bvec_end) prefetchw(&bvec->bv_page->flags); - spin_lock(&tree->lock); - state = find_first_extent_bit_state(tree, start, EXTENT_LOCKED); - if (state && state->start == start) { - /* -* take a reference on the state, unlock will drop -* the ref -*/ - cache_state(state, &cached); - } - spin_unlock(&tree->lock); - mirror = (int)(unsigned long)bio->bi_bdev; if (uptodate && tree->ops && tree->ops->readpage_end_io_hook) { ret = tree->ops->readpage_end_io_hook(page, start, end, - state, mirror); +
[PATCH 2/4] Btrfs: merge adjacent states as much as possible
In order to reduce write locks, we do merge_state as much as much as possible. Signed-off-by: Liu Bo --- fs/btrfs/extent_io.c | 47 +++ 1 files changed, 27 insertions(+), 20 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 2923ede..081fe13 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -276,29 +276,36 @@ static void merge_state(struct extent_io_tree *tree, if (state->state & (EXTENT_IOBITS | EXTENT_BOUNDARY)) return; - other_node = rb_prev(&state->rb_node); - if (other_node) { + while (1) { + other_node = rb_prev(&state->rb_node); + if (!other_node) + break; other = rb_entry(other_node, struct extent_state, rb_node); - if (other->end == state->start - 1 && - other->state == state->state) { - merge_cb(tree, state, other); - state->start = other->start; - other->tree = NULL; - rb_erase(&other->rb_node, &tree->state); - free_extent_state(other); - } + if (other->end != state->start - 1 || + other->state != state->state) + break; + + merge_cb(tree, state, other); + state->start = other->start; + other->tree = NULL; + rb_erase(&other->rb_node, &tree->state); + free_extent_state(other); } - other_node = rb_next(&state->rb_node); - if (other_node) { + + while (1) { + other_node = rb_next(&state->rb_node); + if (!other_node) + break; other = rb_entry(other_node, struct extent_state, rb_node); - if (other->start == state->end + 1 && - other->state == state->state) { - merge_cb(tree, state, other); - state->end = other->end; - other->tree = NULL; - rb_erase(&other->rb_node, &tree->state); - free_extent_state(other); - } + if (other->start != state->end + 1 || + other->state != state->state) + break; + + merge_cb(tree, state, other); + state->end = other->end; + other->tree = NULL; + rb_erase(&other->rb_node, &tree->state); + free_extent_state(other); } } -- 1.6.5.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4 v2][RFC] apply rwlock for extent state
This patchset is against one of project ideas, RBtree lock contention: "Btrfs uses a number of rbtrees to index in-memory data structures. Some of these are dominated by reads, and the lock contention from searching them is showing up in profiles. We need to look into an RCU and sequence counter combination to allow lockless reads." The goal is to use RCU, but we take it as a long term one, and instead we use rwlock until we find a mature rcu structure for lockless read. So what we need to do is to make the code RCU friendly, and the idea mainly comes from Chris Mason: Quoted: "I think the extent_state code can be much more RCU friendly if we separate the operations on the tree from operations on the individual state. In general, we can gain a lot of performance if we are able to reduce the write locks taken at endio time. Especially for reads, these are critical." The patchset is also available in: git://github.com/liubogithub/btrfs-work.git rwlock-for-extent-state I've run through xfstests, and no bugs jump out by then. I made a simple test to show the difference on my box: $ cat 6_FIO/fio-4thread-4M-sync-read [global] group_reporting thread numjobs=4 bs=4M rw=read sync=0 ioengine=sync directory=/mnt/btrfs/ [READ] filename=foobar size=4000M *results:* w/o patch w patch READ bandwidth(aggrb) 849MB/s 971MB/s MORE TESTS ARE WELCOME! v1->v2: drop changes on invalidatepage() and rebase to the latest btrfs upstream. Liu Bo (4): Btrfs: use radix tree for checksum Btrfs: merge adjacent states as much as possible Btrfs: use large extent range for read and its endio Btrfs: apply rwlock for extent state fs/btrfs/extent_io.c | 712 +++--- fs/btrfs/extent_io.h |5 +- fs/btrfs/inode.c |7 +- 3 files changed, 568 insertions(+), 156 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Moving top level to a subvolume
On Wed, Jun 13, 2012 at 4:44 PM, C Anthony Risinger wrote: > On Wed, Jun 13, 2012 at 2:21 AM, Arne Jansen wrote: >> On 13.06.2012 09:04, C Anthony Risinger wrote: >>> ... because in a), data will *copied* the slow way >> What I don't understand is why you think data will be copied. > at one point i tried to create a new subvol and `mv` files there, and > it took quite some time to complete > (cross-link-device-what-have-you?), but maybe things changed ... will > try it out. IIRC it hasn't. Not in upstream anyway. Some distros (e.g. opensuse) carry their own patch which allows cross-subvolume links (cp --reflink ...). But it shouldn't matter anyway, since you can SNAPSHOT the old subvol (even root subvol), instead of creating a new subvol. Which means nothing needs to be copied. You'd still have to do "rm" manually though. -- Fajar -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Moving top level to a subvolume
On Wed, Jun 13, 2012 at 2:21 AM, Arne Jansen wrote: > On 13.06.2012 09:04, C Anthony Risinger wrote: >> >> a) have a lot of data >> b) need to do this via script >> c) ??? >> >> ... because in a), data will *copied* the slow way, and in b) you >> leave a bunch of junk laying around in the old root that will rot >> unless you `rm -rf` it ... [...] > > What I don't understand is why you think data will be copied. at one point i tried to create a new subvol and `mv` files there, and it took quite some time to complete (cross-link-device-what-have-you?), but maybe things changed ... will try it out. >> [...] >> >> so, would it possible to implement this, or could someone kindly (and >> briefly!) explain why it cannot be done? > > The default subvol ('/') has the special number 5 and is expected to > always be around. All other subvols get numbers starting with 256. > Creating a new 5 and internally renumbering the old 5 isn't easy, because > each tree block has an owner recorded in it. Also, all backreferences > have the root number in them. If you have to touch each tree block, you > can as well choose the snapshot/rm -rf approach. ok this makes sense thanks, the last sentence especially ... top-level volume is different. it's identical to other subvols in 99% of ways save one-gotcha-little-1%. couldn't we shield ourselves a little better? >> 1. people install stuff to the top-level >> 2. top-level is unmanageable >> 3. problem >> [...] > > Can't instead add code to the installer that warns a user if he wants > to install into the default subvol? > Just some random ideas... i would like to see #5 cut off from natural access: accessible by an _explicit_ manual mount only, cannot be made default, and cannot be removed. maybe btrfs manages a proxy/facade subvol, say, #10, settable by `--flag-origin` or `{insert-here}` option -- a symlink to subvol? if, at absolutely any time or whatever reason, #10 pointer should not exist, immediately snapshot #5 and update. #5 -> #10 -> #256+ ? ... this might allow the root to be "replaced". default is set to #10 proxy volume when FS is initialized. > [...] > Or you could hack mkfs.btrfs to always create an additional subvol. > Even making / readonly except for creating mountpoint could be possible. ^ yeah, this sounds like exactly what i'm thinking, differing mainly on who does the work... i just want a guaranteed way of replacing the "logical root", at #10. the "physical root" at #5 it's more-or-less indestructible and off limits, and never available except as a template. ... i am new to postgresql, but their template0/template1 feels related to solving problems like this. -- C Anthony -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Moving top level to a subvolume
On Wed, Jun 13, 2012 at 2:23 PM, Duncan <1i5t5.dun...@cox.net> wrote: > Fajar A. Nugraha posted on Wed, 13 Jun 2012 08:49:47 +0700 as excerpted: > >> As for "lose their filesystems", are there recent ones that uses one of >> the three distros above, and is purely btrfs "fault"? The ones I can >> remember (from the post to this list) were broken on earlier kernels, or >> caused by bad disks. > My system's old and has a bit of a problem with overheating in the > Phoenix summer, so has been suffering SATA resets > it's exactly this sort of > corner-case that filesystems need to be able to deal with IIRC XFS had corruption problems when used on top of LVM (or other block device that doesn't support barriers correctly), while using ext2/3/4 on the same block device will be "fine". Yet XFS doesn't have the mark of "unstable, highly experimental, do not use". People simply use the right (for them) fs for the right job. My point is yes, btrfs is new. And it's being developed at much faster rate than any other more-mature fs out there. And there are known cases of data loss on certain configuration of corner cases/"buggy" hardware and/or old version of kernel. But when used in the correct environment, btrfs can be a good choice, even for critical data. Of course IF the data were REALLY critical, and I REALLY need btrfs' features, and it were on an enterprise environment, I would've bought support from oracle linux (or SLES 12, when it's out, or whatever enterprise distro supporting btrfs which sells support contract) so I can have someone to turn to in case of problems, and (in some cases) transfer the risk/blame :D -- Fajar -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] E2fsprogs: add missing usage for No_COW
Add the missing usage for No_COW since we've supported No_COW flag. Signed-off-by: Liu Bo --- v1->v2: sort options alphabetically, thanks to Roman Mamedov. misc/chattr.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/misc/chattr.c b/misc/chattr.c index 141ea6e..24254cc 100644 --- a/misc/chattr.c +++ b/misc/chattr.c @@ -83,7 +83,7 @@ static unsigned long sf; static void usage(void) { fprintf(stderr, - _("Usage: %s [-RVf] [-+=AacDdeijsSu] [-v version] files...\n"), + _("Usage: %s [-RVf] [-+=AaCcDdeijsSu] [-v version] files...\n"), program_name); exit(1); } -- 1.6.5.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Bug in btrfs-debug-tree for two or more devices.
-Original Message- From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Randy Barlow Sent: Tuesday, June 12, 2012 8:28 PM To: linux-btrfs@vger.kernel.org Subject: Re: Bug in btrfs-debug-tree for two or more devices. On Tuesday, June 12, 2012 06:53:00 AM Santosh Hosamani wrote: > Kernel 3.0.13.0.27-default This kernel is very old for btrfs. Can you try with at least Linux 3.4? I have installed 3.4.2 kernel but still I am facing the same issue.May be my understanding of calculating the used block may be wrong. If someone could help me in understanding .It would be great. -- R -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html http://www.mindtree.com/email/disclaimer.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Bug in btrfs-debug-tree for two or more devices.
-Original Message- From: linux-btrfs-ow...@vger.kernel.org [mailto:linux-btrfs-ow...@vger.kernel.org] On Behalf Of Hugo Mills Sent: Wednesday, June 13, 2012 1:37 AM To: Santosh Hosamani Cc: linux-btrfs@vger.kernel.org Subject: Re: Bug in btrfs-debug-tree for two or more devices. On Tue, Jun 12, 2012 at 06:53:00AM +, Santosh Hosamani wrote: > > Hi btrfs folks, > I am working on btrfs filesystem on how it manages the free > space. And found out btrfs maintain a ctree which manages the physical > location of the chunks and stripes of the filesystem. > Btrfs-debug-tree also gives the information on the chunk tree > > I created btrfs on single device and two device.I have attached the output of > both on running btrfs-debug-tree. > For single device sum of all the length in the chunks will add upto the total > used bytes which is expected behavior. > > But for two devices sum of all lengths in the chunks does not add to the > total bytes .Am I missing something . Without actually seeing the details of your technique and expectations, I shall make a guess that you're not accounting for the double-counting of RAID-1 metadata. In other words, you will find that all of the metadata device extents (or chunks) will appear twice -- once on each device. Actually, this isn't quite right either -- what you really need to do is look at the RAID-1, RAID-10 and DUP bits in the chunk flags, add up all of those chunks, divide by two, and then add in the remaining (RAID-0 and single) chunks. That total should then add up to the total value of allocated space that you get from the output of "btrfs fi df". >> chunk tree leaf 20971520 items 8 free space 3023 generation 4 owner 3 fs uuid 23f86d1e-038a-4f5b-b87c-2ba78018135c chunk uuid db672366-6801-4f83-99ef-2087a60bb394 item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 3897 itemsize 98 dev item devid 1 total_bytes 3221225472 bytes used 673579008 item 1 key (DEV_ITEMS DEV_ITEM 2) itemoff 3799 itemsize 98 dev item devid 2 total_bytes 3221225472 bytes used 652607488 item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 0) itemoff 3719 itemsize 80 chunk length 4194304 owner 2 type 2 num_stripes 1 stripe 0 devid 1 offset 0 item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 4194304) itemoff 3639 itemsize 80 chunk length 8388608 owner 2 type 4 num_stripes 1 stripe 0 devid 1 offset 4194304 item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 12582912) itemoff 3559 itemsize 80 chunk length 8388608 owner 2 type 1 num_stripes 1 stripe 0 devid 1 offset 12582912 item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 20971520) itemoff 3447 itemsize 112 chunk length 8388608 owner 2 type 18 num_stripes 2 stripe 0 devid 2 offset 1048576 stripe 1 devid 1 offset 20971520 item 6 key (FIRST_CHUNK_TREE CHUNK_ITEM 29360128) itemoff 3335 itemsize 112 chunk length 322109440 owner 2 type 20 num_stripes 2 stripe 0 devid 2 offset 9437184 stripe 1 devid 1 offset 29360128 item 7 key (FIRST_CHUNK_TREE CHUNK_ITEM 351469568) itemoff 3223 itemsize 112 chunk length 644218880 owner 2 type 9 num_stripes 2 stripe 0 devid 2 offset 331546624 stripe 1 devid 1 offset 351469568 chunk tree will tell me where the physical stripes are there right .?Irrespective of the raid type ... correct me if I am wrong. If not how will you know which all blocks are occupied and which all block are free. Basically what I want to do is . get the used blocks of all the devices and create a bitmap of that and zero out all the free block. Then I should not overwrite the used blocks. I should be able to mount the filesystem without any error. How do I achieve that? > Also I notice that for the second device the superblock location 0x1 is > not considered as used . > > I would be really grateful if you folks can answer my query. > > I hav run these tests on SLES11-sp2-x86 Kernel 3.0.13.0.27-default This is pretty old, but shouldn't affect the results. It will cause reliability problems if you try running it seriously. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- "There's a Martian war machine outside -- they want to talk --- to you about a cure for the common cold." http://www.mindtree.com/email/disclaimer.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] E2fsprogs: add missing usage for No_COW
On Wed, 13 Jun 2012 15:47:13 +0800 Liu Bo wrote: > Add the missing usage for No_COW since we've supported No_COW flag. > > Signed-off-by: Liu Bo > --- > misc/chattr.c |2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/misc/chattr.c b/misc/chattr.c > index 141ea6e..24254cc 100644 > --- a/misc/chattr.c > +++ b/misc/chattr.c > @@ -83,7 +83,7 @@ static unsigned long sf; > static void usage(void) > { > fprintf(stderr, > - _("Usage: %s [-RVf] [-+=AacDdeijsSu] [-v version] files...\n"), > + _("Usage: %s [-RVf] [-+=AacDdeijsSuC] [-v version] files...\n"), > program_name); > exit(1); > } These were sorted alphabetically so the better way would be to use AaCcDdeijsSu -- With respect, Roman ~~~ "Stallman had a printer, with code he could not see. So he began to tinker, and set the software free." signature.asc Description: PGP signature
[PATCH] E2fsprogs: add missing usage for No_COW
Add the missing usage for No_COW since we've supported No_COW flag. Signed-off-by: Liu Bo --- misc/chattr.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/misc/chattr.c b/misc/chattr.c index 141ea6e..24254cc 100644 --- a/misc/chattr.c +++ b/misc/chattr.c @@ -83,7 +83,7 @@ static unsigned long sf; static void usage(void) { fprintf(stderr, - _("Usage: %s [-RVf] [-+=AacDdeijsSu] [-v version] files...\n"), + _("Usage: %s [-RVf] [-+=AacDdeijsSuC] [-v version] files...\n"), program_name); exit(1); } -- 1.6.5.2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs and data nocow per inode basis
On 06/13/2012 05:10 AM, Ted Ts'o wrote: > On Tue, Jun 12, 2012 at 04:44:23PM -0400, Chris Mason wrote: >> On Tue, Jun 12, 2012 at 01:15:27PM -0600, Ted Ts'o wrote: >>> It appears the NOCOW_FL flag is currently a no-op in the 3.2 kernel? >> It's not a noop, but it is only setting the NODATACOW flag. It needs to >> set the nodatasum flag as well, just like the mount -o nodatacow mount >> option does. >> >> I'll fix this up on the kernel side, thanks Ted. > ohh, that's my fault...sorry. > Here's the final patch to e2fsprogs that will be going into 1.42.4: > This commit is lack of the related usage update, I'll send a patch for it :) thanks, liubo > commit 5a23c93aeb65d61892a47f8f27bffad38f4759ea > Author: Theodore Ts'o > Date: Tue Jun 12 17:09:39 2012 -0400 > > lsattr, chattr: add support for btrfs's No_COW flag > > Signed-off-by: "Theodore Ts'o" > > diff --git a/lib/e2p/pf.c b/lib/e2p/pf.c > index f03193c..e2f8ce5 100644 > --- a/lib/e2p/pf.c > +++ b/lib/e2p/pf.c > @@ -49,6 +49,7 @@ static struct flags_name flags_array[] = { > { EXT2_TOPDIR_FL, "T", "Top_of_Directory_Hierarchies" }, > { EXT4_EXTENTS_FL, "e", "Extents" }, > { EXT4_HUGE_FILE_FL, "h", "Huge_file" }, > + { FS_NOCOW_FL, "C", "No_COW" }, > { 0, NULL, NULL } > }; > > diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h > index f46a1a9..fb3f7cc 100644 > --- a/lib/ext2fs/ext2_fs.h > +++ b/lib/ext2fs/ext2_fs.h > @@ -301,6 +301,7 @@ struct ext2_dx_countlimit { > #define EXT4_EXTENTS_FL 0x0008 /* Inode uses extents */ > #define EXT4_EA_INODE_FL 0x0020 /* Inode used for large EA */ > /* EXT4_EOFBLOCKS_FL 0x0040 was here */ > +#define FS_NOCOW_FL 0x0080 /* Do not cow file */ > #define EXT4_SNAPFILE_FL 0x0100 /* Inode is a snapshot */ > #define EXT4_SNAPFILE_DELETED_FL 0x0400 /* Snapshot is being > deleted */ > #define EXT4_SNAPFILE_SHRUNK_FL 0x0800 /* Snapshot shrink > has completed */ > diff --git a/misc/chattr.1.in b/misc/chattr.1.in > index 92f6d70..5a57d2c 100644 > --- a/misc/chattr.1.in > +++ b/misc/chattr.1.in > @@ -64,6 +64,15 @@ this file compresses data before storing them on the disk. > Note: please > make sure to read the bugs and limitations section at the end of this > document. > .PP > +A file with the 'C' attribute set will not be subject to copy-on-write > +updates. This flag is only supported on file systems which perform > +copy-on-write. (Note: For btrfs, the 'C' flag should be only > +set on new or empty files. If it is set on a file which already has > +data blocks, it is undefined when the blocks assigned to the file will > +be fully stable. If the 'C' flag is set on a directory, it will have no > +effect on the directory, but new files created in that directory will > +the No_COW attribute.) > +.PP > When a directory with the `D' attribute set is modified, > the changes are written synchronously on the disk; this is equivalent to > the `dirsync' mount option applied to a subset of the files. > @@ -159,8 +168,7 @@ maintained by Theodore Ts'o . > .SH BUGS AND LIMITATIONS > The `c', 's', and `u' attributes are not honored > by the ext2 and ext3 filesystems as implemented in the current mainline > -Linux kernels.These attributes may be implemented > -in future versions of the ext2 and ext3 filesystems. > +Linux kernels. > .PP > The `j' option is only useful if the filesystem is mounted as ext3. > .PP > diff --git a/misc/chattr.c b/misc/chattr.c > index 8a2d61f..141ea6e 100644 > --- a/misc/chattr.c > +++ b/misc/chattr.c > @@ -107,6 +107,7 @@ static const struct flags_char flags_array[] = { > { EXT2_UNRM_FL, 'u' }, > { EXT2_NOTAIL_FL, 't' }, > { EXT2_TOPDIR_FL, 'T' }, > + { FS_NOCOW_FL, 'C' }, > { 0, 0 } > }; > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Moving top level to a subvolume
Fajar A. Nugraha posted on Wed, 13 Jun 2012 08:49:47 +0700 as excerpted: > As for "lose their filesystems", are there recent ones that uses one of > the three distros above, and is purely btrfs "fault"? The ones I can > remember (from the post to this list) were broken on earlier kernels, or > caused by bad disks. I tried btrfs during the 3.4 cycle for a bit, and didn't lose the whole filesystem, but definitely found it not upto my usual standard of robustness, my previous and back to now filesystem, Chris's former project, reiserfs. My system's old and has a bit of a problem with overheating in the Phoenix summer, so has been suffering SATA resets (not the disk, the sata chipset most likely, and/or issues with the graphics overheating since I'm using an AMD 8xxx chipset with AGPGART split between IOMMU for storage I/O and graphics) and full system freezes. Not only did I have way more stuff disappearing or being zeroed out than on reiserfs (in default data=ordered mode), but in one case I had a segment disappear out of the middle of a file, and in another, I had firefox's crash-resume-file /content/ show up as what SHOULD have been an entirely unrelated configuration file. Naturally I had backups to restore from, and if it wasn't for the freezes, it would have likely been fine, but it's exactly this sort of corner-case that filesystems need to be able to deal with, and what bothered me wasn't disappearing or zeroed out last few seconds of work with well documented explanations, but having random segments of files that I hadn't changed (whether the app was rewriting them with the same data's another question) in some time disappear, and having one file's content show up with an entirely unrelated name. I thought that's the sort of thing btrfs checksums were supposed to detect and effectively zero out, but... I decided that's /too/ experimental for me ATM, especially with not-quite- stable hardware (it's worth noting that I survived bad memory and the related crashes on reiserfs, without /that/ sort of damage, at least not since data=ordered mode!), so am back on reiserfs for now, anyway. I'll likely try again next year sometime... -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Moving top level to a subvolume
On 13.06.2012 09:04, C Anthony Risinger wrote: > On Fri, Jun 8, 2012 at 2:40 PM, Arne Jansen wrote: >> On 06/08/2012 09:24 PM, Matthew Hawn wrote: >>> I just converted my root filesystem to btrfs with btrfs-convert. However, >>> since I am running Ubuntu, I would like to have the same subvolume >>> structure as a default install,. How do I move the top-level subvolume >>> (where all my files currently are) to another subvolume? >> >> Just snapshot the root subvol and continue working in the snapshot. > > ... yeah but that solution totally sucks when you: > > a) have a lot of data > b) need to do this via script > c) ??? > > ... because in a), data will *copied* the slow way, and in b) you > leave a bunch of junk laying around in the old root that will rot > unless you `rm -rf` it ... and idk about you, but issuing what is very > near to that command on someone else's machine -- via script -- makes > me REALLY uneasy ;-) well, don't put data in the top level in the first place. Yes, you have to remove the content of the subvol / by rm -rf, but I don't really see the problem with it. What I don't understand is why you think data will be copied. > > i have asked this exact question at least 4 times specifically, and > referenced it probably 8-10, in the last 3 years or more. i needed it > then. i still need it now. but since i never got an answer up/down > or around, i gave up and told people to `rm -rf`themselves ... > > http://markmail.org/message/7hj5ioqrztkeerqv > > ... that's from May of 2010, but i don't think it's the first. > > so, would it possible to implement this, or could someone kindly (and > briefly!) explain why it cannot be done? The default subvol ('/') has the special number 5 and is expected to always be around. All other subvols get numbers starting with 256. Creating a new 5 and internally renumbering the old 5 isn't easy, because each tree block has an owner recorded in it. Also, all backreferences have the root number in them. If you have to touch each tree block, you can as well choose the snapshot/rm -rf approach. > > 1. people install stuff to the top-level > 2. top-level is unmanageable > 3. problem > > in my case i wrote an initramfs hook that implemented rollback > functionality, but there was not way for me to cleanly -- and safely > -- "rotate" the user's setup to one that DOES NOT have user items in > the top-level volume. Can't instead add code to the installer that warns a user if he wants to install into the default subvol? Or you could hack mkfs.btrfs to always create an additional subvol. Even making / readonly except for creating mountpoint could be possible. Just some random ideas... -Arne > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Moving top level to a subvolume
On Fri, Jun 8, 2012 at 2:40 PM, Arne Jansen wrote: > On 06/08/2012 09:24 PM, Matthew Hawn wrote: >> I just converted my root filesystem to btrfs with btrfs-convert. However, >> since I am running Ubuntu, I would like to have the same subvolume structure >> as a default install,. How do I move the top-level subvolume (where all my >> files currently are) to another subvolume? > > Just snapshot the root subvol and continue working in the snapshot. ... yeah but that solution totally sucks when you: a) have a lot of data b) need to do this via script c) ??? ... because in a), data will *copied* the slow way, and in b) you leave a bunch of junk laying around in the old root that will rot unless you `rm -rf` it ... and idk about you, but issuing what is very near to that command on someone else's machine -- via script -- makes me REALLY uneasy ;-) i have asked this exact question at least 4 times specifically, and referenced it probably 8-10, in the last 3 years or more. i needed it then. i still need it now. but since i never got an answer up/down or around, i gave up and told people to `rm -rf`themselves ... http://markmail.org/message/7hj5ioqrztkeerqv ... that's from May of 2010, but i don't think it's the first. so, would it possible to implement this, or could someone kindly (and briefly!) explain why it cannot be done? 1. people install stuff to the top-level 2. top-level is unmanageable 3. problem in my case i wrote an initramfs hook that implemented rollback functionality, but there was not way for me to cleanly -- and safely -- "rotate" the user's setup to one that DOES NOT have user items in the top-level volume. -- C Anthony -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html