Re: Synching a Backup Server
On 07/01/11 16:20, Hubert Kario wrote: I usually create subvolumes in btrfs root volume: /mnt/btrfs/ |- server-a |- server-b \- server-c then create snapshots of these directories: /mnt/btrfs/ |- server-a |- server-b |- server-c |- snapshots-server-a |- @GMT-2010.12.21-16.48.09 \- @GMT-2010.12.22-16.45.14 |- snapshots-server-b \- snapshots-server-c This way I can use the shadow_copy module for samba to publish the snapshots to windows clients. Can you post some actual commands to do this part I am extremely confused about btrfs subvolumes v the root filesystem and mounting, particularly in relation to the default subvolume. For instance, if I create the initial file system using mkfs.btrfs and then mount it on /mnt/btrfs is there already a default subvolume? or do I have to make one? What happens when you unmount the whole filesystem and then come back The wiki also makes the following statement *Note:* to be mounted the subvolume or snapshot have to be in the root of the btrfs filesystem. but you seems to have snapshots at one layer down from the root. I am trying to use this method for my offsite backups - to a large spare sata disk loaded via a usb port. I want to create the main filesystem (and possibly a subvolume - this is where I start to get confused) and rsync my current daily backup files to it. I would then also (just so I get the correct time - rather than do it at the next cycle, as explained below) take a snapshot with a time label. I would transport this disk offsite. I would repeat this in a months time with a totally different disk In a couple of months time - when I come to recycle the first disk for my offsite backup, I would mount the retrieved disk (and again I am confused - mount the complete filesystem or the subvolume?) rsync (--inplace ? - is this necessary) again the various backup files from my server and take another snapshot. I am hoping that this would effectively allow me to leave the snapshot I took last time in place, as because not everything will have changed it won't have used much space - so effectively I can keep quite a long stream of backup snapshots in place offsite. Eventually of course the disk will start to become full, but I assume I can reclaim the space by deleting some of the old snapshots. -- Alan Chandler http://www.chandlerfamily.org.uk -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Various Questions
I'd rather not do the copy again unless necessary, as it took a day. Directories look identical, but who knows? I'm going to try and figure out how to do a file-by-file crc check, for peace of mind. On Sat 08 January 2011 17:26:25 Freddie Cash wrote: On Sat, Jan 8, 2011 at 5:25 AM, Carl Cook cac...@quantum-sci.com wrote: In addition to the questions below, if anyone has a chance could you advise on why my destination drive has more data than the source after this command: # rsync --hard-links --delete --inplace --archive --numeric-ids /media/disk/* /home sending incremental file list What happens if you delete /home, then run the command again, but without the *? You generally don't use wildcards for the source or destination when using rsync. You just tell it which directory to start in. If you do an ls /home and ls /media/disk are they different? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synching a Backup Server
On Sun, Jan 9, 2011 at 6:46 PM, Alan Chandler a...@chandlerfamily.org.uk wrote: then create snapshots of these directories: /mnt/btrfs/ |- server-a |- server-b |- server-c |- snapshots-server-a |- @GMT-2010.12.21-16.48.09 \- @GMT-2010.12.22-16.45.14 |- snapshots-server-b \- snapshots-server-c For instance, if I create the initial file system using mkfs.btrfs and then mount it on /mnt/btrfs is there already a default subvolume? or do I have to make one? from btrfs FAQ: A subvolume is like a directory - it has a name, there's nothing on it when it is created, and it can hold files and other directories. There's at least one subvolume in every Btrfs filesystem, the default subvolume. The equivalent in Ext4 would be a filesystem. Each subvolume behaves as a individual filesystem. What happens when you unmount the whole filesystem and then come back whatever subvolume and snapshot you already have will still be there. The wiki also makes the following statement *Note:* to be mounted the subvolume or snapshot have to be in the root of the btrfs filesystem. but you seems to have snapshots at one layer down from the root. By default, when you do something like mount /dev/sdb1 /mnt/btrfs the default subvolume will be mounted under /mnt/btrfs. Snapshots and subvolumes will be visible as subdirectories under it, regardless whether it's in the root or several directories under it. Most likely this is enough for what you need, no need to mess with mounting subvolumes. Mounting subvolumes allows you to see a particular subvolume directly WITHOUT having to see the default subvolume or other subvolumes. This is particularly useful when you use btrfs as / or /home and want to rollback to a previous snapshot. So assuming snapshots-server-b above is a snapshot, you can run mount /dev/sdb1 /mnt/btrfs -o subvol=snapshots-server-b and what previously was in /mnt/btrfs/snapshots-server-b will now be accessible under /mnt/btrfs directly, and you can NOT see what was previously under /mnt/btrfs/snapshots-server-c. Also on a side note, you CAN mount subvolumes not located in the root of btrfs filesystem using subvolid instead of subvol. It might require a newer kernel/btrfs-progs version though (works fine in Ubuntu maverick.) -- Fajar -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Various Questions
On 09/01/11 13:37, Fajar A. Nugraha wrote: On Sun, Jan 9, 2011 at 8:16 PM, Carl Cookcac...@quantum-sci.com wrote: I'd rather not do the copy again unless necessary, as it took a day. Directories look identical, but who knows? I'm going to try and figure out how to do a file-by-file crc check, for peace of mind. try du --apparent-size -slh It should rule out any differences caused by sparse files and hardlinks. On Sat 08 January 2011 17:26:25 Freddie Cash wrote: On Sat, Jan 8, 2011 at 5:25 AM, Carl Cookcac...@quantum-sci.com wrote: In addition to the questions below, if anyone has a chance could you advise on why my destination drive has more data than the source after this command: # rsync --hard-links --delete --inplace --archive --numeric-ids /media/disk/* /home Are you SURE you don't get the command mixed up? The last argument to rsync should be the destination. Your command looks like you're copying things to /home. What is also important is that use of * - it means all the . files at the top level are NOT being copied rsync is clever enough to notice if you have the / at the end of the source to know whether you want the directory to be put into the destination or the contents of the directory. The / at the end of the source means copy the contents. This could be (I am not sure of the exact scope of --delete) the reason why the destination has more data than the source. If --delete is not deleting /home/.* files (if there any there). -- Alan Chandler http://www.chandlerfamily.org.uk -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Atomic file data replace API
On Sat, Jan 8, 2011 at 10:43 PM, Thomas Bellman bell...@nsc.liu.se wrote: So, basically database transactions with an isolation level of committed read, for file operations. That's something I have wanted for a long time, especially if I also get a rollback() operation, but have never heard of any Unix that implemented it. True, that's why this feature request is here. Note that it's (ATM) only about single file data replace. A separate commit() operation would be better than conflating it with close(). And as I said, we want a rollback() as well. And a process that terminates without committing the transaction that it is performing, should have the transaction automatically rolled back. What could you do between commit and close? I only have a very shallow knowledge about the internals of the Linux kernel in regards to filesystems, but I suspect that this could be implemented almost entirely within the VFS, and not need to touch the actual filesystems, as long as you are satisfied with a limited amount of transaction space (what fits in RAM + swap). I'm looking forward to your implementation. :-) Even though I suspect that it would be a rather large undertaking to implement... I have no plans to work on an implementation. -- Olaf -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synching a Backup Server
On 09/01/11 13:54, Fajar A. Nugraha wrote: By default, when you do something like mount /dev/sdb1 /mnt/btrfs the default subvolume will be mounted under /mnt/btrfs. Snapshots and subvolumes will be visible as subdirectories under it, regardless whether it's in the root or several directories under it. Most likely this is enough for what you need, no need to mess with mounting subvolumes. Mounting subvolumes allows you to see a particular subvolume directly WITHOUT having to see the default subvolume or other subvolumes. This is particularly useful when you use btrfs as / or /home and want to rollback to a previous snapshot. So assuming snapshots-server-b above is a snapshot, you can run I think I start to get it now. Its the fact that subvolumes can be snapshotted etc without mounting them that is the difference. I guess I am too used to thinking like LVM and I was thinking subvolumes where like an LV. They are, but not quite the same. -- Alan Chandler http://www.chandlerfamily.org.uk -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Swap file on btrfs fails (swapfile has holes)
Hi, I have just recently installed Debian squeeze with a root filesystem on btrfs [1]. I have noticed however that I cannot set up a swap file stored on the btrfs volume: dd if=/dev/zero of=/var/swap bs=16M count=4 mkswap /var/swap chmod 0 /var/swap swapon /var/swap [ 01751.879759] swapon: swapfile has holes swapon: /var/swap: swapon failed: Invalid argument For now I've set up a swap file under /boot (ext4), but is swap on btrfs expected to work? I'm using the stock Debian linux kernel 2.6.32-5-amd64. 1: This took a little bit of work - http://lists.debian.org/debian-user/2011/01/msg00217.html -- Paul Richards @pauldoo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synching a Backup Server
On Sun, Jan 9, 2011 at 7:32 AM, Alan Chandler a...@chandlerfamily.org.uk wrote: I think I start to get it now. Its the fact that subvolumes can be snapshotted etc without mounting them that is the difference. I guess I am too used to thinking like LVM and I was thinking subvolumes where like an LV. They are, but not quite the same. Let see if I can match up the terminology and layers a bit: LVM Physical Volume == Btrfs disk == ZFS disk / vdevs LVM Volume Group == Btrfs filesystem == ZFS storage pool LVM Logical Volume == Btrfs subvolume == ZFS volume 'normal' filesysm == Btrfs subvolume (when mounted) == ZFS filesystem Does that look about right? LVM: A physical volume is the lowest layer in LVM and they are combined into a volume group which is then split up into logical volumes, and formatted with a filesystem. Btrfs: A bunch of disks are formatted into a btrfs filesystem which is then split up into sub-volumes (sub-volumes are auto-formatted with a btrfs filesystem). ZFS: A bunch of disks are combined into virtual devices, then combined into a ZFS storage pool, which can be split up into either volumes formatted with any filesystem, or ZFS filesystems. Just curious, why all the new terminology in btrfs for things that already existed? And why are old terms overloaded with new meanings? I don't think I've seen a write-up about that anywhere (or I don't remember it if I have). Perhaps it's time to start looking at separating the btrfs pool creation tools out of mkfs (or renaming mkfs.btrfs), since you're really building a a storage pool, and not a filesystem. It would prevent a lot of confusion with new users. It's great that there's a separate btrfs tool for manipulating btrfs setups, but mkfs.btrfs is just wrong for creating the btrfs setup. -- Freddie Cash fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Atomic file data replace API
Olaf van der Spek wrote: On Sat, Jan 8, 2011 at 10:43 PM, Thomas Bellman bell...@nsc.liu.se wrote: So, basically database transactions with an isolation level of committed read, for file operations. That's something I have wanted for a long time, especially if I also get a rollback() operation, but have never heard of any Unix that implemented it. True, that's why this feature request is here. Note that it's (ATM) only about single file data replace. That particular problem was solved with the introduction of the rename(2) system call in 4.2BSD a bit more than a quarter of a century ago. There is no need to introduce another, less flexible, API for doing the same thing. A separate commit() operation would be better than conflating it with close(). And as I said, we want a rollback() as well. And a process that terminates without committing the transaction that it is performing, should have the transaction automatically rolled back. What could you do between commit and close? More write() operations, of course. Just like you can continue with more transactions after a COMMIT WORK call without having to close and re-open the database in SQL. /Bellman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Atomic file data replace API
On Sun, Jan 9, 2011 at 7:56 PM, Thomas Bellman bell...@nsc.liu.se wrote: True, that's why this feature request is here. Note that it's (ATM) only about single file data replace. That particular problem was solved with the introduction of the rename(2) system call in 4.2BSD a bit more than a quarter of a century ago. There is no need to introduce another, less flexible, API for doing the same thing. You might want to read about the problems with that workaround. What could you do between commit and close? More write() operations, of course. Just like you can continue with more transactions after a COMMIT WORK call without having to close and re-open the database in SQL. The transaction is defined as beginning with open and ending with close. -- Olaf -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Atomic file data replace API
On 01/09/2011 01:56 PM, Thomas Bellman wrote: That particular problem was solved with the introduction of the rename(2) system call in 4.2BSD a bit more than a quarter of a century ago. There is no need to introduce another, less flexible, API for doing the same thing. I'm curious if there are any BSD specifications that state that rename() has this behavior. Ted Tso has been claiming that POSIX does not require this behavior in the face of a crash and that as a result, an application that relies on such behavior is broken, and needs to fsync() before rename(). This of course, makes replacing numerous files much slower, glacially so on btrfs. There has been a great deal of discussion ok the dpkg mailing lists about it since plenty of people are upset that dpkg runs much slower these days than it used to, because it now calls fsync() before rename() in order to avoid breakage on ext4. You can read more, including the rationale of why POSIX does not require this behavior at http://lwn.net/Articles/323607/. I still say that preserving the order of the writes and rename is the only sane thing to do, whether POSIX requires it or not. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synching a Backup Server
On 09/01/11 18:30, Hugo Mills wrote: No, subvolumes are a part of the whole filesystem. In btrfs, there is only one filesystem. There are 6 main B-trees that store metadata in btrfs (plus a couple of others). One of those is the filesystem tree (or FS tree), which contains all the metadata associated with the normal POSIX directory/file namespace (basically all the inode and xattr data). When you create a subvolume, a new FS tree is created, but it shares *all* of the other btrfs B-trees. There is only one filesystem, but there may be distinct namespaces within that filesystem that can be mounted as if they were filesystems. Think of it more like NFSv4, where there's one overall namespace exported per server, but clients can mount subsections of it. I think this explanation is still missing the key piece that has confused me despite trying very hard to understand it by reading the wiki. You talk about Distinct Namespaces, but what I learnt from further up the thread is that this namespace is also inside the the namespace that makes up the whole filesystem. I mount the whole filesystem, and all my subvolumes are automatically there (at least that is what I find in practice). Its this duality of namespace that is the difficult concept. I am still not sure of there is a default subvolume, and the other subvolumes are defined within its namespace, or whether there is an overall filesystem namespace and subvolumes defined within it and if you mount the default subvolume you would then loose the overall filesystem namespace and hence no longer see the subvolumes. I find the wiki also confusing because it talks about subvolumes having to be at the first level of the filesystem, but again further up this thread there is an example which is used for real of it not being at the first level, but at one level down inside a directory. What it means is that I don't have a mental picture of how this all works, and all use cases could then be worked out by following this mental picture. I think it would be helpful if the Wiki contained some of the use cases that we have been talking about in this thread - but with more detailed information - like the actual commands used to mount the filesystems like this, and information as to in what circumstances you would perform each action. The main awkward piece of btrfs terminology is the use of RAID to describe btrfs's replication strategies. It's not RAID, and thinking of it in RAID terms is causing lots of confusion. Most of the other things in btrfs are, I think, named relatively sanely. I don't find this AS confusing, although there is still information missing which I asked in another post that wasn't answered. I still can't understand if its possible to initialise a filesystem in degraded mode. If you create the filesystem so that -m RAID1 and -d RAID1 but only have one device - it implies that it writes two copies of both metadata and data to that one device. However if you successfully create the filesystem on two devices and then fail one and mount it -o degraded it appears to suggest it will only write the one copy. I was considering how to migrate from an existing mdmadm Raid1 /lvm arrangement I suppose I could fail one device of the mdm pair and initialise the btrfs filesystem with this one device as the first half of a raid1 mirror and the other as a usb stick, then remove the usb stick and mount the filesystem -o degraded. Copy data to it from the still working half available lv and then dispose of mdmadm device completely and add in the freed up device using btrfs device add -- Alan Chandler http://www.chandlerfamily.org.uk -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synching a Backup Server
On Sun, Jan 09, 2011 at 08:57:12PM +, Alan Chandler wrote: On 09/01/11 18:30, Hugo Mills wrote: No, subvolumes are a part of the whole filesystem. In btrfs, there is only one filesystem. There are 6 main B-trees that store metadata in btrfs (plus a couple of others). One of those is the filesystem tree (or FS tree), which contains all the metadata associated with the normal POSIX directory/file namespace (basically all the inode and xattr data). When you create a subvolume, a new FS tree is created, but it shares *all* of the other btrfs B-trees. There is only one filesystem, but there may be distinct namespaces within that filesystem that can be mounted as if they were filesystems. Think of it more like NFSv4, where there's one overall namespace exported per server, but clients can mount subsections of it. I think this explanation is still missing the key piece that has confused me despite trying very hard to understand it by reading the wiki. You talk about Distinct Namespaces, but what I learnt from further up the thread is that this namespace is also inside the the namespace that makes up the whole filesystem. I mount the whole filesystem, and all my subvolumes are automatically there (at least that is what I find in practice). Its this duality of namespace that is the difficult concept. I am still not sure of there is a default subvolume, and the other subvolumes are defined within its namespace, or whether there is an overall filesystem namespace and subvolumes defined within it and if you mount the default subvolume you would then lose the overall filesystem namespace and hence no longer see the subvolumes. There is a root subvolume namespace (subvolid=0), which may contain files, directories, and other subvolumes. This root subvolume is what you see when you mount a newly-created btrfs filesystem. The default subvolume is simply what you get when you mount the filesystem without a subvol or subvolid parameter to mount. Initially, the default subvolume is set to be the root subvolume. If another subvolume is set to be the default, then the root subvolume can only be mounted with the subvolid=0 mount option. I find the wiki also confusing because it talks about subvolumes having to be at the first level of the filesystem, but again further up this thread there is an example which is used for real of it not being at the first level, but at one level down inside a directory. Try it, see what happens, and fix the wiki where it's wrong? :) Or at least say what page this is on, and I can try the experiment and fix it later... What it means is that I don't have a mental picture of how this all works, and all use cases could then be worked out by following this mental picture. I think it would be helpful if the Wiki contained some of the use cases that we have been talking about in this thread - but with more detailed information - like the actual commands used to mount the filesystems like this, and information as to in what circumstances you would perform each action. I've written a chunk of text about how btrfs's storage, RAID and subvolumes work. At the moment, though, the wiki is somewhat broken and I can't actually create the page to put it on... There's also a page of recipes[1], which is probably the place that the examples you mentioned should go. The main awkward piece of btrfs terminology is the use of RAID to describe btrfs's replication strategies. It's not RAID, and thinking of it in RAID terms is causing lots of confusion. Most of the other things in btrfs are, I think, named relatively sanely. I don't find this AS confusing, although there is still information missing which I asked in another post that wasn't answered. I still can't understand if its possible to initialise a filesystem in degraded mode. If you create the filesystem so that -m RAID1 and -d RAID1 but only have one device - it implies that it writes two copies of both metadata and data to that one device. However if you successfully create the filesystem on two devices and then fail one and mount it -o degraded it appears to suggest it will only write the one copy. From trying it a while ago, I don't think it is possible to create a filesystem in degraded mode. Again, I'll try it again when I have the time to do some experimentation and see what actually happens. Hugo. [1] https://btrfs.wiki.kernel.org/index.php/UseCases -- === Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- A clear conscience. Where did you get this taste --- for luxuries, Bernard? signature.asc Description: Digital signature
Re: Synching a Backup Server
On 09/01/11 22:01, Hugo Mills wrote: I find the wiki also confusing because it talks about subvolumes having to be at the first level of the filesystem, but again further up this thread there is an example which is used for real of it not being at the first level, but at one level down inside a directory. Try it, see what happens, and fix the wiki where it's wrong? :) Or at least say what page this is on, and I can try the experiment and fix it later... I don't have an account right now, but the page its on is here. https://btrfs.wiki.kernel.org/index.php/Getting_started#Basic_Filesystem_Commands ... From trying it a while ago, I don't think it is possible to create a filesystem in degraded mode. Again, I'll try it again when I have the time to do some experimentation and see what actually happens. As I wondered before it might be possible to fake it by using something like a USB stick initially and then failing it, and replacing it with the real device when ready. If thats possible, then perhaps functionality to do it without faking it could be added to the to do list. It sure would be useful in migrating from mdmadm/lvm setup. -- Alan Chandler http://www.chandlerfamily.org.uk -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synching a Backup Server
On Mon, Jan 10, 2011 at 5:01 AM, Hugo Mills hugo-l...@carfax.org.uk wrote: There is a root subvolume namespace (subvolid=0), which may contain files, directories, and other subvolumes. This root subvolume is what you see when you mount a newly-created btrfs filesystem. Is there a detailed explanation in the wiki about subvolid=0? What does top level 5 in the output of btrfs subvolume list mean (I thought 5 was subvolid for root subvolume)? # btrfs subvolume list / ID 256 top level 5 path maverick-base ID 257 top level 5 path kernel-2.6.37 The default subvolume is simply what you get when you mount the filesystem without a subvol or subvolid parameter to mount. Initially, the default subvolume is set to be the root subvolume. If another subvolume is set to be the default, then the root subvolume can only be mounted with the subvolid=0 mount option. ... and mounting with either subvolid=5 and subvolid=0 gives the same result in my case. -- Fajar -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] [RFC PATCH 0/4] btrfs: Implement delayed directory name index insertion and deletion
Hi Miao, As you suggested, in btrfs_recover_log_trees(), the items to modify in the transaction are not known before entering a tree, we can use the global block reservation for it. Signed-off-by: Itaru Kitayama kitay...@cl.bb4u.ne.jp --- fs/btrfs/tree-log.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 054744a..7df8c7b 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3081,6 +3081,8 @@ int btrfs_recover_log_trees(struct btrfs_root *log_root_tree) trans = btrfs_start_transaction(fs_info-tree_root, 0); + trans-block_rsv = fs_info-global_block_rsv; + wc.trans = trans; wc.pin = 1; -- 1.7.3.4 On Thu, 06 Jan 2011 14:47:41 +0800 Miao Xie mi...@cn.fujitsu.com wrote: Hi, Kitayama-san Firstly, thanks for your test. On Sat, 1 Jan 2011 00:43:41 +0900, Itaru Kitayama wrote: Hi Miao, The HEAD of the perf-improve fails to boot on my virtual machine. The system calls btrfs_delete_delayed_dir_index() with trans block_rsv set to NULL, thus selects, in get_block_rsv(), empty_block_rsv whose reserve is 0 (and size is also 0), which leads to ENOSPC. I wonder below patch is enough reserve metadata to finish btrfs_recover_log_trees() without going to ENOSPC. I appreciate your review. Singed-Off-by: Itaru Kitayamakitay...@cl.bb4u.ne.jp diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 054744a..f26326b 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3079,7 +3079,7 @@ int btrfs_recover_log_trees(struct btrfs_root *log_root_tree) path = btrfs_alloc_path(); BUG_ON(!path); - trans = btrfs_start_transaction(fs_info-tree_root, 0); + trans = btrfs_start_transaction(fs_info-tree_root, 4); I don't think this change is right, because we don't know how many leaves we may change when doing log tree replay, so we can't set the secondargument to 4. And I think the original code is right, because the space reservation is used to avoid filesystem operations being broken by that other users hogging all of the free space. but this function is invoked when we mount a filesystem, at this time, no other user can access the filesystem, so we can use all of the free space, thus we needn't reserve any free space for log tree replay. I don't understand the log tree very well, maybe there is something wrong with what I said. If what I said above is right, we should look for another way to fix this problem. (I'm making the second version of this patchset now, I'll fix it in it. So if your patch is right, I'll want to add it into my patchset.) Thanks again for your test. Miao wc.trans = trans; wc.pin = 1; Here's the log: kernel BUG at fs/btrfs/tree-log.c:678! invalid opcode: [#1] SMP last sysfs file: /sys/devices/virtual/bdi/btrfs-1/uevent CPU 1 Modules linked in: floppy mptspi mptscsih mptbase scsi_transport_spi [last unloaded: scsi_wait_scan] Pid: 308, comm: mount Not tainted 2.6.36-perf-improve+ #1 440BX Desktop Reference Platform/VMware Virtual Platform RIP: 0010:[811eb161] [811eb161] drop_one_dir_item+0xd6/0xfb RSP: 0018:88007a5a5858 EFLAGS: 00010286 RAX: ffe4 RBX: 88007d2b7800 RCX: 880037e8b240 RDX: RSI: eac3ae68 RDI: 0206 RBP: 88007a5a58c8 R08: 005e6760 R09: 88007a5a55e8 R10: 88007a5a55e0 R11: 88007a5a5648 R12: 880037e8b120 R13: 880037e98cc0 R14: 88007b371c90 R15: 0005 FS: 7f37b63c4800() GS:88000204() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f37b55f0190 CR3: 7a4d9000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process mount (pid: 308, threadinfo 88007a5a4000, task 88007a5c9720) Stack: 0005 0d75 880037e98550 880037dcf000 0 0016e730 0001 0100 005e7b60 0 88007a46d000 880037e8b120 88007d2b7800 880037e98550 Call Trace: [811ec339] add_inode_ref+0x32a/0x403 [811ec59a] replay_one_buffer+0x188/0x209 [811bafef] ? verify_parent_transid+0x36/0xf9 [811e8eb9] walk_up_log_tree+0x109/0x1d1 [811ec412] ? replay_one_buffer+0x0/0x209 [811e930f] walk_log_tree+0x9b/0x187 [811eaf73] btrfs_recover_log_trees+0x18a/0x2a2 [811ec412] ? replay_one_buffer+0x0/0x209 [811bb123] ? btree_read_extent_buffer_pages+0x71/0xaf [811becfe] open_ctree+0xf8f/0x12c6 [811a69b4] btrfs_get_sb+0x225/0x459 [810fe143] ? __kmalloc_track_caller+0x13a/0x14c [8110d458]