Re: What to do about subvolumes?
Josef Bacik wrote: This is a huge topic in and of itself, but Christoph mentioned wanting to have an idea of what we wanted to do with it, so I'm putting it here. There are really 2 things here 1) Limiting the size of subvolumes. This is really easy for us, just create a subvolume and at creation time set a maximum size it can grow to and not let it go farther than that. Nice, simple and straightforward. I'd love to be able to limit the size of a subvolume. Here the size comprises all blocks this subvolume refers to. But at least as important to me is a mode where one can build groups of sub- volumes and snapshots and define a quota for the complete group. Again, the size here comprises all blocks any of the subvolumes/snapshots refer to. If a block is referred to more than once, it counts only once. A subvolume/snapshot can be configured to be part of multiple groups. With this I can do interesting things: a) The user pays only for the space he occupies, not for read-only snapshots b) The user pays for his space and for all the snapshots c) The user pays for his space and snapshots, but not for snapshots generated for internal backup purposes d) Hierarchical quotas. I can limit /home and set an additional quota on each homedir Thanks, Arne -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What to do about subvolumes?
Josef Bacik wrote: 1) Scrap the 256 inode number thing. Instead we'll just put a flag in the inode to say Hey, I'm a subvolume and then we can do all of the appropriate magic that way. This unfortunately will be an incompatible format change, but the sooner we get this adressed the easier it will be in the long run. Obviously when I say format change I mean via the incompat bits we have, so old fs's won't be broken and such. 2) Do something like NFS's referral mounts when we cd into a subvolume. Now we just do dentry trickery, but that doesn't make the boundary between subvolumes clear, so it will confuse people (and samba) when they walk into a subvolume and all of a sudden the inode numbers are the same as in the directory behind them. With doing the referral mount thing, each subvolume appears to be its own mount and that way things like NFS and samba will work properly. What about the alternative and allocating inode numbers globally? The only problem would be with snapshots as they share the inum with the source, but one could just remap inode numbers in snapshots by sparing some bits at the top of this 64 bit field. Having one mount per subvolume/snapshots is the cleaner solution, but quickly leads to situations where you have _lots_ of mounts, especially when you export them via NFS and mount it somewhere else. I've seen a machine which had to handle 100,000 mounts from a zfs server. This definitely brings it's own problems, so I'd love to see a full fs exported as a single mount. This will also keep output from tools like iostat (for nfs mounts) and df readable. Thanks, Arne -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What to do about subvolumes?
Excerpts from Arne Jansen's message of 2010-12-02 04:49:39 -0500: Josef Bacik wrote: 1) Scrap the 256 inode number thing. Instead we'll just put a flag in the inode to say Hey, I'm a subvolume and then we can do all of the appropriate magic that way. This unfortunately will be an incompatible format change, but the sooner we get this adressed the easier it will be in the long run. Obviously when I say format change I mean via the incompat bits we have, so old fs's won't be broken and such. 2) Do something like NFS's referral mounts when we cd into a subvolume. Now we just do dentry trickery, but that doesn't make the boundary between subvolumes clear, so it will confuse people (and samba) when they walk into a subvolume and all of a sudden the inode numbers are the same as in the directory behind them. With doing the referral mount thing, each subvolume appears to be its own mount and that way things like NFS and samba will work properly. What about the alternative and allocating inode numbers globally? The only problem would be with snapshots as they share the inum with the source, but one could just remap inode numbers in snapshots by sparing some bits at the top of this 64 bit field. The global inode number is possible, it's just another btree that must be maintained on disk in order to map which inodes are free and which ones aren't. It also needs to have a reference count on each inode, since each snapshot effectively increases the reference count on every file and directory it contains. The cost of maintaining that reference count is very very high. -chris Having one mount per subvolume/snapshots is the cleaner solution, but quickly leads to situations where you have _lots_ of mounts, especially when you export them via NFS and mount it somewhere else. I've seen a machine which had to handle 100,000 mounts from a zfs server. This definitely brings it's own problems, so I'd love to see a full fs exported as a single mount. This will also keep output from tools like iostat (for nfs mounts) and df readable. Thanks, Arne -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at fs/btrfs/inode.c:806
Excerpts from Johannes Hirte's message of 2010-12-01 08:11:01 -0500: On one of my machines with btrfs I got this bug: entry offset 29085974528, bytes 4096, bitmap no entry offset 29162995712, bytes 20480, bitmap yes entry offset 29171744768, bytes 4096, bitmap no block group has cluster?: no 0 blocks of free space at or bigger than bytes is block group 29834084352 has 1073741824 bytes, 1072648192 used 0 pinned 0 reserved Well, you've had an ENOSPC explosion. The block group messages where way more, too much for the dmesg log buffer. Kernel is a 2.6.37-rc3+ without the latest btrfs-fixes. The bug occurred when compiling openoffice.org. After the bug a 'df -h' showed: df -h: FilesystemSize Used Avail Use% Mounted on rootfs 21G 17G 770M 96% / /dev/root 21G 17G 770M 96% / rc-svcdir 1.0M 108K 916K 11% /lib/rc/init.d udev 10M 116K 9.9M 2% /dev shm 1013M 0 1013M 0% /dev/shm /dev/sda2 66G 46G 20G 71% /home /dev/sdb1 75G 56G 19G 75% /mnt/windows Which of these filesystems were you compiling on? -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] fs: btrfs: Shuffle preprocessor macros
The function btree_migratepage will be extended from the baseclass only when CONFIG_MIGRATION option is enabled. So, it's useful to define/build this function only when that config option is enabled. Fixes an unused function compiler warning when CONFIG_MIGRATION is not enabled and also removes an return -ENOSYS statement, whose scenario will not happen. Signed-off-by: Sankar P sankar.curios...@gmail.com --- fs/btrfs/disk-io.c |6 ++ 1 files changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index c547cca..7199239 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -696,6 +696,7 @@ static int btree_submit_bio_hook(struct inode *inode, int rw, struct bio *bio, __btree_submit_bio_done); } +#ifdef CONFIG_MIGRATION static int btree_migratepage(struct address_space *mapping, struct page *newpage, struct page *page) { @@ -712,12 +713,9 @@ static int btree_migratepage(struct address_space *mapping, if (page_has_private(page) !try_to_release_page(page, GFP_KERNEL)) return -EAGAIN; -#ifdef CONFIG_MIGRATION return migrate_page(mapping, newpage, page); -#else - return -ENOSYS; -#endif } +#endif static int btree_writepage(struct page *page, struct writeback_control *wbc) { -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fsck, parent transid verify failed
Excerpts from Tommy Jonsson's message of 2010-12-01 06:00:56 -0500: Hi folks! Been using btrfs for quite a while now, worked great until now. Got power-loss on my machine and now i have the parent transid verify failed on X wanted X found X problem. So I can't get it to mount. My btrfs is spread over sda (2tb), sdc(2tb), sdd(1tb). Is this something that an offline fsck could fix ? If so is the fsck-util being developed ? Is there a way to mount the FS in a read-only mode or something to rescue the data ? Which kernel are you on? Unless you formatted with -m raid0, the current git tree should be able to read this FS by using the second copy of the metadata. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 4/4] btrfs: implement delayed dir index insertion and deletion
Excerpts from Miao Xie's message of 2010-12-01 03:09:35 -0500: Compare with Ext3/4, the performance of file creation and deletion on btrfs is very poor. the reason is that btrfs must do a lot of b+ tree insertions, such as inode item, directory name item, directory name index and so on. If we can do some delayed b+ tree insertion or deletion, we can improve the performance, so we made this patch which implemented delayed directory name index insertion and deletion. Many thanks for working on this. It's a difficult problem and these patches look very clean. I think you can get more improvement if you also do this delayed scheme for the inode items themselves. The hard part of these delayed implementations is always the throttling, +if (delayed_root-count = root-leafsize / sizeof(*dir_item)) +btrfs_run_delayed_dir_index(trans, root, NULL, +BTRFS_DELAYED_INSERT_ITEM, 0); + Have you experimented with other values here? I need to take a hard look at the locking and do some benchmarking on larger machines. I'm a little worried about increased lock contention, but I think we can get around it by breaking up the rbtrees a little later on if we need to. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at fs/btrfs/inode.c:806
On Thursday 02 December 2010 17:19:56 Chris Mason wrote: Excerpts from Johannes Hirte's message of 2010-12-01 08:11:01 -0500: On one of my machines with btrfs I got this bug: entry offset 29085974528, bytes 4096, bitmap no entry offset 29162995712, bytes 20480, bitmap yes entry offset 29171744768, bytes 4096, bitmap no block group has cluster?: no 0 blocks of free space at or bigger than bytes is block group 29834084352 has 1073741824 bytes, 1072648192 used 0 pinned 0 reserved Well, you've had an ENOSPC explosion. The block group messages where way more, too much for the dmesg log buffer. Kernel is a 2.6.37-rc3+ without the latest btrfs-fixes. The bug occurred when compiling openoffice.org. After the bug a 'df -h' showed: df -h: FilesystemSize Used Avail Use% Mounted on rootfs 21G 17G 770M 96% / /dev/root 21G 17G 770M 96% / rc-svcdir 1.0M 108K 916K 11% /lib/rc/init.d udev 10M 116K 9.9M 2% /dev shm 1013M 0 1013M 0% /dev/shm /dev/sda2 66G 46G 20G 71% /home /dev/sdb1 75G 56G 19G 75% /mnt/windows Which of these filesystems were you compiling on? On /. It's a gentoo system and the bug happened during an 'emerge openoffice'. The compilation ist usually done under /var/tmp/portage. regards, Johannes -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at fs/btrfs/inode.c:806
On Thursday 02 December 2010 17:52:50 Johannes Hirte wrote: On Thursday 02 December 2010 17:19:56 Chris Mason wrote: Excerpts from Johannes Hirte's message of 2010-12-01 08:11:01 -0500: On one of my machines with btrfs I got this bug: entry offset 29085974528, bytes 4096, bitmap no entry offset 29162995712, bytes 20480, bitmap yes entry offset 29171744768, bytes 4096, bitmap no block group has cluster?: no 0 blocks of free space at or bigger than bytes is block group 29834084352 has 1073741824 bytes, 1072648192 used 0 pinned 0 reserved Well, you've had an ENOSPC explosion. The block group messages where way more, too much for the dmesg log buffer. Kernel is a 2.6.37-rc3+ without the latest btrfs-fixes. The bug occurred when compiling openoffice.org. After the bug a 'df -h' showed: df -h: FilesystemSize Used Avail Use% Mounted on rootfs 21G 17G 770M 96% / /dev/root 21G 17G 770M 96% / rc-svcdir 1.0M 108K 916K 11% /lib/rc/init.d udev 10M 116K 9.9M 2% /dev shm 1013M 0 1013M 0% /dev/shm /dev/sda2 66G 46G 20G 71% /home /dev/sdb1 75G 56G 19G 75% /mnt/windows Which of these filesystems were you compiling on? On /. It's a gentoo system and the bug happened during an 'emerge openoffice'. The compilation ist usually done under /var/tmp/portage. Btw, I was able to reproduce this with a second try to emerge openoffice. regards, Johannes -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What to do about subvolumes?
On 02/12/10 16:11, Chris Mason wrote: Excerpts from Arne Jansen's message of 2010-12-02 04:49:39 -0500: Josef Bacik wrote: 1) Scrap the 256 inode number thing. Instead we'll just put a flag in the inode to say Hey, I'm a subvolume and then we can do all of the appropriate magic that way. This unfortunately will be an incompatible format change, but the sooner we get this adressed the easier it will be in the long run. Obviously when I say format change I mean via the incompat bits we have, so old fs's won't be broken and such. 2) Do something like NFS's referral mounts when we cd into a subvolume. Now we just do dentry trickery, but that doesn't make the boundary between subvolumes clear, so it will confuse people (and samba) when they walk into a subvolume and all of a sudden the inode numbers are the same as in the directory behind them. With doing the referral mount thing, each subvolume appears to be its own mount and that way things like NFS and samba will work properly. What about the alternative and allocating inode numbers globally? The only problem would be with snapshots as they share the inum with the source, but one could just remap inode numbers in snapshots by sparing some bits at the top of this 64 bit field. The global inode number is possible, it's just another btree that must be maintained on disk in order to map which inodes are free and which ones aren't. It also needs to have a reference count on each inode, since each snapshot effectively increases the reference count on every file and directory it contains. The cost of maintaining that reference count is very very high. A couple of years ago I was suffering from the problem of different files having the same inode number on Netapp servers. On a Netapp device if you snapshot a volume then the files in the snapshot have the same inode number as the original, even if the original changes. (Netapp snapshots are read only). This means that if you attempt to see what has changed since your last snapshot using a command line such as: diff src/file.c .snapshots/hourly.12/src.file.c Then the diff tool will tell you that the files are the same even if they are different, because it is assuming that files with the same inode number will have identical contents. Therefore I think it is a bad idea if potentially different files on btrfs can have the same inode number. It will break all sorts of tools. Instead of maintaining a big complicated reference count of used inode numbers, could btrfs use bit masks to create a the userland visible inode number from the subvolume id and the real internal inode number. Something like: userland_inode = ( volume_id 48 ) internal_inode; Please forgive me if this is impossible, or if that C snippet is syntactically incorrect. I am not a filesystem or kernel developer, and I have not coded in C for many years. -- David Pottage -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
800 GByte free, but no space left
Hallo, I've new problems. I use 2 disks (1.5 Tbyte and 2.0 TByte) under 1 LABEL (for my video collection, nearly alle files have more than 1 GByte): Label: MM2 uuid: ad7c0668-316c-4a79-ba00-3b505b9d99b4 Total devices 2 FS bytes used 2.38TB devid2 size 1.35TB used 1.35TB path /dev/sdc3 devid1 size 1.81TB used 1.35TB path /dev/sdf2 (btrfs-show uses TiByte, it's 10% less than TByte) Btrfs Btrfs v0.19 Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdc33400799848 2559596740 841203108 76% /srv/MM When I add some more videos, writing gets slower and slower, and then the system refuses with no space left ... Label: MM2 uuid: ad7c0668-316c-4a79-ba00-3b505b9d99b4 Total devices 2 FS bytes used 2.40TB devid2 size 1.35TB used 1.35TB path /dev/sdc3 devid1 size 1.81TB used 1.35TB path /dev/sdf2 Btrfs Btrfs v0.19 Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdc33400799848 2585340332 815459516 77% /srv/MM - When I try to rename files I get no space left When I delete some files and then try again to rename the system doesn't really rename but first copies to the new name and then deletes the old file. Same behaviour when I try to move a file from one directory to another. -- Where is the bottleneck? Viele Gruesse! Helmut -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at fs/btrfs/inode.c:806
Excerpts from Johannes Hirte's message of 2010-12-02 12:02:16 -0500: On Thursday 02 December 2010 17:52:50 Johannes Hirte wrote: On Thursday 02 December 2010 17:19:56 Chris Mason wrote: Excerpts from Johannes Hirte's message of 2010-12-01 08:11:01 -0500: On one of my machines with btrfs I got this bug: entry offset 29085974528, bytes 4096, bitmap no entry offset 29162995712, bytes 20480, bitmap yes entry offset 29171744768, bytes 4096, bitmap no block group has cluster?: no 0 blocks of free space at or bigger than bytes is block group 29834084352 has 1073741824 bytes, 1072648192 used 0 pinned 0 reserved Well, you've had an ENOSPC explosion. The block group messages where way more, too much for the dmesg log buffer. Kernel is a 2.6.37-rc3+ without the latest btrfs-fixes. The bug occurred when compiling openoffice.org. After the bug a 'df -h' showed: df -h: FilesystemSize Used Avail Use% Mounted on rootfs 21G 17G 770M 96% / /dev/root 21G 17G 770M 96% / rc-svcdir 1.0M 108K 916K 11% /lib/rc/init.d udev 10M 116K 9.9M 2% /dev shm 1013M 0 1013M 0% /dev/shm /dev/sda2 66G 46G 20G 71% /home /dev/sdb1 75G 56G 19G 75% /mnt/windows Which of these filesystems were you compiling on? On /. It's a gentoo system and the bug happened during an 'emerge openoffice'. The compilation ist usually done under /var/tmp/portage. Btw, I was able to reproduce this with a second try to emerge openoffice. Ok, there is one related fix in the git tree right now that you don't have. I'm not 100% sure it'll fix this, but it can't hurt. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hunt for 2.6.37 dm-crypt+ext4 corruption? (was: Re: dm-crypt barrier support is effective)
On Wed, Dec 1, 2010 at 10:23 PM, Mike Snitzer snit...@redhat.com wrote: On Wed, Dec 01 2010 at 3:45pm -0500, Milan Broz mb...@redhat.com wrote: On 12/01/2010 08:34 PM, Jon Nelson wrote: Perhaps this is useful: for myself, I found that when I started using 2.6.37rc3 that postgresql starting having a *lot* of problems with corruption. Specifically, I noted zeroed pages, corruption in headers, all sorts of stuff on /newly created/ tables, especially during index creation. I had a fairly high hit rate of failure. I backed off to 2.6.34.7 and have *zero* problems (in fact, prior to 2.6.37rc3, I had never had a corruption issue with postgresql). I ran on 2.6.36 for a few weeks as well, without issue. I am using kcrypt with lvm on top of that, and ext4 on top of that. With unpatched dmcrypt (IOW with Linus' git)? Then it must be ext4 or dm-core problem because there were no patches for dm-crypt... Matt and Jon, If you'd be up to it: could you try testing your dm-crypt+ext4 corruption reproducers against the following two 2.6.37-rc commits: 1) 1de3e3df917459422cb2aecac440febc8879d410 then 2) bd2d0210cf22f2bd0cef72eb97cf94fc7d31d8cc Then, depending on results of no corruption for those commits, bonus points for testing the same commits but with Andi and Milan's latest dm-crypt cpu scalability patch applied too: https://patchwork.kernel.org/patch/365542/ Thanks! Mike Yeah sure, I'll have to set up another testing system (on a separate partition / volume group) for its own so that will take some time, first tests will be run probably in the weekend, thanks for those pointers ! I took a look at git-web - you think 5a87b7a5da250c9be6d757758425dfeaf8ed3179 might be relevant, too ? the others seem rather minor compared to those you posted Afaik last time I run vanilla 2.6.37-rc* (which was probably around rc1) I saw no corruption at all but I'll give it a test-run without the dm-crypt patch anyway Thanks Regards Matt -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: disk space caching generation missmatch
On Thu, Dec 2, 2010 at 2:34 PM, Josef Bacik jo...@redhat.com wrote: + if (!ret) { + spin_lock(block_gruop-lock); + block_group-disk_cache_state = BTRFS_DC_SETUP; + spin_unlock(block_group-lock); + } misspelling: block_gruop - block_group just noticed this glancing thru... C Anthony -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fsck, parent transid verify failed
Excerpts from Tommy Jonsson's message of 2010-12-02 16:45:39 -0500: I can't remember if i used -m raid0. I think i just used mkfs.btrfs /dev/sda then btrfs device add /dev/sdb and same for sdc. I am sure that i didn't explicitly use -m raid1 or raid10. Is there a way that i can check this ? The defaults will maintain raid1 as you add more drives. We can check it with btrfs-debug-tree from the git repository. But, more below. If i do have raid0 for both metadata and data is there anything i can do ? I've been looking at the source but haven't got my head around it yet. What whould happen if i just ignore/bypass the transid error? The error: [265889.197279] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 3 transid 39651 /dev/sda [265889.198266] btrfs: use compression [265889.647817] parent transid verify failed on 2721514774528 wanted 39651 found 39649 [265889.672632] btrfs: open_ctree failed Or could i update the metadata to want 39649 ? The first thing I would try is: git pull git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git Build the latest tools, then: btrfsck -s 1 /dev/xxx btrfsck -s 2 /dev/xxx If either of these work we have an easy way to get it mounted. Just let me know. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fsck, parent transid verify failed
$ btrfsck -s 1 /dev/sda using SB copy 1, bytenr 67108864 parent transid verify failed on 2721514774528 wanted 39651 found 39649 btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed. Aborted $ btrfsck -s 2 /dev/sda using SB copy 2, bytenr 274877906944 parent transid verify failed on 2721514774528 wanted 39651 found 39649 btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed. Aborted -tommy On Thu, Dec 2, 2010 at 10:50 PM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Tommy Jonsson's message of 2010-12-02 16:45:39 -0500: I can't remember if i used -m raid0. I think i just used mkfs.btrfs /dev/sda then btrfs device add /dev/sdb and same for sdc. I am sure that i didn't explicitly use -m raid1 or raid10. Is there a way that i can check this ? The defaults will maintain raid1 as you add more drives. We can check it with btrfs-debug-tree from the git repository. But, more below. If i do have raid0 for both metadata and data is there anything i can do ? I've been looking at the source but haven't got my head around it yet. What whould happen if i just ignore/bypass the transid error? The error: [265889.197279] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 3 transid 39651 /dev/sda [265889.198266] btrfs: use compression [265889.647817] parent transid verify failed on 2721514774528 wanted 39651 found 39649 [265889.672632] btrfs: open_ctree failed Or could i update the metadata to want 39649 ? The first thing I would try is: git pull git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git Build the latest tools, then: btrfsck -s 1 /dev/xxx btrfsck -s 2 /dev/xxx If either of these work we have an easy way to get it mounted. Just let me know. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fsck, parent transid verify failed
Tried btrfs-debug-tree /dev/sda and btrfs-debug-tree -e /dev/sda : parent transid verify failed on 2721514774528 wanted 39651 found 39649 btrfs-debug-tree: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed. dmesg said: [268375.903581] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 2 transid 39650 /dev/sdd [268375.904241] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 1 transid 39651 /dev/sdc [268375.904526] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 3 transid 39651 /dev/sda -tommy On Thu, Dec 2, 2010 at 10:59 PM, Tommy Jonsson quaz...@gmail.com wrote: $ btrfsck -s 1 /dev/sda using SB copy 1, bytenr 67108864 parent transid verify failed on 2721514774528 wanted 39651 found 39649 btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed. Aborted $ btrfsck -s 2 /dev/sda using SB copy 2, bytenr 274877906944 parent transid verify failed on 2721514774528 wanted 39651 found 39649 btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed. Aborted -tommy On Thu, Dec 2, 2010 at 10:50 PM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Tommy Jonsson's message of 2010-12-02 16:45:39 -0500: I can't remember if i used -m raid0. I think i just used mkfs.btrfs /dev/sda then btrfs device add /dev/sdb and same for sdc. I am sure that i didn't explicitly use -m raid1 or raid10. Is there a way that i can check this ? The defaults will maintain raid1 as you add more drives. We can check it with btrfs-debug-tree from the git repository. But, more below. If i do have raid0 for both metadata and data is there anything i can do ? I've been looking at the source but haven't got my head around it yet. What whould happen if i just ignore/bypass the transid error? The error: [265889.197279] device fsid 734a485d12c77872-9b0b5aa408670db4 devid 3 transid 39651 /dev/sda [265889.198266] btrfs: use compression [265889.647817] parent transid verify failed on 2721514774528 wanted 39651 found 39649 [265889.672632] btrfs: open_ctree failed Or could i update the metadata to want 39649 ? The first thing I would try is: git pull git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git Build the latest tools, then: btrfsck -s 1 /dev/xxx btrfsck -s 2 /dev/xxx If either of these work we have an easy way to get it mounted. Just let me know. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: disk space caching generation missmatch
On Thursday 02 December 2010 21:34:10 Josef Bacik wrote: On Wed, Dec 01, 2010 at 10:40:29PM +0100, Johannes Hirte wrote: On Wednesday 01 December 2010 22:22:45 Johannes Hirte wrote: On Wednesday 01 December 2010 21:03:13 Josef Bacik wrote: On Wed, Dec 01, 2010 at 08:56:14PM +0100, Johannes Hirte wrote: On Wednesday 01 December 2010 18:40:18 Josef Bacik wrote: On Wed, Dec 01, 2010 at 05:46:14PM +0100, Johannes Hirte wrote: After enabling disk space caching I've observed several log entries like this: btrfs: free space inode generation (0) did not match free space cache generation (169594) for block group 15464398848 I'm not sure, but it seems this happens on every reboot. Is this something to worry about? So that usually means 1 of a couple of things 1) You didn't have space for us to save the free space cache 2) When trying to write out the cache we hit one of those cases where we would deadlock so we couldn't write the cache out It's nothing to worry about, it's doing what it is supposed to. However I'd like to know why we're not able to write out the cache. Are you running close to full? Thanks, Josef I think there should be enough free space: Ok it doesn't look like theres an actual problem, we're just being sub-optimal. Take out the other patch and apply this one, boot into that kernel and then reboot and then give me the dmesg. Here it comes: Initializing cgroup subsys cpuset Linux version 2.6.37-rc4-space-cache-dbg-00022-g620731b-dirty (r...@netbook) (gcc version 4.5.1 (Gentoo 4.5.1-r1 p1.3, pie-0.4.5) ) #126 SMP PREEMPT Fri Dec 3 00:40:04 CET 2010 Atom PSE erratum detected, BIOS microcode update recommended BIOS-provided physical RAM map: BIOS-e820: - 0009f800 (usable) BIOS-e820: 0009f800 - 000a (reserved) BIOS-e820: 000dc000 - 000e4000 (reserved) BIOS-e820: 000e8000 - 0010 (reserved) BIOS-e820: 0010 - 7f6d (usable) BIOS-e820: 7f6d - 7f6e2000 (ACPI data) BIOS-e820: 7f6e2000 - 7f6e3000 (ACPI NVS) BIOS-e820: 7f6e3000 - 8000 (reserved) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fec0 - fec1 (reserved) BIOS-e820: fed0 - fed00400 (reserved) BIOS-e820: fed14000 - fed1a000 (reserved) BIOS-e820: fed1c000 - fed9 (reserved) BIOS-e820: fee0 - fee01000 (reserved) BIOS-e820: ff00 - 0001 (reserved) NX (Execute Disable) protection: active DMI present. DMI: M912/M912, BIOS R02 05/04/2009 e820 update range: - 0001 (usable) == (reserved) e820 remove range: 000a - 0010 (usable) last_pfn = 0x7f6d0 max_arch_pfn = 0x100 MTRR default type: uncachable MTRR fixed ranges enabled: 0-9 write-back A-B uncachable C-C write-protect D-D uncachable E-F write-protect MTRR variable ranges enabled: 0 base 0 mask 08000 write-back 1 base 07F70 mask 0FFF0 uncachable 2 base 07F80 mask 0FF80 uncachable 3 disabled 4 disabled 5 disabled 6 disabled 7 disabled x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106 Scanning 0 areas for low memory corruption initial memory mapped : 0 - 01a0 init_memory_mapping: -37bfe000 00 - 0037bfe000 page 4k kernel direct mapping tables up to 37bfe000 @ 183f000-1a0 ACPI: RSDP 000f7e40 00024 (v02 GBT ) ACPI: XSDT 7f6dc705 00084 (v01 GBTGBTUACPI 0604 LTP ) ACPI: FACP 7f6e1bd2 000F4 (v03 INTEL CALISTGA 0604 ALAN 0001) ACPI: DSDT 7f6dd907 04257 (v01 INTEL CALISTGA 0604 INTL 20050624) ACPI: FACS 7f6e2fc0 00040 ACPI: APIC 7f6e1cc6 00068 (v01 INTEL CALISTGA 0604 LOHR 005A) ACPI: HPET 7f6e1d2e 00038 (v01 INTEL CALISTGA 0604 LOHR 005A) ACPI: MCFG 7f6e1d66 0003C (v01 INTEL CALISTGA 0604 LOHR 005A) ACPI: SLIC 7f6e1da2 00176 (v01 GBTGBTUACPI 0604 TBD 0001) ACPI: TCPA 7f6e1f18 00032 (v01 PTLTD CALISTGA 0604 PTL 0001) ACPI: TMOR 7f6e1f4a 00026 (v01 PTLTD 0604 PTL 0003) ACPI: APIC 7f6e1f70 00068 (v01 PTLTD ? APIC 0604 LTP ) ACPI: BOOT 7f6e1fd8 00028 (v01 PTLTD $SBFTBL$ 0604 LTP 0001) ACPI: SSDT 7f6dcd25 0025F (v01 PmRef Cpu0Tst 3000 INTL 20050624) ACPI: SSDT 7f6dcc7f 000A6 (v01 PmRef Cpu1Tst 3000 INTL 20050624) ACPI: SSDT 7f6dc789 004F6 (v02 PmRefCpuPm 3000 INTL 20050624) ACPI: BIOS bug: multiple APIC/MADT found, using 0 ACPI: If acpi_apic_instance=2 works better, notify linux-a...@vger.kernel.org ACPI: Local APIC address
Re: disk space caching generation missmatch
Did you fix that typo I posted? C Anthony [mobile] On Dec 2, 2010, at 6:05 PM, Johannes Hirte johannes.hi...@fem.tu-ilmenau.de wrote: On Thursday 02 December 2010 21:34:10 Josef Bacik wrote: On Wed, Dec 01, 2010 at 10:40:29PM +0100, Johannes Hirte wrote: On Wednesday 01 December 2010 22:22:45 Johannes Hirte wrote: On Wednesday 01 December 2010 21:03:13 Josef Bacik wrote: On Wed, Dec 01, 2010 at 08:56:14PM +0100, Johannes Hirte wrote: On Wednesday 01 December 2010 18:40:18 Josef Bacik wrote: On Wed, Dec 01, 2010 at 05:46:14PM +0100, Johannes Hirte wrote: After enabling disk space caching I've observed several log entries like this: btrfs: free space inode generation (0) did not match free space cache generation (169594) for block group 15464398848 I'm not sure, but it seems this happens on every reboot. Is this something to worry about? So that usually means 1 of a couple of things 1) You didn't have space for us to save the free space cache 2) When trying to write out the cache we hit one of those cases where we would deadlock so we couldn't write the cache out It's nothing to worry about, it's doing what it is supposed to. However I'd like to know why we're not able to write out the cache. Are you running close to full? Thanks, Josef I think there should be enough free space: Ok it doesn't look like theres an actual problem, we're just being sub-optimal. Take out the other patch and apply this one, boot into that kernel and then reboot and then give me the dmesg. Here it comes: Initializing cgroup subsys cpuset Linux version 2.6.37-rc4-space-cache-dbg-00022-g620731b-dirty (r...@netbook) (gcc version 4.5.1 (Gentoo 4.5.1-r1 p1.3, pie-0.4.5) ) #126 SMP PREEMPT Fri Dec 3 00:40:04 CET 2010 Atom PSE erratum detected, BIOS microcode update recommended BIOS-provided physical RAM map: BIOS-e820: - 0009f800 (usable) BIOS-e820: 0009f800 - 000a (reserved) BIOS-e820: 000dc000 - 000e4000 (reserved) BIOS-e820: 000e8000 - 0010 (reserved) BIOS-e820: 0010 - 7f6d (usable) BIOS-e820: 7f6d - 7f6e2000 (ACPI data) BIOS-e820: 7f6e2000 - 7f6e3000 (ACPI NVS) BIOS-e820: 7f6e3000 - 8000 (reserved) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fec0 - fec1 (reserved) BIOS-e820: fed0 - fed00400 (reserved) BIOS-e820: fed14000 - fed1a000 (reserved) BIOS-e820: fed1c000 - fed9 (reserved) BIOS-e820: fee0 - fee01000 (reserved) BIOS-e820: ff00 - 0001 (reserved) NX (Execute Disable) protection: active DMI present. DMI: M912/M912, BIOS R02 05/04/2009 e820 update range: - 0001 (usable) == (reserved) e820 remove range: 000a - 0010 (usable) last_pfn = 0x7f6d0 max_arch_pfn = 0x100 MTRR default type: uncachable MTRR fixed ranges enabled: 0-9 write-back A-B uncachable C-C write-protect D-D uncachable E-F write-protect MTRR variable ranges enabled: 0 base 0 mask 08000 write-back 1 base 07F70 mask 0FFF0 uncachable 2 base 07F80 mask 0FF80 uncachable 3 disabled 4 disabled 5 disabled 6 disabled 7 disabled x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106 Scanning 0 areas for low memory corruption initial memory mapped : 0 - 01a0 init_memory_mapping: -37bfe000 00 - 0037bfe000 page 4k kernel direct mapping tables up to 37bfe000 @ 183f000-1a0 ACPI: RSDP 000f7e40 00024 (v02 GBT ) ACPI: XSDT 7f6dc705 00084 (v01 GBTGBTUACPI 0604 LTP ) ACPI: FACP 7f6e1bd2 000F4 (v03 INTEL CALISTGA 0604 ALAN 0001) ACPI: DSDT 7f6dd907 04257 (v01 INTEL CALISTGA 0604 INTL 20050624) ACPI: FACS 7f6e2fc0 00040 ACPI: APIC 7f6e1cc6 00068 (v01 INTEL CALISTGA 0604 LOHR 005A) ACPI: HPET 7f6e1d2e 00038 (v01 INTEL CALISTGA 0604 LOHR 005A) ACPI: MCFG 7f6e1d66 0003C (v01 INTEL CALISTGA 0604 LOHR 005A) ACPI: SLIC 7f6e1da2 00176 (v01 GBTGBTUACPI 0604 TBD 0001) ACPI: TCPA 7f6e1f18 00032 (v01 PTLTD CALISTGA 0604 PTL 0001) ACPI: TMOR 7f6e1f4a 00026 (v01 PTLTD 0604 PTL 0003) ACPI: APIC 7f6e1f70 00068 (v01 PTLTD ? APIC 0604 LTP ) ACPI: BOOT 7f6e1fd8 00028 (v01 PTLTD $SBFTBL$ 0604 LTP 0001) ACPI: SSDT 7f6dcd25 0025F (v01 PmRef Cpu0Tst 3000 INTL 20050624) ACPI: SSDT 7f6dcc7f 000A6 (v01 PmRef Cpu1Tst 3000 INTL 20050624) ACPI: SSDT 7f6dc789 004F6 (v02 PmRefCpuPm 3000 INTL 20050624) ACPI: BIOS bug: multiple APIC/MADT found, using 0 ACPI: If acpi_apic_instance=2 works better, notify linux-a...@vger.kernel.org ACPI: Local APIC address 0xfee0
Re: disk space caching generation missmatch
On Friday 03 December 2010 01:44:49 C Anthony Risinger wrote: Did you fix that typo I posted? C Anthony [mobile] Yes, without fix it wouldn't compile. regards, Johannes -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at fs/btrfs/inode.c:806
On Thursday 02 December 2010 20:21:30 Chris Mason wrote: Excerpts from Johannes Hirte's message of 2010-12-02 12:02:16 -0500: On Thursday 02 December 2010 17:52:50 Johannes Hirte wrote: On Thursday 02 December 2010 17:19:56 Chris Mason wrote: Excerpts from Johannes Hirte's message of 2010-12-01 08:11:01 -0500: On one of my machines with btrfs I got this bug: entry offset 29085974528, bytes 4096, bitmap no entry offset 29162995712, bytes 20480, bitmap yes entry offset 29171744768, bytes 4096, bitmap no block group has cluster?: no 0 blocks of free space at or bigger than bytes is block group 29834084352 has 1073741824 bytes, 1072648192 used 0 pinned 0 reserved Well, you've had an ENOSPC explosion. The block group messages where way more, too much for the dmesg log buffer. Kernel is a 2.6.37-rc3+ without the latest btrfs-fixes. The bug occurred when compiling openoffice.org. After the bug a 'df -h' showed: df -h: FilesystemSize Used Avail Use% Mounted on rootfs 21G 17G 770M 96% / /dev/root 21G 17G 770M 96% / rc-svcdir 1.0M 108K 916K 11% /lib/rc/init.d udev 10M 116K 9.9M 2% /dev shm 1013M 0 1013M 0% /dev/shm /dev/sda2 66G 46G 20G 71% /home /dev/sdb1 75G 56G 19G 75% /mnt/windows Which of these filesystems were you compiling on? On /. It's a gentoo system and the bug happened during an 'emerge openoffice'. The compilation ist usually done under /var/tmp/portage. Btw, I was able to reproduce this with a second try to emerge openoffice. Ok, there is one related fix in the git tree right now that you don't have. I'm not 100% sure it'll fix this, but it can't hurt. -chris Unfortunately it didn't fixed the bug. The system crashed again on emerging openoffice. regards, Johannes -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What to do about subvolumes?
On 12/02/2010 04:49 AM, Arne Jansen wrote: What about the alternative and allocating inode numbers globally? The only problem would be with snapshots as they share the inum with the source, but one could just remap inode numbers in snapshots by sparing some bits at the top of this 64 bit field. I was wondering this as well. Why give each subvol its own inode number space? To avoid breaking assumptions of various programs, if they each have their own inode space, they must each have a unique st_dev. How are inode numbers currently allocated, and why wouldn't it be simple to just have a single pool of inode numbers for all subvols? It seems obvious to me that snapshots start out inheriting the inode numbers of the original subvol, but must be given a new st_dev. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Replacing corrupted files/directories
Is there a recommended way to replace a corrupted file or directory on btrfs? The use case I'm thinking of is handling filesystem corruption by restoring only the corrupted files from backup. For a corrupted file, it seems like deleting the file and replacing it with the copy from the backup works, but I don't know if this is necessarily the best way - the nature of copy-on-write means that there's still corrupt data lying around on the filesystem, right? And for directories, it's tricky, because (afaik) there's no way to delete a directory without first walking it; not good if it's corrupted. I think this is probably also relevant to something like Ceph or CRFS, where redundancy exists, but isn't managed by btrfs directly. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 800 GByte free, but no space left
On Thu, Dec 2, 2010 at 10:23 AM, Helmut Hullen hul...@t-online.de wrote: Btrfs Btrfs v0.19 btrfs in the kernel has been version 0.19 for a *long* time. The version number there may never change. How do you encode a feature mask in a version number? Some features may be in one tree but not upstreamed all together and other such minutiae. What you need to do is use a more recent kernel than 2.6.32 if you want to use btrfs (modulo backports, but let's not talk about that right now). So if you're using a kernel older than 2.6.36, then you should probably upgrade. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 800 GByte free, but no space left
Hallo, Mike, Du meintest am 02.12.10: Btrfs Btrfs v0.19 btrfs in the kernel has been version 0.19 for a *long* time. The version number there may never change. How do you encode a feature mask in a version number? Some features may be in one tree but not upstreamed all together and other such minutiae. Sorry - I forgot: Kernel 2.6.35.8 btrfs-git from 20101117 Viele Gruesse! Helmut -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html