Re: A rescue tool for btrfs
On 17/12/2010, at 00.18, Michael Niederle wrote: Hi! Last week I crashed a btrfs file system. I didn't lose a lot of data because I had current backups of most data and a full backup from a month ago. But I thought it would be a nice idea to have a rescue tool! Currently I have a first release of this tool (surely buggy and runnning on little endian architectures only). Thank you for writing this tool. It was able to save some 80% of all data from a very broken btrfs filesystem. I could mount the filesystem but it would hang after just a few accesses with thousands of parent transid verify failed messages and btrfsck just exited immediately with some huge negative number as the only indication of what was wrong. Using the btrfsck -s $number option also didn't help but your tool seemed to do the job just fine. I still have the broken filesystem as I'm interested to see what Chris Masons new btrfsck code can do with it so if anybody is interested in further debugging I can probably help with that. Regards, Bryan Østergaard -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[BUG] can not allocate space for caching data
Hi, Chris There is something wrong with this patch: commit 83a50de97fe96aca82389e061862ed760ece2283 Author: Chris Mason chris.ma...@oracle.com Date: Mon Dec 13 15:06:46 2010 -0500 Btrfs: prevent RAID level downgrades when space is low The extent allocator has code that allows us to fill allocations from any available block group, even if it doesn't match the raid level we've requested. This was put in because adding a new drive to a filesystem made with the default mkfs options actually upgrades the metadata from single spindle dup to full RAID1. But, the code also allows us to allocate from a raid0 chunk when we really want a raid1 or raid10 chunk. This can cause big trouble because mkfs creates a small (4MB) raid0 chunk for data and metadata which then goes unused for raid1/raid10 installs. The allocator will happily wander in and allocate from that chunk when things get tight, which is not correct. The fix here is to make sure that we provide duplication when the caller has asked for it. It does all the dups to be any raid level, which preserves the dup-raid1 upgrade abilities. Signed-off-by: Chris Mason chris.ma...@oracle.com Btrfs has added the space of single chunks and raid0 chunks into the space information, so when we use btrfs_check_data_free_space() to check if there is some space for storing file data, this function may return true. So we write the data into the cache successfully. But, the extent allocator can not allocate any space to store that cached data, and then the file system panic. I think we subtract that space from the space information, or split the space information into two types, one is used to manage the chunks with duplication, the other manages the other chunks. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] can not allocate space for caching data
Excerpts from Miao Xie's message of 2010-12-20 07:25:10 -0500: Hi, Chris There is something wrong with this patch: commit 83a50de97fe96aca82389e061862ed760ece2283 Author: Chris Mason chris.ma...@oracle.com Date: Mon Dec 13 15:06:46 2010 -0500 Btrfs: prevent RAID level downgrades when space is low The extent allocator has code that allows us to fill allocations from any available block group, even if it doesn't match the raid level we've requested. This was put in because adding a new drive to a filesystem made with the default mkfs options actually upgrades the metadata from single spindle dup to full RAID1. But, the code also allows us to allocate from a raid0 chunk when we really want a raid1 or raid10 chunk. This can cause big trouble because mkfs creates a small (4MB) raid0 chunk for data and metadata which then goes unused for raid1/raid10 installs. The allocator will happily wander in and allocate from that chunk when things get tight, which is not correct. The fix here is to make sure that we provide duplication when the caller has asked for it. It does all the dups to be any raid level, which preserves the dup-raid1 upgrade abilities. Signed-off-by: Chris Mason chris.ma...@oracle.com Btrfs has added the space of single chunks and raid0 chunks into the space information, so when we use btrfs_check_data_free_space() to check if there is some space for storing file data, this function may return true. So we write the data into the cache successfully. But, the extent allocator can not allocate any space to store that cached data, and then the file system panic. I think we subtract that space from the space information, or split the space information into two types, one is used to manage the chunks with duplication, the other manages the other chunks. Ok, do you have a test case that triggers this? I'll work out a patch. Yan Zheng's original idea of 'the chunks should be readonly' should help us deduct them from the total. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] can not allocate space for caching data
On Mon, 20 Dec 2010 07:44:06 -0500, Chris Mason wrote: Excerpts from Miao Xie's message of 2010-12-20 07:25:10 -0500: Hi, Chris There is something wrong with this patch: commit 83a50de97fe96aca82389e061862ed760ece2283 Author: Chris Masonchris.ma...@oracle.com Date: Mon Dec 13 15:06:46 2010 -0500 Btrfs: prevent RAID level downgrades when space is low The extent allocator has code that allows us to fill allocations from any available block group, even if it doesn't match the raid level we've requested. This was put in because adding a new drive to a filesystem made with the default mkfs options actually upgrades the metadata from single spindle dup to full RAID1. But, the code also allows us to allocate from a raid0 chunk when we really want a raid1 or raid10 chunk. This can cause big trouble because mkfs creates a small (4MB) raid0 chunk for data and metadata which then goes unused for raid1/raid10 installs. The allocator will happily wander in and allocate from that chunk when things get tight, which is not correct. The fix here is to make sure that we provide duplication when the caller has asked for it. It does all the dups to be any raid level, which preserves the dup-raid1 upgrade abilities. Signed-off-by: Chris Masonchris.ma...@oracle.com Btrfs has added the space of single chunks and raid0 chunks into the space information, so when we use btrfs_check_data_free_space() to check if there is some space for storing file data, this function may return true. So we write the data into the cache successfully. But, the extent allocator can not allocate any space to store that cached data, and then the file system panic. I think we subtract that space from the space information, or split the space information into two types, one is used to manage the chunks with duplication, the other manages the other chunks. Ok, do you have a test case that triggers this? I'll work out a patch. Yan Zheng's original idea of 'the chunks should be readonly' should help us deduct them from the total. # mkfs.btrfs -d raid1 /dev/sda9 /dev/sda10 # mount /dev/sda9 /mnt # dd if=/dev/zero of=/mnt/tmpfile0 bs=4K count=99 (fill the file system) # umount /mnt # mount /dev/sda9 /mnt # dd if=/dev/zero of=/mnt/tmpfile1 bs=4K count=1000 # sync Thanks Miao -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] can not allocate space for caching data
Excerpts from Miao Xie's message of 2010-12-20 08:13:14 -0500: On Mon, 20 Dec 2010 07:44:06 -0500, Chris Mason wrote: Excerpts from Miao Xie's message of 2010-12-20 07:25:10 -0500: Hi, Chris There is something wrong with this patch: commit 83a50de97fe96aca82389e061862ed760ece2283 Author: Chris Masonchris.ma...@oracle.com Date: Mon Dec 13 15:06:46 2010 -0500 Btrfs: prevent RAID level downgrades when space is low The extent allocator has code that allows us to fill allocations from any available block group, even if it doesn't match the raid level we've requested. Btrfs has added the space of single chunks and raid0 chunks into the space information, so when we use btrfs_check_data_free_space() to check if there is some space for storing file data, this function may return true. So we write the data into the cache successfully. But, the extent allocator can not allocate any space to store that cached data, and then the file system panic. I think we subtract that space from the space information, or split the space information into two types, one is used to manage the chunks with duplication, the other manages the other chunks. Ok, do you have a test case that triggers this? I'll work out a patch. Yan Zheng's original idea of 'the chunks should be readonly' should help us deduct them from the total. # mkfs.btrfs -d raid1 /dev/sda9 /dev/sda10 # mount /dev/sda9 /mnt # dd if=/dev/zero of=/mnt/tmpfile0 bs=4K count=99 (fill the file system) # umount /mnt # mount /dev/sda9 /mnt # dd if=/dev/zero of=/mnt/tmpfile1 bs=4K count=1000 # sync Looks like we've got an off by one bug in set_block_group_ro, which is why our block group isn't getting set to ro. With this patch, we're properly setting the block group ro, and the enospc accounting is done correctly. It should also be able to replace my commit above. Please take a look, Zheng does this look correct to you? diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 227e581..6f7d758 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -7970,13 +7970,14 @@ static int set_block_group_ro(struct btrfs_block_group_cache *cache) if (sinfo-bytes_used + sinfo-bytes_reserved + sinfo-bytes_pinned + sinfo-bytes_may_use + sinfo-bytes_readonly + - cache-reserved_pinned + num_bytes sinfo-total_bytes) { + cache-reserved_pinned + num_bytes = sinfo-total_bytes) { sinfo-bytes_readonly += num_bytes; sinfo-bytes_reserved += cache-reserved_pinned; cache-reserved_pinned = 0; cache-ro = 1; ret = 0; } + spin_unlock(cache-lock); spin_unlock(sinfo-lock); return ret; -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
mini bug report
Hi, I use btrfs on my laptop, 2.6.37-rc2 from kernel.org, under dm-crypt, as /. I use space_cache and compression (not forced). Today, my computer froze. At reboot, the kernel could not mount. The dmesg output, which I haven't saved was speaking of a null dereference. After that I rebooted on 2.6.34, which was not very happy: mount errors (space cache ?). Now I am on 2.6.37-rc2 again, which seems to work. So I guess it might come from space_cache. I would note on a paper if this comes back. However if anybody has any clue on what specific part of the dmesg should be reported, it would be very helpfull ! Cheers, -- Xavier Nicollet -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mini bug report
Excerpts from Xavier Nicollet's message of 2010-12-20 10:58:01 -0500: Hi, I use btrfs on my laptop, 2.6.37-rc2 from kernel.org, under dm-crypt, as /. I use space_cache and compression (not forced). Today, my computer froze. At reboot, the kernel could not mount. The dmesg output, which I haven't saved was speaking of a null dereference. After that I rebooted on 2.6.34, which was not very happy: mount errors (space cache ?). Now I am on 2.6.37-rc2 again, which seems to work. So I guess it might come from space_cache. I would note on a paper if this comes back. However if anybody has any clue on what specific part of the dmesg should be reported, it would be very helpfull ! These sound like the free space caching bugs that josef fixed. If you pull down the latest I think we've got it nailed. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[TRIVIAL][PATCH] Improve error handling in the btrfs command
Hi Chris, below is enclosed a trivial patch, which has the aim to improve the error reporting of the btrfs command. You can pull from http://cassiopea.homelinux.net/git/btrfs-progs-unstable.git branch strerror I changed every printf(some-error) to something like: e = errno; fprintf(stderr, ERROR: - %s, strerror(e)); so: 1) all the error are reported to standard error 2) At the end of the message is printed the error as returned by the system. The change is quite simple, I replaced every printf(some-error) to the line above. I don't touched anything other. I also integrated a missing printf on the basis of the Ben patch. This patch leads the btrfs command to be more user friendly :-) Regards G.Baroncelli btrfs-list.c | 40 ++ btrfs_cmds.c | 77 - utils.c |6 3 files changed, 89 insertions(+), 34 deletions(-) -- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) kreij...@inwind.it Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 diff --git a/btrfs-list.c b/btrfs-list.c index 93766a8..abcc2f4 100644 --- a/btrfs-list.c +++ b/btrfs-list.c @@ -265,7 +265,7 @@ static int resolve_root(struct root_lookup *rl, struct root_info *ri) static int lookup_ino_path(int fd, struct root_info *ri) { struct btrfs_ioctl_ino_lookup_args args; - int ret; + int ret, e; if (ri-path) return 0; @@ -275,9 +275,11 @@ static int lookup_ino_path(int fd, struct root_info *ri) args.objectid = ri-dir_id; ret = ioctl(fd, BTRFS_IOC_INO_LOOKUP, args); + e = errno; if (ret) { - fprintf(stderr, ERROR: Failed to lookup path for root %llu\n, - (unsigned long long)ri-ref_tree); + fprintf(stderr, ERROR: Failed to lookup path for root %llu - %s\n, + (unsigned long long)ri-ref_tree, + strerror(e)); return ret; } @@ -320,15 +322,18 @@ static u64 find_root_gen(int fd) unsigned long off = 0; u64 max_found = 0; int i; + int e; memset(ino_args, 0, sizeof(ino_args)); ino_args.objectid = BTRFS_FIRST_FREE_OBJECTID; /* this ioctl fills in ino_args-treeid */ ret = ioctl(fd, BTRFS_IOC_INO_LOOKUP, ino_args); + e = errno; if (ret) { - fprintf(stderr, ERROR: Failed to lookup path for dirid %llu\n, - (unsigned long long)BTRFS_FIRST_FREE_OBJECTID); + fprintf(stderr, ERROR: Failed to lookup path for dirid %llu - %s\n, + (unsigned long long)BTRFS_FIRST_FREE_OBJECTID, + strerror(e)); return 0; } @@ -351,8 +356,10 @@ static u64 find_root_gen(int fd) while (1) { ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, args); + e = errno; if (ret 0) { - fprintf(stderr, ERROR: can't perform the search\n); + fprintf(stderr, ERROR: can't perform the search - %s\n, +strerror(e)); return 0; } /* the ioctl returns the number of item it found in nr_items */ @@ -407,14 +414,16 @@ static char *__ino_resolve(int fd, u64 dirid) struct btrfs_ioctl_ino_lookup_args args; int ret; char *full; + int e; memset(args, 0, sizeof(args)); args.objectid = dirid; ret = ioctl(fd, BTRFS_IOC_INO_LOOKUP, args); + e = errno; if (ret) { - fprintf(stderr, ERROR: Failed to lookup path for dirid %llu\n, - (unsigned long long)dirid); + fprintf(stderr, ERROR: Failed to lookup path for dirid %llu - %s\n, + (unsigned long long)dirid, strerror(e) ); return ERR_PTR(ret); } @@ -472,6 +481,7 @@ static char *ino_resolve(int fd, u64 ino, u64 *cache_dirid, char **cache_name) struct btrfs_ioctl_search_header *sh; unsigned long off = 0; int namelen; + int e; memset(args, 0, sizeof(args)); @@ -490,8 +500,10 @@ static char *ino_resolve(int fd, u64 ino, u64 *cache_dirid, char **cache_name) sk-nr_items = 1; ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, args); + e = errno; if (ret 0) { - fprintf(stderr, ERROR: can't perform the search\n); + fprintf(stderr, ERROR: can't perform the search - %s\n, + strerror(e)); return NULL; } /* the ioctl returns the number of item it found in nr_items */ @@ -550,6 +562,7 @@ int list_subvols(int fd) char *name; u64 dir_id; int i; + int e; root_lookup_init(root_lookup); @@ -578,8 +591,10 @@ int list_subvols(int fd) while(1) { ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, args); + e = errno; if (ret 0) { - fprintf(stderr, ERROR: can't perform the search\n); + fprintf(stderr, ERROR: can't perform the search - %s\n, +strerror(e)); return ret; } /* the ioctl returns the number of item it found in nr_items */ @@ -747,6 +762,7 @@ int find_updated_files(int fd, u64 root_id, u64 oldest_gen) u64 found_gen; u64 max_found = 0; int i; + int e; u64 cache_dirid = 0; u64 cache_ino = 0; char *cache_dir_name = NULL; @@ -773,8 +789,10 @@ int find_updated_files(int fd, u64 root_id, u64 oldest_gen) max_found = find_root_gen(fd); while(1) { ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, args); + e = errno; if (ret 0) { - fprintf(stderr,
Re: [PATCH] Improve error handling in filesystem df
Hi Ben, I integrated your patch on the my one (see my next email). However I changed the argument of the strerror function from the ioctl return code to the errno variable. Regards G.Baroncelli On Sunday, 19 December, 2010, Ben Gamari wrote: The return values of ioctl weren't being printed to stderr on failure, causing the command to silently fail, resulting in a very confused user. Signed-off-by: Ben Gamari bgamari.f...@gmail.com --- btrfs_cmds.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/btrfs_cmds.c b/btrfs_cmds.c index 8031c58..45da2bd 100644 --- a/btrfs_cmds.c +++ b/btrfs_cmds.c @@ -857,6 +857,7 @@ int do_df_filesystem(int nargs, char **argv) ret = ioctl(fd, BTRFS_IOC_SPACE_INFO, sargs); if (ret) { + fprintf(stderr, ERROR: can't query '%s' for free space (%s)\n, path, strerror(-ret)); free(sargs); return ret; } @@ -875,6 +876,7 @@ int do_df_filesystem(int nargs, char **argv) ret = ioctl(fd, BTRFS_IOC_SPACE_INFO, sargs); if (ret) { + fprintf(stderr, ERROR: can't query '%s' for free space (%s)\n, path, strerror(-ret)); free(sargs); return ret; } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) kreij...@inwind.it Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [TRIVIAL][PATCH] Improve error handling in the btrfs command
On Monday, 20 December, 2010, you (Chris Samuel) wrote: On 21/12/10 07:06, Goffredo Baroncelli wrote: below is enclosed a trivial patch, which has the aim to improve the error reporting of the btrfs command. Any reason to not just use perror() ? Some time I needed to add other info, so perror(3) may not be sufficient.. -- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) kreij...@inwind.it Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [TRIVIAL][PATCH] Improve error handling in the btrfs command
On 21/12/10 07:06, Goffredo Baroncelli wrote: below is enclosed a trivial patch, which has the aim to improve the error reporting of the btrfs command. Any reason to not just use perror() ? -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs: 21 minutes to read 1.2M file directory
Sigh, wrong btrfs address on the original. Apologies for the double-post. On Mon, Dec 20, 2010 at 02:24:46PM -0800, Andy Isaacson wrote: I have a directory with 1.2M files in it, which makes readdir very slow on btrfs with cold caches (although it's reasonably fast with hot caches as in the first example below): % time find /btr/foo /btr/foo.list find /btr/foo /btr/foo.list 4.10s user 7.97s system 36% cpu 33.275 total % head /btr/foo.list /btr/foo /btr/foo/1281373625.777.fg.jpg /btr/foo/1281373625.777.bg.jpg /btr/foo/1281373625.948.fg.jpg /btr/foo/1281373625.948.bg.jpg /btr/foo/1281373626.096.fg.jpg /btr/foo/1281373626.096.bg.jpg /btr/foo/1281373626.218.fg.jpg /btr/foo/1281373626.218.bg.jpg /btr/foo/1281373626.350.fg.jpg % wc !$ wc /btr/foo.list 1216 1216 401499940 /btr/foo.list % wc -l /btr/foo.list 1216 /btr/foo.list % sudo sysctl -w vm.drop_caches=3 vm.drop_caches=0 vm.drop_caches = 3 vm.drop_caches = 0 % time find /btr/foo /btr/foo.list.2 find /btr/foo /btr/foo.list.2 5.62s user 24.54s system 2% cpu 21:40.90 total % uname -a Linux pyron 2.6.36-rc7-00149-g29979aa #71 SMP Wed Oct 13 09:42:57 PDT 2010 x86_64 GNU/Linux Interestingly, while readdir is busy I'm only seeing IO on sdb even though the btrfs is on 3 targets: Label: btr uuid: 1271de53-b3d2-4d68-9d48-b19487e1c982 Total devices 3 FS bytes used 555.13GB devid1 size 18.65GB used 18.64GB path /dev/sda2 devid3 size 512.00GB used 44.13GB path /dev/sdc1 devid2 size 512.00GB used 511.76GB path /dev/sdb1 iostat -k 1 | grep sdb tells me: Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sdb 173.00 692.00 0.00692 0 sdb 185.00 740.00 0.00740 0 sdb 198.00 792.00 0.00792 0 sdb 177.00 712.00 0.00712 0 I updated to a recent git and it's still slow (my test hasn't completed yet 19 minutes in): Linux pyron 2.6.37-rc6-11882-g55ec86f #72 SMP Mon Dec 20 13:34:38 PST 2010 x86_64 GNU/Linux The devices are: [1.834527] ata1.00: ATA-7: INTEL SSDSA2M040G2GC, 2CV102HD, max UDMA/133 [1.834816] ata1.00: 78165360 sectors, multi 1: LBA48 NCQ (depth 31/32) [1.835369] ata1.00: configured for UDMA/133 [1.835776] scsi 0:0:0:0: Direct-Access ATA INTEL SSDSA2M040 2CV1 PQ: 0 ANSI: 5 ... [2.904919] ata3.00: ATA-8: ST31500341AS, CC1H, max UDMA/133 [2.905206] ata3.00: 2930277168 sectors, multi 0: LBA48 NCQ (depth 31/32) [2.947393] ata3.00: configured for UDMA/133 [2.947850] scsi 2:0:0:0: Direct-Access ATA ST31500341AS CC1H PQ: 0 ANSI: 5 ... [3.989664] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [4.018524] ata5.00: ATA-8: ST31500341AS, CC1H, max UDMA/133 [4.018811] ata5.00: 2930277168 sectors, multi 0: LBA48 NCQ (depth 31/32) [4.060838] ata5.00: configured for UDMA/133 [4.061205] scsi 4:0:0:0: Direct-Access ATA ST31500341AS CC1H PQ: 0 ANSI: 5 The host is a Intel(R) Core(TM) i7 CPU 930 @2.80GHz with 12GB RAM. Thanks, -andy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [TRIVIAL][PATCH] Improve error handling in the btrfs command
On 21/12/10 09:53, Goffredo Baroncelli wrote: Some time I needed to add other info, so perror(3) may not be sufficient.. Ah, of course, and you cannot rely on safely snprintf()'ing something into the string would get passed to perror() because that could easily change errno if something went wrong internally. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] can not allocate space for caching data
On Mon, Dec 20, 2010 at 11:41 PM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Miao Xie's message of 2010-12-20 08:13:14 -0500: On Mon, 20 Dec 2010 07:44:06 -0500, Chris Mason wrote: Excerpts from Miao Xie's message of 2010-12-20 07:25:10 -0500: Hi, Chris There is something wrong with this patch: commit 83a50de97fe96aca82389e061862ed760ece2283 Author: Chris Masonchris.ma...@oracle.com Date: Mon Dec 13 15:06:46 2010 -0500 Btrfs: prevent RAID level downgrades when space is low The extent allocator has code that allows us to fill allocations from any available block group, even if it doesn't match the raid level we've requested. Btrfs has added the space of single chunks and raid0 chunks into the space information, so when we use btrfs_check_data_free_space() to check if there is some space for storing file data, this function may return true. So we write the data into the cache successfully. But, the extent allocator can not allocate any space to store that cached data, and then the file system panic. I think we subtract that space from the space information, or split the space information into two types, one is used to manage the chunks with duplication, the other manages the other chunks. Ok, do you have a test case that triggers this? I'll work out a patch. Yan Zheng's original idea of 'the chunks should be readonly' should help us deduct them from the total. # mkfs.btrfs -d raid1 /dev/sda9 /dev/sda10 # mount /dev/sda9 /mnt # dd if=/dev/zero of=/mnt/tmpfile0 bs=4K count=99 (fill the file system) # umount /mnt # mount /dev/sda9 /mnt # dd if=/dev/zero of=/mnt/tmpfile1 bs=4K count=1000 # sync Looks like we've got an off by one bug in set_block_group_ro, which is why our block group isn't getting set to ro. With this patch, we're properly setting the block group ro, and the enospc accounting is done correctly. It should also be able to replace my commit above. Please take a look, Zheng does this look correct to you? diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 227e581..6f7d758 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -7970,13 +7970,14 @@ static int set_block_group_ro(struct btrfs_block_group_cache *cache) if (sinfo-bytes_used + sinfo-bytes_reserved + sinfo-bytes_pinned + sinfo-bytes_may_use + sinfo-bytes_readonly + - cache-reserved_pinned + num_bytes sinfo-total_bytes) { + cache-reserved_pinned + num_bytes = sinfo-total_bytes) { sinfo-bytes_readonly += num_bytes; sinfo-bytes_reserved += cache-reserved_pinned; cache-reserved_pinned = 0; cache-ro = 1; ret = 0; } + spin_unlock(cache-lock); spin_unlock(sinfo-lock); return ret; Looks good for me, Yan, Zheng -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs: 21 minutes to read 1.2M file directory
On Tue, Dec 21, 2010 at 12:24 AM, Andy Isaacson a...@hexapodia.org wrote: I have a directory with 1.2M files in it, which makes readdir very slow on btrfs with cold caches (although it's reasonably fast with hot caches as in the first example below): Sounds like: Bug 21562 - btrfs is dead slow due to fragmentation https://bugzilla.kernel.org/show_bug.cgi?id=21562 -- Felipe Contreras -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Scary OOPS when playing with --bind, --move, and friends
hello, i really need to stop recklessly doing this stuff to my laptop... i'm finishing a new initramfs hook to support many features of btrfs; when considering how i was going to mount the target subvol as / for the booting system, i decided to play with --bind and --move. in short, everything works fine until you --bind across a subvol via the special folders created when one takes a snapshot, or --bind the special folder itself. the --bind succeeds, and everything initially appears to work fine... this is nearly the exact process i did; should reproduce :-(i'm scared to do it again...): - # mkdir -p sand/root sand/bind # cd sand # mount -o subvolid=0 /dev/sda root # mount --bind root/subvol of my current root/home/anthony bind # touch bind/TEST you can now see TEST at ~/TEST and bind/TEST # vim bind/TEST did it work? :wq you can see the edited version ONLY in the one you edited... the other is still 0 bytes # vim ~/anthony/TEST 1 wtf, why not? :wq machine panics, X is instantly replaced by an oopsie screen; machine locked - i don't know why i decided to stupidly edit the bad version, even though something was clearly wrong. at any rate, this was about 15 minutes ago... the machine booted back up alright after a hard reboot, hooray for that, but methinks there is probably some corruptions in there now... meh. i don't know what it means, but when the two versions desynced (it could have been like this, but i didn't notice until after the desync), `ls -l` reported a `0` right after the permissions: -rw-r--r-- 0 anthony users 8 Dec 20 21:41 TEST all other files report `1`. since /dev and /proc etc. have different numbers, this appears to have something to do with the mount or device? i panicked wen the kernel did, and i forgot to write down the message, but the trace had `vfs_rename` and `tomoyo_???`... sorry for the bad memory. vim was attempting to move a temporary file over the top of the misbehaving file, hence the rename. i'm on 2.6.36.2 the `directory as a subvol` thing seems to be a little finicky :-) did i do something incorrect? should this kind of operation be supported? it seems to work fine so long as i stay on the same subvol. thanks, C Anthony -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scary OOPS when playing with --bind, --move, and friends
On Tue, Dec 21, 2010 at 10:51 AM, C Anthony Risinger anth...@extof.me wrote: in short, everything works fine until you --bind across a subvol via the special folders created when one takes a snapshot, # mount --bind root/subvol of my current root/home/anthony bind # touch bind/TEST you can now see TEST at ~/TEST and bind/TEST bind/ is a mounted snapshot, right? if yes, then when you touch bind/TEST, it should also appear in root/subvol of my current root/home/anthony/TEST, and NOT in root/home/anthony/TEST or /home/anthony/TEST i'm on 2.6.36.2 Try 2.6.35 or later. I tested something similar under ubuntu maverick (2.6.35-24-generic) and it works just fine. -- Fajar -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scary OOPS when playing with --bind, --move, and friends
On Mon, Dec 20, 2010 at 10:16 PM, Fajar A. Nugraha l...@fajar.net wrote: On Tue, Dec 21, 2010 at 10:51 AM, C Anthony Risinger anth...@extof.me wrote: in short, everything works fine until you --bind across a subvol via the special folders created when one takes a snapshot, # mount --bind root/subvol of my current root/home/anthony bind # touch bind/TEST you can now see TEST at ~/TEST and bind/TEST bind/ is a mounted snapshot, right? if yes, then when you touch bind/TEST, it should also appear in root/subvol of my current root/home/anthony/TEST, and NOT in root/home/anthony/TEST or /home/anthony/TEST i'm on 2.6.36.2 Try 2.6.35 or later. I tested something similar under ubuntu maverick (2.6.35-24-generic) and it works just fine. Last I checked, 2.6.36 came after 2.6.35. :) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scary OOPS when playing with --bind, --move, and friends
On Tue, Dec 21, 2010 at 11:16 AM, Fajar A. Nugraha l...@fajar.net wrote: On Tue, Dec 21, 2010 at 10:51 AM, C Anthony Risinger anth...@extof.me wrote: i'm on 2.6.36.2 Try 2.6.35 or later. I tested something similar under ubuntu maverick (2.6.35-24-generic) and it works just fine. Sorry, hit send to soon. I though you wrote 2.6.32 :P Still curious about your test scenario though. Can you double check it? A write on the snapshot should not appear on the parent filesystem. -- Fajar -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scary OOPS when playing with --bind, --move, and friends
On Mon, Dec 20, 2010 at 10:19 PM, Fajar A. Nugraha l...@fajar.net wrote: On Tue, Dec 21, 2010 at 11:16 AM, Fajar A. Nugraha l...@fajar.net wrote: On Tue, Dec 21, 2010 at 10:51 AM, C Anthony Risinger anth...@extof.me wrote: i'm on 2.6.36.2 Try 2.6.35 or later. I tested something similar under ubuntu maverick (2.6.35-24-generic) and it works just fine. Sorry, hit send to soon. I though you wrote 2.6.32 :P Still curious about your test scenario though. Can you double check it? A write on the snapshot should not appear on the parent filesystem. sorry maybe i wasn't very clear; my current root is a subvol... the directory i was --bind mounting corresponded to /home/anthony: / and root/subvol of my current root are the same; so it should show up in my /home/anthony directory. if mount the subvol by id, then --bind mount, it works as expected; only when crossing the magic barrier doesn't things seem to freak out. i actually reproduced it twice, but this time i didn't write to the files :-) C Anthony -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Scary OOPS when playing with --bind, --move, and friends
On Mon, Dec 20, 2010 at 10:25 PM, C Anthony Risinger anth...@extof.me wrote: On Mon, Dec 20, 2010 at 10:19 PM, Fajar A. Nugraha l...@fajar.net wrote: Still curious about your test scenario though. Can you double check it? A write on the snapshot should not appear on the parent filesystem. sorry maybe i wasn't very clear; my current root is a subvol... the directory i was --bind mounting corresponded to /home/anthony: / and root/subvol of my current root are the same; so it should show up in my /home/anthony directory. if mount the subvol by id, then --bind mount, it works as expected; only when crossing the magic barrier doesn't things seem to freak out. s/doesn't/do/g to be exact, it looks like this: - (subvolid) source mount [options] (262) /dev/sda / (__0) /dev/sda /home/anthony/sand/root [subvolid=0] (???) /home/anthony/sand/root/vols/262/home/anthony /home/anthony/sand/bind [--bind] - all my subvolumes are kept in a vols directory in the btrfs root, so my / and the --bind mount were suppose to be referencing the same location. additionally, TEST showed up in both locations... it was the editing part that blew up. NOTE however, that the subvol (id 262) itself was _never_ actually mounted, it was accessed thru the btrfs root mounted at `root`. i think this is the crux of the problem; --bind doesn't seem to know that the directory it was binding isn't 100% within the mount point it resides under. C Anthony -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html