[PATCH] btrfs-progs: Fix a extent buffer leak in count_csum_range().
The commit f495a2ac6611 (btrfs-progs: fsck: remove unfriendly BUG_ON() for searching tree failure) is causing tons of extent buffer leak if some csum mismatches in btrfsck. This is caused by a misplaced btrfs_release_path(), fix it. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- cmds-check.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cmds-check.c b/cmds-check.c index d2d218a..5b644cf 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -1186,9 +1186,9 @@ static int count_csum_range(struct btrfs_root *root, u64 start, path.slots[0]++; } out: + btrfs_release_path(path); if (ret 0) return ret; - btrfs_release_path(path); return 0; } -- 2.2.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Uncorrectable errors on RAID-1?
On Sun, Jan 4, 2015 at 9:18 PM, Phillip Susi ps...@ubuntu.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 01/03/2015 12:31 AM, Chris Murphy wrote: This is architecture astronaut territory. The system only has a terrible response for two reasons: 1. The user spec'd the wrong hardware for the use case; 2. The distro isn't automatically leveraging existing ways to mitigate that user mistake by changing either SCT ERC on the drives, or the SCSI command timer for each block device. No, it has terrible response because the kernel either waits an unreasonable time or fails the drive and kicks it out of the array instead of trying to repair it. It's a default that works for more use cases than not. The kernel isn't dynamically self-configuring, and it isn't even the kernel's job to take the first step which is to enable and correctly set SCT ERC on each drive. I think assuming a large pile of causes for a drive freezing on a command be treated as read errors (after the link reset) is a bad idea. But since it's your idea, and I'm not a kernel developer, you should propose it on linux-raid@ instead of arguing with me. Blaming the user for not buying better hardware is not an appropriate response for the kernel failing so badly to handle commonly available hardware that doesn't behave in the most ideal way. Hi, I'm a good and knowledgeable sysadmin. I buy hardware that's explicitly stated in the company's marketing data sheet as being incompatible with my use case. This is someone else's fault. Sounds like buck passing. Now, even though that solution *might* mean long recoveries on occasion, it's still better than link reset behavior which is what we have today because it causes the underlying problem to be fixed by md/dm/Btrfs once the read error is reported. But no distro has implemented this $500 man hour solution. Instead you're suggesting a $500,000 fix that will take hundreds of man hours and end user testing to find all the edge cases. It's like, seriously, WTF? Seriously? Treating a timeout the same way you treat an unrecoverable media error is no herculean task. So you keep saying. But best practices is already known and tested, and can be done with a startup script. Yet no distro does this for the user, even though its much much simpler than what you're proposing, and actually fixes both sources of the problem. That it is in your opinion an imperfect fix is not relevant. It's still better behavior than what we have today, and yet still no distro does this, thereby tacitly preferring status quo. And if the current behavior is simply good enough no one has taken action to implement automatically the known best practice work around of the day, why should kernel developers gives two shits about this idea? Sounds like more buck passing. http://www.seagate.com/files/www-content/support-content/documentation/product-manuals/en-us/Enterprise/Savvio/Savvio%2015K.3/100629381e.pdf That's a high end SAS drive. It's default is to retry up to 20 times, which takes ~1.4 seconds, per sector. But also note how it says 20 retries on a 15,000 rpm drive only takes 80 milliseconds, not 1.4 seconds. 15,000 rpm / 60 seconds per minute = 250 rotations/retries per second. The PDF contains a table saying 20 retries takes 1.4 seconds. I didn't compute this number myself, it's in the bloody manufacturer's own documentation. Obviously the ECC is doing things that take more than one revolution of the spindle. Maybe you'd prefer seeing these big, cheap, green drives have shorter ERC times, with a commensurate reality check with their unrecoverable error rate, which right now is already two orders magnitude higher than enterprise SAS drives. So what if this means that rate is 3 or 4 orders magnitude higher? 20 retries vs. 200 retries does not reduce the URE rate by orders of magnitude; more like 1% *maybe*. 200 vs 2000 makes no measurable difference at all. I see, well I guess you prefer believing in fraud and conspiracy theories, by multiple companies, to screw users over, while they admit the incompatibility of the intended use case on their data sheets. -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ignoring bad blocks
On Sun, Jan 04, 2015 at 01:45:41AM -0700, Chris Murphy wrote: On Sat, Jan 3, 2015 at 10:40 PM, Dyweni - BTRFS y4bwxfpc4...@dyweni.com wrote: Hi All, Can BTRFS ignore bad blocks as they are discovered? I want to try BTRFS on some older drives, but they all have a few bad blocks. Not currently, and I don't see it in the project ideas list. Right now on Btrfs you will just get write errors, but I'm uncertain if it just tries a new sector and continues on (indirectly not use the bad sector but also not keeping track of it either)? The unreliable disk features are still project ideas. badblocks are a thing of the past, as you hinted drives automatically remap badblocks so that the filesystem doesn't have to deal with them. If you have a questionable drive, you can indeed simply dd 0's over it before you use it with btrfs. Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ignoring bad blocks
On Sat, Jan 3, 2015 at 10:40 PM, Dyweni - BTRFS y4bwxfpc4...@dyweni.com wrote: Hi All, Can BTRFS ignore bad blocks as they are discovered? I want to try BTRFS on some older drives, but they all have a few bad blocks. Not currently, and I don't see it in the project ideas list. Right now on Btrfs you will just get write errors, but I'm uncertain if it just tries a new sector and continues on (indirectly not use the bad sector but also not keeping track of it either)? The unreliable disk features are still project ideas. If the drives no longer have reserve sectors, then technically they're toast. That's indicated by write failure in dmesg. Two work arounds: use ext4 with mkfs.ext4 -c which builds a bad blocks list and then won't use those sectors; mdadm 3.1+ has an option to build a bad blocks list also but I don't know if raid0 or linear/concat are supported: http://thread.gmane.org/gmane.linux.raid/34883 If you haven't tried it, badblocks -wvs will (destructively) write over the entire block device, and the drive firmware should detect persistent write failures automatically and remap the LBA to a reserve sector, removing the bad sector from use. This is transparent to everything outside the drive. Once reserve sectors are depleted then the drive will report write failure. -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: possible bug in balance
lu...@plaintext.sk writes: Hello, luvar@blackdawn:~$ sudo time btrfs balance start -dconvert=raid1 -dusage=20 /home/luvar/programs/ Am I doing something forbidden (I have not see any structure where raid type is stored per file/subvolume item), or I just hit some problem? What should I try? btrfs doesn't yet support per-subvolume RAID1 levels. I'm not sure how it should behave with your command line. It probably tries to rebalance the whole filesystem. Than I wanted to convert to raid1 also some data (with balance filter) and try if there is some speedup when reading files (starting programs)... Though I can already tell that no, there won't be a speedup, as btrfs scheduler chooses the device to access by using the process id as a seed. Therefore a single thread is never able to use 100% RAID1 input capability. Perhaps in future there will be more sophisticated schedulers. You may try to use MD raid1 for extra speed, but you would lose the automatic error recovery of btrfs (but you would still notice if data gets corrupted). [ 8159.300427] attempt to access beyond end of device [ 8159.300434] sdb2: rw=1041, want=480110048, limit=473956352 [ 8159.300440] btrfs: bdev /dev/sdb2 errs: wr 638628, rd 65867, flush 0, corrupt 0, gen 0 I have noticed that 'attempt to access beyond end of device' typically indicates (with other file systems, I haven't seen that with btrfs) that the partition table and the filesystem size don't match. Typically such a situation could occur when one modifies partition table after creating the file system, though I'm sure there are other ways to get into such a situation. You may find the filesystem size with btrfs filesystem show and partition sizes with cat /proc/partitions (multiply by block size = 1024 bytes). Should the partition sizes and filesystem sizes match, I would be quite certain this would indeed be a btrfs bug. But, root@blackdawn:/home/luvar# uname -a Linux blackdawn 3.13.0-30-generic #55-Ubuntu SMP Fri Jul 4 21:40:53 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux root@blackdawn:/home/luvar# btrfs v Btrfs v0.20-rc1-189-g704a08c should this turn out to be a bug, I'm certain trying a more recent kernel version is a terrific idea :). 3.18.x or 3.17.y where y2 (I think those were the two versions that were bad in 3.17 series..). They won't have support for raid1'n a subvolume either, though, as far as I know. Remember backups :). -- _ / __// /__ __ http://www.modeemi.fi/~flux/\ \ / /_ / // // /\ \/ /\ / /_/ /_/ \___/ /_/\_\@modeemi.fi \/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] E2fsprogs: add compress and cow support in chattr, lsattr
Lutz Vieweg l...@5t9.de writes: Maybe chattr +C could print a warning if a file to change attributes for is 0 bytes long? This may only affect btrfs. The old ext2? ext3? compression patches were able to compress pre-existing files. I don't know how other filesystems behave in this regard. -- _ / __// /__ __ http://www.modeemi.fi/~flux/\ \ / /_ / // // /\ \/ /\ / /_/ /_/ \___/ /_/\_\@modeemi.fi \/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fixing quota error when removing files from a limit exceeded subvols
On Sat, Jan 3, 2015 at 10:29 PM, Khaled Ahmed khaled@gmail.com wrote: Hi Yang, This is how to reproduce the bug, [root@algodev ~]# uname -r 3.18.0+ [root@algodev ~]# btrfs version Btrfs v3.18-2-g6938452-dirty [root@algodev ~]# btrfs quota enable LOOP/ [root@algodev ~]# btrfs qgroup show LOOP/ qgroupid rfer excl 0/5 16384 16384 [root@algodev ~]# btrfs subvol create LOOP/subvol1 Create subvolume 'LOOP/subvol1' [root@algodev ~]# btrfs qgroup limit 1g LOOP/subvol1/ [root@algodev ~]# btrfs qgroup show LOOP/ qgroupid rfer excl 0/5 16384 16384 0/25816384 16384 [root@algodev ~]# dd if=/dev/zero of=LOOP/subvol1/bigfile dd: writing to ‘LOOP/subvol1/bigfile’: Disk quota exceeded 2097018+0 records in 2097017+0 records out 1073672704 bytes (1.1 GB) copied, 10.0759 s, 107 MB/s [root@algodev ~]# rm -f LOOP/subvol1/bigfile rm: cannot remove ‘LOOP/subvol1/bigfile’: Disk quota exceeded Hi Ahmed, Okey, thanx for your example. a). I guess your problem is getting a EQUOTA when remove a file here. It's because we need to reserve some metadata in transaction of btrfs_unlink(). b). I think you patch here will not solve your problem. The root cause is current quota in btrfs is accounting data and metadata together. c). I admit getting a EQUOTA is strange when you did not writing anything but only remove a file. I had a plan in my TODO list which is making qgroup to limit and account the size in three modes, data, metadata and both. Then in this case if you only limit the size of data, you will not get a EQUOTA any more. Thanx Yang [root@algodev ~]# Best Regards, ~Khaled Ahmed On Jan 3, 2015, at 4:09 AM, Dongsheng Yang dongsheng081...@gmail.com wrote: Hi Khaled, Could you give use more description about the problem this patch is trying to solve? Maybe an example will help a lot to understand it. Thanx On Fri, Jan 2, 2015 at 7:48 AM, Khaled Ahmed khaled@gmail.com wrote: Signed-off-by: Khaled Ahmed khaled@gmail.com --- fs/btrfs/qgroup.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 48b60db..b85200d 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -2408,14 +2408,14 @@ int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes) if ((qg-lim_flags BTRFS_QGROUP_LIMIT_MAX_RFER) qg-reserved + (s64)qg-rfer + num_bytes - qg-max_rfer) { + qg-max_rfer - 1 ) { ret = -EDQUOT; goto out; } if ((qg-lim_flags BTRFS_QGROUP_LIMIT_MAX_EXCL) qg-reserved + (s64)qg-excl + num_bytes - qg-max_excl) { + qg-max_excl - 1) { ret = -EDQUOT; goto out; } -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] E2fsprogs: add compress and cow support in chattr, lsattr
Am Sonntag, 4. Januar 2015, 12:40:59 schrieb Erkki Seppala: Lutz Vieweg l...@5t9.de writes: Maybe chattr +C could print a warning if a file to change attributes for is 0 bytes long? This may only affect btrfs. The old ext2? ext3? compression patches were able to compress pre-existing files. I don't know how other filesystems behave in this regard. +C is no-cow, -c is compression. -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Debian/Jessie 3.16.7-ckt2-1 kernel error
Hi Petr, On 2014/12/28 0:36, Petr Janecek wrote: Hello Satoru and all, that Oct. report was the only time I've experienced the error, so I don't have much to add. I can try to answer your questions: Here are my questions. 1. Is your system btrfs scrub clean? yes, 2. Is this message shown every boot time? no, I have seen them only during one boot 3. Is this message shown only in boot? As in my Oct. email http://permalink.gmane.org/gmane.comp.file-systems.btrfs/39721 I've seen a similar one after creating a subvolume on a new fs. But it was during the same boot. 4. When this message is started to be shown? 5. Do you have any trouble, change your operation or configuration just before the answer of Q4 ? a disk was added to the fs and balance has been run. The balance crashed, as in https://bugzilla.kernel.org/show_bug.cgi?id=64961 (probably unrelated). After reboot, I've seen the messages. Additional questions. Q5. Could you give me your kernel configuration? At least, could you tell me whether your kernel enabled CONFIG_PREEMPT or not? CONFIG_PREEMPT_VOLUNTARY=y Q6. If the answer of Q1 is correct, please give me the file system image which can be captured by the following command. Sorry, the fs's are long gone. I continued to run similar workloads on that test box, but these errors never appeared again. Thank you for giving me information. So, further investigation of this problem seems to be hard. Please give us the above-mentioned information if this problem happens again. Thanks, Satoru Regards, Petr -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] btrfs-progs: Documentation: add T/P/E description for resize cmd
On Fri, 2015-01-02 at 17:21 +0100, David Sterba wrote: On Fri, Jan 02, 2015 at 05:12:04PM +0100, David Sterba wrote: On Thu, Jan 01, 2015 at 08:27:55PM -0700, Chris Murphy wrote: Small problem with the rendering of this commit d4ef1a06f8be623ae94e4d498c306e8dd1605bef, when I use 'man btrfs filesystem' the above portion looks like this: 'K', 'M', 'G', 'T', 'P', or 'E\', I'm not sure why there's a trailing slash after the E. Me neither, but it looks like a bug in the asciidoc processing. Seems that only the first ' has to be quoted, and consumes the next unquoted ' as a pair, so with the last \' the next one is missing and is printed verbatim: Fixed by: -units designators: \'K\', \'M\', \'G\', \'T\', \'P\', or \'E\', which represent +units designators: \'K', \'M', \'G', \'T', \'P', or \'E', which represent Oh, sorry, I missed this problem, thanks for fixing it. -Gui -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] xfstests: btrfs: fix up 001.out
On Fri, Jan 02, 2015 at 09:04:29PM +0800, Anand Jain wrote: The subvol delete output has changed with btrfs-progs Better to point out that since which btrfs-progs version the output changed. -Delete subvolume 'SCRATCH_MNT/snap' +Delete subvolume (no-commit): 'SCRATCH_MNT/snap' so fix 001 failing. Signed-off-by: Anand Jain anand.j...@oracle.com v2: Thanks Filipe for mentioning now we have _run_btrfs_util_prog. and commit update I think a better way to fix this is to update the _filter_btrfs_subvol_delete filter Right now the filter does delete message about transaction commit: sed -e /Transaction commit: none (default)/d Just adding another -e to sed to delete the (no-commit): part is fine. Thanks, Eryu --- tests/btrfs/001 | 2 +- tests/btrfs/001.out | 1 - 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/tests/btrfs/001 b/tests/btrfs/001 index 8258d06..a7747c8 100755 --- a/tests/btrfs/001 +++ b/tests/btrfs/001 @@ -99,7 +99,7 @@ echo Listing subvolumes $BTRFS_UTIL_PROG subvolume list $SCRATCH_MNT | awk '{ print $NF }' # Delete the snapshot -$BTRFS_UTIL_PROG subvolume delete $SCRATCH_MNT/snap | _filter_btrfs_subvol_delete +_run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap echo List root dir ls $SCRATCH_MNT _scratch_remount diff --git a/tests/btrfs/001.out b/tests/btrfs/001.out index c782bde..43e8c56 100644 --- a/tests/btrfs/001.out +++ b/tests/btrfs/001.out @@ -33,7 +33,6 @@ subvol Listing subvolumes snap subvol -Delete subvolume 'SCRATCH_MNT/snap' List root dir subvol List root dir -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe fstests in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Uncorrectable errors on RAID-1?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 01/03/2015 12:31 AM, Chris Murphy wrote: It's not a made to order hard drive industry. Maybe one day you'll be able to 3D print your own with its own specs. And wookies did not live on endor. What's your point? Sticking fingers in your ears doesn't change the fact there's a measurable difference in support requirements. Sure, just don't misrepresent one requirement for another. Just because I don't care about a warranty from the hardware manufacturer does not mean I have no right to expect the kernel to perform *reasonably* on that hardware. This is architecture astronaut territory. The system only has a terrible response for two reasons: 1. The user spec'd the wrong hardware for the use case; 2. The distro isn't automatically leveraging existing ways to mitigate that user mistake by changing either SCT ERC on the drives, or the SCSI command timer for each block device. No, it has terrible response because the kernel either waits an unreasonable time or fails the drive and kicks it out of the array instead of trying to repair it. Blaming the user for not buying better hardware is not an appropriate response for the kernel failing so badly to handle commonly available hardware that doesn't behave in the most ideal way. Now, even though that solution *might* mean long recoveries on occasion, it's still better than link reset behavior which is what we have today because it causes the underlying problem to be fixed by md/dm/Btrfs once the read error is reported. But no distro has implemented this $500 man hour solution. Instead you're suggesting a $500,000 fix that will take hundreds of man hours and end user testing to find all the edge cases. It's like, seriously, WTF? Seriously? Treating a timeout the same way you treat an unrecoverable media error is no herculean task. Ok well I think that's hubris unless you're a hard drive engineer. You're referring to how drives behaved over a decade ago, when bad sectors were persistent rather than remapped, and we had to scan the drive at format time to build a map so the bad ones wouldn't be used by the filesystem. Remapping has nothing to do with it: we are talking about *read* errors, which do not trigger a remap. http://www.seagate.com/files/www-content/support-content/documentation/product-manuals/en-us/Enterprise/Savvio/Savvio%2015K.3/100629381e.pdf That's a high end SAS drive. It's default is to retry up to 20 times, which takes ~1.4 seconds, per sector. But also note how it says 20 retries on a 15,000 rpm drive only takes 80 milliseconds, not 1.4 seconds. 15,000 rpm / 60 seconds per minute = 250 rotations/retries per second. Maybe you'd prefer seeing these big, cheap, green drives have shorter ERC times, with a commensurate reality check with their unrecoverable error rate, which right now is already two orders magnitude higher than enterprise SAS drives. So what if this means that rate is 3 or 4 orders magnitude higher? 20 retries vs. 200 retries does not reduce the URE rate by orders of magnitude; more like 1% *maybe*. 200 vs 2000 makes no measurable difference at all. -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQEcBAEBCgAGBQJUqhCxAAoJENRVrw2cjl5RhDYH/RLbHXEPyjK4j6u33ElOyS5S W5/nfiT1ZZjVAFxJwD0y/gt2L61hB1PQdlUjBm2NayExfCXn3sEuccAxvjMDrvsL dFJOV8G/7GBbUfsD0uBustG5639QGc30bRzuiw/URT77zNf+T6+5SmTPSC3Oaj3j fCcDdiKCwNcYiUF3/Q3gdh4XVI8wgoABHC2S/GqvRB+FmmqD6Yt6yG50TG5sPBzq zSUSxWjOPwVinZOlPfCUCFr3buw+yzg5fclcvaNRStJM38gtK0UGgeIHFgCViHtN 0xNRCKWMu3XkfjfOI/cYVor79K4sQlz9K83Ja/UAMrOtopdlKjn9N04oIiPdsbg= =u/i9 -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/3] Btrfs: Enhancment for qgroup.
Hi Josef and others, This patch set is about enhancing qgroup. [1/3]: fix a bug about qgroup leak when we exceed quota limit, It is reviewd by Josef. [2/3]: introduce a new accounter in qgroup to close a window where user will exceed the limit by qgroup. It looks good to Josef. [3/3]: a new patch to fix a bug reported by Satoru. BTW, I have some other plan about qgroup in my TODO list: Kernel: a). adjust the accounters in parent qgroup when we move the child qgroup. Currently, when we move a qgroup, the parent qgroup will not updated at the same time. This will cause some wrong numbers in qgroup. b). add a ioctl to show the qgroup info. Command btrfs qgroup show is showing the qgroup info read from qgroup tree. But there is some information in memory which is not synced into device. Then it will show some outdate number. c). limit and account size in 3 modes, data, metadata and both. qgroup is accounting the size both of data and metadata togather, but to a user, the data size is the most useful to them. d). remove a subvolume related qgroup when subvolume is deleted and there is no other reference to it. user-tool: a). Add the unit of B/K/M/G to btrfs qgroup show. b). get the information via ioctl rather than reading it from btree. Will keep the old way as a fallback for compatiblity. Any comment and sugguestion is welcome. :) Yang Dongsheng Yang (3): Btrfs: qgroup: free reserved in exceeding quota. Btrfs: qgroup: Introduce a may_use to account space_info-bytes_may_use. Btrfs: qgroup, Account data space in more proper timings. fs/btrfs/extent-tree.c | 41 +++--- fs/btrfs/file.c| 9 --- fs/btrfs/inode.c | 18 - fs/btrfs/qgroup.c | 68 +++--- fs/btrfs/qgroup.h | 4 +++ 5 files changed, 117 insertions(+), 23 deletions(-) -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 3/3] Btrfs: qgroup, Account data space in more proper timings.
Currenly, in data writing, -reserved is accounted in fill_delalloc(), but -may_use is released in clear_bit_hook() which is called by btrfs_finish_ordered_io(). That's too late, that said, between fill_delalloc() and btrfs_finish_ordered_io(), the data is doublely accounted by qgroup. It will cause some unexpected -EDQUOT. Example: # btrfs quota enable /root/btrfs-auto-test/ # btrfs subvolume create /root/btrfs-auto-test//sub Create subvolume '/root/btrfs-auto-test/sub' # btrfs qgroup limit 1G /root/btrfs-auto-test//sub dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 count=150 dd: error writing '/root/btrfs-auto-test//sub/file': Disk quota exceeded 681353+0 records in 681352+0 records out 697704448 bytes (698 MB) copied, 8.15563 s, 85.5 MB/s It's (698 MB) when we got an -EDQUOT, but we limit it by 1G. This patch move the btrfs_qgroup_reserve/free() for data from btrfs_delalloc_reserve/release_metadata() to btrfs_check_data_free_space() and btrfs_free_reserved_data_space(). Then the accounter in qgroup will be updated at the same time with the accounter in space_info updated. In this way, the unexpected -EDQUOT will be killed. Reported-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com Signed-off-by: Dongsheng Yang yangds.f...@cn.fujitsu.com --- fs/btrfs/extent-tree.c | 16 +--- fs/btrfs/file.c| 9 - 2 files changed, 9 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index d1a7ce0..67c2e28 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3774,12 +3774,16 @@ commit_trans: data_sinfo-flags, bytes, 1); return -ENOSPC; } + ret = btrfs_qgroup_reserve(root, bytes); + if (ret) + goto out; data_sinfo-bytes_may_use += bytes; trace_btrfs_space_reservation(root-fs_info, space_info, data_sinfo-flags, bytes, 1); +out: spin_unlock(data_sinfo-lock); - return 0; + return ret; } /* @@ -3796,6 +3800,7 @@ void btrfs_free_reserved_data_space(struct inode *inode, u64 bytes) data_sinfo = root-fs_info-data_sinfo; spin_lock(data_sinfo-lock); WARN_ON(data_sinfo-bytes_may_use bytes); + btrfs_qgroup_free(root, bytes); data_sinfo-bytes_may_use -= bytes; trace_btrfs_space_reservation(root-fs_info, space_info, data_sinfo-flags, bytes, 0); @@ -5191,8 +5196,7 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes) spin_unlock(BTRFS_I(inode)-lock); if (root-fs_info-quota_enabled) { - ret = btrfs_qgroup_reserve(root, num_bytes + - nr_extents * root-nodesize); + ret = btrfs_qgroup_reserve(root, nr_extents * root-nodesize); if (ret) goto out_fail; } @@ -5200,8 +5204,7 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes) ret = reserve_metadata_bytes(root, block_rsv, to_reserve, flush); if (unlikely(ret)) { if (root-fs_info-quota_enabled) - btrfs_qgroup_free(root, num_bytes + - nr_extents * root-nodesize); + btrfs_qgroup_free(root, nr_extents * root-nodesize); goto out_fail; } @@ -5319,8 +5322,7 @@ void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes) trace_btrfs_space_reservation(root-fs_info, delalloc, btrfs_ino(inode), to_free, 0); if (root-fs_info-quota_enabled) { - btrfs_qgroup_free(root, num_bytes + - dropped * root-nodesize); + btrfs_qgroup_free(root, dropped * root-nodesize); } btrfs_block_rsv_release(root, root-fs_info-delalloc_block_rsv, diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index e409025..0ab1333 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2527,7 +2527,6 @@ static long btrfs_fallocate(struct file *file, int mode, { struct inode *inode = file_inode(file); struct extent_state *cached_state = NULL; - struct btrfs_root *root = BTRFS_I(inode)-root; u64 cur_offset; u64 last_byte; u64 alloc_start; @@ -2555,11 +2554,6 @@ static long btrfs_fallocate(struct file *file, int mode, ret = btrfs_check_data_free_space(inode, alloc_end - alloc_start); if (ret) return ret; - if (root-fs_info-quota_enabled) { - ret = btrfs_qgroup_reserve(root, alloc_end - alloc_start); - if (ret) - goto out_reserve_fail; - } mutex_lock(inode-i_mutex); ret = inode_newsize_ok(inode,
Re: [PATCH v2] xfstests: btrfs: fix up 001.out
On 01/05/2015 11:25 AM, Eryu Guan wrote: On Fri, Jan 02, 2015 at 09:04:29PM +0800, Anand Jain wrote: The subvol delete output has changed with btrfs-progs Better to point out that since which btrfs-progs version the output changed. The fix here is output string change neutral, so it does not matter. -Delete subvolume 'SCRATCH_MNT/snap' +Delete subvolume (no-commit): 'SCRATCH_MNT/snap' so fix 001 failing. Signed-off-by: Anand Jain anand.j...@oracle.com v2: Thanks Filipe for mentioning now we have _run_btrfs_util_prog. and commit update I think a better way to fix this is to update the _filter_btrfs_subvol_delete filter Right now the filter does delete message about transaction commit: sed -e /Transaction commit: none (default)/d Just adding another -e to sed to delete the (no-commit): part is fine. in this case checking for the output string was fundamentally wrong for a long. Thanks, Anand Thanks, Eryu --- tests/btrfs/001 | 2 +- tests/btrfs/001.out | 1 - 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/tests/btrfs/001 b/tests/btrfs/001 index 8258d06..a7747c8 100755 --- a/tests/btrfs/001 +++ b/tests/btrfs/001 @@ -99,7 +99,7 @@ echo Listing subvolumes $BTRFS_UTIL_PROG subvolume list $SCRATCH_MNT | awk '{ print $NF }' # Delete the snapshot -$BTRFS_UTIL_PROG subvolume delete $SCRATCH_MNT/snap | _filter_btrfs_subvol_delete +_run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap echo List root dir ls $SCRATCH_MNT _scratch_remount diff --git a/tests/btrfs/001.out b/tests/btrfs/001.out index c782bde..43e8c56 100644 --- a/tests/btrfs/001.out +++ b/tests/btrfs/001.out @@ -33,7 +33,6 @@ subvol Listing subvolumes snap subvol -Delete subvolume 'SCRATCH_MNT/snap' List root dir subvol List root dir -- 2.0.0.153.g79d -- To unsubscribe from this list: send the line unsubscribe fstests in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/3] Btrfs: qgroup: free reserved in exceeding quota.
When we exceed quota limit in writing, we will free some reserved extent when we need to drop but not free account in qgroup. It means, each time we exceed quota in writing, there will be some remain space in qg-reserved we can not use any more. If things go on like this, the all space will be ate up. Signed-off-by: Dongsheng Yang yangds.f...@cn.fujitsu.com Reviewed-by: Josef Bacik jba...@fb.com --- fs/btrfs/extent-tree.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index a80b971..88b4e32 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -5275,8 +5275,11 @@ out_fail: to_free = 0; } spin_unlock(BTRFS_I(inode)-lock); - if (dropped) + if (dropped) { + if (root-fs_info-quota_enabled) + btrfs_qgroup_free(root, dropped * root-nodesize); to_free += btrfs_calc_trans_metadata_size(root, dropped); + } if (to_free) { btrfs_block_rsv_release(root, block_rsv, to_free); -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: Allow debug-tree to be executed on regular file.
The commit 1bad43fbe002 (btrfs-progs: refine btrfs-debug-tree error prompt when a mount point given) add judgement on btrfs-debug-tree to restrict only block device to be executed on, but the command can also be used on regular file, so add regular file support for the judgement. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- btrfs-debug-tree.c | 5 +++-- utils.c| 21 + utils.h| 1 + 3 files changed, 21 insertions(+), 6 deletions(-) diff --git a/btrfs-debug-tree.c b/btrfs-debug-tree.c index 9cdb35f..0815fe1 100644 --- a/btrfs-debug-tree.c +++ b/btrfs-debug-tree.c @@ -180,8 +180,9 @@ int main(int ac, char **av) print_usage(); ret = check_arg_type(av[optind]); - if (ret != BTRFS_ARG_BLKDEV) { - fprintf(stderr, '%s' is not a block device\n, av[optind]); + if (ret != BTRFS_ARG_BLKDEV ret != BTRFS_ARG_REG) { + fprintf(stderr, '%s' is not a block device or regular file\n, + av[optind]); exit(1); } diff --git a/utils.c b/utils.c index af0a8fe..3ca8229 100644 --- a/utils.c +++ b/utils.c @@ -854,13 +854,23 @@ int is_mount_point(const char *path) return ret; } +static int is_reg_file(const char *path) +{ + struct stat statbuf; + + if (stat(path, statbuf) 0) + return -errno; + return S_ISREG(statbuf.st_mode); +} + /* * This function checks if the given input parameter is * an uuid or a path - * return -1: some error in the given input - * return 0: unknow input - * return 1: given input is uuid - * return 2: given input is path + * return 0 : some error in the given input + * return BTRFS_ARG_UNKNOWN: unknown input + * return BTRFS_ARG_UUID: given input is uuid + * return BTRFS_ARG_MNTPOINT: given input is path + * return BTRFS_ARG_REG: given input is regular file */ int check_arg_type(const char *input) { @@ -877,6 +887,9 @@ int check_arg_type(const char *input) if (is_mount_point(path) == 1) return BTRFS_ARG_MNTPOINT; + if (is_reg_file(path)) + return BTRFS_ARG_REG; + return BTRFS_ARG_UNKNOWN; } diff --git a/utils.h b/utils.h index 3950491..142f3f9 100644 --- a/utils.h +++ b/utils.h @@ -35,6 +35,7 @@ #define BTRFS_ARG_MNTPOINT 1 #define BTRFS_ARG_UUID 2 #define BTRFS_ARG_BLKDEV 3 +#define BTRFS_ARG_REG 4 #define BTRFS_UUID_UNPARSED_SIZE 37 -- 2.2.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html