Re: write corruption due to bio cloning on raid5/6
Janos Toth F. posted on Sun, 30 Jul 2017 03:39:10 +0200 as excerpted: [OT but related topic continues...] > I still get shivers if I need to resize a filesystems due to the > memories of those early tragic experiences when I never won the lottery > on the "trial and error" runs but lost filesystems with both hands and > learned what wild-spread silent corruption is and how you can refresh > your backups with corrupted copies...). Let's not take me back to those > early days, please. I don't want to live in a cave anymore. Thank you > modern filesystems (and their authors). :) > > And on that note... Assuming I had interference problems, it was caused > by my human mistake/negligence. I can always make similar or bigger > human mistakes, independent of disk-level segregation. (For example, no > amount of partitions will save any data if I accidentally wipe the > entire drive with DD, or if I have it security-locked by the controller > and loose the passwords, etc...) I was glad to say goodbye to MSDOS/MBR style partitions as well, but just as happy to enthusiastically endorse GPT/EFI style partitions, with their effectively unlimited partition numbers (128 allowed at the default table size), no primary/logical partition stuff to worry about, partition (as opposed to filesystem in the partition) labels/names, integrity checksums, and second copy at the other end of the device. =:^) And while all admins have their fat-finger or fat-head, aka brown-bag, experiences, I've never erased the wrong partition, tho I can certainly remember being /very/ careful the first couple times I did partitioning, back in the 90s on MSDOS. Thankfully, these days even ssds are "reasonably" priced, and spinning rust is the trivial cost of perhaps a couple meals out, so as long as there's backups on other physical devices, getting even the device name wrong simply means losing perhaps your working copy instead of redoing the layout of one of your backups. And of course you can see the existing layout on the device before you repartition it, and if it's not what you expected or there are any other problems, you just back out without doing that final writeout of the new partition table. FWIW my last brown-bag was writing and running a script as root, with a variable-name typo that made varname empty with an rm -rf $varname/. I caught and stopped it after it had emptied /bin, while it was in /etc, I believe. Luckily I could boot to the (primary) backup. But meanwhile, two experiences that set in concrete the practicality of separate filesystems on their own partitions, for me: 1) Back on MS, IE4-beta era. I was running the public beta when the MSIE devs decided that for performance reasons they needed to write directly to the IE cache index on disk, bypassing the usual filesystem methods. What they didn't think about, however, was IE's new integration into the Explorer shell, meaning it was running all the time. So along come people running the beta, running their scheduled defrag, which decides the index is fragmented and moves it out from under the of course still running Explorer shell, so the next time it direct-writes to what WAS the cache index, it's overwriting whatever file defrag moved to that spot after it moved the cache file out. The eventual fix was to set the system attribute on the cache index, so the defragger wouldn't touch it. I know a number of people running that beta that lost important files to that, when those files got moved into the old on-disk location of the cache index file and overwritten by IE when it direct-wrote to what it /thought/ was still the on-disk location of its index file. But I was fine, never in any danger, because IE's "Temporary Internet Files" cache was on a dedicated tempfiles filesystem. So the only files it overwrote for me were temporary in any case. 2) Some years ago, during a Phoenix summer, my AC went out. I was in a trailer at the time, so without the AC it got hot pretty quickly, and I was away, with the computer left on, at the time it went out. The high in the shade that day was about 47C/117F, and the trailer was in the sun, so it easily hit 55-60C/131-140F inside. The computer was obviously going to be hotter than that, and the spinning disk in the computer hotter still, so it easily hit 70C/158F or higher. The CPU shut down of course, and was fine when I turned it back on after a cooldown. The disk... not so fine. I'm sure it physically head-crashed and if I had taken it apart I'd have found grooves on the platter. But... disks were a lot more expensive back then, and I didn't have another disk with backups. What I *DID* have were backup partitions on the same disk, and because they weren't mounted at the time, the head didn't try seeking to them, and they weren't damaged (at least not beyond what could be repaired). When I went to assess things after everything cooled down, the damage was (almost) all
[PATCH v2] btrfs: preserve i_mode if __btrfs_set_acl() fails
When changing a file's acl mask, btrfs_set_acl() will first set the group bits of i_mode to the value of the mask, and only then set the actual extended attribute representing the new acl. If the second part fails (due to lack of space, for example) and the file had no acl attribute to begin with, the system will from now on assume that the mask permission bits are actual group permission bits, potentially granting access to the wrong users. Prevent this by starting the journal transaction before calling __btrfs_set_acl and only changing the inode mode after the acl is set successfully. Signed-off-by: Ernesto A. Fernández--- Changes in v2: - Take the code that checks if we are setting a default acl on something that is not a dir, remove it from the __btrfs_set_acl function, and place it in btrfs_set_acl instead. This should fix the issue pointed out by Josef Bacik, that I was sometimes updating the inode even when there was no change. - Don't call BUG_ON when the inode failed to update. Also requested by Josef Bacik. It should be noted that __btrfs_setxattr was already calling BUG_ON before my patch; that has not been changed. fs/btrfs/acl.c | 31 +++ 1 file changed, 27 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/acl.c b/fs/btrfs/acl.c index 8d8370d..f62e8ac 100644 --- a/fs/btrfs/acl.c +++ b/fs/btrfs/acl.c @@ -27,6 +27,7 @@ #include "ctree.h" #include "btrfs_inode.h" #include "xattr.h" +#include "transaction.h" struct posix_acl *btrfs_get_acl(struct inode *inode, int type) { @@ -80,8 +81,6 @@ static int __btrfs_set_acl(struct btrfs_trans_handle *trans, name = XATTR_NAME_POSIX_ACL_ACCESS; break; case ACL_TYPE_DEFAULT: - if (!S_ISDIR(inode->i_mode)) - return acl ? -EINVAL : 0; name = XATTR_NAME_POSIX_ACL_DEFAULT; break; default: @@ -113,14 +112,38 @@ static int __btrfs_set_acl(struct btrfs_trans_handle *trans, int btrfs_set_acl(struct inode *inode, struct posix_acl *acl, int type) { + struct btrfs_root *root = BTRFS_I(inode)->root; + struct btrfs_trans_handle *trans; int ret; + umode_t mode = inode->i_mode; + if (type == ACL_TYPE_DEFAULT && !S_ISDIR(inode->i_mode)) + return acl ? -EINVAL : 0; if (type == ACL_TYPE_ACCESS && acl) { - ret = posix_acl_update_mode(inode, >i_mode, ); + ret = posix_acl_update_mode(inode, , ); if (ret) return ret; } - return __btrfs_set_acl(NULL, inode, acl, type); + + if (btrfs_root_readonly(root)) + return -EROFS; + + trans = btrfs_start_transaction(root, 2); + if (IS_ERR(trans)) + return PTR_ERR(trans); + + ret = __btrfs_set_acl(trans, inode, acl, type); + if (ret) + goto out; + + inode->i_mode = mode; + inode_inc_iversion(inode); + inode->i_ctime = current_time(inode); + set_bit(BTRFS_INODE_COPY_EVERYTHING, _I(inode)->runtime_flags); + ret = btrfs_update_inode(trans, root, inode); +out: + btrfs_end_transaction(trans); + return ret; } /* -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: write corruption due to bio cloning on raid5/6
Reply to the TL;DR part, so TL;DR marker again... Well, I live on the other extreme now. I want as few filesystems as possible and viable (it's obviously impossible to have a real backup within the same fs and/or device and with the current size/performance/price differences between HDD and SSD, it's evident to separate the "small and fast" from the "big and slow" storage but other than that...). I always believed (even before I got a real grasp on these things and could explain my view or argue about this) "subvolumes" (in a general sense but let's use this word here) should reside below filesystems (and be totally optional) and filesystems should spread over a whole disk or(md- or hardware) RAID volume (forget the MSDOS partitions) and even these ZFS/Brtfs style subvolumes should be used sparingly (only when you really have a good enough reason to create a subvolume, although it doesn't matter nearly as much with subvolumes than it does with partitions). I remember the days when I thought it's important to create separate partitions for different kinds of data (10+ years ago when I was aware I didn't have the experience to deviate from common general teachings). I remember all the pain of randomly running out of space on any and all filesystems and eventually mixing the various kinds of data on every theoretically-segregated filesystems (wherever I found free space), causing a nightmare of broken sorting system (like a library after a tornado) and then all the horror of my first russian rulett like experiences of resizing partitions and filesystem to make the segregation decent again. And I saw much worse on other peoples's machines. At one point, I decided to create as few partitions as possible (and I really like the idea of zero partitions, I don't miss MSDOS). I still get shivers if I need to resize a filesystems due to the memories of those early tragic experiences when I never won the lottery on the "trial and error" runs but lost filesystems with both hands and learned what wild-spread silent corruption is and how you can refresh your backups with corrupted copies...). Let's not take me back to those early days, please. I don't want to live in a cave anymore. Thank you modern filesystems (and their authors). :) And on that note... Assuming I had interference problems, it was caused by my human mistake/negligence. I can always make similar or bigger human mistakes, independent of disk-level segregation. (For example, no amount of partitions will save any data if I accidentally wipe the entire drive with DD, or if I have it security-locked by the controller and loose the passwords, etc...) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 4.11.6 / more corruption / root 15455 has a root item with a more recent gen (33682) compared to the found root node (0)
Imran Geriskovan posted on Sat, 29 Jul 2017 21:29:46 +0200 as excerpted: > On 7/9/17, Duncan <1i5t5.dun...@cox.net> wrote: >> I have however just upgraded to new ssds then wiped and setup the old >> ones as another backup set, so everything is on brand new filesystems on >> fast ssds, no possibility of old undetected corruption suddenly >> triggering problems. >> >> Also, all my btrfs are raid1 or dup for checksummed redundancy > > Do you have any experience/advice/comment regarding > dup data on ssds? Very good question. =:^) Limited. Most of my btrfs are raid1, with dup only used on the device- respective /boot btrfs (of which there are four, one on each of the two ssds that otherwise form the btrfs raid1 pairs, for each of the working and backup copy pairs -- I can use BIOS to select any of the four to boot), and those are all sub-GiB mixed-bg mode. So all my dup experience is sub-GiB mixed-blockgroup mode. Within that limitation, my only btrfs problem has been that at my initially chosen size of 256 MiB, mkfs.btrfs at least used to create an initial data/metadata chunk of 64 MiB. Remember, this is dup mode, so there's two of them = 128 MiB. Because there's also a system chunk, that means the initial chunk cannot be balanced even with an entirely empty filesystem, because there's not enough space to write a second 64 MiB chunk duped to 128 MiB. Between that and the 256 MiB in dup mode size meaning under 128 MiB usable, and the fact that I routinely run and sometimes need to bisect pre-release kernels, I was routinely running out of space, then cleaning up, but not being able to do a full cleanup without a blow-away and new mkfs.btrfs, because I couldn't balance. When I recently purchased the second pair of (now larger) ssds in ordered to put everything, including the media and backups that were previously still on spinning rust, on ssd, I redid the layout and made the /boots 512 MiB, still mixed-bg dup mode. That seems to have solved the problem, and I can now rebalance the first mkfs.btrfs-created mixed-bg chunk, as it's now small enough that it's less than half the filesystem even when duped. Because it's now 512 MiB, however, I can't say for sure whether the previous problem with mkfs.btrfs creating an initial mixed-bg chunk of a quarter the 256 MiB filesystem size, so in dup mode it can't be balanced because it's half the total filesystem size and with the system chunk as well, the other half is partially used so there's no space to write the balance destination chunks, is fixed, or not. What I can say is that the problem doesn't affect the new 512 MiB size, at least with btrfs-progs 4.11.x, which is what I used to mkfs.btrfs the new layout. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: write corruption due to bio cloning on raid5/6
Janos Toth F. posted on Sat, 29 Jul 2017 05:02:48 +0200 as excerpted: > The read-only scrub finished without errors/hangs (with kernel > 4.12.3). So, I guess the hangs were caused by: > 1: other bug in 4.13-RC1 > 2: crazy-random SATA/disk-controller issue > 3: interference between various btrfs tools [*] > 4: something in the background did DIO write with 4.13-RC1 (but all > affected content was eventually overwritten/deleted between the scrub > attempts) > > [*] I expected scrub to finish in ~5 rather than ~40 hours (and didn't > expect interference issues), so I didn't disable the scheduled > maintenance script which deletes old files, recursively defrags the > whole fs and runs a balance with usage=33 filters. I guess either of > those (especially balance) could potentially cause scrub to hang. That #3, interference between btrfs tools, could be it. It seems btrfs in general is getting stable enough now that we're beginning to see bugs exposed from running two or more tools at once, because the devs have apparently caught and fixed enough of the single-usage race bugs that individual tools are working reasonably well, and it's now the concurrent multi-usage case races that no one was thinking about when they were writing the code that are being exposed. At least, there have been a number of such bugs either definitely or probability-traced to concurrent usage, reported and traced/fixed, lately, more than I remember seeing in the past. (TL;DR folks can stop at that.) Incidentally, that's one more advantage to my own strategy of multiple independent small btrfs, keeping everything small enough that maintenance jobs are at least tolerably short, making it actually practical to run them. Tho my case is surely an extreme, with everything on ssd and my largest btrfs, even after recently switching my media filesystems to ssd and btrfs, being 80 GiB (usable and per device, btrfs raid1 on paired partitions, each on a different physical ssd). I use neither quotas, which don't scale well on btrfs and I don't need them, nor snapshots, which have a bit of a scaling issue (tho apparently not as bad as quotas) as well, because weekly or monthly backups are enough here, and the filesystems are small enough (and on ssd) to do full-copy backups in minutes each. In fact, making backups easier was a major reason I switched even the backups and media devices to all ssd, this time. So scrubs are trivially short enough I can run them and wait for the results while composing posts such as this (bscrub is my scrub script, run here by my admin user with a stub setup so sudo isn't required): $$ bscrub /mm scrub device /dev/sda11 (id 1) done scrub started at Sat Jul 29 14:50:54 2017 and finished after 00:01:08 total bytes scrubbed: 33.98GiB with 0 errors scrub device /dev/sdb11 (id 2) done scrub started at Sat Jul 29 14:50:54 2017 and finished after 00:01:08 total bytes scrubbed: 33.98GiB with 0 errors Just over a minute for a scrub of both devices on my largest 80 gig per device btrfs. =:^) Tho projecting to full it might be 2 and a half minutes... Tho of course parity-raid scrubs would be far slower, at a WAG an hour or two, for similar size on spinning rust... Balances are similar, but being on ssd and not needing one on any of the still relatively freshly redone filesystems ATM, I don't feel inclined to needlessly spend a write cycle just for demonstration. With filesystem maintenance runtimes of a minute, definitely under five minutes, per filesystem, and with full backups under 10, I don't /need/ to run more than one tool at once, and backups can trivially be kept fresh enough that I don't really feel the need to schedule maintenance and risk running more than one that way, either, particularly when I know it'll be done in minutes if I run it manually. =:^) Like I said, I'm obviously an extreme case, but equally obviously, while I see the runtime concurrency bug reports on the list, it's not something likely to affect me personally. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Best Practice: Add new device to RAID1 pool (Summary)
Am Montag, den 24.07.2017, 18:40 +0200 schrieb Cloud Admin: > Am Montag, den 24.07.2017, 10:25 -0400 schrieb Austin S. Hemmelgarn: > > On 2017-07-24 10:12, Cloud Admin wrote: > > > Am Montag, den 24.07.2017, 09:46 -0400 schrieb Austin S. > > > Hemmelgarn: > > > > On 2017-07-24 07:27, Cloud Admin wrote: > > > > > Hi, > > > > > I have a multi-device pool (three discs) as RAID1. Now I want > > > > > to > > > > > add a > > > > > new disc to increase the pool. I followed the description on > > > > > https: > > > > > //bt > > > > > rfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devic > > > > > es > > > > > and > > > > > used 'btrfs add '. After that I called a > > > > > balance > > > > > for rebalancing the RAID1 using 'btrfs balance start > > > > path>'. > > > > > Is that anything or should I need to call a resize (for > > > > > example) or > > > > > anything else? Or do I need to specify filter/profile > > > > > parameters > > > > > for > > > > > balancing? > > > > > I am a little bit confused because the balance command is > > > > > running > > > > > since > > > > > 12 hours and only 3GB of data are touched. This would mean > > > > > the > > > > > whole > > > > > balance process (new disc has 8TB) would run a long, long > > > > > time... > > > > > and > > > > > is using one cpu by 100%. > > > > > > > > Based on what you're saying, it sounds like you've either run > > > > into a > > > > bug, or have a huge number of snapshots on this filesystem. > > > > > > It depends what you define as huge. The call of 'btrfs sub list > > > > > path>' returns a list of 255 subvolume. > > > > OK, this isn't horrible, especially if most of them aren't > > snapshots > > (it's cross-subvolume reflinks that are most of the issue when it > > comes > > to snapshots, not the fact that they're subvolumes). > > > I think this is not too huge. The most of this subvolumes was > > > created > > > using docker itself. I cancel the balance (this will take awhile) > > > and will try to delete such of these subvolumes/snapshots. > > > What can I do more? > > > > As Roman mentioned in his reply, it may also be qgroup related. If > > you run: > > btrfs quota disable > > It seems quota was one part of it. Thanks for the tip. I disabled and > started balance new. > Now approx. each 5 min. one chunk will be relocated. But if I take > the > reported 10860 chunks and calc. the time it will take ~37 days to > finish... So, it seems I have to investigate more time into figure > out > the subvolume / snapshots structure created by docker. > A first deeper look shows, there is a subvolume with a snapshot, > which > has itself a snapshot, and so forth. > > > > Now, the balance process finished after 127h the new disc is in the pool... Not so long as expected but in my opinion long enough. Quota seems one big driver in my case. What I could see over the time at the beginning many extends was relocated ignoring the new disc. Properly it could be a good idea to rebalance using filter (like -dusage=30 for example) before add the new disc to decrease the time. But only theory. It will try to keep it in my mind for the next time. Thanks all for your tips, ideas and time! Frank -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: eliminate bogus IOC_DEV_INFO call
On 28/07/2017 11:49, Henk Slager wrote: On Thu, Jul 27, 2017 at 9:24 PM, Hans van Kranenburgwrote: Device ID numbers always start at 1, not at 0. The first IOC_DEV_INFO call does not make sense, since it will always return ENODEV. When there is a btrfs-replace ongoing, there is a Device ID 0 Aha... thanks for teaching me something new today. :) Actually, I remember having seen it a time earlier yes. So, this one goes to /dev/null! Hans ioctl(3, BTRFS_IOC_DEV_INFO, {devid=0}) = -1 ENODEV (No such device) Signed-off-by: Hans van Kranenburg --- cmds-fi-usage.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cmds-fi-usage.c b/cmds-fi-usage.c index 101a0c4..52c4c62 100644 --- a/cmds-fi-usage.c +++ b/cmds-fi-usage.c @@ -535,7 +535,7 @@ static int load_device_info(int fd, struct device_info **device_info_ptr, return 1; } - for (i = 0, ndevs = 0 ; i <= fi_args.max_id ; i++) { + for (i = 1, ndevs = 0 ; i <= fi_args.max_id ; i++) { if (ndevs >= fi_args.num_devices) { error("unexpected number of devices: %d >= %llu", ndevs, (unsigned long long)fi_args.num_devices); -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 4.11.6 / more corruption / root 15455 has a root item with a more recent gen (33682) compared to the found root node (0)
On 7/9/17, Duncan <1i5t5.dun...@cox.net> wrote: > I have however just upgraded to new ssds then wiped and setup the old > ones as another backup set, so everything is on brand new filesystems on > fast ssds, no possibility of old undetected corruption suddenly > triggering problems. > > Also, all my btrfs are raid1 or dup for checksummed redundancy Do you have any experience/advice/comment regarding dup data on ssds? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 3/3] Btrfs: heuristic add byte core set calculation
Calculate byte core set for data sample: Sort bucket's numbers in decreasing order Count how many numbers use 90% of sample If core set are low (<=25%), data are easily compressible If core set high (>=80%), data are not compressible Signed-off-by: Timofey Titovets--- fs/btrfs/compression.c | 58 ++ fs/btrfs/compression.h | 2 ++ 2 files changed, 60 insertions(+) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 1429b11f2c5f..a469a7c21f5a 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -33,6 +33,7 @@ #include #include #include +#include #include "ctree.h" #include "disk-io.h" #include "transaction.h" @@ -1069,6 +1070,42 @@ static inline int byte_set_size(const struct heuristic_bucket_item *bucket) return byte_set_size; } +/* For bucket sorting */ +static inline int heuristic_bucket_compare(const void *lv, const void *rv) +{ + struct heuristic_bucket_item *l = (struct heuristic_bucket_item *)(lv); + struct heuristic_bucket_item *r = (struct heuristic_bucket_item *)(rv); + + return r->count - l->count; +} + +/* + * Byte Core set size + * How many bytes use 90% of sample + */ +static inline int byte_core_set_size(struct heuristic_bucket_item *bucket, +u32 core_set_threshold) +{ + int a = 0; + u32 coreset_sum = 0; + + for (; a < BTRFS_HEURISTIC_BYTE_CORE_SET_LOW; a++) + coreset_sum += bucket[a].count; + + if (coreset_sum > core_set_threshold) + return a; + + for (; a < BTRFS_HEURISTIC_BYTE_CORE_SET_HIGH; a++) { + if (bucket[a].count == 0) + break; + coreset_sum += bucket[a].count; + if (coreset_sum > core_set_threshold) + break; + } + + return a; +} + /* * Compression heuristic. * @@ -1092,6 +1129,8 @@ int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end) struct heuristic_bucket_item *bucket; int a, b, ret; u8 symbol, *input_data; + u32 core_set_threshold; + u32 input_size = end - start; ret = 1; @@ -1123,6 +1162,25 @@ int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end) goto out; } + /* Sort in reverse order */ + sort(bucket, BTRFS_HEURISTIC_BUCKET_SIZE, +sizeof(struct heuristic_bucket_item), _bucket_compare, +NULL); + + core_set_threshold = (input_size*90)/(BTRFS_HEURISTIC_ITER_OFFSET*100); + core_set_threshold *= BTRFS_HEURISTIC_READ_SIZE; + + a = byte_core_set_size(bucket, core_set_threshold); + if (a <= BTRFS_HEURISTIC_BYTE_CORE_SET_LOW) { + ret = 2; + goto out; + } + + if (a >= BTRFS_HEURISTIC_BYTE_CORE_SET_HIGH) { + ret = 0; + goto out; + } + out: kfree(bucket); return ret; diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h index 03857967815a..0fcd1a485adb 100644 --- a/fs/btrfs/compression.h +++ b/fs/btrfs/compression.h @@ -139,6 +139,8 @@ struct heuristic_bucket_item { #define BTRFS_HEURISTIC_ITER_OFFSET 256 #define BTRFS_HEURISTIC_BUCKET_SIZE 256 #define BTRFS_HEURISTIC_BYTE_SET_THRESHOLD 64 +#define BTRFS_HEURISTIC_BYTE_CORE_SET_LOW BTRFS_HEURISTIC_BYTE_SET_THRESHOLD +#define BTRFS_HEURISTIC_BYTE_CORE_SET_HIGH 200 // 80% int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end); -- 2.13.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/3] Btrfs: heuristic add byte set calculation
Calculate byte set size for data sample: Calculate how many unique bytes has been in sample By count all bytes in bucket with count > 0 If byte set low (~25%), data are easily compressible Signed-off-by: Timofey Titovets--- fs/btrfs/compression.c | 27 +++ fs/btrfs/compression.h | 1 + 2 files changed, 28 insertions(+) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index ca7cfaad6e2f..1429b11f2c5f 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -1048,6 +1048,27 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start, return 1; } +static inline int byte_set_size(const struct heuristic_bucket_item *bucket) +{ + int a = 0; + int byte_set_size = 0; + + for (; a < BTRFS_HEURISTIC_BYTE_SET_THRESHOLD; a++) { + if (bucket[a].count > 0) + byte_set_size++; + } + + for (; a < BTRFS_HEURISTIC_BUCKET_SIZE; a++) { + if (bucket[a].count > 0) { + byte_set_size++; + if (byte_set_size > BTRFS_HEURISTIC_BYTE_SET_THRESHOLD) + return byte_set_size; + } + } + + return byte_set_size; +} + /* * Compression heuristic. * @@ -1096,6 +1117,12 @@ int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end) index++; } + a = byte_set_size(bucket); + if (a > BTRFS_HEURISTIC_BYTE_SET_THRESHOLD) { + ret = 1; + goto out; + } + out: kfree(bucket); return ret; diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h index e30a9df1937e..03857967815a 100644 --- a/fs/btrfs/compression.h +++ b/fs/btrfs/compression.h @@ -138,6 +138,7 @@ struct heuristic_bucket_item { #define BTRFS_HEURISTIC_READ_SIZE 16 #define BTRFS_HEURISTIC_ITER_OFFSET 256 #define BTRFS_HEURISTIC_BUCKET_SIZE 256 +#define BTRFS_HEURISTIC_BYTE_SET_THRESHOLD 64 int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end); -- 2.13.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/3] Btrfs: populate heuristic with detection logic
Based on kdave for-next As heuristic skeleton already merged Populate heuristic with basic code. First patch: add simple sampling code It's get 16 byte samples with 256 bytes shifts over input data. Collect info about how many different bytes (symbols) has been found in sample data Second patch: add code for calculate how many unique bytes has been found in sample data That can fast detect easy compressible data Third patch: add code for calculate byte core set size i.e. how many unique bytes use 90% of sample data That code require that numbers in bucket must be sorted That can detect easy compressible data with many repeated bytes That can detect not compressible data with evenly distributed bytes Changes v1 -> v2: - Change input data iterator shift 512 -> 256 - Replace magic macro numbers with direct values - Drop useless symbol population in bucket as no one care about where and what symbol stored in bucket at now Changes v2 -> v3 (only update #3 patch): - Fix u64 division problem by use u32 for input_size - Fix input size calculation start - end -> end - start - Add missing sort.h header Timofey Titovets (3): Btrfs: heuristic add simple sampling logic Btrfs: heuristic add byte set calculation Btrfs: heuristic add byte core set calculation fs/btrfs/compression.c | 109 - fs/btrfs/compression.h | 13 ++ 2 files changed, 120 insertions(+), 2 deletions(-) -- 2.13.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/3] Btrfs: heuristic add simple sampling logic
Get small sample from input data and calculate byte type count for that sample into bucket. Bucket will store info about which bytes and how many has been detected in sample Signed-off-by: Timofey Titovets--- fs/btrfs/compression.c | 24 ++-- fs/btrfs/compression.h | 10 ++ 2 files changed, 32 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 63f54bd2d5bb..ca7cfaad6e2f 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -1068,15 +1068,35 @@ int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end) u64 index = start >> PAGE_SHIFT; u64 end_index = end >> PAGE_SHIFT; struct page *page; - int ret = 1; + struct heuristic_bucket_item *bucket; + int a, b, ret; + u8 symbol, *input_data; + + ret = 1; + + bucket = kcalloc(BTRFS_HEURISTIC_BUCKET_SIZE, + sizeof(struct heuristic_bucket_item), GFP_NOFS); + + if (!bucket) + goto out; while (index <= end_index) { page = find_get_page(inode->i_mapping, index); - kmap(page); + input_data = kmap(page); + a = 0; + while (a < PAGE_SIZE) { + for (b = 0; b < BTRFS_HEURISTIC_READ_SIZE; b++) { + symbol = input_data[a+b]; + bucket[symbol].count++; + } + a += BTRFS_HEURISTIC_ITER_OFFSET; + } kunmap(page); put_page(page); index++; } +out: + kfree(bucket); return ret; } diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h index d1f4eee2d0af..e30a9df1937e 100644 --- a/fs/btrfs/compression.h +++ b/fs/btrfs/compression.h @@ -129,6 +129,16 @@ struct btrfs_compress_op { extern const struct btrfs_compress_op btrfs_zlib_compress; extern const struct btrfs_compress_op btrfs_lzo_compress; +struct heuristic_bucket_item { + u8 padding; + u8 symbol; + u16 count; +}; + +#define BTRFS_HEURISTIC_READ_SIZE 16 +#define BTRFS_HEURISTIC_ITER_OFFSET 256 +#define BTRFS_HEURISTIC_BUCKET_SIZE 256 + int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end); #endif -- 2.13.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/3] Btrfs: heuristic add byte core set calculation
2017-07-29 14:43 GMT+03:00 kbuild test robot <l...@intel.com>: > Hi Timofey, > > [auto build test ERROR on next-20170724] > [cannot apply to btrfs/next v4.13-rc2 v4.13-rc1 v4.12 v4.13-rc2] > [if your patch is applied to the wrong git tree, please drop us a note to > help improve the system] > > url: > https://github.com/0day-ci/linux/commits/Timofey-Titovets/Btrfs-populate-heuristic-with-detection-logic/20170729-061208 > config: arm-arm5 (attached as .config) > compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705 > reproduce: > wget > https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O > ~/bin/make.cross > chmod +x ~/bin/make.cross > # save the attached .config to linux build tree > make.cross ARCH=arm > > All errors (new ones prefixed by >>): > >>> ERROR: "__aeabi_uldivmod" [fs/btrfs/btrfs.ko] undefined! > > --- > 0-DAY kernel test infrastructureOpen Source Technology Center > https://lists.01.org/pipermail/kbuild-all Intel Corporation I will fix 64bit division and resend patch set Thanks. -- Have a nice day, Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/3] Btrfs: heuristic add byte core set calculation
Hi Timofey, [auto build test ERROR on next-20170724] [cannot apply to btrfs/next v4.13-rc2 v4.13-rc1 v4.12 v4.13-rc2] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Timofey-Titovets/Btrfs-populate-heuristic-with-detection-logic/20170729-061208 config: arm-arm5 (attached as .config) compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705 reproduce: wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=arm All errors (new ones prefixed by >>): >> ERROR: "__aeabi_uldivmod" [fs/btrfs/btrfs.ko] undefined! --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
[PATCH v2] btrfs: use appropriate define for the fsid
Though BTRFS_FSID_SIZE and BTRFS_UUID_SIZE or of same size, for the purpose of doing it correctly use BTRFS_FSID_SIZE instead. Signed-off-by: Anand Jain--- v2: Fix this for all remaining files. fs/btrfs/check-integrity.c | 2 +- fs/btrfs/disk-io.c | 6 +++--- fs/btrfs/scrub.c | 2 +- fs/btrfs/volumes.c | 16 4 files changed, 13 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c index 11d37c94ce05..0ab7f7fa1b5f 100644 --- a/fs/btrfs/check-integrity.c +++ b/fs/btrfs/check-integrity.c @@ -1732,7 +1732,7 @@ static int btrfsic_test_for_metadata(struct btrfsic_state *state, num_pages = state->metablock_size >> PAGE_SHIFT; h = (struct btrfs_header *)datav[0]; - if (memcmp(h->fsid, fs_info->fsid, BTRFS_UUID_SIZE)) + if (memcmp(h->fsid, fs_info->fsid, BTRFS_FSID_SIZE)) return 1; for (i = 0; i < num_pages; i++) { diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 086dcbadce09..ed840e0cabc5 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -529,7 +529,7 @@ static int check_tree_block_fsid(struct btrfs_fs_info *fs_info, struct extent_buffer *eb) { struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; - u8 fsid[BTRFS_UUID_SIZE]; + u8 fsid[BTRFS_FSID_SIZE]; int ret = 1; read_extent_buffer(eb, fsid, btrfs_header_fsid(), BTRFS_FSID_SIZE); @@ -3731,7 +3731,7 @@ int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors) btrfs_set_stack_device_io_width(dev_item, dev->io_width); btrfs_set_stack_device_sector_size(dev_item, dev->sector_size); memcpy(dev_item->uuid, dev->uuid, BTRFS_UUID_SIZE); - memcpy(dev_item->fsid, dev->fs_devices->fsid, BTRFS_UUID_SIZE); + memcpy(dev_item->fsid, dev->fs_devices->fsid, BTRFS_FSID_SIZE); flags = btrfs_super_flags(sb); btrfs_set_super_flags(sb, flags | BTRFS_HEADER_FLAG_WRITTEN); @@ -4172,7 +4172,7 @@ static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info) ret = -EINVAL; } - if (memcmp(fs_info->fsid, sb->dev_item.fsid, BTRFS_UUID_SIZE) != 0) { + if (memcmp(fs_info->fsid, sb->dev_item.fsid, BTRFS_FSID_SIZE) != 0) { btrfs_err(fs_info, "dev_item UUID does not match fsid: %pU != %pU", fs_info->fsid, sb->dev_item.fsid); diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 6f1e4c984b94..51a5a14f4c0b 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -1769,7 +1769,7 @@ static inline int scrub_check_fsid(u8 fsid[], struct btrfs_fs_devices *fs_devices = spage->dev->fs_devices; int ret; - ret = memcmp(fsid, fs_devices->fsid, BTRFS_UUID_SIZE); + ret = memcmp(fsid, fs_devices->fsid, BTRFS_FSID_SIZE); return !ret; } diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 5eb7217738ed..c705ea563c60 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1726,7 +1726,7 @@ static int btrfs_add_device(struct btrfs_trans_handle *trans, ptr = btrfs_device_uuid(dev_item); write_extent_buffer(leaf, device->uuid, ptr, BTRFS_UUID_SIZE); ptr = btrfs_device_fsid(dev_item); - write_extent_buffer(leaf, fs_info->fsid, ptr, BTRFS_UUID_SIZE); + write_extent_buffer(leaf, fs_info->fsid, ptr, BTRFS_FSID_SIZE); btrfs_mark_buffer_dirty(leaf); ret = 0; @@ -2261,7 +2261,7 @@ static int btrfs_finish_sprout(struct btrfs_trans_handle *trans, struct btrfs_dev_item *dev_item; struct btrfs_device *device; struct btrfs_key key; - u8 fs_uuid[BTRFS_UUID_SIZE]; + u8 fs_uuid[BTRFS_FSID_SIZE]; u8 dev_uuid[BTRFS_UUID_SIZE]; u64 devid; int ret; @@ -2304,7 +2304,7 @@ static int btrfs_finish_sprout(struct btrfs_trans_handle *trans, read_extent_buffer(leaf, dev_uuid, btrfs_device_uuid(dev_item), BTRFS_UUID_SIZE); read_extent_buffer(leaf, fs_uuid, btrfs_device_fsid(dev_item), - BTRFS_UUID_SIZE); + BTRFS_FSID_SIZE); device = btrfs_find_device(fs_info, devid, dev_uuid, fs_uuid); BUG_ON(!device); /* Logic error */ @@ -6295,7 +6295,7 @@ struct btrfs_device *btrfs_find_device(struct btrfs_fs_info *fs_info, u64 devid, cur_devices = fs_info->fs_devices; while (cur_devices) { if (!fsid || - !memcmp(cur_devices->fsid, fsid, BTRFS_UUID_SIZE)) { + !memcmp(cur_devices->fsid, fsid, BTRFS_FSID_SIZE)) { device = __find_device(_devices->devices, devid, uuid);