Re: Feature Req: mkfs.btrfs -d dup option on single device
On 12/12/13, Chris Mason c...@fb.com wrote: For me anyway, data=dup in mixed mode is definitely an accident ;) I personally think data dup is a false sense of security, but drives have gotten so huge that it may actually make sense in a few configurations. Sure, it's not about any security regarding the device. It's about the capability of recovering from any bit-rot which can creep into your backups and can be detected when you need the file after 20-30 generations of backups which is too late. (Who keeps that much incremental archive and reads backup logs of millions of files, regularly?) Someone asks for it roughly once a year, so it probably isn't a horrible idea. -chris Today, I've brought up an old 2 GB Seagate from the basement. Literaly, it has been Rusted. So it deserves the title of Spinning Rust for real. I had no hope whether it would work, but out of curiosity I plugged it into a USB-IDE box. It spinned up and wow!; it showed up among the devices. It had two swap and an ext2 partition. I remembered that it was one of the disk used for linux installations more than 10 years ago. I mounted it . Most of the files dates back to 2001-07. They are more than 12 years old and they seem to be intact with just one inode size missmatch. (See fsck output below). If there were BTRFS (and -d dup :) ) at the time, now I would perform a scrub and report the outcome here. Hence, 'Digital Archeology' can surely benefit from Btrfs. :) PS: And regarding the SSD data retension debate this can be an interesting benchmark for a device whick was kept in an unfavorable environment. Regards, Imran FSCK output: fsck from util-linux 2.20.1 e2fsck 1.42.8 (20-Jun-2013) /dev/sdb3 has gone 4209 days without being checked, check forced. Pass 1: Checking inodes, blocks, and sizes Special (device/socket/fifo) inode 82669 has non-zero size. Fixy? yes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/sdb3: * FILE SYSTEM WAS MODIFIED * /dev/sdb3: 41930/226688 files (1.0% non-contiguous), 200558/453096 blocks -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Feature Req: mkfs.btrfs -d dup option on single device
David Sterba posted on Thu, 12 Dec 2013 18:58:16 +0100 as excerpted: I've been testing --mixed mode with various other raid profiles types as far as I remember. Some bugs popped up, reported and Josef fixed them. FWIW, I'm running a mixed-mode btrfs-raid1 mode here on my log partition and haven't seen any issues, altho that's sub-GiB (640 MiB), so anything that would only be seen at larger sizes will not have triggered, here. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Feature Req: mkfs.btrfs -d dup option on single device
Quoting Duncan (2013-12-11 13:27:53) Imran Geriskovan posted on Wed, 11 Dec 2013 15:19:29 +0200 as excerpted: Now, there is one open issue: In its current form -d dup interferes with -M. Is it constraint of design? Or an arbitrary/temporary constraint. What will be the situation if there is tunable duplicates? I believe I answered that, albeit somewhat indirectly, when I explained that AFAIK, the fact that -M (mixed mode) has the effect of allowing -d dup mode is an accident. Mixed mode was introduced to fix the very real problem of small btrfs filesystems tending to run out of either data or metadata space very quickly, while having all sorts of the other resource still available, due to inappropriate separate mode allocations. And it fixed that problem rather well, IMO and experience! =:^) Thus mixed-mode wasn't designed to enable duped data at all, but rather to solve a very different problem (which it did very well), and I'm not sure the devs even realized that the dup-data it enabled as a side effect of forcing data and metadata to the same dup-mode, was a feature people might actually want on its own, until after the fact. So I doubt very much it was a constraint of the design. If it was deliberate, I expect they'd have enabled data=dup mode directly. Rather, it was purely an accident, The fixed the unbalanced small-filesystem allocation issue by enabling a mixed mode that as a side effect of combining data and metadata into the same blocks, also happened to allow data=dup by pure accident. For me anyway, data=dup in mixed mode is definitely an accident ;) I personally think data dup is a false sense of security, but drives have gotten so huge that it may actually make sense in a few configurations. Someone asks for it roughly once a year, so it probably isn't a horrible idea. The biggest problem with mixed mode is just that it isn't very common. You'll end up finding corners that others do not. Also mixed mode forces your metadata block size down to 4K, which does increase fragmentation over time. The new default of 16K is overall much faster. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Feature Req: mkfs.btrfs -d dup option on single device
On Thu, Dec 12, 2013 at 10:57:33AM -0500, Chris Mason wrote: Quoting Duncan (2013-12-11 13:27:53) Imran Geriskovan posted on Wed, 11 Dec 2013 15:19:29 +0200 as excerpted: Now, there is one open issue: In its current form -d dup interferes with -M. Is it constraint of design? Or an arbitrary/temporary constraint. What will be the situation if there is tunable duplicates? I believe I answered that, albeit somewhat indirectly, when I explained that AFAIK, the fact that -M (mixed mode) has the effect of allowing -d dup mode is an accident. Mixed mode was introduced to fix the very real problem of small btrfs filesystems tending to run out of either data or metadata space very quickly, while having all sorts of the other resource still available, due to inappropriate separate mode allocations. And it fixed that problem rather well, IMO and experience! =:^) Thus mixed-mode wasn't designed to enable duped data at all, but rather to solve a very different problem (which it did very well), and I'm not sure the devs even realized that the dup-data it enabled as a side effect of forcing data and metadata to the same dup-mode, was a feature people might actually want on its own, until after the fact. So I doubt very much it was a constraint of the design. If it was deliberate, I expect they'd have enabled data=dup mode directly. Rather, it was purely an accident, The fixed the unbalanced small-filesystem allocation issue by enabling a mixed mode that as a side effect of combining data and metadata into the same blocks, also happened to allow data=dup by pure accident. For me anyway, data=dup in mixed mode is definitely an accident ;) I've asked to allow data=dup in mixed mode when Ilya implementd the validations of balance filters. That's a convenient way how to get mirrored data on a single device. I personally think data dup is a false sense of security, but drives have gotten so huge that it may actually make sense in a few configurations. It's not perfect yeah. Someone asks for it roughly once a year, so it probably isn't a horrible idea. The biggest problem with mixed mode is just that it isn't very common. You'll end up finding corners that others do not. Also mixed mode forces your metadata block size down to 4K, which does increase fragmentation over time. The new default of 16K is overall much faster. I've been testing --mixed mode with various other raid profiles types as far as I remember. Some bugs popped up, reported and Josef fixed them. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Feature Req: mkfs.btrfs -d dup option on single device
On Tue, Dec 10, 2013 at 09:07:21PM -0700, Chris Murphy wrote: On Dec 10, 2013, at 8:19 PM, Imran Geriskovan imran.gerisko...@gmail.com wrote: Now the question is, is it a good practice to use -M for large filesystems? Pros, Cons? What is the performance impact? Or any other possible impact? Uncertain. man mkfs.btrfs says Mix data and metadata chunks together for more efficient space utilization. This feature incurs a performance penalty in larger filesystems. It is recommended for use with filesystems of 1 GiB or smaller. That documentation needs tweaking. You need --mixed/-M for larger filesystems than that. It's hard to say exactly where the optimal boundary is, but somewhere around 16 GiB seems to be the dividing point (8 GiB is in the mostly going to cause you problems without it area). 16 GiB is what we have on the wiki, I think. I haven't benchmarked to quantify the penalty. Nor have I. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- vi: The core of evil. --- signature.asc Description: Digital signature
Re: Feature Req: mkfs.btrfs -d dup option on single device
Chris Murphy posted on Tue, 10 Dec 2013 17:33:59 -0700 as excerpted: On Dec 10, 2013, at 5:14 PM, Imran Geriskovan imran.gerisko...@gmail.com wrote: Current btrfs-progs is v3.12. 0.19 is a bit old. But yes, looks like the wiki also needs updating. Anyway I just tried it on an 8GB stick and it works, but -M (mixed data+metadata) is required, which documentation also says incurs a performance hit, although I'm uncertain of the significance. btrfs-tools 0.19+20130705 is the most recent one on Debian's leading edge Sid/Unstable. [I was debating where to reply, and chose here.] To be fair, that's a snapshot date tag, 0.19 plus 2013-07-05 (which would be the date the git snapshot was taken), which isn't /that/ old, particularly for something like Debian. There was a 0.20-rc1 about this time last year (Nov/Dec-ish 2012), but I guess Debian's date tags don't take rcs into account. That said, as the wiki states, btrfs is still under /heavy/ development, anyone using it at this point is by definition a development filesystem tester, and said testers are strongly recommended to keep current with both kernel and btrfs-progs userspace both because not doing so unnecessarily risks whatever they're testing to already known and fixed bugs, and because if things /do/ go wrong, in addition to being little more than distracting noise if the bug is already fixed, reports from outdated versions simply aren't as useful if the bug remains unfixed. Since the btrfs-progs git repo policy is master branch is always kept release-ready, development must be done on other branches and merged to master only when considered release-ready, ideally, all testers would run a current git build, either built themselves or for distros who choose to package a development/testing product like btrfs, built and updated by the distro on a weekly or monthly basis. Of course that flies in the face of normal distro stabilization policies, but the point is, btrfs is NOT a normal distro stable package, and distros that choose to package it are by definition choosing to package a development package for their users to test /as/ a development package, and should update it accordingly. And Debian or not Debian, a development status package last updated in July, when it's now December and there have been significant changes since July... might not be /that/ old in Debian terms, but it certainly isn't current, either! Given the state of the docs probably very few or no people ever used '-d dup'. As being the lead developer, is it possible for you to provide some insights for the reliability of this option? I'm not a developer, I'm just an ape who wears pants. Chris Mason is the lead developer. All I can say about it is that it's been working for me OK so far. Least anyone finding this thread in google or the like think otherwise, it's probably worthwhile to emphasize that with a separate post... which I just did. Can '-M' requirement be an indication of code which has not been ironed out, or is it simply a constraint of the internal machinery? I think it's just how chunks are allocated it becomes space inefficient to have two separate metadata and data chunks, hence the requirement to mix them if -d dup is used. But I'm not really sure. AFAIK, duplicated data without RAID simply wasn't considered a reasonable use-case. I'd certainly consider it so here, in particular because I *DO* use the data integrity and scrub features, but I'm actually using dual physical devices (SSDs in my case) in raid1 mode, instead. The fact that mixed-data/metadata mode allows it is thus somewhat of an accident, more than a planned feature. FWIW I had tried btrfs some time ago, then left as I decided it wasn't mature enough for my use-case at the time, and just came back in time to see mixed-mode going in. Until mixed-mode, btrfs had quite some issues on 1-gig or smaller partitions as the pre-allocated separate data and metadata blocks simply didn't tend to balance out that well and one or the other would tend to be used up very fast, leaving the filesystem more or less useless in terms of further writes. Mixed-data/metadata mode was added as an effective way of countering that problem, and in fact I've been quite pleased with how it has worked here on my smaller partitions. My /boot is 256 MiB, I have one of those in dup-mode, meaning both data/ metadata dup since it's mixed-mode, on each of my otherwise btrfs raid1 mode SSDs, thus allowing for an effective backup of what would otherwise be not easily and effectively backup-able, since bootloaders tend to allow pointing at only one such location (tho with grub2 on GPT partitioned devices with a BIOS reserved partition for grub2, that's not the issue it tended to be on MBR, since grub2 should still come up with its rescue mode shell even if it can't find the /boot it'd normally load the normal shell from, and the rescue-mode shell can be
Re: Feature Req: mkfs.btrfs -d dup option on single device
That's actually the reason btrfs defaults to SINGLE metadata mode on single-device SSD-backed filesystems, as well. But as Imran points out, SSDs aren't all there is. There's still spinning rust around. And defaults aside, even on SSDs it should be /possible/ to specify data- dup mode, because there's enough different SSD variants and enough different use-cases, that it's surely going to be useful some-of-the-time to someone. =:^) We didn't start with SSDs but the thread heads to there. Well ok then. Since hard drives with more complex firmwares, hybrids, and so.. are becoming available. Eventually they will share common problems with SSDs. To make story short lets say Eventually we all will have block address devices, without any sensible physically bound addresses. Without physically bound addresses, any duplicate written to device, MAY end up in the same unreliable portion of the device. Note it MAY. However the devices are so large that, this probability is very low. The paranoid who wants to make this lower may simply increase the number of duplicates. On the other hand people who work with multiple physical devices may want to decrease number of duplicates. (Probably to single copy) Hence, there is definetely use case for tunable duplicates both data and metadata. Now, there is one open issue: In its current form -d dup interferes with -M. Is it constraint of design? Or an arbitrary/temporary constraint. What will be the situation if there is tunable duplicates? And more: Is -M good for everyday usage on large fs for efficient packing? What's the penalty? Can it be curable? If so, why not make it default? Imran -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Feature Req: mkfs.btrfs -d dup option on single device
On 11/12/13 03:19, Imran Geriskovan wrote: SSDs: What's more (in relation to our long term data integrity aim) order of magnitude for their unpowered data retension period is 1 YEAR. (Read it as 6months to 2-3 years. While powered they refresh/shuffle the blocks) This makes SSDs unsuitable for mid-to-long tem consumer storage. Hence they are out of this discussion. (By the way, the only way for reliable duplication on SSDs, is using physically seperate devices.) Interesting... Have you any links/quotes/studies/specs for that please? Does btrfs need to date-stamp each block/chunk to ensure that data is rewritten before suffering flash memory bitrot? Is not the firmware in SSDs aware to rewrite any too-long unchanged data? Regards, Martin -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Feature Req: mkfs.btrfs -d dup option on single device
What's more (in relation to our long term data integrity aim) order of magnitude for their unpowered data retension period is 1 YEAR. (Read it as 6months to 2-3 years. Does btrfs need to date-stamp each block/chunk to ensure that data is rewritten before suffering flash memory bitrot? Is not the firmware in SSDs aware to rewrite any too-long unchanged data? No. It is supposed to handled by firmware. That's why they should be powered. It is not visible to the file system. You can do a google search with terms ssd data retension. There is no concrete info about it. But figures range from: - 10 years retention for new devices to - 3-6 months for devices at their 'rated' usage. Seems there is consensus about 1 year. And it seems SSD vendors are close to the datacenters. Its todays tech. In time we'll see if it will get better or worse. In the long run, we may have no choice but to put all our data in the hands of belowed cloud lords. Hence the NSA. :) Note that Sony has shutdown its optical disc unit. Regards, Imran -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Feature Req: mkfs.btrfs -d dup option on single device
On Dec 11, 2013, at 1:09 AM, Hugo Mills h...@carfax.org.uk wrote: That documentation needs tweaking. You need --mixed/-M for larger filesystems than that. It's hard to say exactly where the optimal boundary is, but somewhere around 16 GiB seems to be the dividing point (8 GiB is in the mostly going to cause you problems without it area). 16 GiB is what we have on the wiki, I think. Yes, man mkfs.btrfs also doesn't list dup as a possible option for -d. Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Feature Req: mkfs.btrfs -d dup option on single device
Hugo Mills posted on Wed, 11 Dec 2013 08:09:02 + as excerpted: On Tue, Dec 10, 2013 at 09:07:21PM -0700, Chris Murphy wrote: On Dec 10, 2013, at 8:19 PM, Imran Geriskovan imran.gerisko...@gmail.com wrote: Now the question is, is it a good practice to use -M for large filesystems? Uncertain. man mkfs.btrfs says Mix data and metadata chunks together for more efficient space utilization. This feature incurs a performance penalty in larger filesystems. It is recommended for use with filesystems of 1 GiB or smaller. That documentation needs tweaking. You need --mixed/-M for larger filesystems than that. It's hard to say exactly where the optimal boundary is, but somewhere around 16 GiB seems to be the dividing point (8 GiB is in the mostly going to cause you problems without it area). 16 GiB is what we have on the wiki, I think. I believe it also depends on the expected filesystem fill percentage and how that interacts with chunk sizes. I posted some thoughts on this in another thread a couple weeks(?) ago. Here's a rehash. On large enough filesystems with enough unallocated space, data chunks are 1 GiB, while metadata chunks are 256 MiB, but I /think/ dup-mode means that'll double as they'll allocate in pairs. For balance to do its thing and to avoid unexpected out-of-space errors, you need at least enough unallocated space to easily allocate one of each as the need arises (assuming file sizes significantly under a gig, so the chances of having to allocate two or more data chunks at once is reasonably low), which with normal separate data/metadata chunks, means 1.5 GiB unallocated, absolute minimum. (2.5 gig if dup data also, 1.25 gig if single data and single metadata, or on each of two devices in raid1 data and metadata mode.) Based on the above, it shouldn't be unobvious (hmm... double negative, /should/ be /obvious/, but that's not /quite/ the nuance I want... the double negative stays) that with separate data/metadata, once the unallocated-free-space goes below the level required to allocate one of each, things get WAAYYY more complex and any latent corner-case bugs are far more likely to trigger. And it's equally if not even more obvious (no negatives this time) that this 1.5 GiB minimum safe reserve space is going to be a MUCH larger share of say a 4 or 8 GiB filesystem, than it will of say a 32 GiB or larger filesystem. However, I've had no issues with my root filesystems, 8 GiB each on two separate devices in btrfs raid1 (both data and metadata) mode, but I believe that's in large part because actual data usage according to btrfs fi df is 1.64 GiB (4 gig allocated), metadata 274 MiB (512 meg allocated). There's plenty of space left unallocated, well more than the minimum-safe 1.25 gigs on each of two devices (1.25 gigs each not 1.5 gigs each since there's only one metadata copy on each, not the default two of single-device dup mode). And I'm on ssd with small filesystems, so a full balance takes about 2 minutes on that filesystem, not the hours to days often reported for multi-terabyte filesystems on spinning rust. So it's easy to full-balance any time allocated usage (as reported by btrfs filesystem show) starts to climb too far beyond actual used bytes within that allocation (as reported by btrfs filesystem df). That means the filesystem says healthy, with lots of unallocated freespace in reserve, should it be needed. And even in the event something goes hog wild and uses all that space (logs, the usual culprits, are on a separate filesystem, as is /home, so it'd have to be a core system something going hog-wild!), at 8 gigs, I can easily do a temporary btrfs device add if I have to, to get the space necessary for a proper balance to do its thing. I'm actually much more worried about my 24 gig, 21.5 gigs used, packages- cache filesystem, tho it's only my cached gentoo packages tree, cached sources, etc, so it's easily restored direct from the net if it comes to that. Before the rebalance I just did while writing this post, above, btrfs fi show reported it using 22.53 of 24.00 gigs (on each of the two devices in btrfs raid1), /waaayyy/ too close to that magic 1.25 GiB to be comfortable! And after the balance it's still 21.5 gig used out of 24, so as it is, it's a DEFINITE candidate for an out-of-space error at some point. I guess I need to clean up old sources and binpkgs, before I actually get that out-of-space and can't balance to fix it due to too much stale binpkg/sources cache. I did recently update to kde 4.12 branch live-git from 4.11-branch and I guess cleaning up the old 4.11 binpkgs should release a few gigs. That and a few other cleanups should bring it safely into line... for now... but the point is, that 24 gig filesystem both tends to run much closer to full and has a much more dramatic full/empty/full cycle than either my root or home filesystems, at 8 gig and 20 gig respectively.
Re: Feature Req: mkfs.btrfs -d dup option on single device
Imran Geriskovan posted on Wed, 11 Dec 2013 15:19:29 +0200 as excerpted: Now, there is one open issue: In its current form -d dup interferes with -M. Is it constraint of design? Or an arbitrary/temporary constraint. What will be the situation if there is tunable duplicates? I believe I answered that, albeit somewhat indirectly, when I explained that AFAIK, the fact that -M (mixed mode) has the effect of allowing -d dup mode is an accident. Mixed mode was introduced to fix the very real problem of small btrfs filesystems tending to run out of either data or metadata space very quickly, while having all sorts of the other resource still available, due to inappropriate separate mode allocations. And it fixed that problem rather well, IMO and experience! =:^) Thus mixed-mode wasn't designed to enable duped data at all, but rather to solve a very different problem (which it did very well), and I'm not sure the devs even realized that the dup-data it enabled as a side effect of forcing data and metadata to the same dup-mode, was a feature people might actually want on its own, until after the fact. So I doubt very much it was a constraint of the design. If it was deliberate, I expect they'd have enabled data=dup mode directly. Rather, it was purely an accident, The fixed the unbalanced small-filesystem allocation issue by enabling a mixed mode that as a side effect of combining data and metadata into the same blocks, also happened to allow data=dup by pure accident. Actually, it may be that they're only with this thread seeing people actually wanting the data=dup option on its own, and why they might want it. Tho it's equally possible they realized that some time ago, shortly after accidentally enabling it via mixed-mode, and have it on their list since then but have simply been to busy fixing bugs and working on features such as the still unfinished raid5/6 code to get to this. We'll only know if they post, but regardless of whether they saw it before or not, it'd be pretty hard to avoid seeing it with what this thread has blossomed into, so I'm sure they see it now! =:^) And more: Is -M good for everyday usage on large fs for efficient packing? What's the penalty? Can it be curable? If so, why not make it default? I believe I addressed that in the post I just sent, which took me some time to compose as I kept ending up way into the weeds on other topics, and I ended up deleting multiple whole paragraphs in ordered to rewrite them hopefully better, several times. In brief, I believe the biggest penalties won't apply in your case, since they're related to the dup-data effect, and that's actually what you're interested in, so they'd apply or not apply regardless of mixed-mode. But I do expect there are two penalties in general, the first being the raw effect of mass duplicating large quantities of data (as opposed to generally an order of magnitude smaller metadata only) by default, the second having to do with what that does to IO performance, particularly uncached directory/metadata reads and the resulting seeks necessary to find a file before reading it in the first place. That's going to absolutely murder cold-boot times on spinning rust, to give one highly performance-critical example that has been the focus of numerous articles and can-I-make-it-boot-faster-than-N-seconds projects over the years. Absolutely murder that, as mixed mode very well might on spinning rust, and your pet development filesystem will very likely go over like a lead balloon! So it's little wonder they discourage people using it for anything but the smallest filesystems, where it is portrayed as a workaround to an otherwise very difficult problem! -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Feature Req: mkfs.btrfs -d dup option on single device
Currently, if you want to protect your data against bit-rot on a single device you must have 2 btrfs partitions and mount them as Raid1. The requested option will save the user from partitioning and will provide flexibility. Yes, I know: This will not provide any safety againts hardware failure. But it is not the purpose anyway. The main purpose is to Ensure Data Integrity on: a- Computers (ie. laptops) where hardware raid is not practical. b- Backup sets (ie. usb drives) where hardware raid is an overkill. Even if you have regular backups, without having Guaranteed Data Integrity on all data sets, you will lose some data on some day, somewhere. See discussion at: http://hardware.slashdot.org/story/13/12/10/178234/ask-slashdot-practical-bitrot-detection-for-backups Now, the futuristic and OPTIONAL part for the sufficiently paranoid: The number of duplicates may be parametric: mkfs.btrfs -m dup 4 -d dup 3 ... (4 duplicates for metadata, 3 duplicates for data) I kindly request your comments. (At least for -d dup) Regards, Imran Geriskovan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Feature Req: mkfs.btrfs -d dup option on single device
On Dec 10, 2013, at 1:31 PM, Imran Geriskovan imran.gerisko...@gmail.com wrote: Currently, if you want to protect your data against bit-rot on a single device you must have 2 btrfs partitions and mount them as Raid1. No this also works: mkfs.btrfs -d dup -m dup -M device Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Feature Req: mkfs.btrfs -d dup option on single device
Currently, if you want to protect your data against bit-rot on a single device you must have 2 btrfs partitions and mount them as Raid1. No this also works: mkfs.btrfs -d dup -m dup -M device Thanks a lot. I guess docs need an update: https://btrfs.wiki.kernel.org/index.php/Mkfs.btrfs: -d: Data profile, values like metadata. EXCEPT DUP CANNOT BE USED man mkfs.btrfs (btrfs-tools 0.19+20130705) -d, --data type Specify how the data must be spanned across the devices specified. Valid values are raid0, raid1, raid5, raid6, raid10 or single. Imran -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Feature Req: mkfs.btrfs -d dup option on single device
On Dec 10, 2013, at 4:33 PM, Imran Geriskovan imran.gerisko...@gmail.com wrote: Currently, if you want to protect your data against bit-rot on a single device you must have 2 btrfs partitions and mount them as Raid1. No this also works: mkfs.btrfs -d dup -m dup -M device Thanks a lot. I guess docs need an update: https://btrfs.wiki.kernel.org/index.php/Mkfs.btrfs: -d: Data profile, values like metadata. EXCEPT DUP CANNOT BE USED man mkfs.btrfs (btrfs-tools 0.19+20130705) -d, --data type Specify how the data must be spanned across the devices specified. Valid values are raid0, raid1, raid5, raid6, raid10 or single. Current btrfs-progs is v3.12. 0.19 is a bit old. But yes, looks like the wiki also needs updating. Anyway I just tried it on an 8GB stick and it works, but -M (mixed data+metadata) is required, which documentation also says incurs a performance hit, although I'm uncertain of the significance. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fwd: Feature Req: mkfs.btrfs -d dup option on single device
-- Forwarded message -- From: Imran Geriskovan imran.gerisko...@gmail.com Date: Wed, 11 Dec 2013 02:14:25 +0200 Subject: Re: Feature Req: mkfs.btrfs -d dup option on single device To: Chris Murphy li...@colorremedies.com Current btrfs-progs is v3.12. 0.19 is a bit old. But yes, looks like the wiki also needs updating. Anyway I just tried it on an 8GB stick and it works, but -M (mixed data+metadata) is required, which documentation also says incurs a performance hit, although I'm uncertain of the significance. btrfs-tools 0.19+20130705 is the most recent one on Debian's leading edge Sid/Unstable. Given the state of the docs probably very few or no people ever used '-d dup'. As being the lead developer, is it possible for you to provide some insights for the reliability of this option? Can '-M' requirement be an indication of code which has not been ironed out, or is it simply a constraint of the internal machinery? How well does the main idea of Guaranteed Data Integrity for extra reliability and the option -d dup in its current state match? Regards, Imran -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Feature Req: mkfs.btrfs -d dup option on single device
On Dec 10, 2013, at 5:14 PM, Imran Geriskovan imran.gerisko...@gmail.com wrote: Current btrfs-progs is v3.12. 0.19 is a bit old. But yes, looks like the wiki also needs updating. Anyway I just tried it on an 8GB stick and it works, but -M (mixed data+metadata) is required, which documentation also says incurs a performance hit, although I'm uncertain of the significance. btrfs-tools 0.19+20130705 is the most recent one on Debian's leading edge Sid/Unstable. Given the state of the docs probably very few or no people ever used '-d dup'. As being the lead developer, is it possible for you to provide some insights for the reliability of this option? I'm not a developer, I'm just an ape who wears pants. Chris Mason is the lead developer. All I can say about it is that it's been working for me OK so far. Can '-M' requirement be an indication of code which has not been ironed out, or is it simply a constraint of the internal machinery? I think it's just how chunks are allocated it becomes space inefficient to have two separate metadata and data chunks, hence the requirement to mix them if -d dup is used. But I'm not really sure. How well does the main idea of Guaranteed Data Integrity for extra reliability and the option -d dup in its current state match? Well given that Btrfs is still flagged as experimental, most notably when creating any Btrfs file system, I'd say that doesn't apply here. If the case you're trying to mitigate is some kind of corruption that can only be repaired if you have at least one other copy of data, then -d dup is useful. But obviously this ignores the statistically greater chance of a more significant hardware failure, as this is still single device. Not only could the entire single device fail, but it's possible that erase blocks individually fail. And since the FTL decides where pages are stored, the duplicate data/metadata copies could be stored in the same erase block. So there is a failure vector other than full failure where some data can still be lost on a single device even with duplicate, or triplicate copies. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Feature Req: mkfs.btrfs -d dup option on single device
I'm not a developer, I'm just an ape who wears pants. Chris Mason is the lead developer. All I can say about it is that it's been working for me OK so far. Great:) Now, I understand that you were using -d dup, which is quite valuable for me. And since GMail only show first names in Inbox list, I thougth you were Chris Mason. Sorry. Now, I see your full name in the header. Can '-M' requirement be an indication of code which has not been ironed out, or is it simply a constraint of the internal machinery? I think it's just how chunks are allocated it becomes space inefficient to have two separate metadata and data chunks, hence the requirement to mix them if -d dup is used. But I'm not really sure. Sounds like it is implemented paralel/similar to -m dup. That's why -M is implied. Of course we are speculating here.. Now the question is, is it a good practice to use -M for large filesystems? Pros, Cons? What is the performance impact? Or any other possible impact? Well given that Btrfs is still flagged as experimental, most notably when creating any Btrfs file system, I'd say that doesn't apply here. If the case you're trying to mitigate is some kind of corruption that can only be repaired if you have at least one other copy of data, then -d dup is useful. But obviously this ignores the statistically greater chance of a more significant hardware failure, as this is still single device. From the beginning we've put possiblity of full hardware failure aside. The user is expected to handle that risk elsewhere. Our scope is about localized failures which may cost you some files. Since btrfs has checksums you may be aware of them. Using -d dup we increase our chances of recovering from them. But probablity of corruption of all duplicates is non zero. Hence checking the output of btrfs scrub start path is beneficial before making/updating any backups. And then check the output of the scrub on the backup too.. Not only could the entire single device fail, but it's possible that erase blocks individually fail. And since the FTL decides where pages are stored, the duplicate data/metadata copies could be stored in the same erase block. So there is a failure vector other than full failure where some data can still be lost on a single device even with duplicate, or triplicate copies. I guess you are talking about SSD's. Even if you write duplicates on distinct erase blocks, they may end up in same block after firmware's relocation, defragmentation, migration, remapping, god knows what ...ation operations. So practically, block address does not point any fixed physical location on SSDs. What's more (in relation to our long term data integrity aim) order of magnitude for their unpowered data retension period is 1 YEAR. (Read it as 6months to 2-3 years. While powered they refresh/shuffle the blocks) This makes SSDs unsuitable for mid-to-long tem consumer storage. Hence they are out of this discussion. (By the way, the only way for reliable duplication on SSDs, is using physically seperate devices.) Luckly we have hard drives with still sensible block addressing. Even with bad block relocation. So duplication, triplicate, still makes sense.. Or IS IT? Comments? i.e. The new Advanced format drives may employ 4K blocks but present 512B logical blocks which may be another reencarnation of the SSD problem above. However, I guess linux kernel does not access such drives using logical addressing.. Imran -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Feature Req: mkfs.btrfs -d dup option on single device
On Dec 10, 2013, at 8:19 PM, Imran Geriskovan imran.gerisko...@gmail.com wrote: Now the question is, is it a good practice to use -M for large filesystems? Pros, Cons? What is the performance impact? Or any other possible impact? Uncertain. man mkfs.btrfs says Mix data and metadata chunks together for more efficient space utilization. This feature incurs a performance penalty in larger filesystems. It is recommended for use with filesystems of 1 GiB or smaller. I haven't benchmarked to quantify the penalty. I guess you are talking about SSD's. Even if you write duplicates on distinct erase blocks, they may end up in same block after firmware's relocation, defragmentation, migration, remapping, god knows what ...ation operations. So practically, block address does not point any fixed physical location on SSDs. Yes SSDs, although it seems that any flash media could behave this way as it's up to the manufacturer's firmware how it ends up behaving. Luckly we have hard drives with still sensible block addressing. Even with bad block relocation. Seagate has said they've already shipped 1 million shingled magnetic recording (SMR) hard drives. I don't know what sort of FTL-like behavior they implement, but it stands to reason that since the file system doesn't know what LBAs translate into physical sectors that are part of a layered band, and what LBA's are suited for random IO, that the drive might be capable of figuring this out. Random IO LBA's go to physical sectors suited for this, and sequential writes go to bands. i.e. The new Advanced format drives may employ 4K blocks but present 512B logical blocks which may be another reencarnation of the SSD problem above. However, I guess linux kernel does not access such drives using logical addressing. It does, absolutely. All drives are access by LBA these days. And Linux does fine with both varieties of AF disks: 512e, and 4Kn. Off hand I think the only issue is that pretty much no BIOS firmware will boot from a drive with 4K logical/physical sectors, so called 4Kn drives that do not present 512byte sectors. And since UEFI bugs are all over the place, I'd kinda expect booting to work with some firmware and not others. I haven't tested it, but I'm pretty sure I've read GRUB2 and the kernel are able to boot from 4Kn drives so long as the firmware can handle it. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Feature Req: mkfs.btrfs -d dup option on single device
Chris Murphy posted on Tue, 10 Dec 2013 17:33:59 -0700 as excerpted: On Dec 10, 2013, at 5:14 PM, Imran Geriskovan imran.gerisko...@gmail.com wrote: As being the lead developer, is it possible for you to provide some insights for the reliability of this option? I'm not a developer, I'm just an ape who wears pants. Chris Mason is the lead developer. Lest anyone stumbling across this in a google or the like think otherwise, it's probably worthwhile to pull this bit out in its own post. Chris Mason: btrfs lead dev. Chris Murphy: not a dev, just a btrfs tester/user and btrfs list regular. It's worth noting that this isn't a problem unique to this list or the Chris name. There are I think three Linus on LKML now, with at least Linus W noting in good humor at one point that his posts do tend to get noticed a bit more due to his first name, even if he's not /the/ Linus, Torvalds, that is. I've name-collided with other John Duncans too. Which is one reason I've been a mononym Duncan for over a decade now. Strangely enough, I've had far fewer issues with Duncan as a mononym, than I did with John Duncan. I guess at least the Duncan mononym is rarer than the John Duncan name. @ C. Murphy: Given the situation, you might consider a not a dev, just a user disclaimer in your sig... Or perhaps you'll find a Murphy mononym as useful as I have the Duncan mononym, altho I guess that has the Murphy's Law connotation, but that might have the effect of keeping the namespace even clearer (or not, I don't know), as well as making the mononym rather more memorable! -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html