Re: Announcing btrfs-dedupe
On Wed, 09 Nov 2016 12:24:51 +0100, Niccolò Belli <darkba...@linuxsystems.it> wrote : > > On martedì 8 novembre 2016 23:36:25 CET, Saint Germain wrote: > > Please be aware of these other similar softwares: > > - jdupes: https://github.com/jbruchon/jdupes > > - rmlint: https://github.com/sahib/rmlint > > And of course fdupes. > > > > Some intesting points I have seen in them: > > - use xxhash to identify potential duplicates (huge speedup) > > - ability to deduplicate read-only snapshots > > - identify potential reflinked files (see also my email here: > > https://www.spinics.net/lists/linux-btrfs/msg60081.html) > > - ability to filter out hardlinks > > - triangle problem: see jdupes readme > > - jdupes has started the process to be included in Debian > > > > I hope that will help and that you can share some codes with them ! > > > Hi, > What do you think about jdupes? I'm searching an alternative to > duperemove and rmlint doesn't seem to support btrfs deduplication, so > I would like to try jdupes. My main problem with duperemove is a > memory leak, also it seems to lead to greater disk usage: > https://github.com/markfasheh/duperemove/issues/163 rmlint is supporting btrfs deduplication: rmlint --algorithm=xxhash --types="duplicates" --hidden --config=sh:handler=clone --no-hardlinked I've used jdupes and rmlint to deduplicate 2TB with 4GB RAM and it took a few hours. So it is acceptable from a performance point of view. The problems I found have been corrected by both. Jdupes author is really kind and reactive ! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Announcing btrfs-dedupe
On Sun, 6 Nov 2016 14:30:52 +0100, James Pharaohwrote : > Hi all, > > I'm pleased to announce my btrfs deduplication utility, written in > Rust. This operates on whole files, is fast, and I believe > complements the existing utilities (duperemove, bedup), which exist > currently. > > Please visit the homepage for more information: > > http://btrfs-dedupe.com > Thanks for having shared your work. Please be aware of these other similar softwares: - jdupes: https://github.com/jbruchon/jdupes - rmlint: https://github.com/sahib/rmlint And of course fdupes. Some intesting points I have seen in them: - use xxhash to identify potential duplicates (huge speedup) - ability to deduplicate read-only snapshots - identify potential reflinked files (see also my email here: https://www.spinics.net/lists/linux-btrfs/msg60081.html) - ability to filter out hardlinks - triangle problem: see jdupes readme - jdupes has started the process to be included in Debian I hope that will help and that you can share some codes with them ! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Identifying reflink / CoW files
On Thu, 3 Nov 2016 01:17:07 -0400, Zygo Blaxell <ce3g8...@umail.furryterror.org> wrote : > On Thu, Oct 27, 2016 at 01:30:11PM +0200, Saint Germain wrote: > > Hello, > > > > Following the previous discussion: > > https://www.spinics.net/lists/linux-btrfs/msg19075.html > > > > I would be interested in finding a way to reliably identify > > reflink / CoW files in order to use deduplication programs (like > > fdupes, jdupes, rmlint) efficiently. > > > > Using FIEMAP doesn't seem to be reliable according to this > > discussion on rmlint: > > https://github.com/sahib/rmlint/issues/132#issuecomment-157665154 > > Inline extents have no physical address (FIEMAP returns 0 in that > field). You can't dedup them and each file can have only one, so if > you see the FIEMAP_EXTENT_INLINE bit set, you can just skip > processing the entire file immediately. > > You can create a separate non-inline extent in a temporary file then > use dedup to replace _both_ copies of the original inline extent. > Or don't bother, as the savings are negligible. > > > Is there another way that deduplication programs can easily use ? > > The problem is that it's not files that are reflinked--individual > extents are. "reflink file copy" really just means "a file whose > extents are 100% shared with another file." It's possible for files > on btrfs to have any percentage of shared extents from 0 to 100% in > increments of the host page size. It's also possible for the blocks > to be shared with different extent boundaries. > > The quality of the result therefore depends on the amount of effort > put into measuring it. If you look for the first non-hole extent in > each file and use its physical address as a physical file identifier, > then you get a fast reflink detector function that has a high risk of > false positives. If you map out two files and compare physical > addresses block by block, you get a slow function with a low risk of > false positives (but maybe a small risk of false negatives too). > > If your dedup program only does full-file reflink copies then the > first extent physical address method is sufficient. If your program > does block- or extent-level dedup then it shouldn't be using files in > its data model at all, except where necessary to provide a mechanism > to access the physical blocks through the POSIX filesystem API. > > FIEMAP will tell you about all the extents (physical address for > extents that have them, zero for other extent types). It's also slow > and has assorted accuracy problems especially with compressed files. > Any user can run FIEMAP, and it uses only standard structure arrays. > > SEARCH_V2 is root-only and requires parsing variable-length binary > btrfs data encoding, but it's faster than FIEMAP and gives more > accurate results on compressed files. > As the dedup program only does full-file reflink, the first extent physical address method can be used as a fast first check to identify potential files. But how to implement the second check in order to have 0% risk of false positive ? Because you said that mapping out two files and comparing the physical addresses block by block also has a low risk of false positives. Thank you very much for the detailed explanation ! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Identifying reflink / CoW files
Hello, Following the previous discussion: https://www.spinics.net/lists/linux-btrfs/msg19075.html I would be interested in finding a way to reliably identify reflink / CoW files in order to use deduplication programs (like fdupes, jdupes, rmlint) efficiently. Using FIEMAP doesn't seem to be reliable according to this discussion on rmlint: https://github.com/sahib/rmlint/issues/132#issuecomment-157665154 Is there another way that deduplication programs can easily use ? Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel bug during RAID1 replace
On Wed, 29 Jun 2016 18:24:07 -0600, Chris Murphy <li...@colorremedies.com> wrote : > On Wed, Jun 29, 2016 at 5:51 PM, Saint Germain <saint...@gmail.com> > wrote: > > On Wed, 29 Jun 2016 19:23:57 +, Hugo Mills <h...@carfax.org.uk> > > wrote : > > > >> On Wed, Jun 29, 2016 at 09:16:13PM +0200, Saint Germain wrote: > >> > On Wed, 29 Jun 2016 13:08:30 -0600, Chris Murphy > >> > <li...@colorremedies.com> wrote : > >> > > >> > > >> > Ok I will follow your advice and start over with a fresh > >> > > >> > BTRFS volume. As explained on another email, rsync doesn't > >> > > >> > support reflink, so do you think it is worth trying with > >> > > >> > BTRFS send instead ? Is it safe to copy this way or rsync > >> > > >> > is more reliable in case of faulty BTRFS volume ? > >> > > >> > > >> > > >> If you have the space, btrfs restore would probably be the > >> > > >> best option. It's not likely, but using send has a risk of > >> > > >> contaminating the new filesystem as well. > >> > > >> > >> > > > > >> > > > I have to copy through the network (I am running out of > >> > > > disks...) so btrfs restore is unfortunately not an option. > >> > > > I didn't know that btrfs send could contaminate the target > >> > > > disk as well ? > >> > > > Ok rsync it is then. > >> > > > >> > > restore will let you extract files despite csum errors. I don't > >> > > think send will, and using cp or rsync Btrfs definitely won't > >> > > hand over the file. > >> > > > >> > > >> > That's Ok I'd prefer to avoid copying files with csum errors > >> > anyway (I can restore them from backups). > >> > However will btrfs send abort the whole operation as soon as it > >> > finds a csum error ? > >> > And will I have the risk to "contaminate" the target BTRFS > >> > volume by using BTRFS send ? > >> > >>A send stream is effectively just a sequence of filesystem > >> commands (mv, cp, cp --reflink, rm, dd). So any damage that it can > >> do when replayed by receive is limited to what you can do with the > >> basic shell commands (plus cloning extents). If you have metadata > >> breakage in your source filesystem, this won't cause the same > >> metadata breakage to show up in the target filesystem. > >> > > > > Well after 300GB copied through "btrfs send", the process is aborted > > with the following error: > > ERROR: send ioctl failed with -5: Input/output error > > ERROR: unexpected EOF in stream. > > > > /var/log/syslog relevant lines are appended at the end of this > > email. > > > > So it seems that I will have to go with rsync then. > > You'll likely hit the same bad file and get EIO, is my guess. What you > can do is mount it ro from the get go, and do btrfs send receive again > and maybe then it won't hit this sequence where it's finding some need > to clean up a transaction and free an extent. Maybe you still get some > failure to send whatever file is using that extent, but I think > receive will tolerate it. > Well I tried "btrfs send" and the process stalled at 300 GB (on a total of 2 TB) with a never ending stream of: "ERROR: unexpected EOF in stream." I gave up and launched a rsync which is about to be finished. Now I have some work to make sure that all rsynced files are consistent (I have to compared them to the backuped ones). Thanks for your help, I learned a bit more about BTRFS this way. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel bug during RAID1 replace
On Wed, 29 Jun 2016 19:23:57 +, Hugo Mills <h...@carfax.org.uk> wrote : > On Wed, Jun 29, 2016 at 09:16:13PM +0200, Saint Germain wrote: > > On Wed, 29 Jun 2016 13:08:30 -0600, Chris Murphy > > <li...@colorremedies.com> wrote : > > > > > >> > Ok I will follow your advice and start over with a fresh > > > >> > BTRFS volume. As explained on another email, rsync doesn't > > > >> > support reflink, so do you think it is worth trying with > > > >> > BTRFS send instead ? Is it safe to copy this way or rsync is > > > >> > more reliable in case of faulty BTRFS volume ? > > > >> > > > > >> If you have the space, btrfs restore would probably be the best > > > >> option. It's not likely, but using send has a risk of > > > >> contaminating the new filesystem as well. > > > >> > > > > > > > > I have to copy through the network (I am running out of > > > > disks...) so btrfs restore is unfortunately not an option. > > > > I didn't know that btrfs send could contaminate the target disk > > > > as well ? > > > > Ok rsync it is then. > > > > > > restore will let you extract files despite csum errors. I don't > > > think send will, and using cp or rsync Btrfs definitely won't > > > hand over the file. > > > > > > > That's Ok I'd prefer to avoid copying files with csum errors anyway > > (I can restore them from backups). > > However will btrfs send abort the whole operation as soon as it > > finds a csum error ? > > And will I have the risk to "contaminate" the target BTRFS volume by > > using BTRFS send ? > >A send stream is effectively just a sequence of filesystem commands > (mv, cp, cp --reflink, rm, dd). So any damage that it can do when > replayed by receive is limited to what you can do with the basic shell > commands (plus cloning extents). If you have metadata breakage in your > source filesystem, this won't cause the same metadata breakage to show > up in the target filesystem. > Well after 300GB copied through "btrfs send", the process is aborted with the following error: ERROR: send ioctl failed with -5: Input/output error ERROR: unexpected EOF in stream. /var/log/syslog relevant lines are appended at the end of this email. So it seems that I will have to go with rsync then. WARNING: CPU: 3 PID: 1779 at /build/linux-9LouV5/linux-4.6.1/fs/btrfs/extent-tree.c:6608 __btrfs_free_extent.isra.67+0x152/0xdc0 [btrfs] BTRFS: Transaction aborted (error -5) Modules linked in: bnep(E) snd_hda_codec_hdmi(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) nfs(E) lockd(E) grace(E) fscache(E) sunrpc(E) intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) nls_utf8(E) hmac(E) nls_cp437(E) drbg(E) ansi_cprng(E) vfat(E) fat(E) wl(POE) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) cfg80211(E) pcspkr(E) snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) snd_hwdep(E) evdev(E) btusb(E) efi_pstore(E) joydev(E) btrtl(E) serio_raw(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) shpchp(E) efivars(E) i2c_i801(E) hci_uart(E) btbcm(E) btqca(E) btintel(E) bluetooth(E) rfkill(E) i915(E) battery(E) crc16(E) video(E) intel_lpss_acpi(E) drm_kms_helper(E) intel_lpss(E) mfd_core(E) tpm_tis(E) acpi_pad(E) tpm(E) drm(E) acpi_als(E) kfifo_buf(E) i2c_algo_bit(E) mei_me(E) button(E) pro cessor(E) industrialio(E) mei(E) fuse(E) autofs4(E) btrfs(E) xor(E) raid6_pq(E) sg(E) sd_mod(E) hid_logitech_hidpp(E) hid_logitech_dj(E) usbhid(E) ahci(E) libahci(E) crc32c_intel(E) e1000e(E) xhci_pci(E) xhci_hcd(E) ptp(E) psmouse(E) libata(E) pps_core(E) scsi_mod(E) usbcore(E) usb_common(E) i2c_hid(E) hid(E) fjes(E) CPU: 3 PID: 1779 Comm: btrfs-transacti Tainted: P OE 4.6.0-0.bpo.1-amd64 #1 Debian 4.6.1-1~bpo8+1 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z170 Gaming-ITX/ac, BIOS P2.10 04/13/2016 0286 3e3b5862 813123c5 880067783b68 8107af94 01b8f6065000 880067783bc0 88006b328000 880165fc6000 880105a21150 Call Trace: [] ? dump_stack+0x5c/0x77 [] ? __warn+0xc4/0xe0 [] ? warn_slowpath_fmt+0x5f/0x80 [] ? __btrfs_free_extent.isra.67+0x152/0xdc0 [btrfs] [] ? btrfs_merge_delayed_refs+0x6c/0x610 [btrfs] [] ? __btrfs_run_delayed_refs+0x9ad/0x1210 [btrfs] [] ? btrfs_run_delayed_refs+0x8e/0x2b0 [btrfs] [] ? btrfs_commit_transaction+0x4a3/0xa30 [btrfs] [] ? start_transaction+0x96/0x4d0 [btrfs] [] ? transaction_kthread+0x1ce/0x1f0
Re: Kernel bug during RAID1 replace
On Wed, 29 Jun 2016 13:08:30 -0600, Chris Murphywrote : > >> > Ok I will follow your advice and start over with a fresh BTRFS > >> > volume. As explained on another email, rsync doesn't support > >> > reflink, so do you think it is worth trying with BTRFS send > >> > instead ? Is it safe to copy this way or rsync is more reliable > >> > in case of faulty BTRFS volume ? > >> > > >> If you have the space, btrfs restore would probably be the best > >> option. It's not likely, but using send has a risk of contaminating > >> the new filesystem as well. > >> > > > > I have to copy through the network (I am running out of disks...) so > > btrfs restore is unfortunately not an option. > > I didn't know that btrfs send could contaminate the target disk as > > well ? > > Ok rsync it is then. > > restore will let you extract files despite csum errors. I don't think > send will, and using cp or rsync Btrfs definitely won't hand over the > file. > That's Ok I'd prefer to avoid copying files with csum errors anyway (I can restore them from backups). However will btrfs send abort the whole operation as soon as it finds a csum error ? And will I have the risk to "contaminate" the target BTRFS volume by using BTRFS send ? Thanks ! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel bug during RAID1 replace
On Wed, 29 Jun 2016 14:19:23 -0400, "Austin S. Hemmelgarn"wrote : > >>> Already got a backup. I just really want to try to repair it (in > >>> order to test BTRFS). > >> > >> I don't know that this is a good test because I think the file > >> system has already been sufficient corrupted that it can't be > >> fixed. Part of the problem is that Btrfs isn't aware of faulty > >> drives like mdadm or lvm yet, so it looks like it'll try to write > >> to all devices and it's possible for significant confusion to > >> happen if they're each getting different generation writes. > >> Significant as in, currently beyond repair. > >> > > On the other hand it seems interesting to repair instead of just > > giving up. It gives a good look at BTRFS resiliency/reliability. > > On the one hand Btrfs shouldn't become inconsistent in the first > place, that's the design goal. On the other hand, I'm finding > from the problems reported on the list that Btrfs increasingly > mounts at least read only and allows getting data off, even when > the file system isn't fully functional or repairable. > > In your case, once there are metadata problems even with raid 1, > it's difficult at best. But once you have the backup you could > try some other things once it's certain the hardware isn't > adding to the problems, which I'm still not yet certain of. > > >>> > >>> I'm ready to try anything. Let's experiment. > >> > >> I kinda think it's a waste of time. Someone else maybe has a better > >> idea? > >> > >> I think your time is better spent finding out when and why the > >> device with all of these write errors happened. It must have gone > >> missing for a while, and you need to find out why that happened > >> and prevent it; OR you have to be really vigilent at every mount > >> time to make sure both devices have the same transid (generation). > >> In my case when I tried to sabotage this, being of by a generation > >> of 1 wasn't a problem for Btrfs to automatically fix up but I > >> suspect it was only a generation mismatch in the superblock. > >> > > > > Ok I will follow your advice and start over with a fresh BTRFS > > volume. As explained on another email, rsync doesn't support > > reflink, so do you think it is worth trying with BTRFS send > > instead ? Is it safe to copy this way or rsync is more reliable in > > case of faulty BTRFS volume ? > > > If you have the space, btrfs restore would probably be the best > option. It's not likely, but using send has a risk of contaminating > the new filesystem as well. > I have to copy through the network (I am running out of disks...) so btrfs restore is unfortunately not an option. I didn't know that btrfs send could contaminate the target disk as well ? Ok rsync it is then. Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel bug during RAID1 replace
On Wed, 29 Jun 2016 11:28:24 -0600, Chris Murphywrote : > > Already got a backup. I just really want to try to repair it (in > > order to test BTRFS). > > I don't know that this is a good test because I think the file system > has already been sufficient corrupted that it can't be fixed. Part of > the problem is that Btrfs isn't aware of faulty drives like mdadm or > lvm yet, so it looks like it'll try to write to all devices and it's > possible for significant confusion to happen if they're each getting > different generation writes. Significant as in, currently beyond > repair. > > >> > On the other hand it seems interesting to repair instead of just > >> > giving up. It gives a good look at BTRFS resiliency/reliability. > >> > >> On the one hand Btrfs shouldn't become inconsistent in the first > >> place, that's the design goal. On the other hand, I'm finding from > >> the problems reported on the list that Btrfs increasingly mounts > >> at least read only and allows getting data off, even when the file > >> system isn't fully functional or repairable. > >> > >> In your case, once there are metadata problems even with raid 1, > >> it's difficult at best. But once you have the backup you could try > >> some other things once it's certain the hardware isn't adding to > >> the problems, which I'm still not yet certain of. > >> > > > > I'm ready to try anything. Let's experiment. > > I kinda think it's a waste of time. Someone else maybe has a better > idea? > > I think your time is better spent finding out when and why the device > with all of these write errors happened. It must have gone missing for > a while, and you need to find out why that happened and prevent it; OR > you have to be really vigilent at every mount time to make sure both > devices have the same transid (generation). In my case when I tried to > sabotage this, being of by a generation of 1 wasn't a problem for > Btrfs to automatically fix up but I suspect it was only a generation > mismatch in the superblock. > Ok I will follow your advice and start over with a fresh BTRFS volume. As explained on another email, rsync doesn't support reflink, so do you think it is worth trying with BTRFS send instead ? Is it safe to copy this way or rsync is more reliable in case of faulty BTRFS volume ? Many thanks ! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel bug during RAID1 replace
On Wed, 29 Jun 2016 11:50:55 +0200, Saint Germain <saint...@gmail.com> wrote : > So if I understand correctly, you advise to use check --repair > --init-csum-tree and delete the files which were reported as having > checksum error ? > After that I can compare the important files to a backup, but there is > always the non-important files which are not backuped. > > Is there anyway I can be sure afterwards that the volume is indeed > completely correct and reliable ? > If there is no way to be sure, I think it is better that I cp/rsync > all data to a new BTRFS volume. > Oh and I forgot to add that rsync doesn't support reflink yet, so I am bit reluctant to rsync all data to a new volume instead of repairing the existing BTRFS volume. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel bug during RAID1 replace
On Tue, 28 Jun 2016 22:25:32 -0600, Chris Murphywrote : > > Well I made a ddrescue image of both drives (only one error on sdb > > during ddrescue copy) and started the computer again (after > > disconnecting the old drives). > > What was the error? Any kernel message at the time of this error? > ddrescue reported an error during operation ("error: 1" displayed). Dump of /var/log/syslog during the ddrescue operation is appended at the end of this email. > > > I don't know if I should continue trying to repair this RAID1 or if > > I should just cp/rsync to a new BTRFS volume and get done with it. > > Well for sure already you should prepare to lose this volume, so > whatever backup you need, do that yesterday. Already got a backup. I just really want to try to repair it (in order to test BTRFS). > > On the other hand it seems interesting to repair instead of just > > giving up. It gives a good look at BTRFS resiliency/reliability. > > On the one hand Btrfs shouldn't become inconsistent in the first > place, that's the design goal. On the other hand, I'm finding from the > problems reported on the list that Btrfs increasingly mounts at least > read only and allows getting data off, even when the file system isn't > fully functional or repairable. > > In your case, once there are metadata problems even with raid 1, it's > difficult at best. But once you have the backup you could try some > other things once it's certain the hardware isn't adding to the > problems, which I'm still not yet certain of. > I'm ready to try anything. Let's experiment. > > > > Here is the log from the mount to the scrub aborting and the result > > from smartctl. > > > > Thanks for your precious help so far. > > > > > > BTRFS error (device sdb1): cleaner transaction attach returned -30 > > Not sure what this is. The Btrfs cleaner is used to remove snapshots, > decrement extent reference count, and if the count is 0, then free up > that space. So, why is it running? I don't know what -30 means. > > > > BTRFS info (device sdb1): disk space caching is enabled > > BTRFS info (device sdb1): bdev /dev/sdb1 errs: wr 11881695, rd 14, > > flush 7928, corrupt 1714507, gen 1335 BTRFS info (device sdb1): > > bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 21622, gen 24 > > I missed something the first time around in these messages: the > generation error. Both drives have generation errors. A generation > error on a single drive means that drive was not successfully being > written to or was missing. For it to happen on both drives is bad. If > it happens to just one drive, once it's reappears it will be passively > caught up to the other one as reads happen, but best practice for now > requires the user to run scrub or balance. If that doesn't happen and > a 2nd drive vanishes or has write errors that cause generation > mismatches, now both drives are simultaneously behind and ahead of > each other. Some commits went to one drive, some went to the other. > And right now Btrfs totally flips out and will irreparably get > corrupted. > > So I have to ask if this volume was ever mounted degraded? If not you > really need to look at logs and find out why the drives weren't being > written to. sdb show lots of write, flush, corruption and generation > errors, so it seems like it was having a hardware issue. But then sda > has only corruptions and generation problems, as if it wasn't even > connected or powered on. > > OR another possibility is one of the drives was previously cloned > (block copied), or snapshot via LVM and you ran into the block level > copies gotcha: > https://btrfs.wiki.kernel.org/index.php/Gotchas > I got some errors on sdb 2 months ago (I noticed it because it was suddenly mounted read-only). I ran a scrub and a check --repair, and a lot of errors were corrected. I deleted the files which were not repairable and everything was running smoothly since. I ran a scrub a few weeks ago and everything was fine. I never mounted in degraded mode or made a snapshot via LVM (I only upgraded both drives through "replace" 6 months ago). > > > BTRFS warning (device sdb1): checksum error at logical 93445255168 > > on dev /dev/sdb1, sector 54528696, root 5, inode 3434831, offset > > 479232, length 4096, links 1 (path: > > user/.local/share/zeitgeist/activity.sqlite-wal) > > Some extent data and its checksum don't match, on sdb. So this file is > considered corrupt. Maybe the data is OK and the checksum is wrong? > > > btrfs_dev_stat_print_on_error: 164 callbacks suppressed > > BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 11881695, rd 14, > > flush 7928, corrupt 1714508, gen 1335 scrub_handle_errored_block: > > 164 callbacks suppressed BTRFS error (device sdb1): unable to fixup > > (regular) error at logical 93445255168 on dev /dev/sdb1 > > And it can't be fixed, because... > > > BTRFS warning (device sdb1): checksum error at logical 93445255168 > > on dev /dev/sda1, sector
Re: Kernel bug during RAID1 replace
On Mon, 27 Jun 2016 20:14:58 -0600, Chris Murphy <li...@colorremedies.com> wrote : > On Mon, Jun 27, 2016 at 6:49 PM, Saint Germain <saint...@gmail.com> > wrote: > > > > > I've tried both option and launched a replace, but I got the same > > error (replace is cancelled, jernel bug). > > I will let these options on and attempt a ddrescue on /dev/sda > > to /dev/sdd. > > Then I will disconnect /dev/sda and reboot and see if it works > > better. > > Sounds reasonable. Just make sure the file system is already unmounted > when you use ddrescue because otherwise you're block copying it while > it could be modified while rw mounted (generation number tends to get > incremented while rw mounted). > > Well I made a ddrescue image of both drives (only one error on sdb during ddrescue copy) and started the computer again (after disconnecting the old drives). However the errors remains there, and I still cannot scrub (scrub is aborted), nor delete the file which have errors (drive is remounted read-only if I try to delete the files). I don't know if I should continue trying to repair this RAID1 or if I should just cp/rsync to a new BTRFS volume and get done with it. On the other hand it seems interesting to repair instead of just giving up. It gives a good look at BTRFS resiliency/reliability. Here is the log from the mount to the scrub aborting and the result from smartctl. Thanks for your precious help so far. BTRFS error (device sdb1): cleaner transaction attach returned -30 BTRFS info (device sdb1): disk space caching is enabled BTRFS info (device sdb1): bdev /dev/sdb1 errs: wr 11881695, rd 14, flush 7928, corrupt 1714507, gen 1335 BTRFS info (device sdb1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 21622, gen 24 scrub_handle_errored_block: 164 callbacks suppressed BTRFS warning (device sdb1): checksum error at logical 93445255168 on dev /dev/sdb1, sector 54528696, root 5, inode 3434831, offset 479232, length 4096, links 1 (path: user/.local/share/zeitgeist/activity.sqlite-wal) btrfs_dev_stat_print_on_error: 164 callbacks suppressed BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 11881695, rd 14, flush 7928, corrupt 1714508, gen 1335 scrub_handle_errored_block: 164 callbacks suppressed BTRFS error (device sdb1): unable to fixup (regular) error at logical 93445255168 on dev /dev/sdb1 BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 11881695, rd 14, flush 7928, corrupt 1714509, gen 1335 BTRFS error (device sdb1): unable to fixup (regular) error at logical 93445259264 on dev /dev/sdb1 BTRFS warning (device sdb1): checksum error at logical 93445255168 on dev /dev/sda1, sector 77669048, root 5, inode 3434831, offset 479232, length 4096, links 1 (path: user/.local/share/zeitgeist/activity.sqlite-wal) BTRFS error (device sdb1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 21623, gen 24 BTRFS error (device sdb1): unable to fixup (regular) error at logical 93445255168 on dev /dev/sda1 BTRFS error (device sdb1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 21624, gen 24 BTRFS error (device sdb1): unable to fixup (regular) error at logical 93445259264 on dev /dev/sda1 BTRFS warning (device sdb1): checksum error at logical 136349810688 on dev /dev/sda1, sector 140429952, root 5, inode 4265283, offset 0, length 4096, links 1 (path: user/Pictures/Picture-42-2.jpg) BTRFS error (device sdb1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 21625, gen 24 BTRFS warning (device sdb1): checksum error at logical 136349929472 on dev /dev/sda1, sector 140430184, root 5, inode 4265283, offset 118784, length 4096, links 1 (path: user/Pictures/Picture-42-2.jpg) BTRFS warning (device sdb1): checksum error at logical 136350060544 on dev /dev/sda1, sector 140430440, root 5, inode 4265283, offset 249856, length 4096, links 1 (path: user/Pictures/Picture-42-2.jpg) BTRFS error (device sdb1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 21626, gen 24 BTRFS error (device sdb1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 21627, gen 24 BTRFS error (device sdb1): unable to fixup (regular) error at logical 136349810688 on dev /dev/sda1 BTRFS error (device sdb1): unable to fixup (regular) error at logical 136350060544 on dev /dev/sda1 BTRFS error (device sdb1): unable to fixup (regular) error at logical 136349929472 on dev /dev/sda1 BTRFS warning (device sdb1): checksum error at logical 136349814784 on dev /dev/sda1, sector 140429960, root 5, inode 4265283, offset 4096, length 4096, links 1 (path: user/Pictures/Picture-42-2.jpg) BTRFS error (device sdb1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 21628, gen 24 BTRFS warning (device sdb1): checksum error at logical 136350064640 on dev /dev/sda1, sector 140430448, root 5, inode 4265283, offset 253952, length 4096, links 1 (path: user/Pictures/Picture-42-2.jpg) BTRFS error (device sdb1): bdev /dev/sda1 errs: wr 0, rd 0, flush
Re: Kernel bug during RAID1 replace
On Mon, 27 Jun 2016 18:00:34 -0600, Chris Murphy <li...@colorremedies.com> wrote : > On Mon, Jun 27, 2016 at 5:06 PM, Saint Germain <saint...@gmail.com> > wrote: > > On Mon, 27 Jun 2016 16:58:37 -0600, Chris Murphy > > <li...@colorremedies.com> wrote : > > > >> On Mon, Jun 27, 2016 at 4:55 PM, Chris Murphy > >> <li...@colorremedies.com> wrote: > >> > >> >> BTRFS info (device sdb1): dev_replace from /dev/sda1 (devid 1) > >> >> to /dev/sdd1 started scrub_handle_errored_block: 166 callbacks > >> >> suppressed BTRFS warning (device sdb1): checksum error at > >> >> logical 93445255168 on dev /dev/sda1, sector 77669048, root 5, > >> >> inode 3434831, offset 479232, length 4096, links 1 (path: > >> >> user/.local/share/zeitgeist/activity.sqlite-wal) > >> >> btrfs_dev_stat_print_on_error: 166 callbacks suppressed BTRFS > >> >> error (device sdb1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, > >> >> corrupt 14221, gen 24 scrub_handle_errored_block: 166 callbacks > >> >> suppressed BTRFS error (device sdb1): unable to fixup (regular) > >> >> error at logical 93445255168 on dev /dev/sda1 > >> > > >> > Shoot. You have a lot of these. It looks suspiciously like you're > >> > hitting a case list regulars are only just starting to understand > >> > >> Forget this part completely. It doesn't affect raid1. I just > >> re-read that your setup is not raid1, I don't know why I thought > >> it was raid5. > >> > >> The likely issue here is that you've got legit corruptions on sda > >> (mix of slow and flat out bad sectors), as well as a failing drive. > >> > >> This is also safe to issue: > >> > >> smartctl -l scterc /dev/sda > >> smartctl -l scterc /dev/sdb > >> cat /sys/block/sda/device/timeout > >> cat /sys/block/sdb/device/timeout > >> > > > > My setup is indeed RAID1 (and not RAID5) > > > > root@system:/# smartctl -l scterc /dev/sda > > smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.6.0-0.bpo.1-amd64] > > (local build) Copyright (C) 2002-14, Bruce Allen, Christian Franke, > > www.smartmontools.org > > > > SCT Error Recovery Control: > >Read: Disabled > > Write: Disabled > > > > root@system:/# smartctl -l scterc /dev/sdb > > smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.6.0-0.bpo.1-amd64] > > (local build) Copyright (C) 2002-14, Bruce Allen, Christian Franke, > > www.smartmontools.org > > > > SCT Error Recovery Control: > >Read: Disabled > > Write: Disabled > > > > root@system:/# cat /sys/block/sda/device/timeout > > 30 > > root@system:/# cat /sys/block/sdb/device/timeout > > 30 > > Good news and bad news. The bad news is this is a significant > misconfiguration, it's very common, and it means that any bad sectors > that don't result in read errors before 30 seconds will mean they > don't get fixed by Btrfs (or even mdadm or LVM raid). So they can > accumulate. > > There are two options since your drives support SCT ERC. > > 1. > smartctl -l scterc,70,70 /dev/sdX ## done for both drives > > That will make sure the drive reports a read error in 7 seconds, well > under the kernel's command timer of 7 seconds. This is how your drives > should normally be configured for RAID usage. > > 2. > echo 180 > /sys/block/sda/device/timeout > echo 180 > /sys/block/sdb/device/timeout > > This *might* actually work better in your case. If you permit the > drives to have really long error recovery, it might actually allow the > data to be returned to Btrfs and then it can start fixing problems. > Maybe. It's a long shot. And there will be upwards of 3 minute hangs. > > I would give this a shot first. You can issue these commands safely at > any time, no umount is needed or anything like that. I would do this > even before using cp/rsync or ddrescue because it increases the chance > the drive can recover data from these bad sectors and fix the other > drive. > > These settings are not persistent across a reboot unless you set a > udev rule or equivalent. > > On one of my drives that supports SCT ERC it only accepts the smartctl > -l command to set the timeout once. I can't change it without power > cycling the drive or it just crashes (yay firmware bugs). Just FYI > it's possible to run into other weirdness. > I've tried both option and launched a replace, but I got the same error (replace is cancelled, jernel bug)
Re: Kernel bug during RAID1 replace
On Mon, 27 Jun 2016 16:58:37 -0600, Chris Murphywrote : > On Mon, Jun 27, 2016 at 4:55 PM, Chris Murphy > wrote: > > >> BTRFS info (device sdb1): dev_replace from /dev/sda1 (devid 1) > >> to /dev/sdd1 started scrub_handle_errored_block: 166 callbacks > >> suppressed BTRFS warning (device sdb1): checksum error at logical > >> 93445255168 on dev /dev/sda1, sector 77669048, root 5, inode > >> 3434831, offset 479232, length 4096, links 1 (path: > >> user/.local/share/zeitgeist/activity.sqlite-wal) > >> btrfs_dev_stat_print_on_error: 166 callbacks suppressed BTRFS > >> error (device sdb1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, > >> corrupt 14221, gen 24 scrub_handle_errored_block: 166 callbacks > >> suppressed BTRFS error (device sdb1): unable to fixup (regular) > >> error at logical 93445255168 on dev /dev/sda1 > > > > Shoot. You have a lot of these. It looks suspiciously like you're > > hitting a case list regulars are only just starting to understand > > Forget this part completely. It doesn't affect raid1. I just re-read > that your setup is not raid1, I don't know why I thought it was raid5. > > The likely issue here is that you've got legit corruptions on sda (mix > of slow and flat out bad sectors), as well as a failing drive. > > This is also safe to issue: > > smartctl -l scterc /dev/sda > smartctl -l scterc /dev/sdb > cat /sys/block/sda/device/timeout > cat /sys/block/sdb/device/timeout > My setup is indeed RAID1 (and not RAID5) root@system:/# smartctl -l scterc /dev/sda smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.6.0-0.bpo.1-amd64] (local build) Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org SCT Error Recovery Control: Read: Disabled Write: Disabled root@system:/# smartctl -l scterc /dev/sdb smartctl 6.4 2014-10-07 r4002 [x86_64-linux-4.6.0-0.bpo.1-amd64] (local build) Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org SCT Error Recovery Control: Read: Disabled Write: Disabled root@system:/# cat /sys/block/sda/device/timeout 30 root@system:/# cat /sys/block/sdb/device/timeout 30 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel bug during RAID1 replace
On Mon, 27 Jun 2016 16:55:07 -0600, Chris Murphy <li...@colorremedies.com> wrote : > On Mon, Jun 27, 2016 at 4:26 PM, Saint Germain <saint...@gmail.com> > wrote: > > >> > > > > Thanks for your help. > > > > Ok here is the log from the mounting, and including btrfs replace > > (btrfs replace start -f /dev/sda1 /dev/sdd1 /home): > > > > BTRFS info (device sdb1): disk space caching is enabled > > BTRFS info (device sdb1): bdev /dev/sdb1 errs: wr 11881695, rd 12, > > flush 7928, corrupt 1705631, gen 1335 BTRFS info (device sdb1): > > bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 14220, gen 24 > > Eek. So sdb has 11+ million write errors, flush errors, read errors, > and over 1 million corruptions. It's dying or dead. > > And sda has a dozen thousand+ corruptions. This isn't a good > combination, as you have two devices with problems and raid5 only > protects you from one device with problems. > > You were in the process of replacing sda, which is good, but it may > not be enough... > > > > BTRFS info (device sdb1): dev_replace from /dev/sda1 (devid 1) > > to /dev/sdd1 started scrub_handle_errored_block: 166 callbacks > > suppressed BTRFS warning (device sdb1): checksum error at logical > > 93445255168 on dev /dev/sda1, sector 77669048, root 5, inode > > 3434831, offset 479232, length 4096, links 1 (path: > > user/.local/share/zeitgeist/activity.sqlite-wal) > > btrfs_dev_stat_print_on_error: 166 callbacks suppressed BTRFS error > > (device sdb1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt > > 14221, gen 24 scrub_handle_errored_block: 166 callbacks suppressed > > BTRFS error (device sdb1): unable to fixup (regular) error at > > logical 93445255168 on dev /dev/sda1 > > Shoot. You have a lot of these. It looks suspiciously like you're > hitting a case list regulars are only just starting to understand > (somewhat) where it's possible to have a legit corrupt sector that > Btrfs detects during scrub as wrong, fixes it from parity, but then > occasionally wrongly overwrites the parity with bad parity. This > doesn't cause an immediately recognizable problem. But if the volume > becomes degraded later, Btrfs must use parity to reconstruct > on-the-fly and if it hits one of these bad parities, the > reconstruction is bad, and ends up causing lots of these checksum > errors. We can tell it's not metadata corruption because a.) there's a > file listed as being affected and b.) the file system doesn't fail and > go read only. But still it means those files are likely toast... > > > [...snip many instances of checksum errors...] > > > BTRFS error (device sdb1): bdev /dev/sda1 errs: wr 0, rd 0, flush > > 0, corrupt 16217, gen 24 ata2.00: exception Emask 0x0 SAct 0x4000 > > SErr 0x0 action 0x0 ata2.00: irq_stat 0x4008 > > ata2.00: failed command: READ FPDMA QUEUED > > ata2.00: cmd 60/08:70:08:d8:70/00:00:0f:00:00/40 tag 14 ncq 4096 in > > res 41/40:00:08:d8:70/00:00:0f:00:00/40 Emask 0x409 (media > > error) ata2.00: status: { DRDY ERR } > > ata2.00: error: { UNC } > > ata2.00: configured for UDMA/133 > > sd 1:0:0:0: [sdb] tag#14 FAILED Result: hostbyte=DID_OK > > driverbyte=DRIVER_SENSE sd 1:0:0:0: [sdb] tag#14 Sense Key : Medium > > Error [current] [descriptor] sd 1:0:0:0: [sdb] tag#14 Add. Sense: > > Unrecovered read error - auto reallocate failed sd 1:0:0:0: [sdb] > > tag#14 CDB: Read(10) 28 00 0f 70 d8 08 00 00 08 00 > > blk_update_request: I/O error, dev sdb, sector 259053576 > > OK yeah so bad sector on sdb. So you have two failures because sda is > already giving you trouble while being replaced and on top of it you > now get a 2nd (partial) failure via bad sectors. > > So rather urgently I think you need to copy things off this volume if > you don't already have a backup so you can save as much as possible. > Don't write to the drives. You might even consider 'mount -o > remount,ro' to avoid anything writing to the volume. Copy the most > important data first, triage time. > > While that happens you can safely collect some more information: > > btrfs fi us > smartctl -x## for both drives > Ok thanks I will begin to make an image with dd. Do you recommend to use sda or sdb ? In the meantime here are the info requested: btrfs fi us /home Overall: Device size: 3.63TiB Device allocated: 2.76TiB Device unallocated: 888.51GiB Device missing: 0.00B Used: 2.62TiB Free (estimated):517.56GiB (min: 517.56GiB) Data ratio: 2.00 Metadata ratio:
Re: Kernel bug during RAID1 replace
On Mon, 27 Jun 2016 15:42:42 -0600, Chris Murphy <li...@colorremedies.com> wrote : > On Mon, Jun 27, 2016 at 3:36 PM, Saint Germain <saint...@gmail.com> > wrote: > > Hello, > > > > I am on Debian Jessie with a kernel from backports: > > 4.6.0-0.bpo.1-amd64 > > > > I am also using btrfs-tools 4.4.1-1.1~bpo8+1 > > > > When trying to replace a RAID1 drive (with btrfs replace start > > -f /dev/sda1 /dev/sdd1), the operation is cancelled after completing > > only 5%. > > > > I got this error in the /var/log/syslog: > > [ cut here ] > > WARNING: CPU: 2 PID: 2617 > > at /build/linux-9LouV5/linux-4.6.1/fs/btrfs/dev-replace.c:430 > > btrfs_dev_replace_start+0x2be/0x400 [btrfs] Modules linked in: > > uas(E) usb_storage(E) bnep(E) ftdi_sio(E) usbserial(E) > > snd_hda_codec_hdmi(E) nls_utf8(E) nls_cp437(E) vfat(E) fat(E) > > intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) > > coretemp(E) kvm_intel(E) kvm(E) iTCO_wdt(E) irqbypass(E) > > iTCO_vendor_support(E) crct10dif_pclmul(E) crc32_pclmul(E) > > ghash_clmulni_intel(E) hmac(E) drbg(E) ansi_cprng(E) aesni_intel(E) > > aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) > > cryptd(E) wl(POE) btusb(E) btrtl(E) btbcm(E) btintel(E) cfg80211(E) > > bluetooth(E) efi_pstore(E) snd_hda_codec_realtek(E) evdev(E) > > crc16(E) serio_raw(E) pcspkr(E) efivars(E) joydev(E) > > snd_hda_codec_generic(E) rfkill(E) snd_hda_intel(E) nuvoton_cir(E) > > rc_core(E) snd_hda_codec(E) i915(E) battery(E) snd_hda_core(E) > > snd_hwdep(E) soc_button_array(E) tpm_tis(E) drm_kms_helper(E) > > intel_smartconnect(E) snd_pcm(E) tpm(E) video(E) i2c_i801(E) > > snd_timer(E) drm(E) snd(E) lpc_ich(E) i2c_algo_bit(E) soundcore(E) > > mfd_core(E) mei_me(E) processor(E) button(E) mei(E) shpchp(E) > > fuse(E) autofs4(E) hid_logitech_hidpp(E) btrfs(E) > > hid_logitech_dj(E) usbhid(E) hid(E) xor(E) raid6_pq(E) sg(E) > > sr_mod(E) cdrom(E) sd_mod(E) crc32c_intel(E) ahci(E) libahci(E) > > libata(E) psmouse(E) scsi_mod(E) xhci_pci(E) ehci_pci(E) > > xhci_hcd(E) ehci_hcd(E) e1000e(E) usbcore(E) ptp(E) pps_core(E) > > usb_common(E) fjes(E) CPU: 2 PID: 2617 Comm: btrfs Tainted: > > P OE 4.6.0-0.bpo.1-amd64 #1 Debian 4.6.1-1~bpo8+1 > > Hardware name: To Be Filled By O.E.M. To Be Filled By > > O.E.M./Z87E-ITX, BIOS P2.10 10/04/2013 0286 > > f0ba7fe7 813123c5 > > 8107af94 880186caf000 fffb 8800c76b0800 > > 8800cae7 8800cae70ee0 7ffdd5397d98 Call Trace: > > [] ? dump_stack+0x5c/0x77 [] ? > > __warn+0xc4/0xe0 [] ? > > btrfs_dev_replace_start+0x2be/0x400 [btrfs] [] ? > > btrfs_ioctl+0x1d42/0x2190 [btrfs] [] ? > > handle_mm_fault+0x154d/0x1cb0 [] ? > > do_vfs_ioctl+0x99/0x5d0 [] ? SyS_ioctl+0x76/0x90 > > [] ? system_call_fast_compare_end+0xc/0x96 > > ---[ end trace 9fbfaa137cc5a72a ]--- > > > > > > > > What should I do to replace correctly my drive ? > > I don't often see handle_mm_fault with btrfs problems, maybe the > entire dmesg from mounting the fs and including btrfs replace would > reveal a related problem that instigates the failure? > > If the device being replaced is acting unreliably, then you'd want to > use -r with replace to ignore that device unless it's absolutely > necessary to read from it. > Thanks for your help. Ok here is the log from the mounting, and including btrfs replace (btrfs replace start -f /dev/sda1 /dev/sdd1 /home): BTRFS info (device sdb1): disk space caching is enabled BTRFS info (device sdb1): bdev /dev/sdb1 errs: wr 11881695, rd 12, flush 7928, corrupt 1705631, gen 1335 BTRFS info (device sdb1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 14220, gen 24 BTRFS info (device sdb1): dev_replace from /dev/sda1 (devid 1) to /dev/sdd1 started scrub_handle_errored_block: 166 callbacks suppressed BTRFS warning (device sdb1): checksum error at logical 93445255168 on dev /dev/sda1, sector 77669048, root 5, inode 3434831, offset 479232, length 4096, links 1 (path: user/.local/share/zeitgeist/activity.sqlite-wal) btrfs_dev_stat_print_on_error: 166 callbacks suppressed BTRFS error (device sdb1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 14221, gen 24 scrub_handle_errored_block: 166 callbacks suppressed BTRFS error (device sdb1): unable to fixup (regular) error at logical 93445255168 on dev /dev/sda1 BTRFS error (device sdb1): bdev /dev/sda1 errs: wr 0, rd 0, flush 0, corrupt 14222, gen 24 BTRFS error (device sdb1): unable to fixup (regular) error at logical 93445259264 on dev /dev/sda1 BTRFS warning (device sdb1): checksum error at l
Kernel bug during RAID1 replace
Hello, I am on Debian Jessie with a kernel from backports: 4.6.0-0.bpo.1-amd64 I am also using btrfs-tools 4.4.1-1.1~bpo8+1 When trying to replace a RAID1 drive (with btrfs replace start -f /dev/sda1 /dev/sdd1), the operation is cancelled after completing only 5%. I got this error in the /var/log/syslog: [ cut here ] WARNING: CPU: 2 PID: 2617 at /build/linux-9LouV5/linux-4.6.1/fs/btrfs/dev-replace.c:430 btrfs_dev_replace_start+0x2be/0x400 [btrfs] Modules linked in: uas(E) usb_storage(E) bnep(E) ftdi_sio(E) usbserial(E) snd_hda_codec_hdmi(E) nls_utf8(E) nls_cp437(E) vfat(E) fat(E) intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) iTCO_wdt(E) irqbypass(E) iTCO_vendor_support(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) hmac(E) drbg(E) ansi_cprng(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) wl(POE) btusb(E) btrtl(E) btbcm(E) btintel(E) cfg80211(E) bluetooth(E) efi_pstore(E) snd_hda_codec_realtek(E) evdev(E) crc16(E) serio_raw(E) pcspkr(E) efivars(E) joydev(E) snd_hda_codec_generic(E) rfkill(E) snd_hda_intel(E) nuvoton_cir(E) rc_core(E) snd_hda_codec(E) i915(E) battery(E) snd_hda_core(E) snd_hwdep(E) soc_button_array(E) tpm_tis(E) drm_kms_helper(E) intel_smartconnect(E) snd_pcm(E) tpm(E) video(E) i2c_i801(E) snd_timer(E) drm(E) snd(E) lpc_ich(E) i2c_algo_bit(E) soundcore(E) mfd_core(E) mei_me(E) processor(E) button(E) mei(E) shp chp(E) fuse(E) autofs4(E) hid_logitech_hidpp(E) btrfs(E) hid_logitech_dj(E) usbhid(E) hid(E) xor(E) raid6_pq(E) sg(E) sr_mod(E) cdrom(E) sd_mod(E) crc32c_intel(E) ahci(E) libahci(E) libata(E) psmouse(E) scsi_mod(E) xhci_pci(E) ehci_pci(E) xhci_hcd(E) ehci_hcd(E) e1000e(E) usbcore(E) ptp(E) pps_core(E) usb_common(E) fjes(E) CPU: 2 PID: 2617 Comm: btrfs Tainted: P OE 4.6.0-0.bpo.1-amd64 #1 Debian 4.6.1-1~bpo8+1 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z87E-ITX, BIOS P2.10 10/04/2013 0286 f0ba7fe7 813123c5 8107af94 880186caf000 fffb 8800c76b0800 8800cae7 8800cae70ee0 7ffdd5397d98 Call Trace: [] ? dump_stack+0x5c/0x77 [] ? __warn+0xc4/0xe0 [] ? btrfs_dev_replace_start+0x2be/0x400 [btrfs] [] ? btrfs_ioctl+0x1d42/0x2190 [btrfs] [] ? handle_mm_fault+0x154d/0x1cb0 [] ? do_vfs_ioctl+0x99/0x5d0 [] ? SyS_ioctl+0x76/0x90 [] ? system_call_fast_compare_end+0xc/0x96 ---[ end trace 9fbfaa137cc5a72a ]--- What should I do to replace correctly my drive ? Thanks in advance, -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS with RAID1 cannot boot when removing drive
On Fri, 14 Feb 2014 15:33:10 +0100, Saint Germain saint...@gmail.com wrote : On 11 February 2014 03:30, Saint Germain saint...@gmail.com wrote: I am experimenting with BTRFS and RAID1 on my Debian Wheezy (with backported kernel 3.12-0.bpo.1-amd64) using a a motherboard with UEFI. I have installed Debian with the following partition on the first hard drive (no BTRFS subsystem): /dev/sda1: for / (BTRFS) /dev/sda2: for /home (BTRFS) /dev/sda3: for swap Then I added another drive for a RAID1 configuration (with btrfs balance) and I installed grub on the second hard drive with grub-install /dev/sdb. You should be able to mount a two-device btrfs raid1 filesystem with only a single device with the degraded mount option, tho I believe current kernels refuse a read-write mount in that case, so you'll have read-only access until you btrfs device add a second device, so it can do normal raid1 mode once again. Meanwhile, I don't believe it's on the wiki, but it's worth noting my experience with btrfs raid1 mode in my pre-deployment tests. Actually, with the (I believe) mandatory read-only mount if raid1 is degraded below two devices, this problem's going to be harder to run into than it was in my testing several kernels ago, but here's what I found: But as I said, if btrfs only allows read-only mounts of filesystems without enough devices to properly complete the raidlevel, that shouldn't be as big an issue these days, since it should be more difficult or impossible to get the two devices separately mounted writable in the first place, with the consequence that the differing copies issue will be difficult or impossible to trigger in the first place. =:^) Hello, With your advices and Chris ones, I have now a (clean ?) partition to start experimenting with RAID1 (and which boot correctly in UEFI mode): sda1 = BIOS Boot partition sda2 = EFI System Partition sda3 = BTFS partition sda4 = swap partition For the moment I haven't created subvolumes (for / and for /home for instance) to keep things simple. The idea is then to create a RAID1 with a sdb drive (duplicate sda partitioning, add/balance/convert sdb3 + grub-install on sdb, add sdb swap UUID in /etc/fstab), shutdown and remove sda to check the procedure to replace it. I read the last thread on the subject lost with degraded RAID1, but would like to really confirm what would be the current approved procedure and if it will be valid for future BTRFS version (especially about the read-only mount). So what should I do from there ? Here are a few questions: 1) Boot in degraded mode: currently with my kernel (3.12-0.bpo.1-amd64, from Debian wheezy-backports) it seems that I can mount in read-write mode. However for future kernel, it seems that I will be only able to mount read-only ? See here: http://www.spinics.net/lists/linux-btrfs/msg20164.html https://bugzilla.kernel.org/show_bug.cgi?id=60594 2) If I am able to mount read-write, is this the correct procedure: a) place a new drive in another physical location sdc (I don't think I can use the same sda physical location ?) b) boot in degraded mode on sdb c) use the 'replace' command to replace sda by sdc d) perhaps a 'balance' is necessary ? 3) Can I use also the above procedure if I am only allowed to mount read-only ? 4) If I want to use my system without RAID1 support (dangerous I know), after booting in degraded mode with read-write, can I convert back sdb from RAID1 to RAID0 in a safe way ? (btrfs balance start -dconvert=raid0 -mconvert=raid0 /) To continue with this RAID1 recovery procedure (Debian stable with kernel 3.12-0.bpo.1-amd64), I tried to reproduce Duncan setup and the result is not good. Starting with a clean setup of 2 hard drive in RAID1 (sda and sdb) and a clean snapshot of the rootfs: 1) poweroff, disconnect sda and boot on sdb with rootflags=ro,degraded 2) sdb is mounted ro but automatically remounted read-write by initramf 3) create a file witness1 and modify a file test.txt with 'alpha' inside 4) poweroff, connect sda, disconnect sdb and boot on sda 5) create a file witness2 and modify a file test.txt with 'beta' inside 6) poweroff, connect sdb and boot on sda 7) the modification from step 3 are there (but not from step 5) 8) launch scrub: a lot of errors are detected but no unrepairable errors 9) poweroff, disconnect sdb, boot on sda 10) the modification from step 3 are there (but not from step 5) 11) poweroff, boot on sda: kernel panic on startup 12) reboot, boot is possible 13) launch scrub: a lot of errors and kernel error 14) reboot, error on boot, and same error as step 13 with scrub 15) boot on previous snapshot of step1, same error on boot and same error as step 13 with scrub. I hope that it will be useful for someone. It seems that mounting read-write is really not a good idea (have to find how to force ro with Debian). The RAID1
Re: BTRFS with RAID1 cannot boot when removing drive
On 11 February 2014 03:30, Saint Germain saint...@gmail.com wrote: I am experimenting with BTRFS and RAID1 on my Debian Wheezy (with backported kernel 3.12-0.bpo.1-amd64) using a a motherboard with UEFI. I have installed Debian with the following partition on the first hard drive (no BTRFS subsystem): /dev/sda1: for / (BTRFS) /dev/sda2: for /home (BTRFS) /dev/sda3: for swap Then I added another drive for a RAID1 configuration (with btrfs balance) and I installed grub on the second hard drive with grub-install /dev/sdb. You should be able to mount a two-device btrfs raid1 filesystem with only a single device with the degraded mount option, tho I believe current kernels refuse a read-write mount in that case, so you'll have read-only access until you btrfs device add a second device, so it can do normal raid1 mode once again. Meanwhile, I don't believe it's on the wiki, but it's worth noting my experience with btrfs raid1 mode in my pre-deployment tests. Actually, with the (I believe) mandatory read-only mount if raid1 is degraded below two devices, this problem's going to be harder to run into than it was in my testing several kernels ago, but here's what I found: But as I said, if btrfs only allows read-only mounts of filesystems without enough devices to properly complete the raidlevel, that shouldn't be as big an issue these days, since it should be more difficult or impossible to get the two devices separately mounted writable in the first place, with the consequence that the differing copies issue will be difficult or impossible to trigger in the first place. =:^) Hello, With your advices and Chris ones, I have now a (clean ?) partition to start experimenting with RAID1 (and which boot correctly in UEFI mode): sda1 = BIOS Boot partition sda2 = EFI System Partition sda3 = BTFS partition sda4 = swap partition For the moment I haven't created subvolumes (for / and for /home for instance) to keep things simple. The idea is then to create a RAID1 with a sdb drive (duplicate sda partitioning, add/balance/convert sdb3 + grub-install on sdb, add sdb swap UUID in /etc/fstab), shutdown and remove sda to check the procedure to replace it. I read the last thread on the subject lost with degraded RAID1, but would like to really confirm what would be the current approved procedure and if it will be valid for future BTRFS version (especially about the read-only mount). So what should I do from there ? Here are a few questions: 1) Boot in degraded mode: currently with my kernel (3.12-0.bpo.1-amd64, from Debian wheezy-backports) it seems that I can mount in read-write mode. However for future kernel, it seems that I will be only able to mount read-only ? See here: http://www.spinics.net/lists/linux-btrfs/msg20164.html https://bugzilla.kernel.org/show_bug.cgi?id=60594 2) If I am able to mount read-write, is this the correct procedure: a) place a new drive in another physical location sdc (I don't think I can use the same sda physical location ?) b) boot in degraded mode on sdb c) use the 'replace' command to replace sda by sdc d) perhaps a 'balance' is necessary ? 3) Can I use also the above procedure if I am only allowed to mount read-only ? 4) If I want to use my system without RAID1 support (dangerous I know), after booting in degraded mode with read-write, can I convert back sdb from RAID1 to RAID0 in a safe way ? (btrfs balance start -dconvert=raid0 -mconvert=raid0 /) 5) Perhaps a recovery procedure which includes booting on a different rescue disk would be more appropriate ? Thanks again, -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS partitioning scheme (was BTRFS with RAID1 cannot boot when removing drive)
On Thu, 13 Feb 2014 10:43:08 -0700, Chris Murphy li...@colorremedies.com wrote : sda3 = 1 TiB root partition (BTRFS), mounted on / sda4 = 6 GiB swap partition (that way I should be able to be compatible with both CSM or UEFI) B) normal Debian installation on sdas, activate the CSM on the motherboard and reboot. C) apt-get install grub-efi-amd64 and grub-install /dev/sda And the problems begin: 1) grub-install doesn't give any error but using the --debug I can see that it is not using EFI. 2) Ok I force with grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=grub --recheck --debug /dev/sda 3) This time something is generated in /boot/efi: /boot/efi/EFI/grub/grubx64.efi 4) Copy the file /boot/efi/EFI/grub/grubx64.efi to /boot/efi/EFI/boot/bootx64.efi is EFI/boot/ correct here? If you're lucky then your BIOS will tell what path it will try to read for the boot code. For me that is /EFI/debian/grubx64.efi. I followed the advices here (first result on Google with grub uefi debian): http://tanguy.ortolo.eu/blog/article51/debian-efi 5) Reboot and disable the CSM on the motherboard 6) No boot possible, I always go directly to the UEFI-BIOS I am currently stuck there. I read a lot of conflicting advises which doesn't work: - use modprobe efivars and efibootmgr: not possible because I have not booted in EFI (chicken-egg problem) Not exactly. Boot in EFI mode into your favourite installer rescue mode, then chroot into the target filesystem and run efibootmgr there. In the end I managed to do it like this: 1) Make a USB stick with FAT32 partition 2) Install grub on it with: grub-install --target=x86_64-efi --efi-directory=/media/usb0 --removable 3) Note on a paper the grub commands to start the kernel in /boot/grub/grub.cfg 3) Reboot, Disable CSM in the motherboard boot utility (BIOS?), Reboot with the USB stick connected 4) Normally it should have started on the USB stick grub command-line 5) Enter the necessary command to start the kernel (if you have some problem with video mode, use insmod efi_gop) 6) Normally your operating system should start normally 7) Check that efibootmgr is installed and working (normally efivars should be loaded in the modules already) 8) grub-install --efi-directory=/boot/efi --recheck --debug (with the debug info you should see that it is using grub-efi and not grub-pc) 9) efibootmgr -c -d /dev/sda -p 2 -w -L Debian (GRUB) -l '\EFI\Debian\grubx64.efi' (replace -p 2 with yout correct ESP partition number) 10) Reboot and enjoy ! OK at least with GRUB 2.00 I never have to use any options with grub-install when installing to a chrooted system. It also even writes the proper entry into NVRAM for me, I don't have to use efibootmgr. Yes you are right, this is probably unnecessary (see below). Also I've never had single \ work with efibootmgr from shell. I have to use \\. Try typing efibootmgr -v to see the actual entry you created and whether it has the \ in the path. Here is the output: BootCurrent: 0001 Timeout: 1 seconds BootOrder: 0001, Boot* debian HD(2,7d8,106430,5d012c09-b70d-4225-ae18-9831f997c493)File(\EFI\debian\grubx64.efi) Boot0001* Debian (GRUB) HD(2,7d8,106430,5d012c09-b70d-4225-ae18-9831f997c493)File(\EFI\Debian\grubx64.efi) Ah the joy of FAT32 and the case sensitivity ! So it seems that grub-install automatically install the correct entry and using efibootmgr was unnecessary. However it seems that single '\' works. But one thing that explains why the UEFI bootloading stuff is confusing for you is that every distro keeps their own grub patches. So there is very possibly a lot of difference between the downstream grub behaviors, and upstream. Understood. That is why I took the step to describe what I did. Perhaps it will be useful for others (most info on the topic was not for Debian...). Thanks again for your insights ! -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS partitioning scheme (was BTRFS with RAID1 cannot boot when removing drive)
On 13 February 2014 09:50, Frank Kingswood fr...@kingswood-consulting.co.uk wrote: On 12/02/14 17:13, Saint Germain wrote: Ok based on your advices, here is what I have done so far to use UEFI (remeber that the objective is to have a clean and simple BTRFS RAID1 install). A) I start first with only one drive, I have gone with the following partition scheme (Debian wheezy, kernel 3.12, grub 2.00, GPT partition with parted): sda1 = 1MiB BIOS Boot partition (no FS, set 1 bios_grub on with parted to set the type) sda2 = 550 MiB EFI System Partition (FAT32, toggle 2 boot with parted to set the type), mounted on /boot/efi I'm curious, why so big? There's only one file of about 100kb there, and I was considering shrinking mine to the minimum possible (which seems to be about 33 MB). It is quite difficult to find reliable information on this whole UEFI boot with linux (info you can find for sure, but which ones to follow ? there are so many different info out there). So I don't know if this 550 MiB is an urban legend or not, but you can find several people recommending it and the reason why: http://askubuntu.com/questions/336439/any-problems-with-this-partition-scheme http://askubuntu.com/questions/287441/different-uses-of-term-efi-partition https://bbs.archlinux.org/viewtopic.php?pid=1306753 http://forums.gentoo.org/viewtopic-p-7352214.html Other people recommend around 200-300 MiB, so I basically took the upper limit to see what happen. If you have more reliable info on the topic I would be interested ! sda3 = 1 TiB root partition (BTRFS), mounted on / sda4 = 6 GiB swap partition (that way I should be able to be compatible with both CSM or UEFI) B) normal Debian installation on sdas, activate the CSM on the motherboard and reboot. C) apt-get install grub-efi-amd64 and grub-install /dev/sda And the problems begin: 1) grub-install doesn't give any error but using the --debug I can see that it is not using EFI. 2) Ok I force with grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=grub --recheck --debug /dev/sda 3) This time something is generated in /boot/efi: /boot/efi/EFI/grub/grubx64.efi 4) Copy the file /boot/efi/EFI/grub/grubx64.efi to /boot/efi/EFI/boot/bootx64.efi is EFI/boot/ correct here? If you're lucky then your BIOS will tell what path it will try to read for the boot code. For me that is /EFI/debian/grubx64.efi. I followed the advices here (first result on Google with grub uefi debian): http://tanguy.ortolo.eu/blog/article51/debian-efi 5) Reboot and disable the CSM on the motherboard 6) No boot possible, I always go directly to the UEFI-BIOS I am currently stuck there. I read a lot of conflicting advises which doesn't work: - use modprobe efivars and efibootmgr: not possible because I have not booted in EFI (chicken-egg problem) Not exactly. Boot in EFI mode into your favourite installer rescue mode, then chroot into the target filesystem and run efibootmgr there. In the end I managed to do it like this: 1) Make a USB stick with FAT32 partition 2) Install grub on it with: grub-install --target=x86_64-efi --efi-directory=/media/usb0 --removable 3) Note on a paper the grub commands to start the kernel in /boot/grub/grub.cfg 3) Reboot, Disable CSM in the motherboard boot utility (BIOS?), Reboot with the USB stick connected 4) Normally it should have started on the USB stick grub command-line 5) Enter the necessary command to start the kernel (if you have some problem with video mode, use insmod efi_gop) 6) Normally your operating system should start normally 7) Check that efibootmgr is installed and working (normally efivars should be loaded in the modules already) 8) grub-install --efi-directory=/boot/efi --recheck --debug (with the debug info you should see that it is using grub-efi and not grub-pc) 9) efibootmgr -c -d /dev/sda -p 2 -w -L Debian (GRUB) -l '\EFI\Debian\grubx64.efi' (replace -p 2 with yout correct ESP partition number) 10) Reboot and enjoy ! I made a lot of mistakes during these steps. The good thing is that error are quire verbose, so you can easily see what is going wrong. I hope that it will be easier for the next Debian user. So now I can continue on this BTRFS RAID1 adventure... Let's see if my setup is resilient to a hard drive failure. Thanks for the help. Most comments here are quite on the spot and reliable, that is very helpful ! -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS partitioning scheme (was BTRFS with RAID1 cannot boot when removing drive)
On 11 February 2014 19:15, Chris Murphy li...@colorremedies.com wrote: To summarize, I think I have 3 options for partitioning (I am not considering UEFI secure boot or swap): 1) grub, BTRFS partition (i.e. full disk in BTRFS), /boot inside BTRFS subvolume This doesn't seem like a good idea for a boot drive to be without partitions. 2) grub, GPT partition, with (A) on sda1, and a BTRFS partition on sda2, /boot inside BTRFS subvolume 3) grub, GPT partition, with (A) on sda1, /boot (ext4) on sda2, and a BTRFS on sda3 (A) = BIOS Boot partition (1 MiB) or EFI System Partition (FAT32, 550MiB) I don't really see the point of having UEFI/ESP if I don't use other proprietary operating system, so I think I will go with (A) = BIOS Boot partition except if there is someting I have missed. You need to boot your system in UEFI and CSM-BIOS modes, and compare the dmesg for each. I'm finding it common the CSM limits power management, and relegates drives to IDE speeds rather than full SATA link speeds. Sometimes it's unavoidable to use the CSM if it has better overall behavior for your use case. I've found it to be lacking and have abandoned it. It's basically intended for booting Windows XP, right? Ok based on your advices, here is what I have done so far to use UEFI (remeber that the objective is to have a clean and simple BTRFS RAID1 install). A) I start first with only one drive, I have gone with the following partition scheme (Debian wheezy, kernel 3.12, grub 2.00, GPT partition with parted): sda1 = 1MiB BIOS Boot partition (no FS, set 1 bios_grub on with parted to set the type) sda2 = 550 MiB EFI System Partition (FAT32, toggle 2 boot with parted to set the type), mounted on /boot/efi sda3 = 1 TiB root partition (BTRFS), mounted on / sda4 = 6 GiB swap partition (that way I should be able to be compatible with both CSM or UEFI) B) normal Debian installation on sdas, activate the CSM on the motherboard and reboot. C) apt-get install grub-efi-amd64 and grub-install /dev/sda And the problems begin: 1) grub-install doesn't give any error but using the --debug I can see that it is not using EFI. 2) Ok I force with grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=grub --recheck --debug /dev/sda 3) This time something is generated in /boot/efi: /boot/efi/EFI/grub/grubx64.efi 4) Copy the file /boot/efi/EFI/grub/grubx64.efi to /boot/efi/EFI/boot/bootx64.efi 5) Reboot and disable the CSM on the motherboard 6) No boot possible, I always go directly to the UEFI-BIOS I am currently stuck there. I read a lot of conflicting advises which doesn't work: - use modprobe efivars and efibootmgr: not possible because I have not booted in EFI (chicken-egg problem) - use update-grub or use grub-mkconfig (to generate /boot/efi/grub/grub.cfg): no results - other exotic commands... So I will try to upgrade to grub 2.02beta (as recommender by Chris Murphy) but I am not sure that it will help. If someone has some Debian experience on this UEFI install, please don't hesitate to propose solutions ! I will continue to document this experience (hope that it will be useful for others), and hope to get to the point where I can have a good system in BTRFS RAID1 mode. You have to be very motivated to get into this, It is really a challenge ! ;-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS with RAID1 cannot boot when removing drive
On 11 February 2014 21:35, Duncan 1i5t5.dun...@cox.net wrote: Saint Germain posted on Tue, 11 Feb 2014 11:04:57 +0100 as excerpted: The big problem I currently have is that based on your input, I hesitate a lot on my partitioning scheme: should I use a dedicated /boot partition or should I have one global BTRFS partition ? It is not very clear in the doc (a lof of people used a dedicated /boot because at that time, grub couldn't natively boot BTRFS it seems, but it has changed). Could you recommend a partitioning scheme for a simple RAID1 with 2 identical hard drives (just for home computing, not business). FWIW... I'm planning to and have your previous message covering that still marked unread to reply to later. But real life has temporarily been monopolizing my time so the last day or two I've only done relatively short and quick replies. That one will require a bit more time to answer to my satisfaction. So I'm punting for the moment. But FWIW I tend to be a reasonably heavy partitioner (tho nowhere near what I used to be), so a lot of folks will consider my setup somewhat extreme. That's OK. It's my computer, setup for my purposes, not their computer for theirs, and it works very well for me, so it's all good. =:^) But hopefully I'll get back to that with a longer reply by the end of the week. If I don't, you can probably consider that monopoly lasting longer than I thought, and it could be that I'll never get back to properly reply. But it's an interesting enough topic to me that I'll /probably/ get back, just not right ATM. No problem, I have started another thread where we discuss partitioning. It may be slightly off-topic, but the intention is really to have a partition BTRFS-friendly. For instance it seems that a dedicated /boot partition instead of a subvolume is not necessary (better to have the /boot in the RAID1). However I'm no expert. Thanks for your help. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS with RAID1 cannot boot when removing drive
On 11 February 2014 07:59, Duncan 1i5t5.dun...@cox.net wrote: Saint Germain posted on Tue, 11 Feb 2014 04:15:27 +0100 as excerpted: Ok I need to really understand how my motherboard works (new Z87E-ITX). It is written 64Mb AMI UEFI Legal BIOS, so I thought it was really UEFI. I expect it's truly UEFI. But from what I've read most UEFI based firmware(possibly all in theory, with the caveat that there's bugs and some might not actually work as intended due to bugs) on x86/amd64 (as opposed to arm) has a legacy-BIOS mode fallback. Provided it's not in secure-boot mode, if the storage devices it is presented don't have a valid UEFI config, it'll fall back to legacy-BIOS mode and try to detect and boot that. Which may or may not be what your system is actually doing. As I said, since I've not actually experimented with UEFI here, my practical knowledge on it is virtually nil, and I don't claim to have studied the theory well enough to deduce in that level of detail what your system is doing. But I know that's how it's /supposed/ to be able to work. =:^) Hello Duncan; Yes I also suspect something like that. Unfortunately it is not really clear on their website how it works. You can find a lot of marketing stuff, but not what is really needed to boot properly ! (FWIW, what I /have/ done, deliberately, is read enough about UEFI to have a general feel for it, and to have been previously exposed to the ideas for some time, so that once I /do/ have it available and decide it's time, I'll be able to come up to speed relatively quickly as I've had the general ideas turning over in my head for quite some time already, so in effect I'll simply be reviewing the theory and doing the lab work, while concurrently making logical connections about how it all fits together that only happen once one actually does that lab work. I've discovered over the years that this is perhaps my most effective way to learn, read about the general principles while not really understanding it the first time thru, then come back to it some months or years later, and I pick it up real fast, because my subconscious has been working on the problem the whole time! Come to think of it, that's actually how I handled btrfs, too, trying it at one point and deciding it didn't fit my needs at the time, leaving it for awhile, then coming back to it later when my needs had changed, but I already had an idea what I was doing from the previous try, with the result being I really took to it fast, the second time! =:^) I'll try to keep that in mind ! I understand. Normally the swap will only be used for hibernating. I don't expect to use it except perhaps in some extreme case. If hibernate is your main swap usage, you might consider the noauto fstab option as well, then specifically swapon the appropriate one in your hibernate script since you may well need logic in there to figure out which one to use in any case. I was doing that for awhile. (I've run my own suspend/hibernate scripts based on the documentation in $KERNDIR/Documentation/power/*, for years. The kernel's docs dir really is a great resource for a lot of sysadmin level stuff as well as the expected kernel developer stuff. I think few are aware of just how much real useful admin-level information it actually contains. =:^) I am not so used to working without swap. I've worked for year with a computer with low RAM and a swap and I didn't have any problem (even when doing some RAM intensive task). So I haven't tried to fix it ;-) If there is sufficient RAM, I suppose that the the swap doesn't get used so it is not a problem to let it ? I've hesitated a long time between ZFS and BTRFS, and one of the case for ZFS was that it can handle natively swap in its subvolume (and so in theory it can enter in the RAID1 as well). However the folks at ZFS seem also to consider also swap as a relic of the past. I guess I will keep it just in case. ;-) The big problem I currently have is that based on your input, I hesitate a lot on my partitioning scheme: should I use a dedicated /boot partition or should I have one global BTRFS partition ? It is not very clear in the doc (a lof of people used a dedicated /boot because at that time, grub couldn't natively boot BTRFS it seems, but it has changed). Could you recommend a partitioning scheme for a simple RAID1 with 2 identical hard drives (just for home computing, not business). Many thanks ! -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
BTRFS partitioning scheme (was BTRFS with RAID1 cannot boot when removing drive)
Hello and thanks for your feedback ! Cc back to the mailing-list as it may be of interest here as well. On 11 February 2014 16:11, Kyle Gates kylega...@hotmail.com wrote: The big problem I currently have is that based on your input, I hesitate a lot on my partitioning scheme: should I use a dedicated /boot partition or should I have one global BTRFS partition ? It is not very clear in the doc (a lof of people used a dedicated /boot because at that time, grub couldn't natively boot BTRFS it seems, but it has changed). Could you recommend a partitioning scheme for a simple RAID1 with 2 identical hard drives (just for home computing, not business). I run a 1GiB RAID1 btrfs /boot in mixed mode with grub2 and gpt partitions. IIRC grub2 doesn't understand lzo compression nor subvolumes. Well I did tried to read about this and ended up being confused because development is so fast, documentation can become quickly outdated. It seems that grub can boot BTRFS /boot subvolumes: https://bbs.archlinux.org/viewtopic.php?pid=1222358 However Chris Murphy has some problems a few months ago: http://comments.gmane.org/gmane.comp.file-systems.btrfs/29140 So I still don't know if it is a good idea or not to have a BTRFS /boot ? Of course the idea is that I would like to snapshot /boot and have it on RAID1. To summarize, I think I have 3 options for partitioning (I am not considering UEFI secure boot or swap): 1) grub, BTRFS partition (i.e. full disk in BTRFS), /boot inside BTRFS subvolume 2) grub, GPT partition, with (A) on sda1, and a BTRFS partition on sda2, /boot inside BTRFS subvolume 3) grub, GPT partition, with (A) on sda1, /boot (ext4) on sda2, and a BTRFS on sda3 (A) = BIOS Boot partition (1 MiB) or EFI System Partition (FAT32, 550MiB) I don't really see the point of having UEFI/ESP if I don't use other proprietary operating system, so I think I will go with (A) = BIOS Boot partition except if there is someting I have missed. Can someone recommend which one would be the most stable and easier to manage ? Thanks in advance, -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS with RAID1 cannot boot when removing drive
On 11 February 2014 18:21, Chris Murphy li...@colorremedies.com wrote: On Feb 10, 2014, at 8:15 PM, Saint Germain saint...@gmail.com wrote: Ok I need to really understand how my motherboard works (new Z87E-ITX). It is written 64Mb AMI UEFI Legal BIOS, so I thought it was really UEFI. Manufacturers have done us a disservice by equating UEFI and BIOS. Some UEFI also have a compatibility support module (CSM) which presents a BIOS to the operating system. It's intended for legacy operating systems that won't ever directly support UEFI. A way to tell in linux if you're booting with or without the CSM is issue the efibootmgr command. If it return something that looks like an error message, the CSM is enabled and the OS thinks it's running on a BIOS computer. If it returns some numbered list then the CSM isn't enabled and the OS thinks it's running on a UEFI computer. Nice trick ! Thanks. /dev/sdb has the same partition as /dev/sda. grub-install device shouldn't work on UEFI because the only place grub-install installs is to the volume mounted at /boot/efi. And also grub-install /dev/sdb implies installing grub to a disk boot sector, which also isn't applicable to UEFI. I am still not up to date on UEFI partition and so. But I have read these pages: http://tanguy.ortolo.eu/blog/article51/debian-efi http://forums.debian.net/viewtopic.php?f=16t=81120 And apparently is grub-install device is the correct command (but you have to copy file in addition). It is maybe because they use a special package grub-efi-amd64, which replace the grub-install. It is quite difficult to find reliable info on the topic... I understand. Normally the swap will only be used for hibernating. I don't expect to use it except perhaps in some extreme case. I consider hibernate fundamentally broken right now because whether it'll work depends on too many things. It works for some people and not others, and for those it does work it largely didn't work out of the box. It never worked for me and did induce Btrfs corruptions so I've just given up on hibernate entirely. There's a long old Fedora thread that discusses myriad issues about it: https://bugzilla.redhat.com/show_bug.cgi?id=781749 and shows even if it's working, it can stop working without warning after X number of hibernate-resume cycles. I am among the fortunate to have a working hibernate out of the box (Debian stable) and it works reliably (even it ultimately it WILL fail after 20-30 iterations). So I use the feature to save on electricity cost ;-) But yes, maybe I will get rid of the swap... Thanks ! -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS with RAID1 cannot boot when removing drive
Hello Duncan, What an amazing extensive answer you gave me ! Thank you so much for it. See my comments below. On Mon, 10 Feb 2014 03:34:49 + (UTC), Duncan 1i5t5.dun...@cox.net wrote : I am experimenting with BTRFS and RAID1 on my Debian Wheezy (with backported kernel 3.12-0.bpo.1-amd64) using a a motherboard with UEFI. My systems don't do UEFI, but I do run GPT partitions and use grub2 for booting, with grub2-core installed to a BIOS/reserved type partition (instead of as an EFI service as it would be with UEFI). And I have root filesystem btrfs two-device raid1 mode working fine here, tested bootable with only one device of the two available. So while I can't help you directly with UEFI, I know the rest of it can/ does work. One more thing: I do have a (small) separate btrfs /boot, actually two of them as I setup a separate /boot on each of the two devices in ordered to have a backup /boot, since grub can only point to one /boot by default, and while pointing to another in grub's rescue mode is possible, I didn't want to have to deal with that if the first /boot was corrupted, as it's easier to simply point the BIOS at a different drive entirely and load its (independently installed and configured) grub and /boot. Can you explain why you choose to have a dedicated /boot partition ? I also read on this thread that it may be better to have a dedicated /boot partition: https://bbs.archlinux.org/viewtopic.php?pid=1342893#p1342893 However I haven't managed to make the system boot when the removing the first hard drive. I have installed Debian with the following partition on the first hard drive (no BTRFS subsystem): /dev/sda1: for / (BTRFS) /dev/sda2: for /home (BTRFS) /dev/sda3: for swap Then I added another drive for a RAID1 configuration (with btrfs balance) and I installed grub on the second hard drive with grub-install /dev/sdb. Just for clarification as you don't mention it specifically, altho your btrfs filesystem show information suggests you did it this way, are your partition layouts identical on both drives? That's what I've done here, and I definitely find that easiest to manage and even just to think about, tho it's definitely not a requirement. But using different partition layouts does significantly increase management complexity, so it's useful to avoid if possible. =:^) Yes, the partition layout is exactly the same on both drive (copied with sfdisk). I also try to keep things simple ;-) If I boot on sdb, it takes sda1 as the root filesystem If I switched the cable, it always take the first hard drive as the root filesystem (now sdb) That's normal /appearance/, but that /appearance/ doesn't fully reflect reality. The problem is that mount output (and /proc/self/mounts), fstab, etc, were designed with single-device filesystems in mind, and multi-device btrfs has to be made to fix the existing rules as best it can. So what's actually happening is that the for a btrfs composed of multiple devices, since there's only one device slot for the kernel to list devices, it only displays the first one it happens to come across, even tho the filesystem will normally (unless degraded) require that all component devices be available and logically assembled into the filesystem before it can be mounted. When you boot on sdb, naturally, the sdb component of the multi-device filesystem that the kernel finds, so it's the one listed, even tho the filesystem is actually composed of more devices, not just that one. I am not following you: it seems to be the opposite of what you describe. If I boot on sdb, I expect sdb1 and sdb2 to be the first components that the kernel find. However I can see that sda1 and sda2 are used (using the 'mount' command). When you switch the cables, the first one is, at least on your system, always the first device component of the filesystem detected, so it's always the one occupying the single device slot available for display, even tho the filesystem has actually assembled all devices into the complete filesystem before mounting. Normally the 2 hard drive should be exactly the same (or I didn't understand something) except for the UUID_SUB. That's why I don't understand if I switch the cable, I should get exactly the same results with 'mount'. But that is not the case, the 'mount' command always point to the same partition: - without cable switch: sda1 and sda2 - with cable switch: sdb1 and sdb2 Everything happen as if the system is using the UUID_SUB to get his 'favorite' partition. If I disconnect /dev/sda, the system doesn't boot with a message saying that it hasn't found the UUID: Scanning for BTRFS filesystems... mount: mounting /dev/disk/by-uuid/c64fca2a-5700-4cca-abac-3a61f2f7486c on /root failed: Invalid argument Can you tell me what I have done incorrectly ? Is it because of UEFI ? If yes I haven't understood how I can correct
Re: BTRFS with RAID1 cannot boot when removing drive
Hello ! On Mon, 10 Feb 2014 19:18:22 -0700, Chris Murphy li...@colorremedies.com wrote : On Feb 9, 2014, at 2:40 PM, Saint Germain saint...@gmail.com wrote: Then I added another drive for a RAID1 configuration (with btrfs balance) and I installed grub on the second hard drive with grub-install /dev/sdb. That can't work on UEFI. UEFI firmware effectively requires a GPT partition map and something to serve as an EFI System partition on all bootable drives. Second there's a difference between UEFI with and without secure boot. With secure boot you need to copy the files your distro installer puts on the target drive EFI System partition to each addition drive's ESP if you want multibooting to work in case of disk failure. The grub on each ESP likely looks on only its own ESP for a grub.cfg. So that then means having to sync grub.cfg's among each disk used for booting. A way around this is to create a single grub.cfg that merely forwards to the true grub.cfg. And you can copy this forward-only grub.cfg to each ESP. That way the ESP's never need updating or syncing again. Without secure boot, you must umount /boot/efi and mount the ESP for each bootable disk is turn, and then merely run: grub-install That will cause a core.img to be created for that particular ESP, and it will point to the usual grub.cfg location at /boot/grub. Ok I need to really understand how my motherboard works (new Z87E-ITX). It is written 64Mb AMI UEFI Legal BIOS, so I thought it was really UEFI. If I boot on sdb, it takes sda1 as the root filesystem If I switched the cable, it always take the first hard drive as the root filesystem (now sdb) If I disconnect /dev/sda, the system doesn't boot with a message saying that it hasn't found the UUID: Scanning for BTRFS filesystems... mount: mounting /dev/disk/by-uuid/c64fca2a-5700-4cca-abac-3a61f2f7486c on /root failed: Invalid argument Well if /dev/sda is missing, and you have an unpartitioned /dev/sdb I don't even know how you're getting this far, and it seems like the UEFI computer might actually be booting in CSM-BIOS mode which presents a conventional BIOS to the operating system. Disintguishing such things gets messy quickly. /dev/sdb has the same partition as /dev/sda. Duncan gave me the hint with degraded mode and I managed to boot (however I had some problem with mounting sda2). Can you tell me what I have done incorrectly ? Is it because of UEFI ? If yes I haven't understood how I can correct it in a simple way. As extra question, I don't see also how I can configure the system to get the correct swap in case of disk failure. Should I force both swap partition to have the same UUID ? If you're really expecting to create a system that can accept a disk failure and continue to work, I don't see how it can depend on swap partitions. It's fine to create them, but just realize if they're actually being used and the underlying physical device dies, the kernel isn't going to like it. A possible work around is using an md raid1 partition as swap. I understand. Normally the swap will only be used for hibernating. I don't expect to use it except perhaps in some extreme case. Thanks for your help ! -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
BTRFS with RAID1 cannot boot when removing drive
Hello, I am experimenting with BTRFS and RAID1 on my Debian Wheezy (with backported kernel 3.12-0.bpo.1-amd64) using a a motherboard with UEFI. However I haven't managed to make the system boot when the removing the first hard drive. I have installed Debian with the following partition on the first hard drive (no BTRFS subsystem): /dev/sda1: for / (BTRFS) /dev/sda2: for /home (BTRFS) /dev/sda3: for swap Then I added another drive for a RAID1 configuration (with btrfs balance) and I installed grub on the second hard drive with grub-install /dev/sdb. If I boot on sdb, it takes sda1 as the root filesystem If I switched the cable, it always take the first hard drive as the root filesystem (now sdb) If I disconnect /dev/sda, the system doesn't boot with a message saying that it hasn't found the UUID: Scanning for BTRFS filesystems... mount: mounting /dev/disk/by-uuid/c64fca2a-5700-4cca-abac-3a61f2f7486c on /root failed: Invalid argument Can you tell me what I have done incorrectly ? Is it because of UEFI ? If yes I haven't understood how I can correct it in a simple way. As extra question, I don't see also how I can configure the system to get the correct swap in case of disk failure. Should I force both swap partition to have the same UUID ? Many thanks in advance ! Here are some outputs for info: btrfs filesystem show Label: none uuid: 743d6b3b-71a7-4869-a0af-83549555284b Total devices 2 FS bytes used 27.96MB devid1 size 897.98GB used 3.03GB path /dev/sda2 devid2 size 897.98GB used 3.03GB path /dev/sdb2 Label: none uuid: c64fca2a-5700-4cca-abac-3a61f2f7486c Total devices 2 FS bytes used 3.85GB devid1 size 27.94GB used 7.03GB path /dev/sda1 devid2 size 27.94GB used 7.03GB path /dev/sdb1 blkid /dev/sda1: UUID=c64fca2a-5700-4cca-abac-3a61f2f7486c UUID_SUB=77ffad34-681c-4c43-9143-9b73da7d1ae3 TYPE=btrfs /dev/sda3: UUID=469715b2-2fa3-4462-b6f5-62c04a60a4a2 TYPE=swap /dev/sda2: UUID=743d6b3b-71a7-4869-a0af-83549555284b UUID_SUB=744510f5-5bd5-4df4-b8c4-0fc1a853199a TYPE=btrfs /dev/sdb1: UUID=c64fca2a-5700-4cca-abac-3a61f2f7486c UUID_SUB=2615fd98-f2ad-4e7b-84bc-0ee7f9770ca0 TYPE=btrfs /dev/sdb2: UUID=743d6b3b-71a7-4869-a0af-83549555284b UUID_SUB=8783a7b1-57ef-4bcc-ae7f-be20761e9a19 TYPE=btrfs /dev/sdb3: UUID=56fbbe2f-7048-488f-b263-ab2eb000d1e1 TYPE=swap cat /etc/fstab # file system mount point type options dump pass UUID=c64fca2a-5700-4cca-abac-3a61f2f7486c / btrfs defaults 0 1 UUID=743d6b3b-71a7-4869-a0af-83549555284b /home btrfs defaults 0 2 UUID=469715b2-2fa3-4462-b6f5-62c04a60a4a2 noneswapsw 0 0 cat /boot/grub/grub.cfg # # DO NOT EDIT THIS FILE # # It is automatically generated by grub-mkconfig using templates # from /etc/grub.d and settings from /etc/default/grub # ### BEGIN /etc/grub.d/00_header ### if [ -s $prefix/grubenv ]; then load_env fi set default=0 if [ ${prev_saved_entry} ]; then set saved_entry=${prev_saved_entry} save_env saved_entry set prev_saved_entry= save_env prev_saved_entry set boot_once=true fi function savedefault { if [ -z ${boot_once} ]; then saved_entry=${chosen} save_env saved_entry fi } function load_video { insmod vbe insmod vga insmod video_bochs insmod video_cirrus } insmod part_msdos insmod btrfs set root='(hd1,msdos1)' search --no-floppy --fs-uuid --set=root c64fca2a-5700-4cca-abac-3a61f2f7486c if loadfont /usr/share/grub/unicode.pf2 ; then set gfxmode=640x480 load_video insmod gfxterm insmod part_msdos insmod btrfs set root='(hd1,msdos1)' search --no-floppy --fs-uuid --set=root c64fca2a-5700-4cca-abac-3a61f2f7486c set locale_dir=($root)/boot/grub/locale set lang=fr_FR insmod gettext fi terminal_output gfxterm set timeout=5 ### END /etc/grub.d/00_header ### ### BEGIN /etc/grub.d/05_debian_theme ### insmod part_msdos insmod btrfs set root='(hd1,msdos1)' search --no-floppy --fs-uuid --set=root c64fca2a-5700-4cca-abac-3a61f2f7486c insmod png if background_image /usr/share/images/desktop-base/joy-grub.png; then set color_normal=white/black set color_highlight=black/white else set menu_color_normal=cyan/blue set menu_color_highlight=white/blue fi ### END /etc/grub.d/05_debian_theme ### ### BEGIN /etc/grub.d/10_linux ### menuentry 'Debian GNU/Linux, with Linux 3.12-0.bpo.1-amd64' --class debian --class gnu-linux --class gnu --class os { load_video insmod gzio insmod part_msdos insmod btrfs set root='(hd1,msdos1)' search --no-floppy --fs-uuid --set=root c64fca2a-5700-4cca-abac-3a61f2f7486c echo'Chargement de Linux 3.12-0.bpo.1-amd64 ...' linux /boot/vmlinuz-3.12-0.bpo.1-amd64 root=UUID=c64fca2a-5700-4cca-abac-3a61f2f7486c ro quiet echo'Chargement du disque mémoire initial ...' initrd