Re: reproducible builds with btrfs seed feature
On 10/17/2018 03:49 AM, Chris Murphy wrote: On Tue, Oct 16, 2018 at 2:13 AM, Anand Jain wrote: On 10/14/2018 06:28 AM, Chris Murphy wrote: Is it practical and desirable to make Btrfs based OS installation images reproducible? Or is Btrfs simply too complex and non-deterministic? [1] The main three problems with Btrfs right now for reproducibility are: a. many objects have uuids other than the volume uuid; and mkfs only lets us set the volume uuid b. atime, ctime, mtime, otime; and no way to make them all the same c. non-deterministic allocation of file extents, compression, inode assignment, logical and physical address allocation I'm imagining reproducible image creation would be a mkfs feature that builds on Btrfs seed and --rootdir concepts to constrain Btrfs features to maybe make reproducible Btrfs volumes possible: - No raid - Either all objects needing uuids can have those uuids specified by switch, or possibly a defined set of uuids expressly for this use case, or possibly all of them can just be zeros (eek? not sure) - A flag to set all times the same - Possibly require that target block device is zero filled before creation of the Btrfs - Possibly disallow subvolumes and snapshots - Require the resulting image is seed/ro and maybe also a new compat_ro flag to enforce that such Btrfs file systems cannot be modified after the fact. - Enforce a consistent means of allocation and compression The end result is creating two Btrfs volumes would yield image files with matching hashes. If I had to guess, the biggest challenge would be allocation. But it's also possible that such an image may have problems with "sprouts". A non-removable sprout seems fairly straightforward and safe; but if a "reproducible build" type of seed is removed, it seems like removal needs to be smart enough to refresh *all* uuids found in the sprout: a hard break from the seed. Right. The seed fsid will be gone in a detached sprout. I think already we get a new devid, volume uuid, and device uuid. Yes on the sprout. Open question is whether any other uuid's need to be refreshed, such as chunk uuid since that appears in every node and leaf. There are quite a number of uuid. Any thoughts? Useful? Difficult to implement? Recently Nikolay sent a patch to change fsid on a mounted btrfs. However for a reproducible builds it also needs neutralized uuids, time, bytenr(s) further more though the ondisk layout won't change without notice but block-bytenr might. Seems like the mkfs population method of such a seed, could be made very deterministic as to what the start logical address and physical address are. Can be. But it can change in future fixes as those aren't EXPORTED(). The vast majority of non-deterministic behavior comes from the nature of kernel code having to handle so many complex inputs and outputs, and negotiate them. One question why not reproducible builds get the file data extents from the image and stitch the hashes together to verify the hash. And there could be a vfs ioctl to import and export filesystem images for a better support-ability of the use-case similar to the reproducible builds. Perhaps. I don't know the reproducible build requirements very well, if all they really care about is the hash of the data extents, and really how important fs metadata is. That is important when it comes to fuzzing file systems that have no metadata checksumming like squashfs; of course you'd have to checksum the whole file system image. Another feature the mkfs variety of seed image would need, deduplication. As far as I know, deduplication is kernel code only. You'd want to be able to deduplicate, as well as compress, to have the smallest distributed seed possible. btrfs-image(8) already does compress. I don't think mkfs is the right place to sanitize the uuid/fsid/time... it should be when we generate the btrfs-image. So a possible solution for the reproducible builds: usual mkfs.btrfs dev Write the data unmount; create btrfs-image with uuid/fsid/time sanitized; mark it as a seed (RO). check/verify the hash of the image. If the hash match. To use this btrfs-image. Rest the seed (RO) flag; mount and use it; OR Mount the seed device; add a RW sprout; detach the seed; OR Don't set the RO at all (above) and just mount and use it; Thanks, Anand And mksquashfs does deduplication by default.
Crude Oil Export/Lifting Business
Dear I earn a living in the oil industry as leader of the Procurement Unit/account department in a refining outfit owned by a Sasol SA. On my desk is a mandate to arrange for crude oil purchase from Libya for up to 2,000,000 barrels on monthly bases for 12 calendar months. The essence of my reaching out to you is the fact that am in the process of building a middle man structure to mediate between the 2 parties involved before the contract is signed. You may be wondering why I cannot do it bmyself right? The honest fact is that as a staff, it is against my company's operational policy to profit from any dealings with the firm hence the reason I need a trustworthy person outside my work circle in order to maintain a discreet profile. I wish to extend this partnership to you my friend to build a middle man structure with you, while I work from the back to guide you. Our commission/brokerage as middle persons is between $2 - $3 per barrel as case may be. So if the target of 2M barrels is met monthly we stand to share $4M - $6M every month for a span of 12 months. Worry less about the speedy sales as I have contacts within oil producing country's top officials for license of crude oil export/lifting to any firm I so present for this business. Therefore if you can be able to handle this transaction with honesty and integrity, you should come back to me immediately for more details. Your urgent response is highly needed Regards. John W Monk
Re: btrfs check: Superblock bytenr is larger than device size
On 2018/10/16 下午11:25, Anton Shepelev wrote: > Qu Wenruo to Anton Shepelev: > >>> On all our servers with BTRFS, which are otherwise working >>> normally, `btrfs check /' complains that >>> >>> Superblock bytenr is larger than device size >>> Couldn't open file system >>> >> Please try latest btrfs-progs and see if btrfs check >> reports any error. > > It is SUSE Linux Enterprise with its native repository, so I > cannot update btrfs easily. I will, though. Then I recommend to use latest openSUSE Tumbleweed rescue ISO to do the mount and check. Thanks, Qu > signature.asc Description: OpenPGP digital signature
Re: brtfs warning at ctree.h:1564 btrfs_update_device+0x220/0x230
On 2018/10/17 上午5:27, Dmitry Katsubo wrote: > Dear btrfs team, > > I often observe kernel traces on linux-4.14.0 (mostly likely due to background > "btrfs scrub") which contain the following "characterizing" line (for the rest > see attachments): > > btrfs_remove_chunk+0x26a/0x7e0 [btrfs] > > I wonder if somebody from developers team knows anything about this problem. > It > seems like after such dump btfs volume continues to function OK. It's a known minor problem. "btrfs rescue fix-device-size " could fix it offline (unmounted) Or if you're using the fs as root fs, resize the fs by removing 4K would also solve the problem: # btrfs filesystem resize :-4K The cause is old mkfs/kernel isn't aligning device size correctly, while later kernel is pretty picky about that alignment. It's mostly a developer oriented warning, no harm except a lot of scary kernel warning and may slow down log system. Thanks, Qu > > Thanks for any information! > signature.asc Description: OpenPGP digital signature
Failover for unattached USB device
Dear btrfs team / community, Sometimes it happens that kernel resets USB subsystem (looks like hardware problem). Nevertheless all USB devices are unattached and attached back. After few hours of struggle btrfs finally comes to the situation when read-only filesystem mount is necessary. During this time when I try to access this mounted filesystem (/mnt/backups) it reports success for some directories, or error for others: root@debian:~# ll /mnt/backups/ total 14334 drwxr-xr-x 1 adm users116 Sep 12 00:35 . drwxrwxr-x 1 adm users164 Sep 19 22:44 .. -rw-r--r-- 1 adm users 79927 Feb 7 2018 contacts.zip drwxr-xr-x 1 adm users254 Feb 4 2018 attic drwxr-xr-x 1 adm users 16 Feb 23 2018 recent ... root@debian:~# ll /mnt/backups/attic/ ls: reading directory '/mnt/backups/attic/': Input/output error total 0 drwxr-xr-x 1 adm users 254 Feb 4 2018 . drwxr-xr-x 1 adm users 116 Sep 12 00:35 .. It looks like this depends on whether the content is in disk cache... What is surprising: when I try to create a file, I succeed: root@debian:~# touch /mnt/backups/.mounted root@debian:~# ll /mnt/backups/.mounted -rw-r--r-- 1 root root 0 Sep 20 16:52 /mnt/backups/.mounted root@debian:~# rm /mnt/backups/.mounted My btrfs volume consists of two identical drives combined into RAID1 volume: # btrfs filesystem df /mnt/backups Data, RAID1: total=880.00GiB, used=878.96GiB System, RAID1: total=8.00MiB, used=144.00KiB Metadata, RAID1: total=2.00GiB, used=1.13GiB GlobalReserve, single: total=512.00MiB, used=0.00B # btrfs filesystem show /mnt/backups Label: none uuid: a657364b-36d2-4c1f-8e5d-dc3d28166190 Total devices 2 FS bytes used 880.09GiB devid1 size 3.64TiB used 882.01GiB path /dev/sdf devid2 size 3.64TiB used 882.01GiB path /dev/sde As a workaround I can monitor dmesg output but: 1. It would be nice if I could tell btrfs that I would like to mount read-only after a certain error rate per minute is reached. 2. It would be nice if btrfs could detect that both drives are not available and unmount (as mount read-only won't help much) the filesystem. Kernel log for Linux v4.14.2 is attached. -- With best regards, Dmitry Jun 29 18:54:56 debian kernel: [1197865.440396] usb 4-2: USB disconnect, device number 3 Jun 29 18:54:56 debian kernel: [1197865.440403] usb 4-2.2: USB disconnect, device number 5 Jun 29 18:54:56 debian kernel: [1197865.476118] usb 4-2.3: USB disconnect, device number 8 Jun 29 18:54:56 debian kernel: [1197865.549379] usb 4-2.4: USB disconnect, device number 7 ... Jun 29 18:54:58 debian kernel: [1197867.517728] usb-storage 4-2.3:1.0: USB Mass Storage device detected Jun 29 18:54:58 debian kernel: [1197867.524021] usb-storage 4-2.3:1.0: Quirks match for vid 152d pid 0567: 500 Jun 29 18:54:58 debian kernel: [1197867.603859] usb 4-2.4: new full-speed USB device number 13 using ehci-pci Jun 29 18:54:58 debian kernel: [1197867.725595] usb-storage 4-2.4:1.2: USB Mass Storage device detected Jun 29 18:54:58 debian kernel: [1197867.728602] scsi host9: usb-storage 4-2.4:1.2 Jun 29 18:54:59 debian kernel: [1197868.528737] scsi 7:0:0:0: Direct-Access ST4000DM 004-2CV104 0125 PQ: 0 ANSI: 6 Jun 29 18:54:59 debian kernel: [1197868.529310] scsi 7:0:0:1: Direct-Access ST4000DM 004-2CV104 0125 PQ: 0 ANSI: 6 Jun 29 18:54:59 debian kernel: [1197868.530093] sd 7:0:0:0: Attached scsi generic sg5 type 0 Jun 29 18:54:59 debian kernel: [1197868.530588] sd 7:0:0:1: Attached scsi generic sg6 type 0 Jun 29 18:54:59 debian kernel: [1197868.533064] sd 7:0:0:1: [sdh] Very big device. Trying to use READ CAPACITY(16). Jun 29 18:54:59 debian kernel: [1197868.533619] sd 7:0:0:1: [sdh] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) Jun 29 18:54:59 debian kernel: [1197868.533626] sd 7:0:0:1: [sdh] 4096-byte physical blocks Jun 29 18:54:59 debian kernel: [1197868.534063] sd 7:0:0:1: [sdh] Write Protect is off Jun 29 18:54:59 debian kernel: [1197868.534069] sd 7:0:0:1: [sdh] Mode Sense: 67 00 10 08 Jun 29 18:54:59 debian kernel: [1197868.534422] sd 7:0:0:1: [sdh] No Caching mode page found Jun 29 18:54:59 debian kernel: [1197868.534542] sd 7:0:0:1: [sdh] Assuming drive cache: write through Jun 29 18:54:59 debian kernel: [1197868.535563] sd 7:0:0:1: [sdh] Very big device. Trying to use READ CAPACITY(16). Jun 29 18:54:59 debian kernel: [1197868.536702] sd 7:0:0:0: [sdg] Very big device. Trying to use READ CAPACITY(16). Jun 29 18:54:59 debian kernel: [1197868.537454] sd 7:0:0:0: [sdg] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) Jun 29 18:54:59 debian kernel: [1197868.537459] sd 7:0:0:0: [sdg] 4096-byte physical blocks Jun 29 18:54:59 debian kernel: [1197868.538327] sd 7:0:0:0: [sdg] Write Protect is off Jun 29 18:54:59 debian kernel: [1197868.538331] sd 7:0:0:0: [sdg] Mode Sense: 67 00 10 08 ... Jun 29 20:22:35 debian kernel: [1203125.061068] BTRFS error (device sdf): bdev /dev/sdh errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
brtfs warning at ctree.h:1564 btrfs_update_device+0x220/0x230
Dear btrfs team, I often observe kernel traces on linux-4.14.0 (mostly likely due to background "btrfs scrub") which contain the following "characterizing" line (for the rest see attachments): btrfs_remove_chunk+0x26a/0x7e0 [btrfs] I wonder if somebody from developers team knows anything about this problem. It seems like after such dump btfs volume continues to function OK. Thanks for any information! -- With best regards, Dmitry Jun 7 16:26:31 debian kernel: [1176060.298759] [ cut here ] Jun 7 16:26:31 debian kernel: [1176060.298820] WARNING: CPU: 0 PID: 566 at /build/linux-SCFPgu/linux-4.14.2/fs/btrfs/ctree.h:1564 btrfs_update_device+0x220/0x230 [btrfs] Jun 7 16:26:31 debian kernel: [1176060.298823] Modules linked in: option usb_wwan usbserial ipt_REJECT nf_reject_ipv4 xt_multiport iptable_filter xt_REDIRECT nf_nat_redirect xt_physdev br_netfilter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c xt_tcpudp iptable_mangle arc4 bridge stp llc iTCO_wdt iTCO_vendor_support ppdev coretemp ath5k pcspkr serio_raw ath mac80211 sr9700 dm9601 cfg80211 usbnet mii i915 rfkill snd_hda_codec_realtek lpc_ich snd_hda_codec_generic mfd_core evdev snd_hda_intel snd_hda_codec sg snd_hda_core snd_hwdep snd_pcm_oss rng_core snd_mixer_oss video snd_pcm drm_kms_helper snd_timer drm snd parport_pc soundcore i2c_algo_bit parport shpchp button acpi_cpufreq binfmt_misc w83627hf hwmon_vid ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb crypto_simd cryptd Jun 7 16:26:31 debian kernel: [1176060.298930] aes_i586 btrfs crc32c_generic xor zstd_decompress zstd_compress xxhash raid6_pq hid_generic usbhid hid uas usb_storage sr_mod cdrom sd_mod ata_generic i2c_i801 ata_piix libata firewire_ohci scsi_mod firewire_core crc_itu_t e1000e ptp pps_core ehci_pci uhci_hcd ehci_hcd usbcore usb_common Jun 7 16:26:31 debian kernel: [1176060.298981] CPU: 0 PID: 566 Comm: btrfs-cleaner Tainted: GW 4.14.0-1-686-pae #1 Debian 4.14.2-1 Jun 7 16:26:31 debian kernel: [1176060.299162] Hardware name: AOpen i945GMx-IF/i945GMx-IF, BIOS i945GMx-IF R1.01 Mar.02.2007 AOpen Inc. 03/02/2007 Jun 7 16:26:31 debian kernel: [1176060.299327] task: f287e200 task.stack: f24e2000 Jun 7 16:26:31 debian kernel: [1176060.299448] EIP: btrfs_update_device+0x220/0x230 [btrfs] Jun 7 16:26:31 debian kernel: [1176060.299450] EFLAGS: 00010206 CPU: 0 Jun 7 16:26:31 debian kernel: [1176060.299454] EAX: EBX: f68bee00 ECX: 000c EDX: 0200 Jun 7 16:26:31 debian kernel: [1176060.299457] ESI: ef0d9320 EDI: EBP: f24e3e9c ESP: f24e3e5c Jun 7 16:26:31 debian kernel: [1176060.299460] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Jun 7 16:26:31 debian kernel: [1176060.299463] CR0: 80050033 CR2: 02aa3000 CR3: 32b6ece0 CR4: 06f0 Jun 7 16:26:31 debian kernel: [1176060.299467] Call Trace: Jun 7 16:26:31 debian kernel: [1176060.299561] btrfs_remove_chunk+0x26a/0x7e0 [btrfs] Jun 7 16:26:31 debian kernel: [1176060.299686] btrfs_delete_unused_bgs+0x321/0x3f0 [btrfs] Jun 7 16:26:31 debian kernel: [1176060.299819] cleaner_kthread+0x13c/0x150 [btrfs] Jun 7 16:26:31 debian kernel: [1176060.299907] kthread+0xf3/0x110 Jun 7 16:26:31 debian kernel: [1176060.33] ? __btree_submit_bio_start+0x20/0x20 [btrfs] Jun 7 16:26:31 debian kernel: [1176060.300099] ? kthread_create_on_node+0x20/0x20 Jun 7 16:26:31 debian kernel: [1176060.300182] ret_from_fork+0x19/0x24 Jun 7 16:26:31 debian kernel: [1176060.300249] Code: e9 81 fe ff ff 8d b6 00 00 00 00 bf f4 ff ff ff e9 78 fe ff ff 8d b6 00 00 00 00 f3 90 eb a8 8d 74 26 00 f3 90 e9 2b ff ff ff 90 <0f> ff e9 7a ff ff ff e8 14 4d 4c dc 8d 74 26 00 3e 8d 74 26 00 Jun 7 16:26:31 debian kernel: [1176060.300626] ---[ end trace 32773559e9ec5e68 ]--- Jul 1 07:07:31 debian kernel: [1328228.484772] [ cut here ] Jul 1 07:07:31 debian kernel: [1328228.484822] WARNING: CPU: 0 PID: 26193 at /build/linux-SCFPgu/linux-4.14.2/fs/btrfs/ctree.h:1564 btrfs_update_device+0x220/0x230 [btrfs] Jul 1 07:07:31 debian kernel: [1328228.484824] Modules linked in: cpuid nfs lockd grace sunrpc fscache ipt_REJECT nf_reject_ipv4 xt_multiport iptable_filter xt_REDIRECT nf_nat_redirect xt_physdev br_netfilter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c xt_tcpudp iptable_mangle option usb_wwan usbserial arc4 bridge stp llc iTCO_wdt iTCO_vendor_support ppdev evdev ath5k ath mac80211 coretemp cfg80211 sr9700 rfkill serio_raw dm9601 i915 usbnet pcspkr snd_hda_codec_realtek mii lpc_ich snd_hda_codec_generic mfd_core snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep rng_core video snd_pcm_oss sg drm_kms_helper snd_mixer_oss drm snd_pcm snd_timer i2c_algo_bit snd soundcore parport_pc parport button shpchp acpi_cpufreq binfmt_misc w83627hf hwmon_vid ip_tables x_tables autofs4 ext4 crc16 mbcache Jul 1 07:07:31 debian kernel:
Re: CRC mismatch
On Tue, Oct 16, 2018 at 9:42 AM, Austin S. Hemmelgarn wrote: > On 2018-10-16 11:30, Anton Shepelev wrote: >> >> Hello, all >> >> What may be the reason of a CRC mismatch on a BTRFS file in >> a virutal machine: >> >> csum failed ino 175524 off 1876295680 csum 451760558 >> expected csum 1446289185 >> >> Shall I seek the culprit in the host machine on in the guest >> one? Supposing the host machine healty, what operations on >> the gueest might have caused a CRC mismatch? >> > Possible causes include: > > * On the guest side: > - Unclean shutdown of the guest system (not likely even if this did > happen). > - A kernel bug on in the guest. > - Something directly modifying the block device (also not very likely). > > * On the host side: > - Unclean shutdown of the host system without properly flushing data from > the guest. Not likely unless you're using an actively unsafe caching mode > for the guest's storage back-end. > - At-rest data corruption in the storage back-end. > - A bug in the host-side storage stack. > - A transient error in the host-side storage stack. > - A bug in the hypervisor. > - Something directly modifying the back-end storage. > > Of these, the statistically most likely location for the issue is probably > the storage stack on the host. Is there still that O_DIRECT related "bug" (or more of a limitation) if the guest is using cache=none on the block device? Anton what virtual machine tech are you using? qemu/kvm managed with virt-manager? The configuration affects host behavior; but the negative effect manifests inside the guest as corruption. If I remember correctly. -- Chris Murphy
Re: reproducible builds with btrfs seed feature
On Tue, Oct 16, 2018 at 2:13 AM, Anand Jain wrote: > > > On 10/14/2018 06:28 AM, Chris Murphy wrote: >> >> Is it practical and desirable to make Btrfs based OS installation >> images reproducible? Or is Btrfs simply too complex and >> non-deterministic? [1] >> >> The main three problems with Btrfs right now for reproducibility are: >> a. many objects have uuids other than the volume uuid; and mkfs only >> lets us set the volume uuid >> b. atime, ctime, mtime, otime; and no way to make them all the same >> c. non-deterministic allocation of file extents, compression, inode >> assignment, logical and physical address allocation >> >> I'm imagining reproducible image creation would be a mkfs feature that >> builds on Btrfs seed and --rootdir concepts to constrain Btrfs >> features to maybe make reproducible Btrfs volumes possible: >> >> - No raid >> - Either all objects needing uuids can have those uuids specified by >> switch, or possibly a defined set of uuids expressly for this use >> case, or possibly all of them can just be zeros (eek? not sure) >> - A flag to set all times the same >> - Possibly require that target block device is zero filled before >> creation of the Btrfs >> - Possibly disallow subvolumes and snapshots >> - Require the resulting image is seed/ro and maybe also a new >> compat_ro flag to enforce that such Btrfs file systems cannot be >> modified after the fact. >> - Enforce a consistent means of allocation and compression >> >> The end result is creating two Btrfs volumes would yield image files >> with matching hashes. > > >> If I had to guess, the biggest challenge would be allocation. But it's >> also possible that such an image may have problems with "sprouts". A >> non-removable sprout seems fairly straightforward and safe; but if a >> "reproducible build" type of seed is removed, it seems like removal >> needs to be smart enough to refresh *all* uuids found in the sprout: a >> hard break from the seed. > > > Right. The seed fsid will be gone in a detached sprout. I think already we get a new devid, volume uuid, and device uuid. Open question is whether any other uuid's need to be refreshed, such as chunk uuid since that appears in every node and leaf. >> Any thoughts? Useful? Difficult to implement? > > Recently Nikolay sent a patch to change fsid on a mounted btrfs. However for > a reproducible builds it also needs neutralized uuids, time, bytenr(s) > further more though the ondisk layout won't change without notice but > block-bytenr might. Seems like the mkfs population method of such a seed, could be made very deterministic as to what the start logical address and physical address are. The vast majority of non-deterministic behavior comes from the nature of kernel code having to handle so many complex inputs and outputs, and negotiate them. > One question why not reproducible builds get the file data extents from the > image and stitch the hashes together to verify the hash. And there could be > a vfs ioctl to import and export filesystem images for a better > support-ability of the use-case similar to the reproducible builds. Perhaps. I don't know the reproducible build requirements very well, if all they really care about is the hash of the data extents, and really how important fs metadata is. That is important when it comes to fuzzing file systems that have no metadata checksumming like squashfs; of course you'd have to checksum the whole file system image. Another feature the mkfs variety of seed image would need, deduplication. As far as I know, deduplication is kernel code only. You'd want to be able to deduplicate, as well as compress, to have the smallest distributed seed possible. And mksquashfs does deduplication by default. -- Chris Murphy
Re: CRC mismatch
On 2018-10-16 11:30, Anton Shepelev wrote: Hello, all What may be the reason of a CRC mismatch on a BTRFS file in a virutal machine: csum failed ino 175524 off 1876295680 csum 451760558 expected csum 1446289185 Shall I seek the culprit in the host machine on in the guest one? Supposing the host machine healty, what operations on the gueest might have caused a CRC mismatch? Possible causes include: * On the guest side: - Unclean shutdown of the guest system (not likely even if this did happen). - A kernel bug on in the guest. - Something directly modifying the block device (also not very likely). * On the host side: - Unclean shutdown of the host system without properly flushing data from the guest. Not likely unless you're using an actively unsafe caching mode for the guest's storage back-end. - At-rest data corruption in the storage back-end. - A bug in the host-side storage stack. - A transient error in the host-side storage stack. - A bug in the hypervisor. - Something directly modifying the back-end storage. Of these, the statistically most likely location for the issue is probably the storage stack on the host.
CRC mismatch
Hello, all What may be the reason of a CRC mismatch on a BTRFS file in a virutal machine: csum failed ino 175524 off 1876295680 csum 451760558 expected csum 1446289185 Shall I seek the culprit in the host machine on in the guest one? Supposing the host machine healty, what operations on the gueest might have caused a CRC mismatch? -- () ascii ribbon campaign - against html e-mail /\ http://preview.tinyurl.com/qcy6mjc [archived]
Re: btrfs check: Superblock bytenr is larger than device size
Qu Wenruo to Anton Shepelev: >>On all our servers with BTRFS, which are otherwise working >>normally, `btrfs check /' complains that >> >>Superblock bytenr is larger than device size >>Couldn't open file system >> >Please try latest btrfs-progs and see if btrfs check >reports any error. It is SUSE Linux Enterprise with its native repository, so I cannot update btrfs easily. I will, though. -- () ascii ribbon campaign - against html e-mail /\ http://preview.tinyurl.com/qcy6mjc [archived]
Re: btrfs check: Superblock bytenr is larger than device size
On 2018/10/16 下午10:05, Anton Shepelev wrote: > Hello, all > > On all our servers with BTRFS, which are otherwise working > normally, `btrfs check /' complains that Btrfs check shouldn't continue on mount point. Latest one would report error like: Opening filesystem to check... ERROR: not a regular file or block device: /mnt/btrfs ERROR: cannot open file system > >Superblock bytenr is larger than device size This shouldn't be a big problem, normally related to unaligned numbers, and older kernel/btrfs-progs. Please try latest btrfs-progs and see if btrfs check reports any error. >Couldn't open file system > > Since am not using any "dangerous" options and want merely > to analyse the system for errors without any modifications, > I don't think I must unount my FS, which in my case would > mean booting from a live CD. What may be causeing this > error? Normally old mkfs or old kernel. Latest btrfs check result would definitely help to solve the problem. And the follow result will also help (can be dumpded even with fs mounted, also needs latest btrfs-progs): # btrfs ins dump-super -FfA # btrfs ins dump-tree -t chunk Thanks, Qu > signature.asc Description: OpenPGP digital signature
btrfs check: Superblock bytenr is larger than device size
Hello, all On all our servers with BTRFS, which are otherwise working normally, `btrfs check /' complains that Superblock bytenr is larger than device size Couldn't open file system Since am not using any "dangerous" options and want merely to analyse the system for errors without any modifications, I don't think I must unount my FS, which in my case would mean booting from a live CD. What may be causeing this error? -- () ascii ribbon campaign - against html e-mail /\ http://preview.tinyurl.com/qcy6mjc [archived]
Re: [PATCH v7 1/6] mm: split SWP_FILE into SWP_ACTIVATED and SWP_FS
On Fri, Oct 12, 2018 at 01:59:34PM -0700, Andrew Morton wrote: > On Tue, 11 Sep 2018 15:34:44 -0700 Omar Sandoval wrote: > > > From: Omar Sandoval > > > > The SWP_FILE flag serves two purposes: to make swap_{read,write}page() > > go through the filesystem, and to make swapoff() call > > ->swap_deactivate(). For Btrfs, we want the latter but not the former, > > so split this flag into two. This makes us always call > > ->swap_deactivate() if ->swap_activate() succeeded, not just if it > > didn't add any swap extents itself. > > > > This also resolves the issue of the very misleading name of SWP_FILE, > > which is only used for swap files over NFS. > > > > Acked-by: Andrew Morton Andrew, can you please take the two patches through the mm tree? I'm not going to send the btrfs swap patches in the upcoming merge window so it would not make sense to add plain MM changes to btrfs tree. The whole series has been in linux-next for some time so it's just moving between trees. Thanks.
Re: BTRFS bad block management. Does it exist?
On 10/14/2018 07:08 PM, waxhead wrote: In case BTRFS fails to WRITE to a disk. What happens? Does the bad area get mapped out somehow? There was a proposed patch, its not convincing because the disks does the bad block relocation part transparently to the host and if disk runs out of reserved list then probably its time to replace the disk as in my experience the disk would have failed for other non-media error before it runs out of the reserved list and where in this case the host performed relocation won't help. Further more being at the file-system level you won't be able to accurately determine whether the block write has failed for the bad media error and not because of the reason of target circuitry fault. Does it try again until it succeed or until it "times out" or reach a threshold counter? Block IO timeout and retry are the properties of the block layer depending on the type of error it should. SD module already does retry of 5 counts (when failfast is not set), it should be tune-able. And I think there was a patch for that in the ML. We had few discussion on the retry part in the past. [1] [1] https://www.spinics.net/lists/linux-btrfs/msg70240.html https://www.spinics.net/lists/linux-btrfs/msg71779.html Does it eventually try to write to a different disk (in case of using the raid1/10 profile?) When there is mirror copy it does not go into the RO mode, and it leaves write hole(s) patchy across any transaction as we don't fail the disk at the first failed transaction. That means if a disk is at nth transaction per the super-block, its not guaranteed that all previous transactions have made it to the disk successfully in case of mirror-ed configs. I consider this as a bug. And there is a danger that it may read the junk data, which is hard but not impossible to hit due to our un-reasonable (there is a patch in the ML to address that as well) hard-coded pid-based read-mirror policy. I sent a patch to fail the disk when first write fails so that we know the last good integrity of the FS based on the transaction id. That was a long time back I still believe its important patch. There wasn't enough comments I guess for it go into the next step. The current solution is to replace the offending disk _without_ reading from it, to have a good recovery from the failed disk. As data centers can't relay on admin initiated manual recovery, there is also a patch to do this stuff automatically using the auto-replace feature, patches are in the ML. Again there wasn't enough comments I guess for it go into the next step. Thanks, Anand
Re: [PATCH v2] Btrfs: fix null pointer dereference on compressed write path error
On Sat, Oct 13, 2018 at 12:37:25AM +0100, fdman...@kernel.org wrote: > From: Filipe Manana > > At inode.c:compress_file_range(), under the "free_pages_out" label, we can > end up dereferencing the "pages" pointer when it has a NULL value. This > case happens when "start" has a value of 0 and we fail to allocate memory > for the "pages" pointer. When that happens we jump to the "cont" label and > then enter the "if (start == 0)" branch where we immediately call the > cow_file_range_inline() function. If that function returns 0 (success > creating an inline extent) or an error (like -ENOMEM for example) we jump > to the "free_pages_out" label and then access "pages[i]" leading to a NULL > pointer dereference, since "nr_pages" has a value greater than zero at > that point. > > Fix this by setting "nr_pages" to 0 when we fail to allocate memory for > the "pages" pointer. > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201119 > Fixes: 771ed689d2cd ("Btrfs: Optimize compressed writeback and reads") > Signed-off-by: Filipe Manana Added to misc-next, thanks.
Re: reproducible builds with btrfs seed feature
On 10/14/2018 06:28 AM, Chris Murphy wrote: Is it practical and desirable to make Btrfs based OS installation images reproducible? Or is Btrfs simply too complex and non-deterministic? [1] The main three problems with Btrfs right now for reproducibility are: a. many objects have uuids other than the volume uuid; and mkfs only lets us set the volume uuid b. atime, ctime, mtime, otime; and no way to make them all the same c. non-deterministic allocation of file extents, compression, inode assignment, logical and physical address allocation I'm imagining reproducible image creation would be a mkfs feature that builds on Btrfs seed and --rootdir concepts to constrain Btrfs features to maybe make reproducible Btrfs volumes possible: - No raid - Either all objects needing uuids can have those uuids specified by switch, or possibly a defined set of uuids expressly for this use case, or possibly all of them can just be zeros (eek? not sure) - A flag to set all times the same - Possibly require that target block device is zero filled before creation of the Btrfs - Possibly disallow subvolumes and snapshots - Require the resulting image is seed/ro and maybe also a new compat_ro flag to enforce that such Btrfs file systems cannot be modified after the fact. - Enforce a consistent means of allocation and compression The end result is creating two Btrfs volumes would yield image files with matching hashes. If I had to guess, the biggest challenge would be allocation. But it's also possible that such an image may have problems with "sprouts". A non-removable sprout seems fairly straightforward and safe; but if a "reproducible build" type of seed is removed, it seems like removal needs to be smart enough to refresh *all* uuids found in the sprout: a hard break from the seed. Right. The seed fsid will be gone in a detached sprout. Competing file systems, ext4 with make_ext4 fork, and squashfs. At the moment I'm thinking it might be easier to teach squashfs integrity checking than to make Btrfs reproducible. But then I also think restricting Btrfs features, and applying some requirements to constrain Btrfs to make it reproducible, really enhances the Btrfs seed-sprout feature. > Any thoughts? Useful? Difficult to implement? Recently Nikolay sent a patch to change fsid on a mounted btrfs. However for a reproducible builds it also needs neutralized uuids, time, bytenr(s) further more though the ondisk layout won't change without notice but block-bytenr might. One question why not reproducible builds get the file data extents from the image and stitch the hashes together to verify the hash. And there could be a vfs ioctl to import and export filesystem images for a better support-ability of the use-case similar to the reproducible builds. For the seed sprout feature one thing I have in mind is to make it image and subvolume granular rather than the disk and fsid granular, and ability to transpire golden image (seed) updates, but I haven't checked the feasibility yet. Thanks, Anand Squashfs might be a better fit for this use case *if* it can be taught about integrity checking. It does per file checksums for the purpose of deduplication but those checksums aren't retained for later integrity checking. [1] problems of reproducible system images https://reproducible-builds.org/docs/system-images/ [2] purpose and motivation for reproducible builds https://reproducible-builds.org/ [3] who is involved? https://reproducible-builds.org/who/#Qubes%20OS