Bug#1006157: /lib/modules/5.16.0-1-sparc64-smp/kernel/fs/ext4/ext4.ko: [sparc64+ext4] reads see zeros w/ simultaneous write
Control: tags -1 + upstream Control: forwarded -1 https://marc.info/?l=linux-sparc=164539269632667=2 Hi Noah, On Tue, Feb 22, 2022 at 07:12:14PM -0800, Noah Misch wrote: > On Sun, Feb 20, 2022 at 03:31:27PM +0100, Salvatore Bonaccorso wrote: > > Unless mistaken this looks like to be an upstream issue, think would > > be better suited to directly report it upstream. Can you do so and > > keep us in the loop? > > https://marc.info/?t=16453926991 has my upstream report. Anatoly Pugachev > confirmed the behavior on sparc64 5.17.0-rc5, so I'm assuming this is not > Debian-specific. I will update this bug with any major news. Many thanks. Regards, Salvatore
Bug#1006157: /lib/modules/5.16.0-1-sparc64-smp/kernel/fs/ext4/ext4.ko: [sparc64+ext4] reads see zeros w/ simultaneous write
On Sun, Feb 20, 2022 at 03:31:27PM +0100, Salvatore Bonaccorso wrote: > Unless mistaken this looks like to be an upstream issue, think would > be better suited to directly report it upstream. Can you do so and > keep us in the loop? https://marc.info/?t=16453926991 has my upstream report. Anatoly Pugachev confirmed the behavior on sparc64 5.17.0-rc5, so I'm assuming this is not Debian-specific. I will update this bug with any major news.
Bug#1006157: /lib/modules/5.16.0-1-sparc64-smp/kernel/fs/ext4/ext4.ko: [sparc64+ext4] reads see zeros w/ simultaneous write
Control: tags -1 + moreinfo Hi Noah, On Sat, Feb 19, 2022 at 05:53:52PM -0800, Noah Misch wrote: > Package: src:linux > Version: 5.16.7-2 > Severity: normal > File: /lib/modules/5.16.0-1-sparc64-smp/kernel/fs/ext4/ext4.ko > > Dear Maintainer, > >* What led up to the situation? > > The context is an ext4 filesystem on a sparc64 host. I've observed > this with each of the three sparc64 kernels that I've tested. Those > kernels were 5.16.0-1-sparc64-smp (this report), 5.15.0-2-sparc64-smp, > and 4.9.0-13-sparc64-smp. > >* What exactly did you do (or not do) that was effective (or > ineffective)? > > See the included file for a minimal test program. It creates two > processes, each of which loops indefinitely. One opens a file, writes > 0x1 to a 256-byte region, and closes the file. The other process > opens the same file, reads the same region, and prints a message if > any byte is not 0x1. > > This thread has more discussion and a more-configurable test program: > https://postgr.es/m/flat/20220116071210.ga735...@rfd.leadboat.com > >* What was the outcome of this action? > > The program prints messages, at least ten per second. The mismatch > always appears at an offset divisible by eight. Some offsets are more > common than others. Here's output from 300s of runtime, filtered > through "sort -nk3 | uniq -c": > >1729 mismatch at 8: got 0, want 1 >1878 mismatch at 16: got 0, want 1 >1030 mismatch at 24: got 0, want 1 > 41 mismatch at 40: got 0, want 1 > 373 mismatch at 48: got 0, want 1 > 24 mismatch at 56: got 0, want 1 > 349 mismatch at 64: got 0, want 1 > 13525 mismatch at 72: got 0, want 1 > 401 mismatch at 80: got 0, want 1 > 365 mismatch at 88: got 0, want 1 > 1 mismatch at 96: got 0, want 1 > 32 mismatch at 104: got 0, want 1 > 34 mismatch at 112: got 0, want 1 > 19 mismatch at 120: got 0, want 1 > 34 mismatch at 128: got 0, want 1 > 253 mismatch at 136: got 0, want 1 > 149 mismatch at 144: got 0, want 1 > 138 mismatch at 152: got 0, want 1 > 1 mismatch at 160: got 0, want 1 > 4 mismatch at 168: got 0, want 1 > 7 mismatch at 176: got 0, want 1 > 4 mismatch at 184: got 0, want 1 > 1 mismatch at 192: got 0, want 1 > 83 mismatch at 200: got 0, want 1 > 58 mismatch at 208: got 0, want 1 >3301 mismatch at 216: got 0, want 1 > 2 mismatch at 232: got 0, want 1 > 1 mismatch at 248: got 0, want 1 > > If I run the program atop an xfs filesystem (still with sparc64), it > prints nothing. If I run it with x86_64 or powerpc64 (atop ext4), it > prints nothing. > >* What outcome did you expect instead? > > I expected the program to print nothing, indicating that the reader > process observes only 0x1 bytes. That is how x86_64+ext4 behaves. > > POSIX is stricter, requiring read() and write() implementations such > that "each call shall either see all of the specified effects of the > other call, or none of them" > (https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_07). > ext4 does not conform, which may be pragmatic. However, with x86_64 > and powerpc64, readers see each byte as either its before-write value > or its after-write value. They don't see a zero in an offset that > will have been nonzero both before and after the ongoing write(). Unless mistaken this looks like to be an upstream issue, think would be better suited to directly report it upstream. Can you do so and keep us in the loop? Regards, Salvatore
Bug#1006157: /lib/modules/5.16.0-1-sparc64-smp/kernel/fs/ext4/ext4.ko: [sparc64+ext4] reads see zeros w/ simultaneous write
Package: src:linux Version: 5.16.7-2 Severity: normal File: /lib/modules/5.16.0-1-sparc64-smp/kernel/fs/ext4/ext4.ko Dear Maintainer, * What led up to the situation? The context is an ext4 filesystem on a sparc64 host. I've observed this with each of the three sparc64 kernels that I've tested. Those kernels were 5.16.0-1-sparc64-smp (this report), 5.15.0-2-sparc64-smp, and 4.9.0-13-sparc64-smp. * What exactly did you do (or not do) that was effective (or ineffective)? See the included file for a minimal test program. It creates two processes, each of which loops indefinitely. One opens a file, writes 0x1 to a 256-byte region, and closes the file. The other process opens the same file, reads the same region, and prints a message if any byte is not 0x1. This thread has more discussion and a more-configurable test program: https://postgr.es/m/flat/20220116071210.ga735...@rfd.leadboat.com * What was the outcome of this action? The program prints messages, at least ten per second. The mismatch always appears at an offset divisible by eight. Some offsets are more common than others. Here's output from 300s of runtime, filtered through "sort -nk3 | uniq -c": 1729 mismatch at 8: got 0, want 1 1878 mismatch at 16: got 0, want 1 1030 mismatch at 24: got 0, want 1 41 mismatch at 40: got 0, want 1 373 mismatch at 48: got 0, want 1 24 mismatch at 56: got 0, want 1 349 mismatch at 64: got 0, want 1 13525 mismatch at 72: got 0, want 1 401 mismatch at 80: got 0, want 1 365 mismatch at 88: got 0, want 1 1 mismatch at 96: got 0, want 1 32 mismatch at 104: got 0, want 1 34 mismatch at 112: got 0, want 1 19 mismatch at 120: got 0, want 1 34 mismatch at 128: got 0, want 1 253 mismatch at 136: got 0, want 1 149 mismatch at 144: got 0, want 1 138 mismatch at 152: got 0, want 1 1 mismatch at 160: got 0, want 1 4 mismatch at 168: got 0, want 1 7 mismatch at 176: got 0, want 1 4 mismatch at 184: got 0, want 1 1 mismatch at 192: got 0, want 1 83 mismatch at 200: got 0, want 1 58 mismatch at 208: got 0, want 1 3301 mismatch at 216: got 0, want 1 2 mismatch at 232: got 0, want 1 1 mismatch at 248: got 0, want 1 If I run the program atop an xfs filesystem (still with sparc64), it prints nothing. If I run it with x86_64 or powerpc64 (atop ext4), it prints nothing. * What outcome did you expect instead? I expected the program to print nothing, indicating that the reader process observes only 0x1 bytes. That is how x86_64+ext4 behaves. POSIX is stricter, requiring read() and write() implementations such that "each call shall either see all of the specified effects of the other call, or none of them" (https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_07). ext4 does not conform, which may be pragmatic. However, with x86_64 and powerpc64, readers see each byte as either its before-write value or its after-write value. They don't see a zero in an offset that will have been nonzero both before and after the ongoing write(). -- Package-specific info: ** Version: Linux version 5.16.0-1-sparc64-smp (debian-ker...@lists.debian.org) (gcc-11 (Debian 11.2.0-16) 11.2.0, GNU ld (GNU Binutils for Debian) 2.37.90.20220130) #1 SMP Debian 5.16.7-1 (2022-02-06) ** Command line: BOOT_IMAGE=/vmlinux-5.16.0-1-sparc64-smp root=/dev/mapper/vg1-nroot ro ** Tainted: E (8192) * unsigned module was loaded ** Kernel log: [344103.150402] null-4.exe[3045591]: segfault at 0 ip 01000990 (rpc 01000984) sp 07feff952831 error 1 in null-4.exe[100+2000] [344103.533876] null-4.exe[3045722]: segfault at 8 ip 01000990 (rpc 01000984) sp 07feffa8c841 error 1 in null-4.exe[100+2000] [344103.911758] null-4.exe[3045896]: segfault at 8 ip 010007e4 (rpc 010007dc) sp 07feffeec841 error 1 in null-4.exe[100+2000] [344104.319288] null-4.exe[3046052]: segfault at 8 ip 010007e4 (rpc 010007dc) sp 07feffa2e841 error 1 in null-4.exe[100+2000] [344104.703441] null-4.exe[3046206]: segfault at 8 ip 010007c8 (rpc 010007bc) sp 07feffeb8841 error 1 in null-4.exe[100+2000] [344105.411714] null-4.exe[3046494]: segfault at 8 ip 010007e4 (rpc 010007dc) sp 07feff9ec841 error 1 in null-4.exe[100+2000] [344105.921598] null-4.exe[3046699]: segfault at 8 ip 010007e4 (rpc 010007dc) sp 07feffd3a841 error 1 in null-4.exe[100+2000] [344106.302875] null-5.exe[3046860]: segfault at 0 ip 010009b0 (rpc 010009a4) sp 07feffbc6831 error 1 in null-5.exe[100+2000] [344107.467462] show_signal_msg: 2 callbacks suppressed [344107.467472] null-5.exe[3047293]: segfault at 0 ip 010007f0 (rpc 010007dc) sp 07feff9a8841 error 1 in null-5.exe[100+2000]