Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-14 Thread Lukas Wunner
On Tue, Jan 12, 2021 at 09:28:32AM -0800, Linus Torvalds wrote:
> On Tue, Jan 12, 2021 at 5:20 AM Lukas Wunner  wrote:
> > > Variable declarations in for-loops is the only one I can think of. I
> > > think that would clean up some code (and some macros), but might not
> > > be compelling on its own.
> >
> > Anonymous structs/unions.  I used to have a use case for that in
> > struct efi_dev_path in include/linux/efi.h, but Ard Biesheuvel
> > refactored it in a gnu89-compatible way for v5.7 with db8952e7094f.
> 
> We use anonymous structs/unions extensively and all over the place already.

Yes, my apologies, I mixed things up.

Back in 2016 when I authored 46cd4b75cd0e, what I wanted to do was
include an unnamed "struct efi_generic_dev_path;" in struct efi_dev_path:

struct efi_dev_path {
struct efi_generic_dev_path;
union {
struct {
u32 hid;
u32 uid;
} acpi;
struct {
u8 fn;
u8 dev;
} pci;
};
} __attribute ((packed));

The alternative is to copy-paste the elements of struct efi_dev_path
or to give it a name such as "header" (which is what db8952e7094f
subsequently did).  Both options seemed inelegant to me.

However it turns out this feature requires -fms-extensions.
It's not part of -std=gnu11.

So coming back to topic, yes there doesn't seem to be too much to
be gained from moving to -std=gnu11 aside from variable declarations
in for-loops.

(And it really has to be -std=gnu11 because -std=c11 fails to compile.)

Thanks,

Lukas


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-12 Thread Linus Torvalds
On Tue, Jan 12, 2021 at 5:20 AM Lukas Wunner  wrote:
>
> > Variable declarations in for-loops is the only one I can think of. I
> > think that would clean up some code (and some macros), but might not
> > be compelling on its own.
>
> Anonymous structs/unions.  I used to have a use case for that in
> struct efi_dev_path in include/linux/efi.h, but Ard Biesheuvel
> refactored it in a gnu89-compatible way for v5.7 with db8952e7094f.

We use anonymous structs/unions extensively and all over the place already.

We've had a couple of special cases where some versions of gcc didn't
do things right iirc (it was something like "nested anonymous struct"
or similar), but those weren't actually about the feature itself.

Was it perhaps the nested case you were thinking of? If so, I think
that's not really a --std=gun11 thing, it's more of a "certain
versions of gcc didn't do it at all".

   Linus


RE: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-12 Thread David Laight
From: Florian Weimer
> Sent: 12 January 2021 13:32
> 
> * Lukas Wunner:
> 
> > On Fri, Jan 08, 2021 at 12:02:53PM -0800, Linus Torvalds wrote:
> >> I appreciate Arnd pointing out "--std=gnu11", though. What are the
> >> actual relevant language improvements?
> >>
> >> Variable declarations in for-loops is the only one I can think of. I
> >> think that would clean up some code (and some macros), but might not
> >> be compelling on its own.
> >
> > Anonymous structs/unions.  I used to have a use case for that in
> > struct efi_dev_path in include/linux/efi.h, but Ard Biesheuvel
> > refactored it in a gnu89-compatible way for v5.7 with db8952e7094f.
> 
> Aren't those a GNU extension supported since GCC 3.0?

They are certainly pretty old.
The 15 year old gcc we use for release builds (so binaries work
on old distributions) supports them.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)



Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-12 Thread Florian Weimer
* Lukas Wunner:

> On Fri, Jan 08, 2021 at 12:02:53PM -0800, Linus Torvalds wrote:
>> I appreciate Arnd pointing out "--std=gnu11", though. What are the
>> actual relevant language improvements?
>> 
>> Variable declarations in for-loops is the only one I can think of. I
>> think that would clean up some code (and some macros), but might not
>> be compelling on its own.
>
> Anonymous structs/unions.  I used to have a use case for that in
> struct efi_dev_path in include/linux/efi.h, but Ard Biesheuvel
> refactored it in a gnu89-compatible way for v5.7 with db8952e7094f.

Aren't those a GNU extension supported since GCC 3.0?

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill



Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-12 Thread Lukas Wunner
On Fri, Jan 08, 2021 at 12:02:53PM -0800, Linus Torvalds wrote:
> I appreciate Arnd pointing out "--std=gnu11", though. What are the
> actual relevant language improvements?
> 
> Variable declarations in for-loops is the only one I can think of. I
> think that would clean up some code (and some macros), but might not
> be compelling on its own.

Anonymous structs/unions.  I used to have a use case for that in
struct efi_dev_path in include/linux/efi.h, but Ard Biesheuvel
refactored it in a gnu89-compatible way for v5.7 with db8952e7094f.

[The above was copy-pasted from last time this discussion came up
in July 2020.  Back then, Kirill Shutemov likewise mentioned the
local variables in loops feature:
https://lore.kernel.org/lkml/20200710111724.m4jaci73pykal...@wunner.de/
]

Thanks,

Lukas


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-08 Thread Nick Desaulniers
On Fri, Jan 8, 2021 at 12:34 PM Arnd Bergmann  wrote:
>
> On Fri, Jan 8, 2021 at 9:02 PM Linus Torvalds
>  wrote:
> > On Fri, Jan 8, 2021 at 1:27 AM Will Deacon  wrote:
> > >
> > > On Fri, Jan 08, 2021 at 10:21:54AM +0100, Peter Zijlstra wrote:
> > > > On Thu, Jan 07, 2021 at 10:20:38PM +0100, Arnd Bergmann wrote:
> > > > > On Thu, Jan 7, 2021 at 2:37 PM Russell King - ARM Linux admin
> >
> > I appreciate Arnd pointing out "--std=gnu11", though. What are the
> > actual relevant language improvements?

It's hard to say, since a lot of new language features were already
GNU C extensions.

The only semantic difference I'm aware of is the semantics of `extern
inline` changed 100% from c89 to c99 (so jumping from gnu89 to gnu11
would change that).  We already #define inline to
__attribute__((__gnu_inline)) (there's also a -fgnu-inline flag), but
I worry for places that don't include that header or drop
KBUILD_CFLAGS (like every vdso), though `extern inline` is awful (and
I should be put in jail for introducing it to the kernel; now we have
__attribute__((no_stack_protector)) in both toolchains, and should be
using that instead, but we don't have it yet for all supported
compiler versions).

A quick grep through clang's sources shows mostly parser changes for
_Noreturn, _Alignof and friends etc..  New to me are unicode literal
strings (u or U suffix or prefix?) and something about loops expected
to make forward progress???

Another thing I've been worried about is Makefiles that reset
KBUILD_CFLAGS, since that's a constant source of pain/breakage for
cross compiling from Clang.  That tends to drop -std=gnu89.  For
instance:

$ make LLVM=1 -j71 defconfig
$ make LLVM=1 -j71 V=1 &>log.txt
$ grep -v std=gnu89 log.txt | grep clang | rev | cut -d ' ' -f 1 | rev
| grep -v \\.S
arch/x86/realmode/rm/wakemain.c
arch/x86/realmode/rm/video-mode.c
arch/x86/realmode/rm/regs.c
arch/x86/realmode/rm/video-vga.c
arch/x86/realmode/rm/video-vesa.c
arch/x86/realmode/rm/video-bios.c
drivers/firmware/efi/libstub/efi-stub-helper.c
drivers/firmware/efi/libstub/gop.c
drivers/firmware/efi/libstub/secureboot.c
drivers/firmware/efi/libstub/tpm.c
drivers/firmware/efi/libstub/file.c
drivers/firmware/efi/libstub/mem.c
drivers/firmware/efi/libstub/random.c
drivers/firmware/efi/libstub/randomalloc.c
drivers/firmware/efi/libstub/pci.c
drivers/firmware/efi/libstub/skip_spaces.c
lib/cmdline.c
lib/ctype.c
drivers/firmware/efi/libstub/alignedmem.c
drivers/firmware/efi/libstub/relocate.c
drivers/firmware/efi/libstub/vsprintf.c
drivers/firmware/efi/libstub/x86-stub.c
arch/x86/boot/a20.c
arch/x86/boot/cmdline.c
arch/x86/boot/cpuflags.c
arch/x86/boot/cpucheck.c
arch/x86/boot/early_serial_console.c
arch/x86/boot/edd.c
arch/x86/boot/main.c
arch/x86/boot/memory.c
arch/x86/boot/pm.c
arch/x86/boot/printf.c
arch/x86/boot/regs.c
arch/x86/boot/string.c
arch/x86/boot/tty.c
arch/x86/boot/video.c
arch/x86/boot/video-mode.c
arch/x86/boot/version.c
arch/x86/boot/video-vga.c
arch/x86/boot/video-vesa.c
arch/x86/boot/video-bios.c
arch/x86/boot/cpu.c
arch/x86/boot/compressed/string.c
arch/x86/boot/compressed/cmdline.c
arch/x86/boot/compressed/error.c
arch/x86/boot/compressed/cpuflags.c
arch/x86/boot/compressed/early_serial_console.c
arch/x86/boot/compressed/kaslr.c
arch/x86/boot/compressed/ident_map_64.c
arch/x86/boot/compressed/idt_64.c
arch/x86/boot/compressed/pgtable_64.c
arch/x86/boot/compressed/acpi.c
arch/x86/boot/compressed/misc.c

So it looks like parts of the tree are already built with -std=gnu11
or -std=gnu17, as they rely on the implicit default C language mode
when unspecified.  Oops?
-- 
Thanks,
~Nick Desaulniers


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-08 Thread Russell King - ARM Linux admin
On Fri, Jan 08, 2021 at 12:02:53PM -0800, Linus Torvalds wrote:
> Well, honestly, I'm always in favor of having people not use ancient
> compilers, but both of the issues at hand do seem to be specific to
> arm64.
> 
> The "gcc before 5.1 generates incorrect stack pointer writes on arm64"
> sounds pretty much deadly, and I think means that yes, for arm64 we
> simply need to require 5.1 or newer.
> 
> I also suspect there is much less reason to use old gcc's on arm64. I
> can't imagine that people really run very old setups, Is some old RHEL
> version even relevant for arm64?

For me, six years old for a compiler is really not "very old" - and,
when I first encountered this problem, it was over 12 months ago.
Apart from the kernel, I am not in the habbit of upgrading stuff for
the sake of upgrading - I tend to stick with what I have and what
works. Not everyone on this planet has a desire to have the latest
and greatest all the time.

Since then, I've _not_ wanted to change the compiler in case the
problem vanishes without explanation - it had the feeling of being
way more serious than a compiler bug, potentially a memory ordering
bug.

It took about a year just to start being able to work out what was
going on - it would take up to about three months to show for me,
and when it did, it spat out an ext4 inode checksum error and made
the rootfs read-only.

To "hide" that by upgrading the compiler, and then to be in the
situation where you do not trust any aarch64 machine with your data
is no real solution. That's exactly where I was until this had been
found. The aarch64 architecture had completely lost my trust as a
viable computing platform - and I was at the point of considering
disposing of all my aarch64 hardware and replacing it with x86.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-08 Thread Arnd Bergmann
On Fri, Jan 8, 2021 at 9:02 PM Linus Torvalds
 wrote:
> On Fri, Jan 8, 2021 at 1:27 AM Will Deacon  wrote:
> >
> > On Fri, Jan 08, 2021 at 10:21:54AM +0100, Peter Zijlstra wrote:
> > > On Thu, Jan 07, 2021 at 10:20:38PM +0100, Arnd Bergmann wrote:
> > > > On Thu, Jan 7, 2021 at 2:37 PM Russell King - ARM Linux admin
>
> I appreciate Arnd pointing out "--std=gnu11", though. What are the
> actual relevant language improvements?
>
> Variable declarations in for-loops is the only one I can think of. I
> think that would clean up some code (and some macros), but might not
> be compelling on its own.

I think that was the main one, as most of --std=c11 is already part
of --std=gnu89 as a gnu extension. There were a few things that
came up with clang porting, as clang is somewhat closer to gnu11
than to gnu89, but I don't remember exactly what that was.

I would still like to improve READ_ONCE()/get_user()/cmpxchg()
further using __auto_type and _Generic where possible, but I think
that was already supported in gcc-4.9, and does not require gcc-5.

   Arnd


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-08 Thread Linus Torvalds
On Fri, Jan 8, 2021 at 1:27 AM Will Deacon  wrote:
>
> On Fri, Jan 08, 2021 at 10:21:54AM +0100, Peter Zijlstra wrote:
> > On Thu, Jan 07, 2021 at 10:20:38PM +0100, Arnd Bergmann wrote:
> > > On Thu, Jan 7, 2021 at 2:37 PM Russell King - ARM Linux admin
> >
> > > > So, do we raise the minimum gcc version for the kernel as a whole to 5.1
> > > > or just for aarch64?
> > >
> > > I'd personally love to see gcc-5 as the global minimum version, as that
> > > would let us finally use --std=gnu11 features instead of gnu89. [There are
> > > a couple of useful features that are incompatible with gnu89, and
> > > gnu99/gnu11 support in gcc didn't like the kernel sources]
> >
> > +1 for raising the tree-wide minimum (again!). We actually have a bunch
> > of work-arounds for 4.9 bugs we can get rid of as well.
>
> We even just added another one for arm64 KVM! [1]
>
> So yes, I'm in favour of leaving gcc 4.9 to rot as well, especially after
> this ext4 debugging experience.

Well, honestly, I'm always in favor of having people not use ancient
compilers, but both of the issues at hand do seem to be specific to
arm64.

The "gcc before 5.1 generates incorrect stack pointer writes on arm64"
sounds pretty much deadly, and I think means that yes, for arm64 we
simply need to require 5.1 or newer.

I also suspect there is much less reason to use old gcc's on arm64. I
can't imagine that people really run very old setups, Is some old RHEL
version even relevant for arm64?

So while I'd love to just say "everybody needs to make sure they have
an up-to-date compiler", my git feel is that at least with the current
crop of issues, there is little to really push us globally.

I appreciate Arnd pointing out "--std=gnu11", though. What are the
actual relevant language improvements?

Variable declarations in for-loops is the only one I can think of. I
think that would clean up some code (and some macros), but might not
be compelling on its own.

   Linus


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-08 Thread Pavel Machek
Hi!

> > > The gcc bugzilla mentions backports into gcc-linaro, but I do not see
> > > them in my git history.
> > 
> > So, do we raise the minimum gcc version for the kernel as a whole to 5.1
> > or just for aarch64?
> 
> Russell, Arnd, thanks so much for tracking down the root cause of the
> bug!
> 
> I will note that RHEL 7 uses gcc 4.8.  I personally don't have an
> objections to requiring developers using RHEL 7 to have to install a
> more modern gcc (since I use Debian Testing and gcc 10.2.1, myself,
> and gcc 5.1 is so five years ago :-), but I could imagine that being
> considered inconvenient for some.

I'm on gcc 4.9.2 on a machine that is hard to upgrade :-(.

Best regards,
Pavel
-- 
http://www.livejournal.com/~pavelmachek


signature.asc
Description: PGP signature


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-08 Thread Will Deacon
On Fri, Jan 08, 2021 at 10:21:54AM +0100, Peter Zijlstra wrote:
> On Thu, Jan 07, 2021 at 10:20:38PM +0100, Arnd Bergmann wrote:
> > On Thu, Jan 7, 2021 at 2:37 PM Russell King - ARM Linux admin
> 
> > > So, do we raise the minimum gcc version for the kernel as a whole to 5.1
> > > or just for aarch64?
> > 
> > I'd personally love to see gcc-5 as the global minimum version, as that
> > would let us finally use --std=gnu11 features instead of gnu89. [There are
> > a couple of useful features that are incompatible with gnu89, and
> > gnu99/gnu11 support in gcc didn't like the kernel sources]
> 
> +1 for raising the tree-wide minimum (again!). We actually have a bunch
> of work-arounds for 4.9 bugs we can get rid of as well.

We even just added another one for arm64 KVM! [1]

So yes, I'm in favour of leaving gcc 4.9 to rot as well, especially after
this ext4 debugging experience.

Will

[1] https://git.kernel.org/linus/9fd339a45be5


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-08 Thread Peter Zijlstra
On Thu, Jan 07, 2021 at 10:20:38PM +0100, Arnd Bergmann wrote:
> On Thu, Jan 7, 2021 at 2:37 PM Russell King - ARM Linux admin

> > So, do we raise the minimum gcc version for the kernel as a whole to 5.1
> > or just for aarch64?
> 
> I'd personally love to see gcc-5 as the global minimum version, as that
> would let us finally use --std=gnu11 features instead of gnu89. [There are
> a couple of useful features that are incompatible with gnu89, and
> gnu99/gnu11 support in gcc didn't like the kernel sources]

+1 for raising the tree-wide minimum (again!). We actually have a bunch
of work-arounds for 4.9 bugs we can get rid of as well.


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-08 Thread Peter Zijlstra
On Thu, Jan 07, 2021 at 11:27:22AM -0500, Theodore Ts'o wrote:
> I will note that RHEL 7 uses gcc 4.8.  I personally don't have an
> objections to requiring developers using RHEL 7 to have to install a
> more modern gcc (since I use Debian Testing and gcc 10.2.1, myself,
> and gcc 5.1 is so five years ago :-), but I could imagine that being
> considered inconvenient for some.

The kernel minimum (4.9) is already past that, so RHEL7 people are
already out in the cold.


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-08 Thread Ard Biesheuvel
On Thu, 7 Jan 2021 at 23:42, Eric Biggers  wrote:
>
> On Thu, Jan 07, 2021 at 10:14:46PM +, Russell King - ARM Linux admin 
> wrote:
> > On Thu, Jan 07, 2021 at 10:48:05PM +0100, Arnd Bergmann wrote:
> > > On Thu, Jan 7, 2021 at 5:27 PM Theodore Ts'o  wrote:
> > > >
> > > > On Thu, Jan 07, 2021 at 01:37:47PM +, Russell King - ARM Linux 
> > > > admin wrote:
> > > > > > The gcc bugzilla mentions backports into gcc-linaro, but I do not 
> > > > > > see
> > > > > > them in my git history.
> > > > >
> > > > > So, do we raise the minimum gcc version for the kernel as a whole to 
> > > > > 5.1
> > > > > or just for aarch64?
> > > >
> > > > Russell, Arnd, thanks so much for tracking down the root cause of the
> > > > bug!
> > >
> > > There is one more thing that I wondered about when looking through
> > > the ext4 code: Should it just call the crc32c_le() function directly
> > > instead of going through the crypto layer? It seems that with Ard's
> > > rework from 2018, that can just call the underlying architecture specific
> > > implementation anyway.
> >
> > Yes, I've been wondering about that too. To me, it looks like the
> > ext4 code performs a layering violation by going "under the covers"
> > - there are accessor functions to set the CRC and retrieve it. ext4
> > instead just makes the assumption that the CRC value is stored after
> > struct shash_desc. Especially as the crypto/crc32c code references
> > the value using:
> >
> >   struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
> >
> > Not even crypto drivers are allowed to assume that desc+1 is where
> > the CRC is stored.
>
> It violates how the shash API is meant to be used in general, but there is a
> test that enforces that the shash_desc_ctx for crc32c must be just the single
> u32 crc value.  See alg_test_crc32c() in crypto/testmgr.c.  So it's apparently
> intended to work.
>
> >
> > However, struct shash_desc is already 128 bytes in size on aarch64,
>
> Ard Biesheuvel recently sent a patch to reduce the alignment of struct
> shash_desc to ARCH_SLAB_MINALIGN
> (https://lkml.kernel.org/linux-crypto/20210107124128.19791-1-a...@kernel.org/),
> since apparently most of the bloat is from alignment for DMA, which isn't
> necessary.  I think that reduces the size by a lot on arm64.
>
> > and the proper way of doing it via SHASH_DESC_ON_STACK() is overkill,
> > being strangely 2 * sizeof(struct shash_desc) + 360 (which looks like
> > another bug to me!)
>
> Are you referring to the '2 * sizeof(struct shash_desc)' rather than just
> 'sizeof(struct shash_desc)'?  As mentioned in the comment above
> HASH_MAX_DESCSIZE, there can be a nested shash_desc due to HMAC.
> So I believe the value is correct.
>
> > So, I agree with you wrt crc32c_le(), especially as it would be more
> > efficient, and as the use of crc32c is already hard coded in the ext4
> > code - not only with crypto_alloc_shash("crc32c", 0, 0) but also with
> > the fixed-size structure in ext4_chksum().
> >
> > However, it's ultimately up to the ext4 maintainers to decide.
>
> As I mentioned in my other response, crc32c_le() isn't a proper library API
> (like some of the newer lib/crypto/ stuff) but rather just a wrapper for the
> shash API, and it doesn't handle modules being dynamically loaded/unloaded.
> So switching to it may cause a performance regression.
>
> What I'd recommend is making crc32c_le() able to call architecture-speccific
> implementations directly, similar to blake2s() and chacha20() in lib/crypto/.
> Then there would be no concern about when modules get loaded, etc...
>

I have looked into this in the past, both for crc32(c) and crc-t10dif,
both of which use a horrid method of wrapping a shash into a library
API. This was before we had static calls, though, and this work kind
of stalled on that. It should be straight-forward to redefine the
crc32c() library function as a static call, and have an optimized
module (or builtin) perform the [conditional] static call update at
module_init() time. The only missing piece is autoloading such
modules, which is tricky with softdeps if the dependency is from the
core kernel.

Currently, we have many users of crc32(c) in the kernel that go via
the shash (or synchronous ahash) layer to perform crc32c, all of which
would be better served by a library API, given that the hash type is a
compile time constant, and only synchronous calls are made.




drivers/infiniband/hw/i40iw/i40iw_utils.c: tfm =
crypto_alloc_shash("crc32c", 0, 0);
drivers/infiniband/sw/rxe/rxe_verbs.c: tfm = crypto_alloc_shash("crc32", 0, 0);
drivers/infiniband/sw/siw/siw_main.c: siw_crypto_shash =
crypto_alloc_shash("crc32c", 0, 0);
drivers/md/dm-crypt.c: tcw->crc32_tfm = crypto_alloc_shash("crc32", 0,
drivers/nvme/host/tcp.c: tfm = crypto_alloc_ahash("crc32c", 0,
CRYPTO_ALG_ASYNC);
drivers/nvme/target/tcp.c: tfm = crypto_alloc_ahash("crc32c", 0,
CRYPTO_ALG_ASYNC);
drivers/scsi/iscsi_tcp.c: tfm = crypto_alloc_ahash("crc32c", 0,
CRYPTO_ALG_ASYNC);

Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-08 Thread Arnd Bergmann
On Fri, Jan 8, 2021 at 12:53 AM Darrick J. Wong  wrote:
>
> On Thu, Jan 07, 2021 at 02:27:51PM -0800, Eric Biggers wrote:
> > On Thu, Jan 07, 2021 at 10:48:05PM +0100, Arnd Bergmann wrote:
> > > On Thu, Jan 7, 2021 at 5:27 PM Theodore Ts'o  wrote:
> > > >
> > > > On Thu, Jan 07, 2021 at 01:37:47PM +, Russell King - ARM Linux 
> > > > admin wrote:
> > > > > > The gcc bugzilla mentions backports into gcc-linaro, but I do not 
> > > > > > see
> > > > > > them in my git history.
> > > > >
> > > > > So, do we raise the minimum gcc version for the kernel as a whole to 
> > > > > 5.1
> > > > > or just for aarch64?
> > > >
> > > > Russell, Arnd, thanks so much for tracking down the root cause of the
> > > > bug!
> > >
> > > There is one more thing that I wondered about when looking through
> > > the ext4 code: Should it just call the crc32c_le() function directly
> > > instead of going through the crypto layer? It seems that with Ard's
> > > rework from 2018, that can just call the underlying architecture specific
> > > implementation anyway.
> > >
> >
> > It looks like that would work, although note that crc32c_le() uses the 
> > shash API
> > too, so it isn't any more "direct" than what ext4 does now.
>
> Yes.

Ah, I see. I had only noticed the architecture specific overrides for
__crc32c_le(),
and the global __weak crc32_le() function in lib/crc32.c, but failed to notice
the crc32c_le() macro that redirects to crc32c().

> > Also, a potential issue is that the implementation of crc32c that 
> > crc32c_le()
> > uses might be chosen too early if the architecture-specific implementation 
> > of
> > crc32c is compiled as a module (e.g. crc32c-intel.ko).
>
> This was the primary reason I chose to do it this way for ext4.
>
> The other is that ext4 didn't use crc32c before metadata_csum, so
> there's no point in pulling in the crypto layer if you're only going to
> use older ext2 or ext3 filesystems.  That was 2010, maybe people have
> stopped doing that?

The per-architecture overrides for __crc32c_le() are from 2018. With that
it should be possible to just always have the fastest implementation
(forcing them to be built-in normally), but not all architectures do this.

> > There are two ways this
> > could be fixed -- either by making it a proper library API like blake2s() 
> > that
> > can call the architecture-specific code directly, or by reconfiguring things
> > when a new crypto module is loaded (like what lib/crc-t10dif.c does).
>
> Though I would like to see the library functions gain the ability to use
> whatever is the fastest mechanism available once we can be reasonably
> certain that all the platform-specific drivers have been loaded.
>
> That said, IIRC most distros compile all of them into their
> (increasingly large) vmlinuz files so maybe this isn't much of practical
> concern?

I recently made checked the missing dependencies of drivers that
fail to 'select CRC32' but do call it directly. With those added, there
are now around 200 drivers that include it, and in practice you would
hardly find any kernel that doesn't have it built-in already. Most notably,
jbd2 already calls crc32_be(), so it is impossible to build an EXT4
without it. For memory-constrained embedded devices, it would probably
be more valuable to build without the crypto layer than without crc32.

   Arnd


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-07 Thread Darrick J. Wong
On Thu, Jan 07, 2021 at 02:27:51PM -0800, Eric Biggers wrote:
> On Thu, Jan 07, 2021 at 10:48:05PM +0100, Arnd Bergmann wrote:
> > On Thu, Jan 7, 2021 at 5:27 PM Theodore Ts'o  wrote:
> > >
> > > On Thu, Jan 07, 2021 at 01:37:47PM +, Russell King - ARM Linux admin 
> > > wrote:
> > > > > The gcc bugzilla mentions backports into gcc-linaro, but I do not see
> > > > > them in my git history.
> > > >
> > > > So, do we raise the minimum gcc version for the kernel as a whole to 5.1
> > > > or just for aarch64?
> > >
> > > Russell, Arnd, thanks so much for tracking down the root cause of the
> > > bug!
> > 
> > There is one more thing that I wondered about when looking through
> > the ext4 code: Should it just call the crc32c_le() function directly
> > instead of going through the crypto layer? It seems that with Ard's
> > rework from 2018, that can just call the underlying architecture specific
> > implementation anyway.
> > 
> 
> It looks like that would work, although note that crc32c_le() uses the shash 
> API
> too, so it isn't any more "direct" than what ext4 does now.

Yes.

> Also, a potential issue is that the implementation of crc32c that crc32c_le()
> uses might be chosen too early if the architecture-specific implementation of
> crc32c is compiled as a module (e.g. crc32c-intel.ko).

This was the primary reason I chose to do it this way for ext4.

The other is that ext4 didn't use crc32c before metadata_csum, so
there's no point in pulling in the crypto layer if you're only going to
use older ext2 or ext3 filesystems.  That was 2010, maybe people have
stopped doing that?

> There are two ways this
> could be fixed -- either by making it a proper library API like blake2s() that
> can call the architecture-specific code directly, or by reconfiguring things
> when a new crypto module is loaded (like what lib/crc-t10dif.c does).

Though I would like to see the library functions gain the ability to use
whatever is the fastest mechanism available once we can be reasonably
certain that all the platform-specific drivers have been loaded.

That said, IIRC most distros compile all of them into their
(increasingly large) vmlinuz files so maybe this isn't much of practical
concern?

--D
> 
> Until one of those is done, switching to crc32c_le() might cause performance
> regressions.
> 
> - Eric


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-07 Thread Eric Biggers
On Thu, Jan 07, 2021 at 10:14:46PM +, Russell King - ARM Linux admin wrote:
> On Thu, Jan 07, 2021 at 10:48:05PM +0100, Arnd Bergmann wrote:
> > On Thu, Jan 7, 2021 at 5:27 PM Theodore Ts'o  wrote:
> > >
> > > On Thu, Jan 07, 2021 at 01:37:47PM +, Russell King - ARM Linux admin 
> > > wrote:
> > > > > The gcc bugzilla mentions backports into gcc-linaro, but I do not see
> > > > > them in my git history.
> > > >
> > > > So, do we raise the minimum gcc version for the kernel as a whole to 5.1
> > > > or just for aarch64?
> > >
> > > Russell, Arnd, thanks so much for tracking down the root cause of the
> > > bug!
> > 
> > There is one more thing that I wondered about when looking through
> > the ext4 code: Should it just call the crc32c_le() function directly
> > instead of going through the crypto layer? It seems that with Ard's
> > rework from 2018, that can just call the underlying architecture specific
> > implementation anyway.
> 
> Yes, I've been wondering about that too. To me, it looks like the
> ext4 code performs a layering violation by going "under the covers"
> - there are accessor functions to set the CRC and retrieve it. ext4
> instead just makes the assumption that the CRC value is stored after
> struct shash_desc. Especially as the crypto/crc32c code references
> the value using:
> 
>   struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
> 
> Not even crypto drivers are allowed to assume that desc+1 is where
> the CRC is stored.

It violates how the shash API is meant to be used in general, but there is a
test that enforces that the shash_desc_ctx for crc32c must be just the single
u32 crc value.  See alg_test_crc32c() in crypto/testmgr.c.  So it's apparently
intended to work.

> 
> However, struct shash_desc is already 128 bytes in size on aarch64,

Ard Biesheuvel recently sent a patch to reduce the alignment of struct
shash_desc to ARCH_SLAB_MINALIGN
(https://lkml.kernel.org/linux-crypto/20210107124128.19791-1-a...@kernel.org/),
since apparently most of the bloat is from alignment for DMA, which isn't
necessary.  I think that reduces the size by a lot on arm64.

> and the proper way of doing it via SHASH_DESC_ON_STACK() is overkill,
> being strangely 2 * sizeof(struct shash_desc) + 360 (which looks like
> another bug to me!)

Are you referring to the '2 * sizeof(struct shash_desc)' rather than just
'sizeof(struct shash_desc)'?  As mentioned in the comment above
HASH_MAX_DESCSIZE, there can be a nested shash_desc due to HMAC.
So I believe the value is correct.

> So, I agree with you wrt crc32c_le(), especially as it would be more
> efficient, and as the use of crc32c is already hard coded in the ext4
> code - not only with crypto_alloc_shash("crc32c", 0, 0) but also with
> the fixed-size structure in ext4_chksum().
> 
> However, it's ultimately up to the ext4 maintainers to decide.

As I mentioned in my other response, crc32c_le() isn't a proper library API
(like some of the newer lib/crypto/ stuff) but rather just a wrapper for the
shash API, and it doesn't handle modules being dynamically loaded/unloaded.
So switching to it may cause a performance regression.

What I'd recommend is making crc32c_le() able to call architecture-speccific
implementations directly, similar to blake2s() and chacha20() in lib/crypto/.
Then there would be no concern about when modules get loaded, etc...

- Eric


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-07 Thread Eric Biggers
On Thu, Jan 07, 2021 at 10:48:05PM +0100, Arnd Bergmann wrote:
> On Thu, Jan 7, 2021 at 5:27 PM Theodore Ts'o  wrote:
> >
> > On Thu, Jan 07, 2021 at 01:37:47PM +, Russell King - ARM Linux admin 
> > wrote:
> > > > The gcc bugzilla mentions backports into gcc-linaro, but I do not see
> > > > them in my git history.
> > >
> > > So, do we raise the minimum gcc version for the kernel as a whole to 5.1
> > > or just for aarch64?
> >
> > Russell, Arnd, thanks so much for tracking down the root cause of the
> > bug!
> 
> There is one more thing that I wondered about when looking through
> the ext4 code: Should it just call the crc32c_le() function directly
> instead of going through the crypto layer? It seems that with Ard's
> rework from 2018, that can just call the underlying architecture specific
> implementation anyway.
> 

It looks like that would work, although note that crc32c_le() uses the shash API
too, so it isn't any more "direct" than what ext4 does now.

Also, a potential issue is that the implementation of crc32c that crc32c_le()
uses might be chosen too early if the architecture-specific implementation of
crc32c is compiled as a module (e.g. crc32c-intel.ko).  There are two ways this
could be fixed -- either by making it a proper library API like blake2s() that
can call the architecture-specific code directly, or by reconfiguring things
when a new crypto module is loaded (like what lib/crc-t10dif.c does).

Until one of those is done, switching to crc32c_le() might cause performance
regressions.

- Eric


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-07 Thread Russell King - ARM Linux admin
On Thu, Jan 07, 2021 at 10:48:05PM +0100, Arnd Bergmann wrote:
> On Thu, Jan 7, 2021 at 5:27 PM Theodore Ts'o  wrote:
> >
> > On Thu, Jan 07, 2021 at 01:37:47PM +, Russell King - ARM Linux admin 
> > wrote:
> > > > The gcc bugzilla mentions backports into gcc-linaro, but I do not see
> > > > them in my git history.
> > >
> > > So, do we raise the minimum gcc version for the kernel as a whole to 5.1
> > > or just for aarch64?
> >
> > Russell, Arnd, thanks so much for tracking down the root cause of the
> > bug!
> 
> There is one more thing that I wondered about when looking through
> the ext4 code: Should it just call the crc32c_le() function directly
> instead of going through the crypto layer? It seems that with Ard's
> rework from 2018, that can just call the underlying architecture specific
> implementation anyway.

Yes, I've been wondering about that too. To me, it looks like the
ext4 code performs a layering violation by going "under the covers"
- there are accessor functions to set the CRC and retrieve it. ext4
instead just makes the assumption that the CRC value is stored after
struct shash_desc. Especially as the crypto/crc32c code references
the value using:

struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);

Not even crypto drivers are allowed to assume that desc+1 is where
the CRC is stored.

However, struct shash_desc is already 128 bytes in size on aarch64,
and the proper way of doing it via SHASH_DESC_ON_STACK() is overkill,
being strangely 2 * sizeof(struct shash_desc) + 360 (which looks like
another bug to me!)

#define HASH_MAX_DESCSIZE   (sizeof(struct shash_desc) + 360)
 ^
#define SHASH_DESC_ON_STACK(shash, ctx)   \
char __##shash##_desc[sizeof(struct shash_desc) + \
  ^
HASH_MAX_DESCSIZE] CRYPTO_MINALIGN_ATTR; \
struct shash_desc *shash = (struct shash_desc *)__##shash##_desc

So, I agree with you wrt crc32c_le(), especially as it would be more
efficient, and as the use of crc32c is already hard coded in the ext4
code - not only with crypto_alloc_shash("crc32c", 0, 0) but also with
the fixed-size structure in ext4_chksum().

However, it's ultimately up to the ext4 maintainers to decide.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-07 Thread Arnd Bergmann
On Thu, Jan 7, 2021 at 5:27 PM Theodore Ts'o  wrote:
>
> On Thu, Jan 07, 2021 at 01:37:47PM +, Russell King - ARM Linux admin 
> wrote:
> > > The gcc bugzilla mentions backports into gcc-linaro, but I do not see
> > > them in my git history.
> >
> > So, do we raise the minimum gcc version for the kernel as a whole to 5.1
> > or just for aarch64?
>
> Russell, Arnd, thanks so much for tracking down the root cause of the
> bug!

There is one more thing that I wondered about when looking through
the ext4 code: Should it just call the crc32c_le() function directly
instead of going through the crypto layer? It seems that with Ard's
rework from 2018, that can just call the underlying architecture specific
implementation anyway.

> I will note that RHEL 7 uses gcc 4.8.  I personally don't have an
> objections to requiring developers using RHEL 7 to have to install a
> more modern gcc (since I use Debian Testing and gcc 10.2.1, myself,
> and gcc 5.1 is so five years ago :-), but I could imagine that being
> considered inconvenient for some.

The main users of gcc-4.9 that I recall from previous discussions
were Android and Debian 8, but both of them are done now: Debian 8
has reached its end of life last summer, and Android uses clang
for building new kernels.

   Arnd


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-07 Thread Arnd Bergmann
On Thu, Jan 7, 2021 at 2:37 PM Russell King - ARM Linux admin
 wrote:
> On Thu, Jan 07, 2021 at 02:16:25PM +0100, Arnd Bergmann wrote:
> > On Thu, Jan 7, 2021 at 1:47 PM Russell King - ARM Linux admin
> >  wrote:
>
> > The gcc bugzilla mentions backports into gcc-linaro, but I do not see
> > them in my git history.

Correction: I looked in the wrong branch, gcc-linaro does have it, as
does the Android gcc, which was recently still at 4.9 before they dropped it
in favor of clang.

> So, do we raise the minimum gcc version for the kernel as a whole to 5.1
> or just for aarch64?

I'd personally love to see gcc-5 as the global minimum version, as that
would let us finally use --std=gnu11 features instead of gnu89. [There are
a couple of useful features that are incompatible with gnu89, and
gnu99/gnu11 support in gcc didn't like the kernel sources]

If we make it arm64 specific, I'd propose only making it a build-time
warning instead of an error, as there are no other benefits to increasing
the minimum version if gcc-4.9 is still an option for other architectures,
and most gcc-4.9 users (Android, Red Hat and everyone using gcc-linaro)
have backported this bugfix already.

 Arnd


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-07 Thread Florian Weimer
* Theodore Ts'o:

> On Thu, Jan 07, 2021 at 01:37:47PM +, Russell King - ARM Linux admin 
> wrote:
>> > The gcc bugzilla mentions backports into gcc-linaro, but I do not see
>> > them in my git history.
>> 
>> So, do we raise the minimum gcc version for the kernel as a whole to 5.1
>> or just for aarch64?
>
> Russell, Arnd, thanks so much for tracking down the root cause of the
> bug!
>
> I will note that RHEL 7 uses gcc 4.8.  I personally don't have an
> objections to requiring developers using RHEL 7 to have to install a
> more modern gcc (since I use Debian Testing and gcc 10.2.1, myself,
> and gcc 5.1 is so five years ago :-), but I could imagine that being
> considered inconvenient for some.

Actually, RHEL 7 should have the fix (internal bug #1362635, curiously
we encountered it in the *XFS* CRC calculation code back then).

My understanding is that RHEL 7 aarch64 support ceased completely about
a month ago, so that shouldn't be an argument against bumping the
minimum version requirement to 5.1.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill



Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-07 Thread Theodore Ts'o
On Thu, Jan 07, 2021 at 01:37:47PM +, Russell King - ARM Linux admin wrote:
> > The gcc bugzilla mentions backports into gcc-linaro, but I do not see
> > them in my git history.
> 
> So, do we raise the minimum gcc version for the kernel as a whole to 5.1
> or just for aarch64?

Russell, Arnd, thanks so much for tracking down the root cause of the
bug!

I will note that RHEL 7 uses gcc 4.8.  I personally don't have an
objections to requiring developers using RHEL 7 to have to install a
more modern gcc (since I use Debian Testing and gcc 10.2.1, myself,
and gcc 5.1 is so five years ago :-), but I could imagine that being
considered inconvenient for some.

- Ted


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-07 Thread Russell King - ARM Linux admin
On Thu, Jan 07, 2021 at 02:16:25PM +0100, Arnd Bergmann wrote:
> On Thu, Jan 7, 2021 at 1:47 PM Russell King - ARM Linux admin
>  wrote:
> 
> > Arnd has found via bisecting gcc:
> >
> > 7e8c2bd54af ("[AArch64] fix unsafe access to deallocated stack")
> >
> > which seems to be https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63293
> >
> > That seems to suggest that gcc-5.0.0 is also affected.
> >
> > Looking at the changelog in Debian's gcc-8.3 packages, this doesn't
> > feature, so it's not easy just to look at the changelogs to work out
> > which versions are affected.
> 
> I checked the history to confirm that all gcc-5 releases (5.0.x is 
> pre-release)
> and later have the fix.
> 
> The gcc bugzilla mentions backports into gcc-linaro, but I do not see
> them in my git history.

So, do we raise the minimum gcc version for the kernel as a whole to 5.1
or just for aarch64?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-07 Thread Arnd Bergmann
On Thu, Jan 7, 2021 at 1:47 PM Russell King - ARM Linux admin
 wrote:

> Arnd has found via bisecting gcc:
>
> 7e8c2bd54af ("[AArch64] fix unsafe access to deallocated stack")
>
> which seems to be https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63293
>
> That seems to suggest that gcc-5.0.0 is also affected.
>
> Looking at the changelog in Debian's gcc-8.3 packages, this doesn't
> feature, so it's not easy just to look at the changelogs to work out
> which versions are affected.

I checked the history to confirm that all gcc-5 releases (5.0.x is pre-release)
and later have the fix.

The gcc bugzilla mentions backports into gcc-linaro, but I do not see
them in my git history.

   Arnd


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-07 Thread Russell King - ARM Linux admin
On Thu, Jan 07, 2021 at 11:18:41AM +, Russell King - ARM Linux admin wrote:
> On Wed, Jan 06, 2021 at 10:32:23PM +, Russell King - ARM Linux admin 
> wrote:
> > On Wed, Jan 06, 2021 at 05:20:34PM +, Will Deacon wrote:
> > > With that, I see the following after ten seconds or so:
> > > 
> > >   EXT4-fs error (device sda2): ext4_lookup:1707: inode #674497: comm 
> > > md5sum: iget: checksum invalid
> > > 
> > > Russell, Mark -- does this recipe explode reliably for you too?
> > 
> > I've been working this evening on tracking down what change in the
> > Kconfig file between your working 5.10 kernel binary you supplied me,
> > and my failing 5.9 kernel.
> > 
> > I've found that _enabling_ CONFIG_STACKPROTECTOR appears to mask the
> > inode checksum failure problem, at least from a short test.) I'm going
> > to re-enable CONFIG_STACKPROTECTOR and leave it running for longer.
> > 
> > That is:
> > 
> > CONFIG_STACKPROTECTOR=y
> > CONFIG_STACKPROTECTOR_STRONG=y
> > 
> > appears to mask the problem
> > 
> > # CONFIG_STACKPROTECTOR is not set
> > 
> > appears to unmask the problem.
> 
> We have finally got to the bottom of this - the "bug" is in the ext4
> code:
> 
> static inline u32 ext4_chksum(struct ext4_sb_info *sbi, u32 crc,
>   const void *address, unsigned int length)
> {
> struct {
> struct shash_desc shash;
> char ctx[4];
> } desc;
> 
> BUG_ON(crypto_shash_descsize(sbi->s_chksum_driver)!=sizeof(desc.ctx));
> 
> desc.shash.tfm = sbi->s_chksum_driver;
> *(u32 *)desc.ctx = crc;
> 
> BUG_ON(crypto_shash_update(, address, length));
> 
> return *(u32 *)desc.ctx;
> }
> 
> This isn't always inlined, despite the "inline" keyword. With GCC
> 4.9.4, this is compiled to the following code when the stack protector
> is disabled:
> 
> 0004 :
>4:   a9be7bfdstp x29, x30, [sp, #-32]! <--
>8:   2a0103e3mov w3, w1
>c:   aa0203e1mov x1, x2
>   10:   910003fdmov x29, sp   <--
>   14:   f9000bf3str x19, [sp, #16]
>   18:   d10603ffsub sp, sp, #0x180<--
>   1c:   9101fff3add x19, sp, #0x7f
>   20:   b942ldr w2, [x0]
>   24:   9279e273and x19, x19, #0xff80 <--
>   28:   7100105fcmp w2, #0x4
>   2c:   540001a1b.ne60   
> // b.any
>   30:   2a0303e4mov w4, w3
>   34:   aa0003e3mov x3, x0
>   38:   b9008264str w4, [x19, #128]
>   3c:   aa1303e0mov x0, x19
>   40:   f9000263str x3, [x19] <--
>   44:   9400bl  0 
> 44: R_AARCH64_CALL26crypto_shash_update
>   48:   35e0cbnzw0, 64 
>   4c:   910003bfmov sp, x29   <==
>   50:   b9408260ldr w0, [x19, #128]   <==
>   54:   f9400bf3ldr x19, [sp, #16]
>   58:   a8c27bfdldp x29, x30, [sp], #32
>   5c:   d65f03c0ret
>   60:   d421brk #0x800
>   64:   97e7bl  0 
> 
> Of the instructions that are highlighted with "<--" and "<==",
> x29 is located at the bottom of the function's stack frame, excluding
> local variables.  x19 is "desc", which is calculated to be safely below
> x29 and aligned to a 128 byte boundary.
> 
> The bug is pointed to by the two "<==" markers - the instruction
> at 4c restores the stack pointer _above_ "desc" before then loading
> desc.ctx.
> 
> If an interrupt occurs right between these two instructions, then
> desc.ctx will be corrupted, leading to the checksum failing.
> 
> Comments on irc are long the lines of this being "an impressive
> compiler bug".
> 
> We now need to find which gcc versions are affected, so we know what
> minimum version to require for aarch64.
> 
> Arnd has been unable to find anything in gcc bugzilla to explain this;
> he's tested gcc-5.5.0, which appears to produce correct code, and is
> trying to bisect between 4.9.4 and 5.1.0 to locate where this was
> fixed.
> 
> Peter Zijlstra suggested adding linux-toolchains@ and asking compiler
> folks for feedback on this bug. I guess a pointer to whether this is
> a known bug, and which bug may be useful.
> 
> I am very relieved to have found a positive reason for this bug, rather
> than just moving forward on the compiler and have the bug vanish
> without explanation, never knowing if it would rear its head in future
> and corrupt my filesystems, e.g. never knowing if it became a
> temporarily masked memory ordering bug.

Arnd has found via bisecting gcc:

7e8c2bd54af ("[AArch64] fix unsafe access to deallocated stack")

which seems to be https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63293

That seems to suggest that gcc-5.0.0 is also 

Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-07 Thread Russell King - ARM Linux admin
On Wed, Jan 06, 2021 at 10:32:23PM +, Russell King - ARM Linux admin wrote:
> On Wed, Jan 06, 2021 at 05:20:34PM +, Will Deacon wrote:
> > With that, I see the following after ten seconds or so:
> > 
> >   EXT4-fs error (device sda2): ext4_lookup:1707: inode #674497: comm 
> > md5sum: iget: checksum invalid
> > 
> > Russell, Mark -- does this recipe explode reliably for you too?
> 
> I've been working this evening on tracking down what change in the
> Kconfig file between your working 5.10 kernel binary you supplied me,
> and my failing 5.9 kernel.
> 
> I've found that _enabling_ CONFIG_STACKPROTECTOR appears to mask the
> inode checksum failure problem, at least from a short test.) I'm going
> to re-enable CONFIG_STACKPROTECTOR and leave it running for longer.
> 
> That is:
> 
> CONFIG_STACKPROTECTOR=y
> CONFIG_STACKPROTECTOR_STRONG=y
> 
> appears to mask the problem
> 
> # CONFIG_STACKPROTECTOR is not set
> 
> appears to unmask the problem.

We have finally got to the bottom of this - the "bug" is in the ext4
code:

static inline u32 ext4_chksum(struct ext4_sb_info *sbi, u32 crc,
  const void *address, unsigned int length)
{
struct {
struct shash_desc shash;
char ctx[4];
} desc;

BUG_ON(crypto_shash_descsize(sbi->s_chksum_driver)!=sizeof(desc.ctx));

desc.shash.tfm = sbi->s_chksum_driver;
*(u32 *)desc.ctx = crc;

BUG_ON(crypto_shash_update(, address, length));

return *(u32 *)desc.ctx;
}

This isn't always inlined, despite the "inline" keyword. With GCC
4.9.4, this is compiled to the following code when the stack protector
is disabled:

0004 :
   4:   a9be7bfdstp x29, x30, [sp, #-32]!   <--
   8:   2a0103e3mov w3, w1
   c:   aa0203e1mov x1, x2
  10:   910003fdmov x29, sp <--
  14:   f9000bf3str x19, [sp, #16]
  18:   d10603ffsub sp, sp, #0x180  <--
  1c:   9101fff3add x19, sp, #0x7f
  20:   b942ldr w2, [x0]
  24:   9279e273and x19, x19, #0xff80   <--
  28:   7100105fcmp w2, #0x4
  2c:   540001a1b.ne60   // 
b.any
  30:   2a0303e4mov w4, w3
  34:   aa0003e3mov x3, x0
  38:   b9008264str w4, [x19, #128]
  3c:   aa1303e0mov x0, x19
  40:   f9000263str x3, [x19]   <--
  44:   9400bl  0 
44: R_AARCH64_CALL26crypto_shash_update
  48:   35e0cbnzw0, 64 
  4c:   910003bfmov sp, x29 <==
  50:   b9408260ldr w0, [x19, #128] <==
  54:   f9400bf3ldr x19, [sp, #16]
  58:   a8c27bfdldp x29, x30, [sp], #32
  5c:   d65f03c0ret
  60:   d421brk #0x800
  64:   97e7bl  0 

Of the instructions that are highlighted with "<--" and "<==",
x29 is located at the bottom of the function's stack frame, excluding
local variables.  x19 is "desc", which is calculated to be safely below
x29 and aligned to a 128 byte boundary.

The bug is pointed to by the two "<==" markers - the instruction
at 4c restores the stack pointer _above_ "desc" before then loading
desc.ctx.

If an interrupt occurs right between these two instructions, then
desc.ctx will be corrupted, leading to the checksum failing.

Comments on irc are long the lines of this being "an impressive
compiler bug".

We now need to find which gcc versions are affected, so we know what
minimum version to require for aarch64.

Arnd has been unable to find anything in gcc bugzilla to explain this;
he's tested gcc-5.5.0, which appears to produce correct code, and is
trying to bisect between 4.9.4 and 5.1.0 to locate where this was
fixed.

Peter Zijlstra suggested adding linux-toolchains@ and asking compiler
folks for feedback on this bug. I guess a pointer to whether this is
a known bug, and which bug may be useful.

I am very relieved to have found a positive reason for this bug, rather
than just moving forward on the compiler and have the bug vanish
without explanation, never knowing if it would rear its head in future
and corrupt my filesystems, e.g. never knowing if it became a
temporarily masked memory ordering bug.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-06 Thread Russell King - ARM Linux admin
On Wed, Jan 06, 2021 at 05:20:34PM +, Will Deacon wrote:
> With that, I see the following after ten seconds or so:
> 
>   EXT4-fs error (device sda2): ext4_lookup:1707: inode #674497: comm md5sum: 
> iget: checksum invalid
> 
> Russell, Mark -- does this recipe explode reliably for you too?

I've been working this evening on tracking down what change in the
Kconfig file between your working 5.10 kernel binary you supplied me,
and my failing 5.9 kernel.

I've found that _enabling_ CONFIG_STACKPROTECTOR appears to mask the
inode checksum failure problem, at least from a short test.) I'm going
to re-enable CONFIG_STACKPROTECTOR and leave it running for longer.

That is:

CONFIG_STACKPROTECTOR=y
CONFIG_STACKPROTECTOR_STRONG=y

appears to mask the problem

# CONFIG_STACKPROTECTOR is not set

appears to unmask the problem.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-06 Thread Arnd Bergmann
On Wed, Jan 6, 2021 at 10:04 PM Arnd Bergmann  wrote:
> On Wed, Jan 6, 2021 at 6:22 PM Will Deacon  wrote:
> > On Wed, Jan 06, 2021 at 01:52:53PM +, Russell King - ARM Linux admin 
> > wrote:
>
> I tried the Image-5.9.0 on a virtual machine with seven CPUs (two clusters)
> running in an M1 mac mini and ran these commands inside.
>
> > With that, I see the following after ten seconds or so:
> >
> >   EXT4-fs error (device sda2): ext4_lookup:1707: inode #674497: comm 
> > md5sum: iget: checksum invalid
> >
> > Russell, Mark -- does this recipe explode reliably for you too?
>
> Negative unfortunately -- no checksum mismatch so far, with 10 minutes
> elapsed. I'll keep it running a bit longer.

I managed to trigger the checksum mismatch once now, after around 40
minutes, with a second run going for 20 minutes without mismatch.

So it's not easily reproducible for me, but it does help to rule out
at least some of the hardware specific theories -- it's not just the
Cortex-A72, nor the CCI doing something weird, as neither of
them are in use here.

This is the output I got:

EXT4-fs error (device vda2): ext4_lookup:1707: inode #1185501: comm
md5sum: iget: checksum invalid
Aborting journal on device vda2-8.
EXT4-fs error (device vda2): ext4_journal_check_start:83: Detected
aborted journal
EXT4-fs (vda2): Remounting filesystem read-only
EXT4-fs (vda2): Remounting filesystem read-only
EXT4-fs error (device vda2): ext4_journal_check_start:83: Detected
aborted journal

  Arnd


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-06 Thread Arnd Bergmann
On Wed, Jan 6, 2021 at 6:22 PM Will Deacon  wrote:
> On Wed, Jan 06, 2021 at 01:52:53PM +, Russell King - ARM Linux admin 
> wrote:

>
>and the resulting Image is here:
>
>
> https://mirrors.edge.kernel.org/pub/linux/kernel/people/will/bugs/rmk/Image-5.9.0
>
> 3. Using that kernel, I boot into a 64-bit Debian 10 filesystem and open a
>couple of terminals over SSH.
>
> 4. In one terminal, I run:
>
>$ while (true); do find /var /usr /bin /sbin -type f -print0 | xargs -0
>  md5sum > /dev/null; echo 2 | sudo tee /proc/sys/vm/drop_caches; done
>
>(note that sudo will prompt you for a password on the first iteration)
>
> 5. In the other terminal, I run:
>
>$ while (true); do ./hackbench ; sleep 1; done
>
>where hackbench is built from:
>
>https://people.redhat.com/mingo/cfs-scheduler/tools/hackbench.c
>
>and compiled according to comment in the source code.

I tried the Image-5.9.0 on a virtual machine with seven CPUs (two clusters)
running in an M1 mac mini and ran these commands inside.

> With that, I see the following after ten seconds or so:
>
>   EXT4-fs error (device sda2): ext4_lookup:1707: inode #674497: comm md5sum: 
> iget: checksum invalid
>
> Russell, Mark -- does this recipe explode reliably for you too?

Negative unfortunately -- no checksum mismatch so far, with 10 minutes
elapsed. I'll keep it running a bit longer.

arnd


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-06 Thread Russell King - ARM Linux admin
On Wed, Jan 06, 2021 at 05:20:34PM +, Will Deacon wrote:
> I've managed to reproduce the corruption on my AMD Seattle board (8x A57).
> I haven't had a chance to dig deeper yet, but here's the recipe which works
> for me:
> 
> 1. I'm using GCC 4.9.4 simply to try to get as close as I can to rmk's
>setup. I don't know if this is necessary or not, but the toolchain is
>here:
> 
>
> https://kernel.org/pub/tools/crosstool/files/bin/arm64/4.9.4/arm64-gcc-4.9.4-nolibc-aarch64-linux-gnu.tar.xz
> 
>and I needed to pull down an old libmpfr to get cc1 to work:
> 
>http://ports.ubuntu.com/pool/main/m/mpfr4/libmpfr4_3.1.2-1_arm64.deb
> 
> 2. I build a 5.9 kernel with the config here:
> 
>
> https://mirrors.edge.kernel.org/pub/linux/kernel/people/will/bugs/rmk/config-5.9.0
> 
>and the resulting Image is here:
> 
>
> https://mirrors.edge.kernel.org/pub/linux/kernel/people/will/bugs/rmk/Image-5.9.0
> 
> 3. Using that kernel, I boot into a 64-bit Debian 10 filesystem and open a
>couple of terminals over SSH.
> 
> 4. In one terminal, I run:
> 
>$ while (true); do find /var /usr /bin /sbin -type f -print0 | xargs -0
>  md5sum > /dev/null; echo 2 | sudo tee /proc/sys/vm/drop_caches; done
> 
>(note that sudo will prompt you for a password on the first iteration)
> 
> 5. In the other terminal, I run:
> 
>$ while (true); do ./hackbench ; sleep 1; done
> 
>where hackbench is built from:
> 
>https://people.redhat.com/mingo/cfs-scheduler/tools/hackbench.c
> 
>and compiled according to comment in the source code.
> 
> With that, I see the following after ten seconds or so:
> 
>   EXT4-fs error (device sda2): ext4_lookup:1707: inode #674497: comm md5sum: 
> iget: checksum invalid
> 
> Russell, Mark -- does this recipe explode reliably for you too?

It took a couple of iterations of the find loop (4) here on a kernel
where I'd dropped BLK_WBT=y from my .config... whereas I wasn't able
to provoke it before. So running hackbench in parallel seems to
increase the probability.

I rebooted, set it going again, and on the first iteration it exploded
with ext4 inode checksum failure. And again on the following reboot.
So yes, it looks like you've found a way to more reliably reproduce
it.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-06 Thread Will Deacon
On Wed, Jan 06, 2021 at 01:52:53PM +, Russell King - ARM Linux admin wrote:
> On Wed, Jan 06, 2021 at 11:53:59AM +, Mark Rutland wrote:
> > ... and are you using defconfig or something else?
> 
> Not sure I replied to this. I'm not using the defconfig, I've my own
> .config
> 
> As I mentioned, Will has built a 5.10 kernel using Arnd's gcc 4.9.4
> and hasn't been able to reproduce it. He's sent me his kernel, which
> I've booted here, and haven't yet been able to provoke it.
> 
> Meanwhile, my 5.9 kernel continues to exhibit this problem, so I've
> sent Will my .config (which I'll include here.) There are differences
> in some of the block layer configuration. There's differences in the
> errata configuration, but we don't think that's a cause (they're not
> relevant for Cortex A72).
> 
> Our plan is:
> - Will is switching to 5.9, and using my config as a base for his
>   platform.
> - Will is going to send me his modified version of my config.
> - We are both going to build using the same kernel sources and same
>   config.
> - We are going to test our own kernels, and also swap kernel images
>   and test each others.
> 
> Watch this space for more news...

I've managed to reproduce the corruption on my AMD Seattle board (8x A57).
I haven't had a chance to dig deeper yet, but here's the recipe which works
for me:

1. I'm using GCC 4.9.4 simply to try to get as close as I can to rmk's
   setup. I don't know if this is necessary or not, but the toolchain is
   here:

   
https://kernel.org/pub/tools/crosstool/files/bin/arm64/4.9.4/arm64-gcc-4.9.4-nolibc-aarch64-linux-gnu.tar.xz

   and I needed to pull down an old libmpfr to get cc1 to work:

   http://ports.ubuntu.com/pool/main/m/mpfr4/libmpfr4_3.1.2-1_arm64.deb

2. I build a 5.9 kernel with the config here:

   
https://mirrors.edge.kernel.org/pub/linux/kernel/people/will/bugs/rmk/config-5.9.0

   and the resulting Image is here:

   
https://mirrors.edge.kernel.org/pub/linux/kernel/people/will/bugs/rmk/Image-5.9.0

3. Using that kernel, I boot into a 64-bit Debian 10 filesystem and open a
   couple of terminals over SSH.

4. In one terminal, I run:

   $ while (true); do find /var /usr /bin /sbin -type f -print0 | xargs -0
 md5sum > /dev/null; echo 2 | sudo tee /proc/sys/vm/drop_caches; done

   (note that sudo will prompt you for a password on the first iteration)

5. In the other terminal, I run:

   $ while (true); do ./hackbench ; sleep 1; done

   where hackbench is built from:

   https://people.redhat.com/mingo/cfs-scheduler/tools/hackbench.c

   and compiled according to comment in the source code.

With that, I see the following after ten seconds or so:

  EXT4-fs error (device sda2): ext4_lookup:1707: inode #674497: comm md5sum: 
iget: checksum invalid

Russell, Mark -- does this recipe explode reliably for you too?

Will


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-06 Thread Russell King - ARM Linux admin
On Wed, Jan 06, 2021 at 11:53:59AM +, Mark Rutland wrote:
> ... and are you using defconfig or something else?

Not sure I replied to this. I'm not using the defconfig, I've my own
.config

As I mentioned, Will has built a 5.10 kernel using Arnd's gcc 4.9.4
and hasn't been able to reproduce it. He's sent me his kernel, which
I've booted here, and haven't yet been able to provoke it.

Meanwhile, my 5.9 kernel continues to exhibit this problem, so I've
sent Will my .config (which I'll include here.) There are differences
in some of the block layer configuration. There's differences in the
errata configuration, but we don't think that's a cause (they're not
relevant for Cortex A72).

Our plan is:
- Will is switching to 5.9, and using my config as a base for his
  platform.
- Will is going to send me his modified version of my config.
- We are both going to build using the same kernel sources and same
  config.
- We are going to test our own kernels, and also swap kernel images
  and test each others.

Watch this space for more news...

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
#
# Automatically generated file; DO NOT EDIT.
# Linux/arm64 5.9.0 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="aarch64-linux-gnu-gcc (GCC) 4.9.4"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=40904
CONFIG_LD_VERSION=23101
CONFIG_CLANG_VERSION=0
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_BUILD_SALT=""
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_WATCH_QUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_USELIB=y
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_SHOW_LEVEL=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_GENERIC_IRQ_CHIP=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_IRQ_MSI_IOMMU=y
CONFIG_HANDLE_DOMAIN_IRQ=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
# end of IRQ subsystem

CONFIG_GENERIC_IRQ_MULTI_HANDLER=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_ARCH_HAS_TICK_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
# end of Timers subsystem

# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
CONFIG_IRQ_TIME_ACCOUNTING=y
CONFIG_HAVE_SCHED_AVG_IRQ=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
# CONFIG_PSI is not set
# end of CPU/Task time and stats accounting

CONFIG_CPU_ISOLATION=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
CONFIG_PREEMPT_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_TASKS_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
# end of RCU Subsystem

# CONFIG_IKCONFIG is not set
# CONFIG_IKHEADERS is not set
CONFIG_LOG_BUF_SHIFT=19
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_GENERIC_SCHED_CLOCK=y

#
# Scheduler features
#
# end of Scheduler features

CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_CC_HAS_INT128=y
CONFIG_CGROUPS=y
CONFIG_PAGE_COUNTER=y
CONFIG_MEMCG=y
CONFIG_MEMCG_SWAP=y
CONFIG_MEMCG_KMEM=y
# CONFIG_BLK_CGROUP is not set
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_CFS_BANDWIDTH is not set
# CONFIG_RT_GROUP_SCHED is not set
# CONFIG_CGROUP_PIDS is not set
# CONFIG_CGROUP_RDMA is not set
# CONFIG_CGROUP_FREEZER is not set
CONFIG_CGROUP_HUGETLB=y
# CONFIG_CPUSETS is not set
# CONFIG_CGROUP_DEVICE is not set
CONFIG_CGROUP_CPUACCT=y
# CONFIG_CGROUP_PERF is not set
# CONFIG_CGROUP_DEBUG is not set
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
CONFIG_TIME_NS=y
# CONFIG_IPC_NS is not set
# CONFIG_USER_NS is not set
CONFIG_PID_NS=y
CONFIG_NET_NS=y
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_SCHED_AUTOGROUP=y
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
CONFIG_RD_LZ4=y
# CONFIG_RD_ZSTD is not set
# CONFIG_BOOT_CONFIG is not set

Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-06 Thread Russell King - ARM Linux admin
On Wed, Jan 06, 2021 at 11:53:59AM +, Mark Rutland wrote:
> On Tue, Jan 05, 2021 at 03:47:26PM +, Russell King - ARM Linux admin 
> wrote:
> > Hi,
> 
> Hi Russell,
> 
> > This is an update on where I am with this long standing issue at the
> > current time.
> > 
> > Since 5.4, I have been struggling with several of my ARM64 systems, of
> > different SoC vendors and differing filesystem media, were sporadically
> > reporting inode checksum failures on their root filesystems.  The time
> > taken to report this has been anything between a few hours and three
> > months of uptime, making the problem unrealistic to bisect.
> 
> > However, over the last couple of days, a way to reproduce it has been
> > found, at least for the LX2160A based system.  Power down, leave the
> > machine powered off for some time. Power up, log in and run:
> > 
> > while :; do sleep 5; find /var /usr /bin /sbin -type f -print0 | \
> > xargs -0 md5sum >/dev/null; done
> 
> I've just set this off on an Raspberry Pi 4 running a locally-built
> arm64 v5.10 defconfig. I'm using a SATA SSD mounted via a USB-SATA
> adapter. I'll try to give that a few reboots see if I can reproduce the
> issue.

Will, Arnd and myself have been working on this since yesterday - we
_think_ it might have something to do with the clustering of the CPUs.
The LX2160A has 8 clusters of two CPUs. The Armada 8040 has two clusters
of two CPUs. RPi doesn't have clustering.

I've been able to reproduce it on the LX2160A with CPUs 0 and 15, or
CPUs 0 and 4. I haven't been able to reproduce it with CPUs 0 and 1.
However, contary to that, I haven't yet been able to reproduce it with
CPUs 0 and 2.

> Just to check -- how is your ext4 fs mounted? Mine is mounted as:
> 
>  /dev/sda2 on / type ext4 (rw,relatime,errors=remount-ro)

I've tried just plain "rw,errors=remount-ro" and the above.

Another bit of information I have just discovered - it's not Will's
commit. I reverted Will's commit, booted it on the gt8k system
yesterday. This morning, while checking the mount options there, I
noticed the rootfs had gone read only... checking the kernel log...

[45625.530075] provided = 41746cdc calculated = fb8b1ca1
[45625.533858] inode(ff82889d5600)
[45625.536090] : ff a1 00 00 0c 00 00 00 02 ac f3 5f e6 0b 55 5e
[45625.541269] 0010: 52 cb d1 5c 00 00 00 00 00 00 01 00 00 00 00 00
[45625.546443] 0020: 00 00 00 00 01 00 00 00 6d 71 5f 73 65 6e 64 2e
[45625.551618] 0030: 33 2e 67 7a 00 00 00 00 00 00 00 00 00 00 00 00
[45625.556791] 0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[45625.561965] 0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[45625.567137] 0060: 00 00 00 00 60 70 be da 00 00 00 00 00 00 00 00
[45625.572316] 0070: 00 00 00 00 00 00 00 00 00 00 00 00 dc 6c 00 00
[45625.577503] 0080: 20 00 74 41 34 8e f5 52 00 00 00 00 00 38 9c 1c
[45625.582717] 0090: e6 0b 55 5e 34 8e f5 52 00 00 00 00 00 00 00 00
[45625.587905] 00a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[45625.593083] 00b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[45625.598255] 00c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[45625.603430] 00d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[45625.608618] 00e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[45625.613795] 00f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[45625.618968] recalculated = fb8b1ca1
[45625.621179] EXT4-fs error (device mmcblk0p1): ext4_lookup:1707: inode 
#149495: comm mandb: iget: checksum invalid

debugfs:  stat <149495>
stat: Inode checksum does not match inode while reading inode 149495

Okay, so the data in memory is incorrect...

# dd if=/dev/mmcblk0p1 bs=4096 count=1 skip=525567 | hexdump -C |less
0600  ff a1 00 00 0c 00 00 00  02 ac f3 5f e6 0b 55 5e  |..._..U^|
0610  52 cb d1 5c 00 00 00 00  00 00 01 00 00 00 00 00  |R..\|
0620  00 00 00 00 01 00 00 00  6d 71 5f 73 65 6e 64 2e  |mq_send.|
0630  33 2e 67 7a 00 00 00 00  00 00 00 00 00 00 00 00  |3.gz|
0640  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
*
0660  00 00 00 00 60 70 be da  00 00 00 00 00 00 00 00  |`p..|
0670  00 00 00 00 00 00 00 00  00 00 00 00 dc 6c 00 00  |.l..|
0680  20 00 74 41 34 8e f5 52  00 00 00 00 00 38 9c 1c  | .tA4..R.8..|
0690  e6 0b 55 5e 34 8e f5 52  00 00 00 00 00 00 00 00  |..U^4..R|
06a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
*
# echo 1 > /proc/sys/vm/drop_caches
# debugfs /dev/mmcblk0p1
debugfs 1.44.5 (15-Dec-2018)
/dev/mmcblk0p1: Inode bitmap checksum does not match bitmap while reading 
allocation bitmaps
# e2fsck -n /dev/mmcblk0p1
e2fsck 1.44.5 (15-Dec-2018)
Warning: skipping journal recovery because doing a read-only filesystem
check.
/dev/mmcblk0p1 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Inode 149495 

Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-06 Thread Mark Rutland
On Tue, Jan 05, 2021 at 03:47:26PM +, Russell King - ARM Linux admin wrote:
> Hi,

Hi Russell,

> This is an update on where I am with this long standing issue at the
> current time.
> 
> Since 5.4, I have been struggling with several of my ARM64 systems, of
> different SoC vendors and differing filesystem media, were sporadically
> reporting inode checksum failures on their root filesystems.  The time
> taken to report this has been anything between a few hours and three
> months of uptime, making the problem unrealistic to bisect.

> However, over the last couple of days, a way to reproduce it has been
> found, at least for the LX2160A based system.  Power down, leave the
> machine powered off for some time. Power up, log in and run:
> 
> while :; do sleep 5; find /var /usr /bin /sbin -type f -print0 | \
>   xargs -0 md5sum >/dev/null; done

I've just set this off on an Raspberry Pi 4 running a locally-built
arm64 v5.10 defconfig. I'm using a SATA SSD mounted via a USB-SATA
adapter. I'll try to give that a few reboots see if I can reproduce the
issue.

Just to check -- how is your ext4 fs mounted? Mine is mounted as:

 /dev/sda2 on / type ext4 (rw,relatime,errors=remount-ro)

... and are you using defconfig or something else?

[...]

> It is possible that it could be compiler related, but I don't see that;
> if the "dmb oshld" were strong enough, then it should mean that the
> subsequent reads to checksum the inode data after the inode data has
> been DMA'd into memory should be reading the correct values from memory
> already - but they aren't. And if changing "dmb oshld" to "dsb ld" means
> that the code can then read the right values, that to me points fairly
> definitively to a hardware problem.

Just in case the compiler has some impact, can you way which compilers
you've been using? On the rpi4 I'm using GCC 8.3.0 as packaged by Debian
Buster (10.7).

Thanks,
Mark.


Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-05 Thread Russell King - ARM Linux admin
On Tue, Jan 05, 2021 at 10:27:28AM -0800, Darrick J. Wong wrote:
> On Tue, Jan 05, 2021 at 03:47:26PM +, Russell King - ARM Linux admin 
> wrote:
> > Hi,
> > 
> > This is an update on where I am with this long standing issue at the
> > current time.
> > 
> > Since 5.4, I have been struggling with several of my ARM64 systems, of
> > different SoC vendors and differing filesystem media, were sporadically
> > reporting inode checksum failures on their root filesystems.  The time
> > taken to report this has been anything between a few hours and three
> > months of uptime, making the problem unrealistic to bisect.
> 
> Aha, I was wondering what happened to this bug report. :)

Yes... it's continued causing me grief!

> > The issue was first seen on my SolidRun Clearfog CX LX2160A based
> > system, but was also subsequently noticed on my Armada 8040 based
> > systems running kernels 5.4 and later. Kernel 5.2 has proven stable
> > with 566 days of uptime with no issue.
> > 
> > It has taken a long time to get debugging in place to see what is going
> > on - and this is currently detailed on the front page of
> > www.armlinux.org.uk right now, which has formed a blog of this problem
> > - since almost no one has taken any interest in it.
> > 
> > However, over the last couple of days, a way to reproduce it has been
> > found, at least for the LX2160A based system.  Power down, leave the
> > machine powered off for some time. Power up, log in and run:
> > 
> > while :; do sleep 5; find /var /usr /bin /sbin -type f -print0 | \
> > xargs -0 md5sum >/dev/null; done
> 
> Does that fill up the page cache enough to push memory reclaim?

No - I have around 30GB of memory available on the LX2160A system, and
around 3GB that the above is crunching through.

> > Within a few minutes it seems to have spat out an inode checksum
> > failure if the problem exists. However, testing for the problem _not_
> > existing is quite difficult - just because it doesn't appear in the
> > first few minutes does not mean it has been solved - see above where it
> > can take three months.
> > 
> > However, evidence is currently pointing towards commit 22ec71615d82
> > ("arm64: io: Relax implicit barriers in default I/O accessors") having
> > revealed this problem. Will is very certain that this change is
> > correct, and we feel that it may have exposed some other issue in the
> > Aarch64 code.
> > 
> > Further attempts seem to suggest that the problem is specifically the
> > barrier in __iormb(). Leaving __iowmb() untouched, and changing the
> > barrier in __iormb() from dma_rmb() to rmb() _appears_ to result in the
> > problem disappearing. "Appears" is stressed because further testing is
> > needed - and that is probably going to take many months before we know
> > for certain.
> > 
> > However, this suggests that there is a memory ordering bug with aarch64
> > somewhere. Will can follow up with his own thoughts to this email.
> > 
> > We don't know if it is:
> > - the kernel.
> > - the Cortex A72.
> > - the Cache coherent interconnect.
> > 
> > I don't think it's the CCI, as I believe the Armada 8040 uses Marvell's
> > own IP for that based around Aurora 2 (the functional spec doesn't make
> > it clear.) Remember, I'm seeing this problem on both Armada 8040 and
> > LX2160A. We don't know of any known errata for the A72 in this area.
> > So, we're down to something in the kernel.
> > 
> > It is possible that it could be compiler related, but I don't see that;
> > if the "dmb oshld" were strong enough, then it should mean that the
> > subsequent reads to checksum the inode data after the inode data has
> > been DMA'd into memory should be reading the correct values from memory
> > already - but they aren't. And if changing "dmb oshld" to "dsb ld" means
> > that the code can then read the right values, that to me points fairly
> > definitively to a hardware problem.
> > 
> > Now, ext4fs is pretty good at checksumming the metadata in the
> > filesystem - each inode is individually checksummed with CRC32C and two
> > 16-bit halves are stored in each inode. Directories are also
> > checksummed. ext4fs validates the inode checksum on every ext4_iget()
> > call. Do other filesystems do similar?
> 
> XFS and ext4 both validate the ondisk csum when constructing their
> incore inodes, and set them when flushing the incore inode back to disk.
> 
> I vaguely wonder if there's something else going on here, like ... a
> background memory reclaim thread running on one cpu writes out an inode
> core with new checksum (because reading the file bumped the atime), and
> then another cpu comes along and has to reconstitute the (just
> reclaimed) incore inode, but for whatever reason doesn't get the version
> that the other cpu just wrote?
> 
> That's like 130% speculative though, and note that I have no idea what
> the "outer shareable" domain[1] is.
> 
> [1] 
> 

Re: Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-05 Thread Darrick J. Wong
On Tue, Jan 05, 2021 at 03:47:26PM +, Russell King - ARM Linux admin wrote:
> Hi,
> 
> This is an update on where I am with this long standing issue at the
> current time.
> 
> Since 5.4, I have been struggling with several of my ARM64 systems, of
> different SoC vendors and differing filesystem media, were sporadically
> reporting inode checksum failures on their root filesystems.  The time
> taken to report this has been anything between a few hours and three
> months of uptime, making the problem unrealistic to bisect.

Aha, I was wondering what happened to this bug report. :)

> The issue was first seen on my SolidRun Clearfog CX LX2160A based
> system, but was also subsequently noticed on my Armada 8040 based
> systems running kernels 5.4 and later. Kernel 5.2 has proven stable
> with 566 days of uptime with no issue.
> 
> It has taken a long time to get debugging in place to see what is going
> on - and this is currently detailed on the front page of
> www.armlinux.org.uk right now, which has formed a blog of this problem
> - since almost no one has taken any interest in it.
> 
> However, over the last couple of days, a way to reproduce it has been
> found, at least for the LX2160A based system.  Power down, leave the
> machine powered off for some time. Power up, log in and run:
> 
> while :; do sleep 5; find /var /usr /bin /sbin -type f -print0 | \
>   xargs -0 md5sum >/dev/null; done

Does that fill up the page cache enough to push memory reclaim?

> Within a few minutes it seems to have spat out an inode checksum
> failure if the problem exists. However, testing for the problem _not_
> existing is quite difficult - just because it doesn't appear in the
> first few minutes does not mean it has been solved - see above where it
> can take three months.
> 
> However, evidence is currently pointing towards commit 22ec71615d82
> ("arm64: io: Relax implicit barriers in default I/O accessors") having
> revealed this problem. Will is very certain that this change is
> correct, and we feel that it may have exposed some other issue in the
> Aarch64 code.
> 
> Further attempts seem to suggest that the problem is specifically the
> barrier in __iormb(). Leaving __iowmb() untouched, and changing the
> barrier in __iormb() from dma_rmb() to rmb() _appears_ to result in the
> problem disappearing. "Appears" is stressed because further testing is
> needed - and that is probably going to take many months before we know
> for certain.
> 
> However, this suggests that there is a memory ordering bug with aarch64
> somewhere. Will can follow up with his own thoughts to this email.
> 
> We don't know if it is:
> - the kernel.
> - the Cortex A72.
> - the Cache coherent interconnect.
> 
> I don't think it's the CCI, as I believe the Armada 8040 uses Marvell's
> own IP for that based around Aurora 2 (the functional spec doesn't make
> it clear.) Remember, I'm seeing this problem on both Armada 8040 and
> LX2160A. We don't know of any known errata for the A72 in this area.
> So, we're down to something in the kernel.
> 
> It is possible that it could be compiler related, but I don't see that;
> if the "dmb oshld" were strong enough, then it should mean that the
> subsequent reads to checksum the inode data after the inode data has
> been DMA'd into memory should be reading the correct values from memory
> already - but they aren't. And if changing "dmb oshld" to "dsb ld" means
> that the code can then read the right values, that to me points fairly
> definitively to a hardware problem.
> 
> Now, ext4fs is pretty good at checksumming the metadata in the
> filesystem - each inode is individually checksummed with CRC32C and two
> 16-bit halves are stored in each inode. Directories are also
> checksummed. ext4fs validates the inode checksum on every ext4_iget()
> call. Do other filesystems do similar?

XFS and ext4 both validate the ondisk csum when constructing their
incore inodes, and set them when flushing the incore inode back to disk.

I vaguely wonder if there's something else going on here, like ... a
background memory reclaim thread running on one cpu writes out an inode
core with new checksum (because reading the file bumped the atime), and
then another cpu comes along and has to reconstitute the (just
reclaimed) incore inode, but for whatever reason doesn't get the version
that the other cpu just wrote?

That's like 130% speculative though, and note that I have no idea what
the "outer shareable" domain[1] is.

[1] 
https://developer.arm.com/docs/ddi0597/h/base-instructions-alphabetic-order/dmb-data-memory-barrier

> Anyway, here is the patch I'm currently running, which _seems_ so far
> to be the minimal fix for my problems. Will thinks that this is hiding
> the real problem by adding barriers, but I don't see there's much
> choice but to apply this - I don't see what other debugging could be
> done without the use of expensive hardware simulation, or detailed
> hardware level tracing - the kind of which a 

Aarch64 EXT4FS inode checksum failures - seems to be weak memory ordering issues

2021-01-05 Thread Russell King - ARM Linux admin
Hi,

This is an update on where I am with this long standing issue at the
current time.

Since 5.4, I have been struggling with several of my ARM64 systems, of
different SoC vendors and differing filesystem media, were sporadically
reporting inode checksum failures on their root filesystems.  The time
taken to report this has been anything between a few hours and three
months of uptime, making the problem unrealistic to bisect.

The issue was first seen on my SolidRun Clearfog CX LX2160A based
system, but was also subsequently noticed on my Armada 8040 based
systems running kernels 5.4 and later. Kernel 5.2 has proven stable
with 566 days of uptime with no issue.

It has taken a long time to get debugging in place to see what is going
on - and this is currently detailed on the front page of
www.armlinux.org.uk right now, which has formed a blog of this problem
- since almost no one has taken any interest in it.

However, over the last couple of days, a way to reproduce it has been
found, at least for the LX2160A based system.  Power down, leave the
machine powered off for some time. Power up, log in and run:

while :; do sleep 5; find /var /usr /bin /sbin -type f -print0 | \
xargs -0 md5sum >/dev/null; done

Within a few minutes it seems to have spat out an inode checksum
failure if the problem exists. However, testing for the problem _not_
existing is quite difficult - just because it doesn't appear in the
first few minutes does not mean it has been solved - see above where it
can take three months.

However, evidence is currently pointing towards commit 22ec71615d82
("arm64: io: Relax implicit barriers in default I/O accessors") having
revealed this problem. Will is very certain that this change is
correct, and we feel that it may have exposed some other issue in the
Aarch64 code.

Further attempts seem to suggest that the problem is specifically the
barrier in __iormb(). Leaving __iowmb() untouched, and changing the
barrier in __iormb() from dma_rmb() to rmb() _appears_ to result in the
problem disappearing. "Appears" is stressed because further testing is
needed - and that is probably going to take many months before we know
for certain.

However, this suggests that there is a memory ordering bug with aarch64
somewhere. Will can follow up with his own thoughts to this email.

We don't know if it is:
- the kernel.
- the Cortex A72.
- the Cache coherent interconnect.

I don't think it's the CCI, as I believe the Armada 8040 uses Marvell's
own IP for that based around Aurora 2 (the functional spec doesn't make
it clear.) Remember, I'm seeing this problem on both Armada 8040 and
LX2160A. We don't know of any known errata for the A72 in this area.
So, we're down to something in the kernel.

It is possible that it could be compiler related, but I don't see that;
if the "dmb oshld" were strong enough, then it should mean that the
subsequent reads to checksum the inode data after the inode data has
been DMA'd into memory should be reading the correct values from memory
already - but they aren't. And if changing "dmb oshld" to "dsb ld" means
that the code can then read the right values, that to me points fairly
definitively to a hardware problem.

Now, ext4fs is pretty good at checksumming the metadata in the
filesystem - each inode is individually checksummed with CRC32C and two
16-bit halves are stored in each inode. Directories are also
checksummed. ext4fs validates the inode checksum on every ext4_iget()
call. Do other filesystems do similar?

Anyway, here is the patch I'm currently running, which _seems_ so far
to be the minimal fix for my problems. Will thinks that this is hiding
the real problem by adding barriers, but I don't see there's much
choice but to apply this - I don't see what other debugging could be
done without the use of expensive hardware simulation, or detailed
hardware level tracing - the kind of which a silicon vendor or ARM Ltd
would have.

I'm at the end of what I can do with this; I'm going to keep this patch
in my kernel, since it fixes it for me.

Will would like a reliable reproducer - yes, that would be ideal, but
I'm afraid that's a mamoth task in itself. It's taken a year to find
this method of reproducing it.

There's also the matter that in one case I've seen, the ext4 checksum
has been wrong. The subsequent hexdump has been correct, and the post-
hexdump checksum recalculation has remained incorrect - and the same
value as the first incorrect checksum. However, the inode with the
_same_ checksum has subsequently validated correctly by the kernel and
by e2fsck. I can not explain this.

diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index ff50dd731852..be63c47aecc4 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -95,7 +95,7 @@ static inline u64 __raw_readq(const volatile void __iomem 
*addr)
 ({ \
unsigned long tmp;