On 25.03.20 21:01, Stephen Warren wrote:
On 3/25/20 1:11 PM, Jan Kiszka wrote:
On 25.03.20 16:00, Tom Rini wrote:
On Wed, Mar 25, 2020 at 07:32:30AM +0100, Jan Kiszka wrote:
On 20.03.20 19:21, Tom Rini wrote:
On Mon, Mar 16, 2020 at 08:09:53PM +0100, Jan Kiszka wrote:
Hi all,

=> ls mmc 0:1 /usr/lib/linux-image-4.9.11-1.3.0-dirty
CACHE: Misaligned operation at range [bdfff998, bdfffd98]
CACHE: Misaligned operation at range [bdfff998, bdfffd98]
CACHE: Misaligned operation at range [bdfff998, bdfffd98]
CACHE: Misaligned operation at range [bdfff998, bdfffd98]
invalid extent block

I'm using master (50be9f0e1ccc) on the MCIMX7SABRE, defconfig.

What could this be? The filesystem is fine from Linux POV.

Use tune2fs -l and see if there's any new'ish features enabled that we
need some sort of check-and-reject for would be my first guess.

Here are the reported feature flags:

has_journal ext_attr resize_inode dir_index filetype extent 64bit
sparse_super large_file huge_file dir_nlink extra_isize metadata_csum

Of that, only metadata_csum means that you can't write to that image,
but you're just trying to read and that should be fine.  Can you go back
in time a little and see if this problem persists or if it's been
introduced of late?  Or recreate it on other platforms/SoCs?  Thanks!

Bisected, regression of d5aee659f217 ("fs: ext4: cache extent data").
Reverting this commit over master resolves the issue.

Any idea what could be wrong? What I noticed is that the extent has a
zeroed magic when things go wrong, so maybe it is falsely considered to
be cached?

This is puzzling. I took another look at that patch and I don't see
anything wrong. My guess would be:

- Some unrelated memory corruption bug was exposed simply because this
patch uses dynamic memory or stack slightly differently than before.

- Something writes to the cached block, whereas the cache code assumes
the buffer is read-only.

The cache metadata exists on the stack and so only lasts for the
duration of read_allocated_block() or ext4fs_read_file(), so there's no
issue with re-using the cache across different devices, or persisting
across an ext4 write operation or anything like that. Is this easy to
reproduce; is there a small disk image that shows the problem?

Found it: alignment issue, apparently surfaced by your change when switching from zalloc (which does cacheline? alignment) to malloc. Is this sensitivity maybe SoC specific?


Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

Reply via email to