[Bug tree-optimization/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled

2023-01-28 Thread feng.tang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552

--- Comment #38 from Tang, Feng  ---
Created attachment 54368
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54368&action=edit
objdump of  prep_compound_page() with patch in comment 35

[Bug tree-optimization/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled

2023-01-28 Thread feng.tang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552

--- Comment #37 from Tang, Feng  ---
Created attachment 54367
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54367&action=edit
page_alloc.i with patch in comment 35

[Bug tree-optimization/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled

2023-01-28 Thread feng.tang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552

--- Comment #36 from Tang, Feng  ---
(In reply to Vladimir Makarov from comment #35)
> (In reply to Jakub Jelinek from comment #34)
> > Seems right now DECL_NONALIASED is only used on these coverage vars and on
> > Fortran caf tokens, so perhaps a quick workaround would be on the LRA side
> > never reread stuff from MEMs with VAR_P && DECL_NONALIASED MEM_EXPRs.  CCing
> > Vlad on that.
> 
> The following patch can do this:
> 
> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc

Thanks for the patch!

As the bug is against 11.3, so I git cloned gcc git, and checkout
origin/releases/gcc-11 branch, then compile gcc (TBH, it's my first time)

* built gcc-11,compiled i386 kernel, run my local reproduce(QEMU loop booting
that kernel), the error was reproduced at once for every 20 boots rate. 

* manually applied Vladimir's patch (original patch seems to be against
'master' branch)

* rebuilt gcc, make clean and re-compile i386 kernel, and the error was NOT
seen in 350 runs so far

Also I will attach the page_alloc.i and objdump of prep_compound_page() with
the new patched gcc-11

[Bug target/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled

2023-01-26 Thread feng.tang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552

--- Comment #10 from Tang, Feng  ---
Created attachment 54352
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54352&action=edit
page_alloc.i.xz

[Bug target/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled

2023-01-26 Thread feng.tang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552

--- Comment #9 from Tang, Feng  ---

For original report
https://lore.kernel.org/lkml/202301170941.49728982-oliver.s...@intel.com/t/, it
was reported by Sang Oliver from 0Day team, but I failed to add him too cc
(probably due to he is not registered in this bugzilla system?), so I will try
to gather some info (some from Oliver's report, some from my local system when
it can't be found from Oliver's report)

gcc version: gcc-11 (Debian 11.3.0-8) 11.3.0
 gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

Platform: QEMU

Preprocessing file: page_alloc.i (attached)

gcc options: from page_alloc.s(got from 'make ARCH=i386 mm/page_alloc.s')

 # GNU C89 (Ubuntu 11.3.0-1ubuntu1~22.04) version 11.3.0 (x86_64-linux-gnu)
#   compiled by GNU C version 11.3.0, GMP version 6.2.1, MPFR version
4.1.0, MPC version 1.2.1, isl version isl-0.24-GMP

# GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
# options passed: -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m32
-msoft-float -mregparm=3 -mpreferred-stack-boundary=2 -march=i686
-mstack-protector-guard-reg=fs -msta
ck-protector-guard-symbol=__stack_chk_guard -mindirect-branch=thunk-extern
-mindirect-branch-register -O2 -std=gnu90 -fno-strict-aliasing -fno-common
-fshort-wchar -fcf-prot
ection=none -freg-struct-return -fno-pic -ffreestanding
-fno-asynchronous-unwind-tables -fno-jump-tables
-fno-delete-null-pointer-checks -fno-allow-store-data-races -fno-reo
rder-blocks -fno-ipa-cp-clone -fno-partial-inlining -fstack-protector-strong
-fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-stack-clash-protection
-fno-inline-func
tions-called-once -fno-strict-overflow -fstack-check=no -fconserve-stack
-fprofile-arcs -ftest-coverage -fno-tree-loop-im -fsanitize=bounds
-fsanitize=shift -fsanitize=unrea
chable

[Bug target/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled

2023-01-26 Thread feng.tang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552

--- Comment #8 from Tang, Feng  ---
Created attachment 54350
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54350&action=edit
i386 kernel config

In https://lore.kernel.org/lkml/202301170941.49728982-oliver.s...@intel.com/t/
Oliver Sang provided a reproduce:

To reproduce:

# build kernel
cd linux
cp config-5.13.0-00219-g7118fc2906e2 .config
make HOSTCC=gcc-11 CC=gcc-11 ARCH=i386 olddefconfig prepare
modules_prepare bzImage modules
make HOSTCC=gcc-11 CC=gcc-11 ARCH=i386
INSTALL_MOD_PATH= modules_install
cd 
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k  -m modules.cgz job-script # job-script is
attached in this email

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.

[Bug target/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled

2023-01-26 Thread feng.tang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552

--- Comment #7 from Tang, Feng  ---
Created attachment 54349
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54349&action=edit
original job-script from Oliver (0Day)

[Bug target/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled

2023-01-26 Thread feng.tang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552

Tang, Feng  changed:

   What|Removed |Added

  Attachment #54345|0   |1
is obsolete||

--- Comment #6 from Tang, Feng  ---
Created attachment 54348
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54348&action=edit
objdump of  prep_compound_page()

[Bug target/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled

2023-01-26 Thread feng.tang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552

--- Comment #4 from Tang, Feng  ---
(In reply to Andrew Pinski from comment #3)
> Do you have the preprocessed source that is used generate the bad object
> file?
> How about the exact command line?

Thanks for the prompt response!

The error was originally reported by 0Day (which is a kernel automation test
robot), and I can locally reproduce it with a little difference.

Sorry for my poor knowledge of gcc, do you want me to give the output of
" make ARCH=i386 mm/page_alloc.s"? or you can give me to command to generate
it. thanks

[Bug c/108552] Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled

2023-01-26 Thread feng.tang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552

--- Comment #1 from Tang, Feng  ---
Created attachment 54346
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54346&action=edit
kernel log with error message

[Bug c/108552] New: Linux i386 kernel 5.14 memory corruption for pre_compound_page() when gcov is enabled

2023-01-26 Thread feng.tang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552

Bug ID: 108552
   Summary: Linux i386 kernel 5.14 memory corruption for
pre_compound_page() when gcov is enabled
   Product: gcc
   Version: 11.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: feng.tang at intel dot com
  Target Milestone: ---

Created attachment 54345
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54345&action=edit
objdump of  prep_compound_page()

0Day found a i386 Linux kernel boot issue, and bisection shows the first bad
commit is 7118fc2906e29 ("hugetlb: address ref count racing in
prep_compound_gigantic_page"). It happens 94 times out of 999 runs. Details and
some debug analysis from Linus/Vlastimil and us could be found in the following
link: 
https://lore.kernel.org/lkml/202301170941.49728982-oliver.s...@intel.com/t/


Debug shows it is related with one function prep_compound_page() in
mm/page_alloc.c:

* If we use  '#pragma GCC optimize ("O1")' for that function (kernel normally
uses O2), the issue will be gone
* If we disable GCOV for page_alloc.c, can't reproduce it
* If we disable UBSAN for page_alloc.c, can't reproduce it
* Not reproducable for x86_64 build

It seems to be a loop corruption, the pesudo code is:

for (i = 1; i < nr_pages; i++)
   set_meta_data(page[i];

It should happen for page[1]...page[nr_pages - 1], but from memory dump, seems
that one more page, the page[nr_pages] is also called with set_meta_data[].
https://lore.kernel.org/all/202212312021.bc1efe86-oliver.s...@intel.com/t/

The kernel log, i386 config and the objdump of prep_compound_page() of first
bad commit are attached, please let know if you need more info, thanks!