On Thu, Jan 13, 2022 at 9:09 PM Zhihong Yu <z...@yugabyte.com> wrote:
> > > On Thu, Jan 13, 2022 at 8:09 PM Noah Misch <n...@leadboat.com> wrote: > >> On Sun, Jan 09, 2022 at 06:45:09PM -0800, Zhihong Yu wrote: >> > On Sun, Jan 9, 2022 at 1:27 PM Zhihong Yu <z...@yugabyte.com> wrote: >> > > After installing gcc-11, ./configure passed (with >> 0003-memcpy-null.patch). >> > > In the output of `make check-world`, I don't see `runtime error`. >> >> That's expected. With -fsanitize-undefined-trap-on-error, the program >> will >> generate SIGILL when UBSan detects undefined behavior. To get "runtime >> error" >> messages in the postmaster log, drop -fsanitize-undefined-trap-on-error. >> Both >> ways of running the tests have uses. -fsanitize-undefined-trap-on-error >> is >> better when you think the code is clean, because a zero "make check-world" >> exit status confirms the code is clean. Once you know the code is >> unclean in >> some way, -fsanitize-undefined-trap-on-error is better for getting >> details. >> >> > > Though there was a crash (maybe specific to my machine): >> > > >> > > Core was generated by >> > > >> `/nfusr/dev-server/zyu/postgres/tmp_install/usr/local/pgsql/bin/postgres >> > > --singl'. >> > > Program terminated with signal SIGILL, Illegal instruction. >> > > #0 0x000000000050642d in write_item.cold () >> > > Missing separate debuginfos, use: debuginfo-install >> > > glibc-2.17-325.el7_9.x86_64 nss-pam-ldapd-0.8.13-25.el7.x86_64 >> > > sssd-client-1.16.5-10.el7_9.10.x86_64 >> > > (gdb) bt >> > > #0 0x000000000050642d in write_item.cold () >> > > #1 0x0000000000ba9d1b in write_relcache_init_file () >> > > #2 0x0000000000bb58f7 in RelationCacheInitializePhase3 () >> > > #3 0x0000000000bd5cb5 in InitPostgres () >> > > #4 0x0000000000a0a9ea in PostgresMain () >> >> That is UBSan detecting undefined behavior. A successful patch version >> will >> fix write_item(), among many other places that are currently making >> check-world fail. I get the same when testing your v5 under "gcc (Debian >> 11.2.0-13) 11.2.0". I used the same host as buildfarm member thorntail, >> and I >> configured like this: >> >> ./configure -C --with-lz4 --prefix=$HOME/sw/nopath/pghead >> --enable-tap-tests --enable-debug --enable-depend --enable-cassert >> CC='ccache gcc-11 -fsanitize=undefined -fsanitize-undefined-trap-on-error' >> CFLAGS='-O2 -funwind-tables' >> >> > Earlier I was using devtoolset-11 which had an `Illegal instruction` >> error. >> > >> > I compiled / installed gcc-11 from source (which took whole afternoon). >> > `make check-world` passed with patch v3. >> > In tmp_install/log/install.log, I saw: >> > >> > gcc -Wall -Wmissing-prototypes -Wpointer-arith >> > -Wdeclaration-after-statement -Werror=vla -Wendif-labels >> > -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type >> > -Wformat-security -fno-strict-aliasing -fwrapv >> -fexcess-precision=standard >> > -Wno-format-truncation -Wno-stringop-truncation -fsanitize=undefined >> > -fsanitize-undefined-trap-on-error -I../../src/port -DFRONTEND >> > -I../../src/include -D_GNU_SOURCE -c -o path.o path.c >> > rm -f libpgport.a >> >> Perhaps this self-compiled gcc-11 is defective, being unable to detect the >> instances of undefined behavior that other builds detect. If so, use the >> "devtoolset-11" gcc instead. You're also building without optimization; >> that >> might be the problem. >> > > I tried both locally built gcc-11 and devtoolset-11 with configure command > copied from above. > `make world` failed in both cases with: > > performing post-bootstrap initialization ... sh: line 1: 24714 Illegal > instruction (core dumped) > ".../postgres/tmp_install/.../postgres/bin/postgres" --single -F -O -j -c > search_path=pg_catalog -c exit_on_error=true -c log_checkpoints=false > template1 > /dev/null > child process exited with exit code 132 > > #0 0x000000000050a8d6 in write_item (data=<optimized out>, len=<optimized > out>, fp=<optimized out>) at relcache.c:6471 > #1 0x0000000000c33273 in write_relcache_init_file (shared=true) at > relcache.c:6368 > #2 0x0000000000c33c50 in RelationCacheInitializePhase3 () at > relcache.c:4220 > #3 0x0000000000c55825 in InitPostgres (in_dbname=<optimized out>, > dboid=3105442800, username=<optimized out>, useroid=<optimized out>, > out_dbname=0x0, override_allow_connections=<optimized out>) at > postinit.c:1014 > > FYI > Hi, I forgot to mention that patch v5 was included during the experiment. Cheers