Re: program compiled with clang from base runs 4 times slower than compiled with gcc-11.2.0p6 from ports
Stuart Henderson [stu.li...@spacehopper.org] wrote: > On 2023-06-05, Kastus Shchuka wrote: > > Next I tried -fno-fixup-gadgets, and that made a radical difference: > > Not entirely a surprise, we have seen this a few times now. > Usually it is fine, but has quite bad effects on some programs, > however it is quite a nice mitigation (big reduction in the > number of available ROP gadgets in compiled code). > There are potentially more fixups that can be improved. A while back, the fixup was adding more work than necessary. Todd Mortimer fixed an obvious case where the DstReg form of the MOV instruction was being used, instead of the SrcReg instruction, so a swap was required to move the data between registers. There may be others, from Todd Mortimer: "If you are interested, try objdump -d /usr/lib/libc.so and categorize the instructions that have the xchg dance around them. Sort by most common instruction, and then check the Intel SDM to see if the most common instructions that get this treatment have SrcReg / DestReg forms that we can swap around instead of doing the xchg dance. :-)" Chris
Re: program compiled with clang from base runs 4 times slower than compiled with gcc-11.2.0p6 from ports
On 2023-06-05, Kastus Shchuka wrote: > Next I tried -fno-fixup-gadgets, and that made a radical difference: Not entirely a surprise, we have seen this a few times now. Usually it is fine, but has quite bad effects on some programs, however it is quite a nice mitigation (big reduction in the number of available ROP gadgets in compiled code).
Re: program compiled with clang from base runs 4 times slower than compiled with gcc-11.2.0p6 from ports
On Sun, Jun 04, 2023 at 05:31:34PM -0600, Todd C. Miller wrote: > Take a look at the clang-local man page, it documents the difference > between the OpenBSD base clang and stock llvm. You can try disabling > some of the options to find which one (or combination of options) > is causing the slowdown. Thanks for the pointer, that man page is really what I missed. > > I would try building with -fno-stack-protector and -mno-retpoline > first to see if either of those are the cause. Neither of them made any difference: henryk$ make clean rm -f enchive src/enchive.o src/chacha.o src/curve25519-donna.o src/sha256.o enchive-cli.c henryk$ make CFLAGS='-ansi -pedantic -Wall -Wextra -O3 -g3 -fno-stack-protector -mno-retpoline' cc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -fno-stack-protector -mno-retpoline -o src/enchive.o src/enchive.c cc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -fno-stack-protector -mno-retpoline -o src/chacha.o src/chacha.c cc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -fno-stack-protector -mno-retpoline -o src/curve25519-donna.o src/curve25519-donna.c cc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -fno-stack-protector -mno-retpoline -o src/sha256.o src/sha256.c cc -o enchive src/enchive.o src/chacha.o src/curve25519-donna.o src/sha256.o enchive.c:209 (src/enchive.c:209)(src/enchive.o:(load_seckey)): warning: sprintf() is often misused, please use snprintf() henryk$ time enchive a /dev/null 0m55.07s real 0m49.69s user 0m05.47s system Next I tried -fno-pie: henryk$ make clean rm -f enchive src/enchive.o src/chacha.o src/curve25519-donna.o src/sha256.o enchive-cli.c henryk$ make CFLAGS='-ansi -pedantic -Wall -Wextra -O3 -g3 -fno-pie' LDFLAGS='-nopie' cc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -fno-pie -o src/enchive.o src/enchive.c cc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -fno-pie -o src/chacha.o src/chacha.c cc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -fno-pie -o src/curve25519-donna.o src/curve25519-donna.c cc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -fno-pie -o src/sha256.o src/sha256.c cc -nopie -o enchive src/enchive.o src/chacha.o src/curve25519-donna.o src/sha256.o enchive.c:209 (src/enchive.c:209)(src/enchive.o:(load_seckey)): warning: sprintf() is often misused, please use snprintf() henryk$ time enchive a /dev/null 0m54.65s real 0m49.21s user 0m04.54s system Still no cigar... Next I tried -fno-fixup-gadgets, and that made a radical difference: henryk$ make clean rm -f enchive src/enchive.o src/chacha.o src/curve25519-donna.o src/sha256.o enchive-cli.c henryk$ make CFLAGS='-ansi -pedantic -Wall -Wextra -O3 -g3 -fno-fixup-gadgets' cc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -fno-fixup-gadgets -o src/enchive.o src/enchive.c cc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -fno-fixup-gadgets -o src/chacha.o src/chacha.c cc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -fno-fixup-gadgets -o src/curve25519-donna.o src/curve25519-donna.c cc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -fno-fixup-gadgets -o src/sha256.o src/sha256.c cc -o enchive src/enchive.o src/chacha.o src/curve25519-donna.o src/sha256.o enchive.c:209 (src/enchive.c:209)(src/enchive.o:(load_seckey)): warning: sprintf() is often misused, please use snprintf() henryk$ time enchive a /dev/null 0m16.63s real 0m14.31s user 0m02.36s system 14.31s is on par with 12.85s of gcc-compiled binary: henryk$ make clean rm -f enchive src/enchive.o src/chacha.o src/curve25519-donna.o src/sha256.o enchive-cli.c henryk$ make CC=egcc egcc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -o src/enchive.o src/enchive.c egcc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -o src/chacha.o src/chacha.c egcc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -o src/curve25519-donna.o src/curve25519-donna.c egcc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -o src/sha256.o src/sha256.c egcc -o enchive src/enchive.o src/chacha.o src/curve25519-donna.o src/sha256.o enchive.c:209 (src/enchive.c:209)(src/enchive.o:(agent_addr)): warning: sprintf() is often misused, please use snprintf() henryk$ time enchive a /dev/null 0m14.36s real 0m12.85s user 0m00.39s system So to me it seems that fixup-gadgets is the culprit. Thanks, Kastus
Re: program compiled with clang from base runs 4 times slower than compiled with gcc-11.2.0p6 from ports
Take a look at the clang-local man page, it documents the difference between the OpenBSD base clang and stock llvm. You can try disabling some of the options to find which one (or combination of options) is causing the slowdown. I would try building with -fno-stack-protector and -mno-retpoline first to see if either of those are the cause. - todd
program compiled with clang from base runs 4 times slower than compiled with gcc-11.2.0p6 from ports
I am puzzled with performance of a C program compiled with clang from base. The program in question is enchive [1] Most of the time I use it on macos or linux, but recently I had to install it on openbsd. I compiled it with default clang from base, and the first thing that struck me was long time it took to extract archive. The same operation takes less than 2 seconds on macos and 12 seconds on openbsd. I asked the author on github [2], and he mostly pointed at the compiler. Following his advise, I did my own testing. I compiled enchive with default cc (which is clang 13) make clean make I created a test zero file: $ dd if=/dev/zero of=zero bs=1M count=512 512+0 records in 512+0 records out 536870912 bytes transferred in 1.393 secs (385381071 bytes/sec) Then I archived the zero file: $ time enchive a /dev/null 0m55.08s real 0m49.55s user 0m03.38s system Next, I installed gcc-11.2.0 from ports and recompiled enchive: make clean make CC=egcc Then I ran the same test: $ time enchive a /dev/null 0m14.37s real 0m12.91s user 0m00.35s system The program uses only libc: $ ldd enchive enchive: StartEnd Type Open Ref GrpRef Name 0f5c9cdbe000 0f5c9ce0e000 exe 10 0 enchive 0f5ed3724000 0f5ed381a000 rlib 01 0 /usr/lib/libc.so.97.0 0f5f8523a000 0f5f8523a000 ld.so 01 0 /usr/libexec/ld.so Why gcc produces a binary that runs 4 times faster than binary compiled with clang? Am I missing any compiler flags for clang? Makefile defines CFLAGS = -ansi -pedantic -Wall -Wextra -O3 -g3 This is all on 7.3-release system. Thanks for any pointers to the gaps in my knowledge of compilers. -Kastus 1. https://github.com/skeeto/enchive 2. https://github.com/skeeto/enchive/issues/31