Re: program compiled with clang from base runs 4 times slower than compiled with gcc-11.2.0p6 from ports

2023-06-06 Thread Chris Cappuccio
Stuart Henderson [stu.li...@spacehopper.org] wrote:
> On 2023-06-05, Kastus Shchuka  wrote:
> > Next I tried -fno-fixup-gadgets, and that made a radical difference:
> 
> Not entirely a surprise, we have seen this a few times now.
> Usually it is fine, but has quite bad effects on some programs,
> however it is quite a nice mitigation (big reduction in the
> number of available ROP gadgets in compiled code).
> 

There are potentially more fixups that can be improved. A while back,
the fixup was adding more work than necessary.

Todd Mortimer fixed an obvious case where the DstReg form of the MOV
instruction was being used, instead of the SrcReg instruction, so
a swap was required to move the data between registers.

There may be others, from Todd Mortimer:

"If you are interested, try objdump -d /usr/lib/libc.so and categorize
the instructions that have the xchg dance around them. Sort by most
common instruction, and then check the Intel SDM to see if the most
common instructions that get this treatment have SrcReg / DestReg forms
that we can swap around instead of doing the xchg dance. :-)"

Chris



Re: program compiled with clang from base runs 4 times slower than compiled with gcc-11.2.0p6 from ports

2023-06-05 Thread Stuart Henderson
On 2023-06-05, Kastus Shchuka  wrote:
> Next I tried -fno-fixup-gadgets, and that made a radical difference:

Not entirely a surprise, we have seen this a few times now.
Usually it is fine, but has quite bad effects on some programs,
however it is quite a nice mitigation (big reduction in the
number of available ROP gadgets in compiled code).




Re: program compiled with clang from base runs 4 times slower than compiled with gcc-11.2.0p6 from ports

2023-06-04 Thread Kastus Shchuka
On Sun, Jun 04, 2023 at 05:31:34PM -0600, Todd C. Miller wrote:
> Take a look at the clang-local man page, it documents the difference
> between the OpenBSD base clang and stock llvm.  You can try disabling
> some of the options to find which one (or combination of options)
> is causing the slowdown.

Thanks for the pointer, that man page is really what I missed.

> 
> I would try building with -fno-stack-protector and -mno-retpoline
> first to see if either of those are the cause.

Neither of them made any difference:

henryk$ make clean
rm -f enchive src/enchive.o src/chacha.o src/curve25519-donna.o src/sha256.o 
enchive-cli.c
henryk$ make CFLAGS='-ansi -pedantic -Wall -Wextra -O3 -g3 -fno-stack-protector 
-mno-retpoline'
cc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -fno-stack-protector -mno-retpoline 
-o src/enchive.o src/enchive.c
cc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -fno-stack-protector -mno-retpoline 
-o src/chacha.o src/chacha.c
cc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -fno-stack-protector -mno-retpoline 
-o src/curve25519-donna.o src/curve25519-donna.c
cc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -fno-stack-protector -mno-retpoline 
-o src/sha256.o src/sha256.c
cc  -o enchive src/enchive.o src/chacha.o src/curve25519-donna.o src/sha256.o 
enchive.c:209 (src/enchive.c:209)(src/enchive.o:(load_seckey)): warning: 
sprintf() is often misused, please use snprintf()
henryk$ time enchive  a /dev/null
0m55.07s real 0m49.69s user 0m05.47s system

Next I tried -fno-pie:

henryk$ make clean
rm -f enchive src/enchive.o src/chacha.o src/curve25519-donna.o src/sha256.o 
enchive-cli.c
henryk$ make CFLAGS='-ansi -pedantic -Wall -Wextra -O3 -g3 -fno-pie' 
LDFLAGS='-nopie' 
cc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -fno-pie -o src/enchive.o 
src/enchive.c
cc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -fno-pie -o src/chacha.o 
src/chacha.c
cc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -fno-pie -o src/curve25519-donna.o 
src/curve25519-donna.c
cc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -fno-pie -o src/sha256.o 
src/sha256.c
cc -nopie -o enchive src/enchive.o src/chacha.o src/curve25519-donna.o 
src/sha256.o 
enchive.c:209 (src/enchive.c:209)(src/enchive.o:(load_seckey)): warning: 
sprintf() is often misused, please use snprintf()
henryk$ time enchive  a /dev/null
0m54.65s real 0m49.21s user 0m04.54s system

Still no cigar...

Next I tried -fno-fixup-gadgets, and that made a radical difference:

henryk$ make clean
rm -f enchive src/enchive.o src/chacha.o src/curve25519-donna.o src/sha256.o 
enchive-cli.c
henryk$ make CFLAGS='-ansi -pedantic -Wall -Wextra -O3 -g3 -fno-fixup-gadgets'  
   
cc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -fno-fixup-gadgets -o src/enchive.o 
src/enchive.c
cc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -fno-fixup-gadgets -o src/chacha.o 
src/chacha.c
cc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -fno-fixup-gadgets -o 
src/curve25519-donna.o src/curve25519-donna.c
cc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -fno-fixup-gadgets -o src/sha256.o 
src/sha256.c
cc  -o enchive src/enchive.o src/chacha.o src/curve25519-donna.o src/sha256.o 
enchive.c:209 (src/enchive.c:209)(src/enchive.o:(load_seckey)): warning: 
sprintf() is often misused, please use snprintf()
henryk$ time enchive  a /dev/null
 
0m16.63s real 0m14.31s user 0m02.36s system

14.31s is on par with 12.85s of gcc-compiled binary:

henryk$ make clean
rm -f enchive src/enchive.o src/chacha.o src/curve25519-donna.o src/sha256.o 
enchive-cli.c
henryk$ make CC=egcc
egcc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -o src/enchive.o src/enchive.c
egcc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -o src/chacha.o src/chacha.c
egcc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -o src/curve25519-donna.o 
src/curve25519-donna.c
egcc -c -ansi -pedantic -Wall -Wextra -O3 -g3 -o src/sha256.o src/sha256.c
egcc  -o enchive src/enchive.o src/chacha.o src/curve25519-donna.o src/sha256.o 
enchive.c:209 (src/enchive.c:209)(src/enchive.o:(agent_addr)): warning: 
sprintf() is often misused, please use snprintf()
henryk$ time enchive  a /dev/null
0m14.36s real 0m12.85s user 0m00.39s system

So to me it seems that fixup-gadgets is the culprit. 

Thanks,

Kastus



Re: program compiled with clang from base runs 4 times slower than compiled with gcc-11.2.0p6 from ports

2023-06-04 Thread Todd C . Miller
Take a look at the clang-local man page, it documents the difference
between the OpenBSD base clang and stock llvm.  You can try disabling
some of the options to find which one (or combination of options)
is causing the slowdown.

I would try building with -fno-stack-protector and -mno-retpoline
first to see if either of those are the cause.

 - todd



program compiled with clang from base runs 4 times slower than compiled with gcc-11.2.0p6 from ports

2023-06-04 Thread Kastus Shchuka
I am puzzled with performance of a C program compiled with clang from base.

The program in question is enchive [1]
Most of the time I use it on macos or linux, but recently I had to install it 
on openbsd.
I compiled it with default clang from base, and the first thing that struck me 
was long time it
took to extract archive. The same operation takes less than 2 seconds on macos 
and 12 seconds
on openbsd.

I asked the author on github [2], and he mostly pointed at the compiler.

Following his advise, I did my own testing.

I compiled enchive with default cc (which is clang 13)
make clean
make

I created a test zero file:

$ dd if=/dev/zero of=zero bs=1M count=512
512+0 records in
512+0 records out
536870912 bytes transferred in 1.393 secs (385381071 bytes/sec)

Then I archived the zero file:

$ time enchive  a /dev/null   
0m55.08s real 0m49.55s user 0m03.38s system

Next, I installed gcc-11.2.0 from ports and recompiled enchive:

make clean
make CC=egcc

Then I ran the same test:

$ time enchive  a /dev/null   
0m14.37s real 0m12.91s user 0m00.35s system

The program uses only libc:

$ ldd enchive
enchive:
StartEnd  Type  Open Ref GrpRef Name
0f5c9cdbe000 0f5c9ce0e000 exe   10   0  enchive
0f5ed3724000 0f5ed381a000 rlib  01   0  
/usr/lib/libc.so.97.0
0f5f8523a000 0f5f8523a000 ld.so 01   0  
/usr/libexec/ld.so

Why gcc produces a binary that runs 4 times faster than binary compiled with 
clang?

Am I missing any compiler flags for clang? Makefile defines
CFLAGS = -ansi -pedantic -Wall -Wextra -O3 -g3

This is all on 7.3-release system.

Thanks for any pointers to the gaps in my knowledge of compilers.

-Kastus

1. https://github.com/skeeto/enchive
2. https://github.com/skeeto/enchive/issues/31