Re: Compiling Linux with "bdver2" gcc optimization option

2019-08-21 Thread Étienne Mollier
Franco Martelli, on 2019-08-20:
> mm/memory.o: warning: objtool: remap_pfn_range()+0xd5: unsupported 
> intra-function call
>
> that it's part of linux-kbuild-4.19 package maybe I should submit a bug
> report to this package or is another one a better choice?

Hi Franco,

Should you submit a bug report, it might be a good target.  The
end result would be something like a bug against the kernel,
although it has more to do with the toolbox around its building
procedure.  Please make sure to include the context of your
build, the optimization with -march=bdver2, if you proceed.

But before doing that, may I suggest to have a look at the
"Compile-time stack metadata validation", available in
tools/objtool/Documentation/stack-validation.txt?  It is very
interesting, I only stumbled upon it recently, it describes the
purpose of objtool.  You can read it from Linux source code, or
online here:


https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/objtool/Documentation/stack-validation.txt

Furthermore, it answers accurately to your original question
from the 13th of August:
> compiling the kernel up to Debian 9.x stretch all worked fine but with
> Debian 10 buster I get a lot of warning messages:
>
> 
> mm/memory.o: warning: objtool: remap_pfn_range()+0xd5: unsupported 
> intra-function call
[...]
> arch/x86/kernel/tsc.o: warning: objtool: tsc_refine_calibration_work()+0xd8: 
> stack state mismatch: cfa1=7+48 cfa2=7+40
> 
>
> what does it means?

Short answer, it means that the -march=bdver2 optimization flag
is interfering with the static stack frame analyser at kernel
build time, probably by adjunction of unrecognised CPU
instructions, at least unrecognised by objtool, inside the
object code.

Kind regards,
-- 
Étienne Mollier 
  5ab1 4edf 63bb ccff 8b54  2fa9 59da 56fe fff3 882d
Note to myself: RTWM, Reread The Warning Message




signature.asc
Description: OpenPGP digital signature


Re: Compiling Linux with "bdver2" gcc optimization option

2019-08-20 Thread Franco Martelli
On 19/08/19 at 21:18, Étienne Mollier wrote:
> Franco Martelli, on 2019-08-19:
>> I was thinking to submit a bug report against gcc-8 package. Now that I
>> have a work around, "bdver1" compiles without warnings, I can say
>> enough, what do you think about?
> 
> I don't know, to me it sounds more like little bugs on kernel
> side,
[ ... ]
> Gcc-8 on its side is just trying its best to help one to develop
> better code.  Its heuristics may not apply very well on kernel
> object code however.  If you can reproduce this issue and
> identify it as a false positive with a sample code, that is
> another story of course.

you're right, I compiled tar and hello program with -march=bdver2 option
without problem so gcc-8 is sure. I saw that all warnings that they
appear during kernel compilation process concern "objtool"

mm/memory.o: warning: objtool: remap_pfn_range()+0xd5: unsupported
intra-function call

that it's part of linux-kbuild-4.19 package maybe I should submit a bug
report to this package or is another one a better choice?

Best regards

-- 
Franco Martelli



Re: Compiling Linux with "bdver2" gcc optimization option

2019-08-19 Thread Étienne Mollier
Franco Martelli, on 2019-08-19:
> I was thinking to submit a bug report against gcc-8 package. Now that I
> have a work around, "bdver1" compiles without warnings, I can say
> enough, what do you think about?

I don't know, to me it sounds more like little bugs on kernel
side, patches silencing warnings from Gcc, one way or another,
happen quite often on that side.  See Linux 5.2.9 changelog,
there are a few ones there:

https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.2.9

Since there is no official support for x86 architecture specific
build targets (other than the ones listed in Kconfig), chances
are the bug report would end up in "wontfix" state.  But you can
always give it a try; perhaps an actual break is lurking there,
waiting to happen in production.

Gcc-8 on its side is just trying its best to help one to develop
better code.  Its heuristics may not apply very well on kernel
object code however.  If you can reproduce this issue and
identify it as a false positive with a sample code, that is
another story of course.

Cheers,
-- 
Étienne Mollier 
  5ab1 4edf 63bb ccff 8b54  2fa9 59da 56fe fff3 882d




signature.asc
Description: OpenPGP digital signature


Re: Compiling Linux with "bdver2" gcc optimization option

2019-08-19 Thread Franco Martelli
I was thinking to submit a bug report against gcc-8 package. Now that I
have a work around, "bdver1" compiles without warnings, I can say
enough, what do you think about?
Best regards

-- 
Franco Martelli



Re: Compiling Linux with "bdver2" gcc optimization option

2019-08-16 Thread Étienne Mollier
Franco Martelli, on 2019-08-16:
> On 16/08/19 at 17:22, Étienne Mollier wrote:
[...]
> > Compilers may have good optimization routines to boost the speed
> > of the code in several situations, but in other ones there are
> > trade-offs to take between size and performance of the code.  I
> > personally prefer smaller sized executables (-Os): they fit in
> > less pages, so uses less CPU cache, and leave more room for my
> > programs to get more of their own data in cache (or I might
> > simply have spent too much time on suckless.org.  ;)
>
> Do you remember which kernel CONFIG switch lets to do this optimization?

You can set these values as following if you want to optimize
for size, or the other way around for performance:

# CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE is not set
CONFIG_CC_OPTIMIZE_FOR_SIZE=y

Then "make oldconfig" to validate your changes.  And same as
before, do your own measures.  ;)

> > Activating CPU specific options is interesting on some
> > particular use cases, but newer instruction often require
> > setting up various bits in the CPU before use, which tends to
> > inflate the resulting executable.  This may be interesting for
> > scientific applications, or programs dealing with big data
> > arrays in general.  In kernel mode however, the only case I can
> > think of where CPU specific accelerators would be beneficial are
> > disk ciphering and RAID arrays, for which I believe there is
> > already some runtime detection of available instructions, even
> > with the generic compiler options.
>
> I have four disks in a RAID 5 software array configuration on my system,
> they are managed by mdadm this is my /proc/mdstat file:
>
> $ cat /proc/mdstat
> Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
> md0 : active raid5 sda1[0] sdb1[1] sdd1[3](S) sdc1[2]
>   1953258496 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3]
> [UUU]
>
> unused devices: 

If you have a look at "sudo dmesg" output, near the beginning,
the kernel outputs a series of performance testing out of
various RAID topologies, and keeps the best for runtime.  I'm
speaking from memory, as I have no RAID array at hand to check
this.

[...]
> > Or just see how perform your usual programs, if there are
> > visible improvements.
> >
> > Have fun,  :)
>
> Yes I agree the optimization won't impact on performance in a way that
> is perceptively by an human there are tweak more important in the kernel
> such as CONFIG_HZ_1000=y
> I always take measurement of the time employee by kernel compilation out
> of curiosity.
> Thanks again for the tips, best regards

You're welcome,  :)
-- 
Étienne Mollier 
  5ab1 4edf 63bb ccff 8b54  2fa9 59da 56fe fff3 882d




signature.asc
Description: OpenPGP digital signature


Re: Compiling Linux with "bdver2" gcc optimization option

2019-08-16 Thread Franco Martelli
On 16/08/19 at 17:22, Étienne Mollier wrote:
> Bonjour,
> 
> Woops, this sounds a bit like I might not have used a very clear
> wording.  If I were at your place, I would proceed so; but I
> don't have a Piledriver CPU to do actual testing on my side.
> I'm still stuck with an old K10, not to mention my laptop, which
> comes with an old regular Atom.  :)
> 
> I did try to replace the k8 option by amdfam10 though.  In the
> half hundred thousand lines of logs issued by the build, I get
> something like a dozen differences between k8 and k10.  There
> were a tremendous amount of warnings too, but some of the ones
> you encountered did not appear: the thing with the missing jump
> target for instance, nor the ANNOTATE_NOSPEC_ALTERNATIVE on the
> retpoline thing.  I am running Debian Sid, currently shipping
> with Gcc 9, so this is a difference to take in account though.
> Finally, building an upstream Linux 5.2 kernel instead of
> Buster's 4.19 does not show most of the warnings I encountered,
> as these are being fixed as they come, but probably not as well
> in LTS kernels.
> 
> Doing a third run with addition of the tuning options (-mtune)
> made almost no difference at all, except on the build number and
> the CRC hash.  It seems to me that the architecture specific
> (-march) option already applies the proper tuning, at least for
> my architecture.
> 
> My last manipulation consisted in building Linux upstream 5.2.9,
> released lately, with -march=amdfam10, and this one is running
> quite well so far:
> 
>   $ uname -rv
>   5.2.9-k10 #1 SMP PREEMPT Fri Aug 16 16:13:08 CEST 2019
> 
> But again, no messages worth mentioning during the compilation.
> 
> Do your warnings appear when your build targets k8?
> Or when building a generic x86_64 kernel?

Actually I run kernel built with "k8" option, it works fine, I got no
warning during the compilation.

Investigating deeper your tips about "amdfam10" I checked the gcc
options web page:
https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
amdfam10 optimization was for Family 10 CPU but I have a Family 15h CPU
I notice that it also exists a "bdver1" for my CPU family so I wanted
give it a try and I compiled the kernel source with "bdver1" and
surprise I got no warning, all worked fine, :-) the command line I use
to compile is:

~/linux-source-4.19$ time make -s -j9 ; make -s -j9 modules

> Compilers may have good optimization routines to boost the speed
> of the code in several situations, but in other ones there are
> trade-offs to take between size and performance of the code.  I
> personally prefer smaller sized executables (-Os): they fit in
> less pages, so uses less CPU cache, and leave more room for my
> programs to get more of their own data in cache (or I might
> simply have spent too much time on suckless.org.  ;)

Do you remember which kernel CONFIG switch lets to do this optimization?

> 
> Activating CPU specific options is interesting on some
> particular use cases, but newer instruction often require
> setting up various bits in the CPU before use, which tends to
> inflate the resulting executable.  This may be interesting for
> scientific applications, or programs dealing with big data
> arrays in general.  In kernel mode however, the only case I can
> think of where CPU specific accelerators would be beneficial are
> disk ciphering and RAID arrays, for which I believe there is
> already some runtime detection of available instructions, even
> with the generic compiler options.

I have four disks in a RAID 5 software array configuration on my system,
they are managed by mdadm this is my /proc/mdstat file:

$ cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid5 sda1[0] sdb1[1] sdd1[3](S) sdc1[2]
  1953258496 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3]
[UUU]

unused devices: 

> 
> To be honest, I don't believe the performance gain to get from
> the compiler is tremendous here.  Figures from the author of the
> patch are there to tell us there is a gain indeed; but when you
> investigate in detail the percentage of performance brought by
> the tuning, it is only about 0.03% for the selected benchmark on
> median values.  See the "Data" section at the very end of the
> README, and do your own calculations:
> 
>   https://github.com/graysky2/kernel_gcc_patch/blob/master/README.md
> 
> The best you can do here is to do your own measures with your
> own pattern of usage.  If you are a developer, you can run timed
> builds of Linux, and see the time it takes.  If you are inclined
> toward image rendering speeds, there are a few demo-scenes out
> there where you might get a few figures such as the frame rate
> (careful, glxgears may get capped to 60Hz when some accelerators
> are in use, prefer fancier demos.  ;)
> 
> There is also this other thread dealing with kernel latency
> measures; you may find a few useful tools listed in this
> discussion:
> 
>   

Re: Compiling Linux with "bdver2" gcc optimization option

2019-08-16 Thread Étienne Mollier
Bonjour,

Franco Martelli, on 2019-09-14:
> On 13/08/19 at 19:35, Étienne Mollier wrote:
[...]
> > I would do a few tests with a virtual
> > machine supporting bdver2 instructions before going live anyway,
> > and backups stored far away from the machine once testing, and
> > possibly without contact with that kernel.
>
> I didn't boot that kernel, I don't rely on it. Thanks if you can
> investigate on what happens during compilation process.

Woops, this sounds a bit like I might not have used a very clear
wording.  If I were at your place, I would proceed so; but I
don't have a Piledriver CPU to do actual testing on my side.
I'm still stuck with an old K10, not to mention my laptop, which
comes with an old regular Atom.  :)

I did try to replace the k8 option by amdfam10 though.  In the
half hundred thousand lines of logs issued by the build, I get
something like a dozen differences between k8 and k10.  There
were a tremendous amount of warnings too, but some of the ones
you encountered did not appear: the thing with the missing jump
target for instance, nor the ANNOTATE_NOSPEC_ALTERNATIVE on the
retpoline thing.  I am running Debian Sid, currently shipping
with Gcc 9, so this is a difference to take in account though.
Finally, building an upstream Linux 5.2 kernel instead of
Buster's 4.19 does not show most of the warnings I encountered,
as these are being fixed as they come, but probably not as well
in LTS kernels.

Doing a third run with addition of the tuning options (-mtune)
made almost no difference at all, except on the build number and
the CRC hash.  It seems to me that the architecture specific
(-march) option already applies the proper tuning, at least for
my architecture.

My last manipulation consisted in building Linux upstream 5.2.9,
released lately, with -march=amdfam10, and this one is running
quite well so far:

$ uname -rv
5.2.9-k10 #1 SMP PREEMPT Fri Aug 16 16:13:08 CEST 2019

But again, no messages worth mentioning during the compilation.

Do your warnings appear when your build targets k8?
Or when building a generic x86_64 kernel?


> > Note that someone from the Gentoo community has developed a set
> > of patches to expand the possibilities of optimization for the
> > kernel, depending on Linux and GCC versions.  You may be
> > interested in the following one for Buster:
> >
> > 
> > https://github.com/graysky2/kernel_gcc_patch/blob/master/enable_additional_cpu_optimizations_for_gcc_v8.1%2B_kernel_v4.13%2B.patch
> >
> > These mainly apply changes in various code sections to put the
> > flags in place, and provide options through the .config file of
> > the source code.  I haven't tested it, but I don't believe this
> > will solve your warnings, reading through the patch.  Yet it
> > does a bit more than just replacing the compiler flag: there is
> > notably a component related to L1 cache shift which is modified
> > too.  That should bring an appreciable performance boost if it
> > corrects cache line mismatch.
>
> Thanks, but I don't want to patch the kernel, that change to the
> Makefile was enough simple in order to get the optimization that I
> looking for.

Fair enough, I reread the whole patch, and your modification
seems sufficient, I believe.

> > Please be aware that CPU optimizations in kernel, targeting Zen
> > and Skylake in this case, seemed to be hardly detectable, or
> > even counter productive, with various computer usage patterns,
> > according to measures done by Phoronix earlier this year:
> >
> > https://www.phoronix.com/scan.php?page=article=linux-50-march=1
> >
> > Of course this may not be the case for your own typical load,
> > but I would recommend to do a few measures, to assess the actual
> > performance gain on your machine with, and without, CPU specific
> > compiler optimizations.
>
> I never experimented benchmark with and without bdver2 option, I assumed
> that if it exists an option for k8 in the kernel then changing it to
> bdver2 it would be good (I hope).

Compilers may have good optimization routines to boost the speed
of the code in several situations, but in other ones there are
trade-offs to take between size and performance of the code.  I
personally prefer smaller sized executables (-Os): they fit in
less pages, so uses less CPU cache, and leave more room for my
programs to get more of their own data in cache (or I might
simply have spent too much time on suckless.org.  ;)

Activating CPU specific options is interesting on some
particular use cases, but newer instruction often require
setting up various bits in the CPU before use, which tends to
inflate the resulting executable.  This may be interesting for
scientific applications, or programs dealing with big data
arrays in general.  In kernel mode however, the only case I can
think of where CPU specific accelerators would be beneficial are
disk ciphering and RAID arrays, for which I believe there is
already some runtime detection of available 

Re: Compiling Linux with "bdver2" gcc optimization option

2019-08-14 Thread Franco Martelli
On 13/08/19 at 19:35, Étienne Mollier wrote:
> Hi Franco,
> 
> I'm not fluent enough in GCC 8 for x86_64 to answer to all the
> various warnings you indicated.  Some may be harmless, and some
> may eat your data.  I would do a few tests with a virtual
> machine supporting bdver2 instructions before going live anyway,
> and backups stored far away from the machine once testing, and
> possibly without contact with that kernel.

I didn't boot that kernel, I don't rely on it. Thanks if you can
investigate on what happens during compilation process.
> 
> I also recall having had to move from ORC to DWARF unwinder to
> get the build working, but that was on old OS levels, not on
> newer ones, due to the libelf being too old.
> 
> Some of these seem related to CPU vulnerabilities mitigations,
> and might be worth a bug report against the kernel, either
> Debian or upstream, assuming it also appears /without/ your
> -march=bdver2 flag:
> 
>> mm/memory.o: warning: objtool: If this is a retpoline, please patch it in 
>> with alternatives and annotate it with ANNOTATE_NOSPEC_ALTERNATIVE.

I had asked to debian-kernel mailing list but nobody answered, maybe
could be something related to gcc 8 since all previous Debian kernel
versions worked with bdver2 optimization
> 
> Note that someone from the Gentoo community has developed a set
> of patches to expand the possibilities of optimization for the
> kernel, depending on Linux and GCC versions.  You may be
> interested in the following one for Buster:
> 
>   
> https://github.com/graysky2/kernel_gcc_patch/blob/master/enable_additional_cpu_optimizations_for_gcc_v8.1%2B_kernel_v4.13%2B.patch
> 
> These mainly apply changes in various code sections to put the
> flags in place, and provide options through the .config file of
> the source code.  I haven't tested it, but I don't believe this
> will solve your warnings, reading through the patch.  Yet it
> does a bit more than just replacing the compiler flag: there is
> notably a component related to L1 cache shift which is modified
> too.  That should bring an appreciable performance boost if it
> corrects cache line mismatch.

Thanks, but I don't want to patch the kernel, that change to the
Makefile was enough simple in order to get the optimization that I
looking for.
> 
> Please be aware that CPU optimizations in kernel, targeting Zen
> and Skylake in this case, seemed to be hardly detectable, or
> even counter productive, with various computer usage patterns,
> according to measures done by Phoronix earlier this year:
> 
>   https://www.phoronix.com/scan.php?page=article=linux-50-march=1
> 
> Of course this may not be the case for your own typical load,
> but I would recommend to do a few measures, to assess the actual
> performance gain on your machine with, and without, CPU specific
> compiler optimizations.

I never experimented benchmark with and without bdver2 option, I assumed
that if it exists an option for k8 in the kernel then changing it to
bdver2 it would be good (I hope).

-- 
Franco Martelli



Re: Compiling Linux with "bdver2" gcc optimization option

2019-08-13 Thread Étienne Mollier
Franco Martelli , on 2019-09-13:
> Hi, everybody
>
> in order to achieve Linux kernel optimized for my CPU AMD FX-8350
> Bulldozer2 I changed the line 121 of linux-source-4.19/arch/x86/Makefile
> from:
>
> cflags-$(CONFIG_MK8) += $(call cc-option,-march=k8)
>
> to:
>
> cflags-$(CONFIG_MK8) += $(call cc-option,-march=bdver2) \
> $(call cc-option,-mtune=bdver2,$(call 
> cc-option,-mtune=generic))
>
> compiling the kernel up to Debian 9.x stretch all worked fine but with
> Debian 10 buster I get a lot of warning messages:
[...snipped warnings...]
> what does it means? Is there a way to get the kernel optimized for my
> CPU as it happened in the previous Debian versions?

Hi Franco,

I'm not fluent enough in GCC 8 for x86_64 to answer to all the
various warnings you indicated.  Some may be harmless, and some
may eat your data.  I would do a few tests with a virtual
machine supporting bdver2 instructions before going live anyway,
and backups stored far away from the machine once testing, and
possibly without contact with that kernel.  That is, if it
happens to boot; these sort of things do not look very good
for instance:

> arch/x86/kernel/sys_x86_64.o: warning: objtool: get_align_mask()+0x1d: can't 
> find jump dest instruction at .text+0x2f

I also recall having had to move from ORC to DWARF unwinder to
get the build working, but that was on old OS levels, not on
newer ones, due to the libelf being too old.

Some of these seem related to CPU vulnerabilities mitigations,
and might be worth a bug report against the kernel, either
Debian or upstream, assuming it also appears /without/ your
-march=bdver2 flag:

> mm/memory.o: warning: objtool: If this is a retpoline, please patch it in 
> with alternatives and annotate it with ANNOTATE_NOSPEC_ALTERNATIVE.


Note that someone from the Gentoo community has developed a set
of patches to expand the possibilities of optimization for the
kernel, depending on Linux and GCC versions.  You may be
interested in the following one for Buster:


https://github.com/graysky2/kernel_gcc_patch/blob/master/enable_additional_cpu_optimizations_for_gcc_v8.1%2B_kernel_v4.13%2B.patch

These mainly apply changes in various code sections to put the
flags in place, and provide options through the .config file of
the source code.  I haven't tested it, but I don't believe this
will solve your warnings, reading through the patch.  Yet it
does a bit more than just replacing the compiler flag: there is
notably a component related to L1 cache shift which is modified
too.  That should bring an appreciable performance boost if it
corrects cache line mismatch.

Please be aware that CPU optimizations in kernel, targeting Zen
and Skylake in this case, seemed to be hardly detectable, or
even counter productive, with various computer usage patterns,
according to measures done by Phoronix earlier this year:

https://www.phoronix.com/scan.php?page=article=linux-50-march=1

Of course this may not be the case for your own typical load,
but I would recommend to do a few measures, to assess the actual
performance gain on your machine with, and without, CPU specific
compiler optimizations.

Kind regards,
-- 
Étienne Mollier 
  5ab1 4edf 63bb ccff 8b54  2fa9 59da 56fe fff3 882d



signature.asc
Description: OpenPGP digital signature