Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination

2016-08-09 Thread Andi Kleen
On Wed, Aug 10, 2016 at 12:29:29AM +0200, Arnd Bergmann wrote:
> On Monday, August 8, 2016 8:16:05 PM CEST Andi Kleen wrote:
> > > I don't understand what led Andi Kleen to also move .text.hot and
> > > .text.unlikely together with .text [2], but this may have
> > > been a related issue.
> > >
> > > [2] https://lkml.org/lkml/2015/7/19/377
> >
> > The goal was just to move .hot and .unlikely all together, so that
> > they are clustered and use the minimum amount of cache. On x86 doesn't
> > matter where they are exactly, as long as each is together.
> > If they are not explicitely listed then the linker interleaves
> > them with the normal text, which defeats the purpose.
> 
> I still don't see it, my reading of your patch is that you did
> the opposite, by changing the description that puts all .text.hot
> in front of .text, and all .text.unlikely after exit.text into
> one that mixes them with .text. What am I missing here?

No it doesn't mix .unlikely with .text, .unlikely is all in one place.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination

2016-08-09 Thread Andi Kleen
On Wed, Aug 10, 2016 at 12:29:29AM +0200, Arnd Bergmann wrote:
> On Monday, August 8, 2016 8:16:05 PM CEST Andi Kleen wrote:
> > > I don't understand what led Andi Kleen to also move .text.hot and
> > > .text.unlikely together with .text [2], but this may have
> > > been a related issue.
> > >
> > > [2] https://lkml.org/lkml/2015/7/19/377
> >
> > The goal was just to move .hot and .unlikely all together, so that
> > they are clustered and use the minimum amount of cache. On x86 doesn't
> > matter where they are exactly, as long as each is together.
> > If they are not explicitely listed then the linker interleaves
> > them with the normal text, which defeats the purpose.
> 
> I still don't see it, my reading of your patch is that you did
> the opposite, by changing the description that puts all .text.hot
> in front of .text, and all .text.unlikely after exit.text into
> one that mixes them with .text. What am I missing here?

.text.hot is actually not used, the critical part is .text.unlikely
which was not listed and was interleaved before the patch.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination

2016-08-09 Thread Arnd Bergmann
On Monday, August 8, 2016 8:16:05 PM CEST Andi Kleen wrote:
> > I don't understand what led Andi Kleen to also move .text.hot and
> > .text.unlikely together with .text [2], but this may have
> > been a related issue.
> >
> > [2] https://lkml.org/lkml/2015/7/19/377
>
> The goal was just to move .hot and .unlikely all together, so that
> they are clustered and use the minimum amount of cache. On x86 doesn't
> matter where they are exactly, as long as each is together.
> If they are not explicitely listed then the linker interleaves
> them with the normal text, which defeats the purpose.

I still don't see it, my reading of your patch is that you did
the opposite, by changing the description that puts all .text.hot
in front of .text, and all .text.unlikely after exit.text into
one that mixes them with .text. What am I missing here?

Arnd


Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination

2016-08-09 Thread Arnd Bergmann
On Tuesday, August 9, 2016 9:20:16 AM CEST Alan Modra wrote:
> On Mon, Aug 08, 2016 at 05:14:27PM +0200, Arnd Bergmann wrote:
> > I have reverted that patch now, so ARM uses ".fixup" again like every
> > other architecture does, and now "*(.fixup) *(.text .text.*)" works
> > correctly, while ""*(.fixup) *(.text .fixup .text.*)" also fails
> > the same way that I saw before:
> 
> That is really odd.  The linker isn't supposed to treat those script
> snippets differently.  First match for .fixup wins.

Sorry for my mistake. I checked again and cannot reproduce what I
thought I saw earlier. "*(.fixup) *(.text .text.*)" fails
as would be expected.

Arnd


Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination

2016-08-08 Thread Andi Kleen
> I don't understand what led Andi Kleen to also move .text.hot and
> .text.unlikely together with .text [2], but this may have
> been a related issue.

The goal was just to move .hot and .unlikely all together, so that
they are clustered and use the minimum amount of cache. On x86 doesn't
matter where they are exactly, as long as each is together.
If they are not explicitely listed then the linker interleaves
them with the normal text, which defeats the purpose.

-Andi


Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination

2016-08-08 Thread Alan Modra
On Mon, Aug 08, 2016 at 05:14:27PM +0200, Arnd Bergmann wrote:
> I have reverted that patch now, so ARM uses ".fixup" again like every
> other architecture does, and now "*(.fixup) *(.text .text.*)" works
> correctly, while ""*(.fixup) *(.text .fixup .text.*)" also fails
> the same way that I saw before:

That is really odd.  The linker isn't supposed to treat those script
snippets differently.  First match for .fixup wins.

$ cat > fixup1.s <<\EOF
 .global _start
 .text
_start:
 .dc.a .L2
.L1:
 .section ".fixup","ax",%progbits
.L2:
 .dc.a .L1
EOF
$ cat > fixup2.s <<\EOF
 .section ".text.xyz","ax",%progbits
 .dc.a .L2
.L1:

 .section ".fixup","ax",%progbits
.L2:
 .dc.a .L1
EOF
$ cat > fixup.lnk <<\EOF
SECTIONS {
  .text : { *(.fixup) *(.text .fixup .text.*) }
}
EOF
$ as -o fixup1.o fixup1.s 
$ as -o fixup2.o fixup2.s 
$ ld -o fixup -T fixup.lnk -Map fixup.map fixup1.o fixup2.o
$ cat fixup.map

Memory Configuration

Name Origin Length Attributes
*default*0x 0x

Linker script and memory map


.text   0x   0x10
 *(.fixup)
 .fixup 0x0x4 fixup1.o
 .fixup 0x00040x4 fixup2.o
 *(.text .fixup .text.*)
 .text  0x00080x4 fixup1.o
0x0008_start
 .text  0x000c0x0 fixup2.o
 .text.xyz  0x000c0x4 fixup2.o
[snip]

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination

2016-08-08 Thread Arnd Bergmann
On Monday, August 8, 2016 9:19:47 AM CEST Alan Modra wrote:
> On Sun, Aug 07, 2016 at 10:26:19PM +0200, Arnd Bergmann wrote:
> > On Sunday, August 7, 2016 7:27:39 PM CEST Alan Modra wrote:
> > > 
> > > If it can, then Nicholas' patch should be:
> > > 
> > > *(.text.hot .text.hot.*) *(.text.unlikely .text.unlikely.*) 
> > > *(.text .text.*)
> > > 
> > > If you can't put .text.fixup too far away then you may as well just use
> > > 
> > > *(.text .text.*)
> > 
> > I tried this version:
> > 
> > diff --git a/include/asm-generic/vmlinux.lds.h 
> > b/include/asm-generic/vmlinux.lds.h
> > index b1f8828e9eac..fc210dacac9a 100644
> > --- a/include/asm-generic/vmlinux.lds.h
> > +++ b/include/asm-generic/vmlinux.lds.h
> > @@ -438,7 +438,9 @@
> >   * during second ld run in second ld pass when generating System.map */
> >  #define TEXT_TEXT\
> >   ALIGN_FUNCTION();   \
> > - *(.text.hot .text .text.fixup .text.unlikely .text.*)   \
> > + *(.text.hot .text.hot.*)\
> > + *(.text.unlikely .text.fixup .text.unlikely.*)  \
> > + *(.text .text.*)\
> >   *(.ref.text)\
> >   MEM_KEEP(init.text) \
> >   MEM_KEEP(exit.text) \
> > 
> > but that failed to link an allyesconfig kernel because of references
> > from .fixup to .text.*. Trying your version now:
> 
> Well then, that proves you can't put .text.fixup too far aways from
> the associated input section.
> 
> > *(.text.hot .text.hot.*) *(.text.unlikely .text.unlikely.*) *(.text .text.*)
> 
> Which means this is guaranteed to fail when you test it properly using
> gcc's profiling options, in order to generate .text.hot* and/or
> .text.unlikely* sections.

I've investigated further and it seems that "*(.text.fixup) *(.text .text.*)"
fails just because we list .text.fixup twice. The .text.fixup section
was originally[1] introduced to work around the same link error that
it is causing now: if we use recursive linking, merging .text and .text.fixup
helps avoid the problems of sections that are >32MB before the final
link.

I have reverted that patch now, so ARM uses ".fixup" again like every
other architecture does, and now "*(.fixup) *(.text .text.*)" works
correctly, while ""*(.fixup) *(.text .fixup .text.*)" also fails
the same way that I saw before:

drivers/scsi/sg.o:(.fixup+0x4): relocation truncated to fit: R_ARM_THM_JUMP24 
against `.text.sg_ioctl'
drivers/scsi/sg.o:(.fixup+0xc): relocation truncated to fit: R_ARM_THM_JUMP24 
against `.text.sg_ioctl'
drivers/scsi/sg.o:(.fixup+0x14): relocation truncated to fit: R_ARM_THM_JUMP24 
against `.text.sg_ioctl'
drivers/scsi/sg.o:(.fixup+0x1c): relocation truncated to fit: R_ARM_THM_JUMP24 
against `.text.sg_ioctl'
drivers/scsi/sg.o:(.fixup+0x24): relocation truncated to fit: R_ARM_THM_JUMP24 
against `.text.sg_ioctl'
drivers/scsi/sg.o:(.fixup+0x2c): relocation truncated to fit: R_ARM_THM_JUMP24 
against `.text.sg_ioctl'
drivers/scsi/sg.o:(.fixup+0x34): relocation truncated to fit: R_ARM_THM_JUMP24 
against `.text.sg_ioctl'
drivers/scsi/sg.o:(.fixup+0x3c): relocation truncated to fit: R_ARM_THM_JUMP24 
against `.text.sg_ioctl'
drivers/scsi/sg.o:(.fixup+0x44): relocation truncated to fit: R_ARM_THM_JUMP24 
against `.text.sg_ioctl'

I don't understand what led Andi Kleen to also move .text.hot and
.text.unlikely together with .text [2], but this may have
been a related issue.

> It seems to me the right thing to do would be to change kernel asm to
> generate .text.foo.fixup for any .text.foo section.  A gas feature
> available with binutils-2.26 enabled by --sectname-subst might help
> with implementing that.

I think this what Nico wanted to use anyway to eliminate more functions
from the output.

Arnd


[1] http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8321
http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8322

[2] https://lkml.org/lkml/2015/7/19/377


Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination

2016-08-07 Thread Sam Ravnborg
On Mon, Aug 08, 2016 at 01:29:03PM +1000, Nicholas Piggin wrote:
> On Sat, 6 Aug 2016 22:14:23 +0200
> Sam Ravnborg  wrote:
> 
> > On Fri, Aug 05, 2016 at 10:12:00PM +1000, Nicholas Piggin wrote:
> > > Introduce LINKER_DCE option for architectures to select if they want
> > > to build with -ffunction-sections, -fdata-sections, and link with
> > > --gc-sections.  
> > 
> > Can you please try to come up with a less cryptic name.
> > "DCE" may make sense for you today.
> > Bot the naive reader will benefit from the longer and
> > more explcit form.
> 
> Yes that's a good idea. The name sucks.
> 
> We don't seem to consistently have a prefix for build configuration
> options, but we could start? KBUILD_LD_DEAD_CODE_DATA_ELIMINATION?

For cc related options we sort of have: KCONFIG_CC IIRC.
So for LD related options we could use the shorter CONFIG_LD_ prefix.

Sam


Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination

2016-08-07 Thread Nicholas Piggin
On Mon, 8 Aug 2016 00:12:37 -0400 (EDT)
Nicolas Pitre  wrote:

> On Mon, 8 Aug 2016, Nicholas Piggin wrote:
> 
> > On Sun, 7 Aug 2016 01:33:45 -0400 (EDT)
> > Nicolas Pitre  wrote:
> >   
> > > On Fri, 5 Aug 2016, Nicholas Piggin wrote:
> > >   
> > > > Introduce LINKER_DCE option for architectures to select if they want
> > > > to build with -ffunction-sections, -fdata-sections, and link with
> > > > --gc-sections. It requires some work (documented) to ensure all
> > > > unreferenced entrypoints are live, and requires toolchain and
> > > > build verification, so it is made a per-arch option for now.
> > > > 
> > > > On a random powerpc64le build, this yelds a significant size saving,
> > > > it boots and runs fine, but there is a lot I haven't tested as yet,
> > > > so these savings may be reduced if there are bugs in the link.
> > > > 
> > > > text  databssdec   filename
> > > > 11169741   11807441923176   14273661   vmlinux
> > > > 10445269   10041271919707   13369103   vmlinux.dce
> > > > 
> > > > ~700K text, ~170K data, 6% removed from kernel image size.
> > > > 
> > > > Signed-off-by: Nicholas Piggin   
> > > 
> > > I played with that too. However this needs distinct sections for 
> > > exception tables and the like otherwise the backward references from the 
> > > final exception table to those functions responsible for those exception 
> > > entries has the effect of pulling in all those functions even if their 
> > > entry point is never referenced, making --gc-sections less effective.  
> > > I managed to fix this only with a change to gas (accepted upstream).
> > > 
> > > But once that is solved, you then have the missing forward reference 
> > > problem i.e. nothing actually references those individual exception 
> > > entry sections and ld happily drops them all. Having a KEEP() on each of 
> > > them is unworkable and defeats the purpose anyway.  That requires a 
> > > dummy reloc to trick ld into pulling in those sections when the parent 
> > > section is also pulled in.  
> > 
> > Right, although we don't *need* those things just for enabling
> > --gc-sections, do we? It may not be 100% optimal, but it's enough
> > to avoid the regression when switching to --whole-archive build
> > option.  
> 
> Oh absolutely.
> 
> > Your results are impressive, and I don't want to stand in the way of
> > either LTO or improving accuracy of --gc-sections. But both are things
> > that can be built on top of this patch, I think.  
> 
> Indeed.  Those patches are certainly welcome. They represent half of the 
> job already.  I just wanted to provide some insight about the whole 
> picture in case someone else notices those flaws I have identified.

Okay thanks, I appreciate you taking a look. I wanted to be sure I
wasn't missing some bug here.

Smaller kernel is nice for large systems because it means smaller
icache/dcache footprint and fewer branch trampolines, so I'm always
happy to see that effort. I will certainly help test LTO or some of
these other gc-sections improvements on powerpc.

Thanks,
Nick


Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination

2016-08-07 Thread Nicolas Pitre
On Mon, 8 Aug 2016, Nicholas Piggin wrote:

> On Sun, 7 Aug 2016 01:33:45 -0400 (EDT)
> Nicolas Pitre  wrote:
> 
> > On Fri, 5 Aug 2016, Nicholas Piggin wrote:
> > 
> > > Introduce LINKER_DCE option for architectures to select if they want
> > > to build with -ffunction-sections, -fdata-sections, and link with
> > > --gc-sections. It requires some work (documented) to ensure all
> > > unreferenced entrypoints are live, and requires toolchain and
> > > build verification, so it is made a per-arch option for now.
> > > 
> > > On a random powerpc64le build, this yelds a significant size saving,
> > > it boots and runs fine, but there is a lot I haven't tested as yet,
> > > so these savings may be reduced if there are bugs in the link.
> > > 
> > > text  databssdec   filename
> > > 11169741   11807441923176 14273661   vmlinux
> > > 10445269   10041271919707 13369103   vmlinux.dce
> > > 
> > > ~700K text, ~170K data, 6% removed from kernel image size.
> > > 
> > > Signed-off-by: Nicholas Piggin 
> > 
> > I played with that too. However this needs distinct sections for 
> > exception tables and the like otherwise the backward references from the 
> > final exception table to those functions responsible for those exception 
> > entries has the effect of pulling in all those functions even if their 
> > entry point is never referenced, making --gc-sections less effective.  
> > I managed to fix this only with a change to gas (accepted upstream).
> > 
> > But once that is solved, you then have the missing forward reference 
> > problem i.e. nothing actually references those individual exception 
> > entry sections and ld happily drops them all. Having a KEEP() on each of 
> > them is unworkable and defeats the purpose anyway.  That requires a 
> > dummy reloc to trick ld into pulling in those sections when the parent 
> > section is also pulled in.
> 
> Right, although we don't *need* those things just for enabling
> --gc-sections, do we? It may not be 100% optimal, but it's enough
> to avoid the regression when switching to --whole-archive build
> option.

Oh absolutely.

> Your results are impressive, and I don't want to stand in the way of
> either LTO or improving accuracy of --gc-sections. But both are things
> that can be built on top of this patch, I think.

Indeed.  Those patches are certainly welcome. They represent half of the 
job already.  I just wanted to provide some insight about the whole 
picture in case someone else notices those flaws I have identified.


Nicolas


Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination

2016-08-07 Thread Nicholas Piggin
On Sun, 7 Aug 2016 01:33:45 -0400 (EDT)
Nicolas Pitre  wrote:

> On Fri, 5 Aug 2016, Nicholas Piggin wrote:
> 
> > Introduce LINKER_DCE option for architectures to select if they want
> > to build with -ffunction-sections, -fdata-sections, and link with
> > --gc-sections. It requires some work (documented) to ensure all
> > unreferenced entrypoints are live, and requires toolchain and
> > build verification, so it is made a per-arch option for now.
> > 
> > On a random powerpc64le build, this yelds a significant size saving,
> > it boots and runs fine, but there is a lot I haven't tested as yet,
> > so these savings may be reduced if there are bugs in the link.
> > 
> > text  databssdec   filename
> > 11169741   11807441923176   14273661   vmlinux
> > 10445269   10041271919707   13369103   vmlinux.dce
> > 
> > ~700K text, ~170K data, 6% removed from kernel image size.
> > 
> > Signed-off-by: Nicholas Piggin 
> 
> I played with that too. However this needs distinct sections for 
> exception tables and the like otherwise the backward references from the 
> final exception table to those functions responsible for those exception 
> entries has the effect of pulling in all those functions even if their 
> entry point is never referenced, making --gc-sections less effective.  
> I managed to fix this only with a change to gas (accepted upstream).
> 
> But once that is solved, you then have the missing forward reference 
> problem i.e. nothing actually references those individual exception 
> entry sections and ld happily drops them all. Having a KEEP() on each of 
> them is unworkable and defeats the purpose anyway.  That requires a 
> dummy reloc to trick ld into pulling in those sections when the parent 
> section is also pulled in.

Right, although we don't *need* those things just for enabling
--gc-sections, do we? It may not be 100% optimal, but it's enough
to avoid the regression when switching to --whole-archive build
option.


> Please see attached a subset of the slides I presented at ELC and Linaro 
> Connect last year to illustrate those issues.
> 
> Also attached a sample patch partially implementing those changes.
> 
> In short I'm very glad to see that this might steer interest across 
> multiple architectures.  I felt like this was becoming much more 
> intrusive than I expected and that maybe LTO was a better bet after all. 
> But LTO has its evils too and I'm willing to look at gc-sections again 
> if there is interest from others as well.

Your results are impressive, and I don't want to stand in the way of
either LTO or improving accuracy of --gc-sections. But both are things
that can be built on top of this patch, I think. We don't need to do
the entire intrusive changes all at once.

Thanks,
Nick


Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination

2016-08-07 Thread Nicholas Piggin
On Sat, 6 Aug 2016 22:14:23 +0200
Sam Ravnborg  wrote:

> On Fri, Aug 05, 2016 at 10:12:00PM +1000, Nicholas Piggin wrote:
> > Introduce LINKER_DCE option for architectures to select if they want
> > to build with -ffunction-sections, -fdata-sections, and link with
> > --gc-sections.  
> 
> Can you please try to come up with a less cryptic name.
> "DCE" may make sense for you today.
> Bot the naive reader will benefit from the longer and
> more explcit form.

Yes that's a good idea. The name sucks.

We don't seem to consistently have a prefix for build configuration
options, but we could start? KBUILD_LD_DEAD_CODE_DATA_ELIMINATION?

Thanks,
Nick


Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination

2016-08-07 Thread Alan Modra
On Sun, Aug 07, 2016 at 10:26:19PM +0200, Arnd Bergmann wrote:
> On Sunday, August 7, 2016 7:27:39 PM CEST Alan Modra wrote:
> > 
> > If it can, then Nicholas' patch should be:
> > 
> > *(.text.hot .text.hot.*) *(.text.unlikely .text.unlikely.*) *(.text 
> > .text.*)
> > 
> > If you can't put .text.fixup too far away then you may as well just use
> > 
> > *(.text .text.*)
> 
> I tried this version:
> 
> diff --git a/include/asm-generic/vmlinux.lds.h 
> b/include/asm-generic/vmlinux.lds.h
> index b1f8828e9eac..fc210dacac9a 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -438,7 +438,9 @@
>   * during second ld run in second ld pass when generating System.map */
>  #define TEXT_TEXT\
>   ALIGN_FUNCTION();   \
> - *(.text.hot .text .text.fixup .text.unlikely .text.*)   \
> + *(.text.hot .text.hot.*)\
> + *(.text.unlikely .text.fixup .text.unlikely.*)  \
> + *(.text .text.*)\
>   *(.ref.text)\
>   MEM_KEEP(init.text) \
>   MEM_KEEP(exit.text) \
> 
> but that failed to link an allyesconfig kernel because of references
> from .fixup to .text.*. Trying your version now:

Well then, that proves you can't put .text.fixup too far aways from
the associated input section.

> *(.text.hot .text.hot.*) *(.text.unlikely .text.unlikely.*) *(.text .text.*)

Which means this is guaranteed to fail when you test it properly using
gcc's profiling options, in order to generate .text.hot* and/or
.text.unlikely* sections.

It seems to me the right thing to do would be to change kernel asm to
generate .text.foo.fixup for any .text.foo section.  A gas feature
available with binutils-2.26 enabled by --sectname-subst might help
with implementing that.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination

2016-08-07 Thread Arnd Bergmann
On Sunday, August 7, 2016 7:27:39 PM CEST Alan Modra wrote:
> 
> If it can, then Nicholas' patch should be:
> 
> *(.text.hot .text.hot.*) *(.text.unlikely .text.unlikely.*) *(.text 
> .text.*)
> 
> If you can't put .text.fixup too far away then you may as well just use
> 
> *(.text .text.*)

I tried this version:

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index b1f8828e9eac..fc210dacac9a 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -438,7 +438,9 @@
  * during second ld run in second ld pass when generating System.map */
 #define TEXT_TEXT  \
ALIGN_FUNCTION();   \
-   *(.text.hot .text .text.fixup .text.unlikely .text.*)   \
+   *(.text.hot .text.hot.*)\
+   *(.text.unlikely .text.fixup .text.unlikely.*)  \
+   *(.text .text.*)\
*(.ref.text)\
MEM_KEEP(init.text) \
MEM_KEEP(exit.text) \

but that failed to link an allyesconfig kernel because of references
from .fixup to .text.*. Trying your version now:

*(.text.hot .text.hot.*) *(.text.unlikely .text.unlikely.*) *(.text .text.*)

Arnd


Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination

2016-08-07 Thread Andreas Schwab
On So, Aug 07 2016, Alan Modra  wrote:

> At the risk of being told you (kernel people) have already considerd
> this I thought I should mention that the above isn't ideal.  (Nor is
> gcc's choice of .text.hot for hot sections, which clashes with
> --function-sections for a function called "hot" but that's another
> story.)

Or --function-sections should use a unique prefix, like
.text.functions.*

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination

2016-08-07 Thread Alan Modra
On Fri, Aug 05, 2016 at 10:12:00PM +1000, Nicholas Piggin wrote:
>  #define TEXT_TEXT\
>   ALIGN_FUNCTION();   \
> - *(.text.hot .text .text.fixup .text.unlikely)   \
> + *(.text.hot .text .text.fixup .text.unlikely .text.*)   \
>   *(.ref.text)\
>   MEM_KEEP(init.text) \
>   MEM_KEEP(exit.text) \

At the risk of being told you (kernel people) have already considerd
this I thought I should mention that the above isn't ideal.  (Nor is
gcc's choice of .text.hot for hot sections, which clashes with
--function-sections for a function called "hot" but that's another
story.)

You'd really like all the hot sections and cold sections to be
together, for better cache locality.  So the line ought to have been
*(.text.hot) *(.text) *(.text.fixup) *(.text.unlikely)

That would put all .text.hot sections together.  Similarly for
.text.unlikely.  The trap of course is that this only works if
.text.fixup from one object file can be placed relatively far away
from .text in the same object file.

If it can, then Nicholas' patch should be:

*(.text.hot .text.hot.*) *(.text.unlikely .text.unlikely.*) *(.text 
.text.*)

If you can't put .text.fixup too far away then you may as well just use

*(.text .text.*)

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination

2016-08-06 Thread Nicolas Pitre
On Fri, 5 Aug 2016, Nicholas Piggin wrote:

> Introduce LINKER_DCE option for architectures to select if they want
> to build with -ffunction-sections, -fdata-sections, and link with
> --gc-sections. It requires some work (documented) to ensure all
> unreferenced entrypoints are live, and requires toolchain and
> build verification, so it is made a per-arch option for now.
> 
> On a random powerpc64le build, this yelds a significant size saving,
> it boots and runs fine, but there is a lot I haven't tested as yet,
> so these savings may be reduced if there are bugs in the link.
> 
> text  databssdec   filename
> 11169741   11807441923176 14273661   vmlinux
> 10445269   10041271919707 13369103   vmlinux.dce
> 
> ~700K text, ~170K data, 6% removed from kernel image size.
> 
> Signed-off-by: Nicholas Piggin 

I played with that too. However this needs distinct sections for 
exception tables and the like otherwise the backward references from the 
final exception table to those functions responsible for those exception 
entries has the effect of pulling in all those functions even if their 
entry point is never referenced, making --gc-sections less effective.  
I managed to fix this only with a change to gas (accepted upstream).

But once that is solved, you then have the missing forward reference 
problem i.e. nothing actually references those individual exception 
entry sections and ld happily drops them all. Having a KEEP() on each of 
them is unworkable and defeats the purpose anyway.  That requires a 
dummy reloc to trick ld into pulling in those sections when the parent 
section is also pulled in.

Please see attached a subset of the slides I presented at ELC and Linaro 
Connect last year to illustrate those issues.

Also attached a sample patch partially implementing those changes.

In short I'm very glad to see that this might steer interest across 
multiple architectures.  I felt like this was becoming much more 
intrusive than I expected and that maybe LTO was a better bet after all. 
But LTO has its evils too and I'm willing to look at gc-sections again 
if there is interest from others as well.


Nicolas

gc_slides.html.gz
Description: application/gzip
commit 1d7ec46257dc546bc7b87439788514fc4650a2b1
Author: Nicolas Pitre 
Date:   Mon Oct 26 10:16:14 2015 -0400

ARM: pushlinkedsection introduction

Signed-off-by: Nicolas Pitre 

diff --git a/Makefile b/Makefile
index d5b3739119..75541414cb 100644
--- a/Makefile
+++ b/Makefile
@@ -775,6 +775,10 @@ ifeq ($(shell $(CONFIG_SHELL) 
$(srctree)/scripts/gcc-goto.sh $(CC)), y)
KBUILD_AFLAGS += -DCC_HAVE_ASM_GOTO
 endif
 
+# Named subsections
+KBUILD_AFLAGS  += -Wa,--sectname-subst
+KBUILD_CFLAGS  += -Wa,--sectname-subst
+
 include scripts/Makefile.kasan
 include scripts/Makefile.extrawarn
 
diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
index b2bc8e1147..70161c9bfa 100644
--- a/arch/arm/include/asm/assembler.h
+++ b/arch/arm/include/asm/assembler.h
@@ -88,6 +88,17 @@
 #endif
 
 /*
+ * Special .pushsection wrapper with explicit dependency to prevent
+ * garbage collection of the specified section.  This is needed when no
+ * explicit symbol references are made to this section.
+ */
+   .macro  .pushlinkedsection name:vararg
+   .reloc  . - 1, R_ARM_NONE, 9909f
+   .pushsection \name
+9909:
+   .endm
+
+/*
  * Enable and disable interrupts
  */
 #if __LINUX_ARM_ARCH__ >= 6
@@ -239,7 +250,7 @@
 
 #define USER(x...) \
 :  x;  \
-   .pushsection __ex_table,"a";\
+   .pushlinkedsection __ex_table.%S,"a";   \
.align  3;  \
.long   b,9001f;\
.popsection
@@ -253,7 +264,7 @@
  * ALT_SMP( W(instr) ... )
  */
 #define ALT_UP(instr...)   \
-   .pushsection ".alt.smp.init", "a"   ;\
+   .pushlinkedsection ".alt.smp.init.%S", "a"  ;\
.long   9998b   ;\
 9997:  instr   ;\
.if . - 9997b == 2  ;\
@@ -265,7 +276,7 @@
.popsection
 #define ALT_UP_B(label)\
.equup_b_offset, label - 9998b  ;\
-   .pushsection ".alt.smp.init", "a"   ;\
+   .pushlinkedsection ".alt.smp.init.%S", "a"  ;\
.long   9998b   ;\
W(b). + up_b_offset ;\
.popsection
@@ -375,7 +386,7 @@ THUMB(  orr \reg , \reg , #PSR_T_BIT)
.error  "Unsupported inc macro argument"
.endif
 
-   .pushsection __ex_table,"a"
+   

Re: [PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination

2016-08-06 Thread Sam Ravnborg
On Fri, Aug 05, 2016 at 10:12:00PM +1000, Nicholas Piggin wrote:
> Introduce LINKER_DCE option for architectures to select if they want
> to build with -ffunction-sections, -fdata-sections, and link with
> --gc-sections.

Can you please try to come up with a less cryptic name.
"DCE" may make sense for you today.
Bot the naive reader will benefit from the longer and
more explcit form.


 It requires some work (documented) to ensure all
> unreferenced entrypoints are live, and requires toolchain and
> build verification, so it is made a per-arch option for now.
> 
> On a random powerpc64le build, this yelds a significant size saving,
> it boots and runs fine, but there is a lot I haven't tested as yet,
> so these savings may be reduced if there are bugs in the link.
> 
> text  databssdec   filename
> 11169741   11807441923176 14273661   vmlinux
> 10445269   10041271919707 13369103   vmlinux.dce
> 
> ~700K text, ~170K data, 6% removed from kernel image size.
> 
> Signed-off-by: Nicholas Piggin 
> ---
>  Makefile  | 10 
>  arch/Kconfig  | 13 ++
>  include/asm-generic/vmlinux.lds.h | 52 
> ++-
>  include/linux/compiler.h  | 18 ++
>  include/linux/export.h| 30 +++---
>  include/linux/init.h  | 38 ++--
>  init/Makefile |  2 ++
>  7 files changed, 100 insertions(+), 63 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index b409076..d5ef31a 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -618,6 +618,11 @@ include arch/$(SRCARCH)/Makefile
>  
>  KBUILD_CFLAGS+= $(call cc-option,-fno-delete-null-pointer-checks,)
>  
> +ifdef CONFIG_LINKER_DCE
> +KBUILD_CFLAGS+= $(call cc-option,-ffunction-sections,)
> +KBUILD_CFLAGS+= $(call cc-option,-fdata-sections,)
> +endif
> +
>  ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
>  KBUILD_CFLAGS+= -Os $(call cc-disable-warning,maybe-uninitialized,)
>  else
> @@ -819,6 +824,11 @@ LDFLAGS_BUILD_ID = $(patsubst -Wl$(comma)%,%,\
>  KBUILD_LDFLAGS_MODULE += $(LDFLAGS_BUILD_ID)
>  LDFLAGS_vmlinux += $(LDFLAGS_BUILD_ID)
>  
> +ifdef CONFIG_LINKER_DCE
> +# LDFLAGS_MODULE += $(call ld-option, --gc-sections,)
> +LDFLAGS_vmlinux  += $(call ld-option, --gc-sections,)
> +endif
Something you missed to clean up

Sam


[PATCH 2/5] kbuild: allow archs to select build for link dead code/data elimination

2016-08-05 Thread Nicholas Piggin
Introduce LINKER_DCE option for architectures to select if they want
to build with -ffunction-sections, -fdata-sections, and link with
--gc-sections. It requires some work (documented) to ensure all
unreferenced entrypoints are live, and requires toolchain and
build verification, so it is made a per-arch option for now.

On a random powerpc64le build, this yelds a significant size saving,
it boots and runs fine, but there is a lot I haven't tested as yet,
so these savings may be reduced if there are bugs in the link.

text  databssdec   filename
11169741   11807441923176   14273661   vmlinux
10445269   10041271919707   13369103   vmlinux.dce

~700K text, ~170K data, 6% removed from kernel image size.

Signed-off-by: Nicholas Piggin 
---
 Makefile  | 10 
 arch/Kconfig  | 13 ++
 include/asm-generic/vmlinux.lds.h | 52 ++-
 include/linux/compiler.h  | 18 ++
 include/linux/export.h| 30 +++---
 include/linux/init.h  | 38 ++--
 init/Makefile |  2 ++
 7 files changed, 100 insertions(+), 63 deletions(-)

diff --git a/Makefile b/Makefile
index b409076..d5ef31a 100644
--- a/Makefile
+++ b/Makefile
@@ -618,6 +618,11 @@ include arch/$(SRCARCH)/Makefile
 
 KBUILD_CFLAGS  += $(call cc-option,-fno-delete-null-pointer-checks,)
 
+ifdef CONFIG_LINKER_DCE
+KBUILD_CFLAGS  += $(call cc-option,-ffunction-sections,)
+KBUILD_CFLAGS  += $(call cc-option,-fdata-sections,)
+endif
+
 ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
 KBUILD_CFLAGS  += -Os $(call cc-disable-warning,maybe-uninitialized,)
 else
@@ -819,6 +824,11 @@ LDFLAGS_BUILD_ID = $(patsubst -Wl$(comma)%,%,\
 KBUILD_LDFLAGS_MODULE += $(LDFLAGS_BUILD_ID)
 LDFLAGS_vmlinux += $(LDFLAGS_BUILD_ID)
 
+ifdef CONFIG_LINKER_DCE
+# LDFLAGS_MODULE   += $(call ld-option, --gc-sections,)
+LDFLAGS_vmlinux+= $(call ld-option, --gc-sections,)
+endif
+
 ifeq ($(CONFIG_STRIP_ASM_SYMS),y)
 LDFLAGS_vmlinux+= $(call ld-option, -X,)
 endif
diff --git a/arch/Kconfig b/arch/Kconfig
index 1330bf4..a49092b 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -430,6 +430,19 @@ config THIN_ARCHIVES
  Select this if the architecture wants to use thin archives
  instead of ld -r to create the built-in.o files.
 
+config LINKER_DCE
+   bool
+   help
+ Select this if the architecture wants to do dead code and
+ data elimination with the linker by compiling with
+ -ffunction-sections -fdata-sections and linking with
+ --gc-sections.
+
+ This requires that the arch annotates or otherwise protects
+ its external entry points from being discarded. Linker scripts
+ must also merge .text.*, .data.*, and .bss.* correctly into
+ output sections.
+
 config HAVE_CONTEXT_TRACKING
bool
help
diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index 6a67ab9..a66ffe9 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -196,9 +196,14 @@
*(.dtb.init.rodata) \
VMLINUX_SYMBOL(__dtb_end) = .;
 
-/* .data section */
+/*
+ * .data section
+ * -fdata-sections generates .data.identifier which needs to be pulled in
+ * with .data, but don't want to pull in .data..stuff which has its own
+ * requirements. Same for bss.
+ */
 #define DATA_DATA  \
-   *(.data)\
+   *(.data .data.[0-9a-zA-Z_]*)\
*(.ref.data)\
*(.data..shared_aligned) /* percpu related */   \
MEM_KEEP(init.data) \
@@ -312,76 +317,76 @@
/* Kernel symbol table: Normal symbols */   \
__ksymtab : AT(ADDR(__ksymtab) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start___ksymtab) = .;  \
-   *(SORT(___ksymtab+*))   \
+   KEEP(*(SORT(___ksymtab+*))) \
VMLINUX_SYMBOL(__stop___ksymtab) = .;   \
}   \
\
/* Kernel symbol table: GPL-only symbols */ \
__ksymtab_gpl : AT(ADDR(__ksymtab_gpl) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start___ksymtab_gpl) = .;  \
-   *(SORT(___ksymtab_gpl+*))   \
+   KEEP(*(SORT(___ksymtab_gpl+*))) \