Re: Auto-generated .rodata contents and __attribute__((section))

2018-05-24 Thread Florian Weimer

On 05/23/2018 02:55 PM, Michael Matz wrote:

On Fri, 18 May 2018, Richard Biener wrote:


Interesting.  Do they allow merging across such sections?  Consider a 8
byte entity 0x12345678 and 4 byte entities 0x1234 0x5678, will the 4
byte entities share the rodata with the 8 byte one?



There's no language to forbid this (as long as the alignments
are respected), but at least GNU ld currently only merges same-sized
entities.


I'm not entirely sure if this is valid for C, particularly if the 
objects have the same address, but not the same type.


There is a discussion on the generic-abi list about providing 
information to the linker about when it is safe to do such merging:


https://groups.google.com/forum/#!topic/generic-abi/MPr8TVtnVn4

Thanks,
Florian


Re: Auto-generated .rodata contents and __attribute__((section))

2018-05-23 Thread Michael Matz
Hi,

On Fri, 18 May 2018, Richard Biener wrote:

> Interesting.  Do they allow merging across such sections?  Consider a 8 
> byte entity 0x12345678 and 4 byte entities 0x1234 0x5678, will the 4 
> byte entities share the rodata with the 8 byte one?

There's no language to forbid this (as long as the alignments 
are respected), but at least GNU ld currently only merges same-sized 
entities.

> I believe GCC pulls off such tricks in its internal constant pool 
> merging code.
> 
> It might be worth gathering statistics on the size of constant pool
> entries for this.
> 
> Now the question is of course if BFD contains support for optimizing
> those sections.

You mean to ask if GNU ld is actually uniquifying contents of such 
mergable sections over object files?  If so the answer is yes.


Ciao,
Michael.


Re: Auto-generated .rodata contents and __attribute__((section))

2018-05-18 Thread Richard Biener
On Thu, May 17, 2018 at 11:10 PM Segher Boessenkool <
seg...@kernel.crashing.org> wrote:

> On Thu, May 17, 2018 at 06:10:13PM +0200, Michael Matz wrote:
> > On Wed, 16 May 2018, Richard Biener wrote:
> > > > Are constant pool entries merged at compile time or at link time? I
> > > > would presume it should be done at link time because otherwise
you're
> > > > only merging entries within a single compilation unit (which doesn't
> > > > sound that useful in a big project with hundreds of source files),
> > > > right?
> > >
> > > constant pool entries are merged at compile time.  There's no such
thing
> > > as mergeable constant pool sections
> >
> > Actually there is in ELF.  Mergable sections can not only hold strings,
> > but also fixed-size entities (e.g. 4 or 8 byte constants).  Those are
> > merged content-wise at link time and references properly rewritten.  Of
> > course, those still aren't per-function.

Interesting.  Do they allow merging across such sections?  Consider
a 8 byte entity 0x12345678 and 4 byte entities 0x1234 0x5678,
will the 4 byte entities share the rodata with the 8 byte one?  I believe
GCC pulls off such tricks in its internal constant pool merging code.

It might be worth gathering statistics on the size of constant pool
entries for this.

Now the question is of course if BFD contains support for optimizing
those sections.

> It also works correctly in combination with -ffunction-sections,
> -fdata-sections, -Wl,--gc-sections.  And not with per-function constant
> pools like on arm-linux; I'm not sure how that could ever work.


> Segher


Re: Auto-generated .rodata contents and __attribute__((section))

2018-05-17 Thread Segher Boessenkool
On Thu, May 17, 2018 at 06:10:13PM +0200, Michael Matz wrote:
> On Wed, 16 May 2018, Richard Biener wrote:
> > > Are constant pool entries merged at compile time or at link time? I 
> > > would presume it should be done at link time because otherwise you're 
> > > only merging entries within a single compilation unit (which doesn't 
> > > sound that useful in a big project with hundreds of source files), 
> > > right?
> > 
> > constant pool entries are merged at compile time.  There's no such thing
> > as mergeable constant pool sections 
> 
> Actually there is in ELF.  Mergable sections can not only hold strings, 
> but also fixed-size entities (e.g. 4 or 8 byte constants).  Those are 
> merged content-wise at link time and references properly rewritten.  Of 
> course, those still aren't per-function.

It also works correctly in combination with -ffunction-sections,
-fdata-sections, -Wl,--gc-sections.  And not with per-function constant
pools like on arm-linux; I'm not sure how that could ever work.


Segher


Re: Auto-generated .rodata contents and __attribute__((section))

2018-05-17 Thread Michael Matz
Hi,

On Wed, 16 May 2018, Richard Biener wrote:

> > Are constant pool entries merged at compile time or at link time? I 
> > would presume it should be done at link time because otherwise you're 
> > only merging entries within a single compilation unit (which doesn't 
> > sound that useful in a big project with hundreds of source files), 
> > right?
> 
> constant pool entries are merged at compile time.  There's no such thing
> as mergeable constant pool sections 

Actually there is in ELF.  Mergable sections can not only hold strings, 
but also fixed-size entities (e.g. 4 or 8 byte constants).  Those are 
merged content-wise at link time and references properly rewritten.  Of 
course, those still aren't per-function.


Ciao,
Michael.


Re: Auto-generated .rodata contents and __attribute__((section))

2018-05-16 Thread Richard Biener
On Tue, May 15, 2018 at 9:56 PM Julius Werner  wrote:

> > I think you are asking for per-function constant pool sections.  Because
> > we generally cannot avoid the need of a constant pool and dependent
> > on the target that is always global.  Note with per-function constant
> > pools you will not benefit from constant pool entry merging across
> > functions.  I'm also not aware of any non-target-specific (and thus not
> > implemented on some targets) option to get these.

> Thanks, yeah, that sounds like what I need. Is there any way to get that
> behavior today, even for a specific target? (I'm mostly interested in
> x86_64, armv7 and aarch64.) And are you saying that there are some targets
> for which it would be impossible to provide this behavior? Or just that
> it's not implemented for all targets today?

It's not implemented for all targets and there may be no way to force it
for all constants.

> Are constant pool entries merged at compile time or at link time? I would
> presume it should be done at link time because otherwise you're only
> merging entries within a single compilation unit (which doesn't sound that
> useful in a big project with hundreds of source files), right?

constant pool entries are merged at compile time.  There's no such thing
as mergeable constant pool sections so the closest thing would be to
emit each entry into its own comdat or linkonce section and have the
general linker script merge them pattern-based into .rodata from
.rodata.HASHVALUEOFACTUALCONSTANT.  And then of course you'll
run into hash collisions so I guess it's not a too bright idea but instead
some way of making a mergeable rodata section (which requires
meta data for its entries) would be a solution.

> So if
> they're merged at link time, shouldn't it be possible to do that merging
> after a linker script condensed all the per-function input sections that
> are left after --gc-sections back into a single big .rodata output
section?
> (In my case, the linker script would instead condense all the constant
pool
> sections for marked functions into .special_area.rodata and all the others
> into .rodata, and then it should be merging internally within those two
> output sections.)

> > Does it work better if you make this "static const"?

> Yes, then I can declare an __attribute__((section)) for that specific
> variable. However, that doesn't really seem like a safe and scalable
> approach, especially since it's hard to notice when you missed a variable.
> I'd like to have a way that I can annotate a function and that whole
> function (with everything it needs, except for globals) gets put into a
> special section (or a set of special sections with a common prefix or
> suffix), without having to rewrite the source to accommodate for this
every
> time.


Re: Auto-generated .rodata contents and __attribute__((section))

2018-05-15 Thread Segher Boessenkool
On Tue, May 15, 2018 at 12:56:22PM -0700, Julius Werner wrote:
> > I think you are asking for per-function constant pool sections.  Because
> > we generally cannot avoid the need of a constant pool and dependent
> > on the target that is always global.  Note with per-function constant
> > pools you will not benefit from constant pool entry merging across
> > functions.  I'm also not aware of any non-target-specific (and thus not
> > implemented on some targets) option to get these.
> 
> Thanks, yeah, that sounds like what I need. Is there any way to get that
> behavior today, even for a specific target? (I'm mostly interested in
> x86_64, armv7 and aarch64.) And are you saying that there are some targets
> for which it would be impossible to provide this behavior? Or just that
> it's not implemented for all targets today?

For aarch64 there is -mpc-relative-literal-loads, I think that will
do what you want.  This option is implied by -mcmodel=tiny which you
may want anyway, if your code is small enough.


Segher


Re: Auto-generated .rodata contents and __attribute__((section))

2018-05-15 Thread Joseph Myers
This has been listed as a desirable feature for a long time: 
https://gcc.gnu.org/projects/optimize.html#putting_constants_in_special_sections

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Auto-generated .rodata contents and __attribute__((section))

2018-05-15 Thread Julius Werner
> I think you are asking for per-function constant pool sections.  Because
> we generally cannot avoid the need of a constant pool and dependent
> on the target that is always global.  Note with per-function constant
> pools you will not benefit from constant pool entry merging across
> functions.  I'm also not aware of any non-target-specific (and thus not
> implemented on some targets) option to get these.

Thanks, yeah, that sounds like what I need. Is there any way to get that
behavior today, even for a specific target? (I'm mostly interested in
x86_64, armv7 and aarch64.) And are you saying that there are some targets
for which it would be impossible to provide this behavior? Or just that
it's not implemented for all targets today?

Are constant pool entries merged at compile time or at link time? I would
presume it should be done at link time because otherwise you're only
merging entries within a single compilation unit (which doesn't sound that
useful in a big project with hundreds of source files), right? So if
they're merged at link time, shouldn't it be possible to do that merging
after a linker script condensed all the per-function input sections that
are left after --gc-sections back into a single big .rodata output section?
(In my case, the linker script would instead condense all the constant pool
sections for marked functions into .special_area.rodata and all the others
into .rodata, and then it should be merging internally within those two
output sections.)

> Does it work better if you make this "static const"?

Yes, then I can declare an __attribute__((section)) for that specific
variable. However, that doesn't really seem like a safe and scalable
approach, especially since it's hard to notice when you missed a variable.
I'd like to have a way that I can annotate a function and that whole
function (with everything it needs, except for globals) gets put into a
special section (or a set of special sections with a common prefix or
suffix), without having to rewrite the source to accommodate for this every
time.


Re: Auto-generated .rodata contents and __attribute__((section))

2018-05-15 Thread Segher Boessenkool
On Mon, May 14, 2018 at 04:38:09PM -0700, Julius Werner wrote:
> However, I just found an issue with this when the functions include local
> variables like this:
> 
>   const int some_array[] = { 1, 2, 3, 4, 5, 6 };

Does it work better if you make this "static const"?


Segher


Re: Auto-generated .rodata contents and __attribute__((section))

2018-05-15 Thread Richard Biener
On Tue, May 15, 2018 at 1:38 AM Julius Werner  wrote:

> Hi,

> I'm a firmware/embedded engineer and frequently run into cases where
> certain parts of the code need to be placed in a special memory area (for
> example, because the area that contains the other code is not yet
> initialized or currently inaccessible). My go-to method to solve this is
to
> mark all functions and globals used by this code with
> __attribute__((section)), and using a linker script to map those special
> sections to the desired area. This mostly works pretty well.

> However, I just found an issue with this when the functions include local
> variables like this:

>const int some_array[] = { 1, 2, 3, 4, 5, 6 };

> In this case (and with -Os optimization), GCC seems to automatically
> reserve some space in the .rodata section to place the array, and the
> generated code accesses it there. Of course this breaks my use case if the
> generic .rodata section is inaccessible while that function executes. I
> have not found any way to work around this without either rewriting the
> code to completely avoid those constructs, or manipulating sections
> manually at the linker level (in particular, you can't just mark the array
> itself with __attribute__((section)), since that attribute is not legal
for
> locals).

> Is this intentional, and if so, does it make sense that it is? I can
> understand that it may technically be compliant with the description of
> __attribute__((section)) in the GCC manual -- but I think the use case I'm
> trying to solve is one of the most common uses of that attribute, and it
> seems to become completely impossible due to this. Wouldn't it make more
> sense and be more useful if __attribute__((section)) meant "place
> *everything* generated as part of this function source code into that
> section"? Or at least offer some sort of other extension to be able to
> control section placement for those special constants? (Note that GCC
> usually seems to place constants for individual variables in the text
> section, simply behind the epilogue of the function... so it's also quite
> unclear to me why arrays get treated differently at all.)

> Apart from this issue, this behavior also seems to "break"
> -ffunction-sections/-fdata-sections. Even with both of those set, these
> sorts of constants seem to get placed into the same big, common .rodata
> section (as opposed to either .text.functionname or .rodata.functionname
as
> you'd expect). That means that they won't get collected when linking the
> binary with --gc-sections and will bloat the code size for projects that
> link a lot of code opportunistically and rely on --gc-sections to drop
> everything that's not needed for the current configuration.

> Is there some clever trick that I missed to work around this, or is this
> really not possible with the current GCC? And if so, would you agree that
> this is a valid problem that GCC should provide a solution for (in some
> form or another)?

I think you are asking for per-function constant pool sections.  Because
we generally cannot avoid the need of a constant pool and dependent
on the target that is always global.  Note with per-function constant
pools you will not benefit from constant pool entry merging across
functions.  I'm also not aware of any non-target-specific (and thus not
implemented on some targets) option to get these.

Richard.

> Thanks,
> Julius


Auto-generated .rodata contents and __attribute__((section))

2018-05-14 Thread Julius Werner
Hi,

I'm a firmware/embedded engineer and frequently run into cases where
certain parts of the code need to be placed in a special memory area (for
example, because the area that contains the other code is not yet
initialized or currently inaccessible). My go-to method to solve this is to
mark all functions and globals used by this code with
__attribute__((section)), and using a linker script to map those special
sections to the desired area. This mostly works pretty well.

However, I just found an issue with this when the functions include local
variables like this:

  const int some_array[] = { 1, 2, 3, 4, 5, 6 };

In this case (and with -Os optimization), GCC seems to automatically
reserve some space in the .rodata section to place the array, and the
generated code accesses it there. Of course this breaks my use case if the
generic .rodata section is inaccessible while that function executes. I
have not found any way to work around this without either rewriting the
code to completely avoid those constructs, or manipulating sections
manually at the linker level (in particular, you can't just mark the array
itself with __attribute__((section)), since that attribute is not legal for
locals).

Is this intentional, and if so, does it make sense that it is? I can
understand that it may technically be compliant with the description of
__attribute__((section)) in the GCC manual -- but I think the use case I'm
trying to solve is one of the most common uses of that attribute, and it
seems to become completely impossible due to this. Wouldn't it make more
sense and be more useful if __attribute__((section)) meant "place
*everything* generated as part of this function source code into that
section"? Or at least offer some sort of other extension to be able to
control section placement for those special constants? (Note that GCC
usually seems to place constants for individual variables in the text
section, simply behind the epilogue of the function... so it's also quite
unclear to me why arrays get treated differently at all.)

Apart from this issue, this behavior also seems to "break"
-ffunction-sections/-fdata-sections. Even with both of those set, these
sorts of constants seem to get placed into the same big, common .rodata
section (as opposed to either .text.functionname or .rodata.functionname as
you'd expect). That means that they won't get collected when linking the
binary with --gc-sections and will bloat the code size for projects that
link a lot of code opportunistically and rely on --gc-sections to drop
everything that's not needed for the current configuration.

Is there some clever trick that I missed to work around this, or is this
really not possible with the current GCC? And if so, would you agree that
this is a valid problem that GCC should provide a solution for (in some
form or another)?

Thanks,
Julius