[Bug tree-optimization/84011] Optimize switch table with run-time relocation

2021-09-04 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011

Andrew Pinski  changed:

   What|Removed |Added

 CC||jengelh at inai dot de

--- Comment #14 from Andrew Pinski  ---
*** Bug 99383 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/84011] Optimize switch table with run-time relocation

2018-05-01 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011

--- Comment #13 from Peter Cordes  ---
(In reply to Jakub Jelinek from comment #10)
> ??  That is the task for the linker SHF_MERGE|SHF_STRINGS handling.
> Why should gcc duplicate that?

Because gcc would benefit from knowing if merging makes the total block of
strings for a switch() table short enough to use a uint8_t offset[] instead of
uint16_t.

If we don't know at compile time, we'd have to be conservative and potentially
use a wider offset table.  (Although as Joseph points out
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85585#c2, without more linker
support for this we could end up missing out on literal merging across
compilation units.  So perhaps a first step in applying this idea would be to
use 32-bit offsets from the start of the .rodata.str1.1 section, so we can
still let the linker merge strings and end up with them non-contiguous without
having to force the one that gets kept to be the one that's part of our block
of strings.)

[Bug tree-optimization/84011] Optimize switch table with run-time relocation

2018-05-01 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011

--- Comment #12 from Peter Cordes  ---
(In reply to Jakub Jelinek from comment #10)
> (In reply to Peter Cordes from comment #9)
> > gcc already totally misses optimizations here where one string is a suffix
> > of another.  "mii" could just be a pointer to the 3rd byte of "sgmii", but
> > we instead duplicate all the characters.  That's where major savings are
> > possible for this function.
> 
> ??  That is the task for the linker SHF_MERGE|SHF_STRINGS handling.
> Why should gcc duplicate that?

Oops, right I was only looking at gcc's asm output, didn't check an actual
linked binary.

Will the linker currently catch a case like this?

.LC_base:
.LC2: .string "mii"
.LC3: .string "gmii"

table:
.byte  .LC2 - .LC_base,  .LC3 - .LC_base

and drop .string "mii" entirely + rewrite the table to
.byte  .LC3+1 - .LC_base,  .LC3 - .LC_base

(This discussion should probably be happening on bug 85585.)

Sorry I don't know the actual mechanism by which gcc signals to the linker that
it can / can't merge.  I guess only in some sections?  Because gcc couldn't
allow it if was emitting an array like this, where dropping a string would
change the offsets for later data and break offset calculations:

const struct { char str[11]; } table[] = { {"mii"}, {"gmii"} };

[Bug tree-optimization/84011] Optimize switch table with run-time relocation

2018-05-01 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011

--- Comment #11 from Jan Hubicka  ---
> 
> ??  That is the task for the linker SHF_MERGE|SHF_STRINGS handling.
> Why should gcc duplicate that?
I suppose there would be small room for improvements where GCC could use the
fact that one string's address is actually address of another string + offset.
That may save some relocations.

Honza

[Bug tree-optimization/84011] Optimize switch table with run-time relocation

2018-05-01 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011

--- Comment #10 from Jakub Jelinek  ---
(In reply to Peter Cordes from comment #9)
> gcc already totally misses optimizations here where one string is a suffix
> of another.  "mii" could just be a pointer to the 3rd byte of "sgmii", but
> we instead duplicate all the characters.  That's where major savings are
> possible for this function.

??  That is the task for the linker SHF_MERGE|SHF_STRINGS handling.
Why should gcc duplicate that?

[Bug tree-optimization/84011] Optimize switch table with run-time relocation

2018-05-01 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011

Peter Cordes  changed:

   What|Removed |Added

 CC||peter at cordes dot ca

--- Comment #9 from Peter Cordes  ---
(In reply to rguent...@suse.de from comment #4)
> An optimization would be to
> add an indirection by, say, only recording the constant offset
> into an "array of strings" in the table, thus effectively
> 
>   "case1\0case2\0..."[CSWITCH[i]]
> 
> which would require only a relocation to access the single string
> constant.  But it would prohibit cases of string merging within
> those strings unless we implement that as well for this optimization.

gcc already totally misses optimizations here where one string is a suffix of
another.  "mii" could just be a pointer to the 3rd byte of "sgmii", but we
instead duplicate all the characters.  That's where major savings are possible
for this function.

> Note this might be profitable unconditionally, not just with -fpie/pic
> as the CSWITCH table would be smaller (dependent on the total
> size of the merged string).

Indeed, I wrote up bug 85585 with ideas for optimizing this.  A table of byte
or uint16_t offsets into a static buffer of packed strings looks good for PIC
and for position-dependent.

To avoid any runtime relocations, all you need is the ability to get a static
address into a register (e.g. RIP-relative LEA) and do an indexed load relative
to it, just like using a normal static char[].  Then add the load result to
that address.  Runtime relocation is nice to avoid even if you don't *need* to
avoid it.

Also possible is padding each string out to a constant length and calculating
an index into that, removing a level of indirection.  (Good when strings are
similar length and/or all short, and there aren't many strings that are
duplicates or suffixes of others.)  Again you just need to get a static address
into a register, and add it to 11*enum_value.  This is all ADD + LEA (with one
of them being RIP-relative).

[Bug tree-optimization/84011] Optimize switch table with run-time relocation

2018-01-24 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011

--- Comment #8 from rguenther at suse dot de  ---
On Wed, 24 Jan 2018, hjl.tools at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011
> 
> H.J. Lu  changed:
> 
>What|Removed |Added
> 
>  CC||jakub at redhat dot com
>  Depends on||36881
> 
> --- Comment #6 from H.J. Lu  ---
> The code in question is done on purpose:
> 
> commit 54af7f7e3f0f38a4cd5667fda9fd9a68ad554df0
> Author: jakub 
> Date:   Wed Oct 15 06:43:19 2008 +
> 
> PR tree-optimization/36881
> * tree-switch-conversion.c (check_final_bb): For flag_pic, check
> that each value doesn't need runtime relocations, for !flag_pic
> check that each value is just a valid initializer constant.
> 
> * gcc.dg/tree-ssa/pr36881.c: New test.
> 
> 
> git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@141129
> 138bc75d-0d04-0410-961f-82ee72b054a4
> 
> Should we revisit this decision?

We could add a hook to allow the target to override.  Like
targetm.avoid_runtime_relocations_p () or runtime_relocations_expensive_p 
() or just (as it seems for SPU) runtime_relocations_supported_p ()?

As said ideally we'd simply avoid creating relocations for string
constants or other constant initializers.  Maybe that's even possible
in some weird ways via using section labels/addresses?  Thus make
the table contain

 .Lstring1 - .section_start_of_where_Lstring1_is,
 ...

and thus a constant initializer using some extension to query that
section start or even the offset -- __builtin_offset_from_section_start 
("foo")?

[Bug tree-optimization/84011] Optimize switch table with run-time relocation

2018-01-24 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #7 from Jakub Jelinek  ---
Well, we'd need a target hook for these weirdo targets that don't support them
at all (spu, anything else?), and for the rest, perhaps we should think of
something better.  E.g. if not just some elements of the array return
null_pointer_node, but all of them, perhaps we could emit in the array content
value - array instead of value and then add array to whatever we load from the
array.