[Bug tree-optimization/84011] Optimize switch table with run-time relocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011 Andrew Pinski changed: What|Removed |Added CC||jengelh at inai dot de --- Comment #14 from Andrew Pinski --- *** Bug 99383 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/84011] Optimize switch table with run-time relocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011 --- Comment #13 from Peter Cordes --- (In reply to Jakub Jelinek from comment #10) > ?? That is the task for the linker SHF_MERGE|SHF_STRINGS handling. > Why should gcc duplicate that? Because gcc would benefit from knowing if merging makes the total block of strings for a switch() table short enough to use a uint8_t offset[] instead of uint16_t. If we don't know at compile time, we'd have to be conservative and potentially use a wider offset table. (Although as Joseph points out https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85585#c2, without more linker support for this we could end up missing out on literal merging across compilation units. So perhaps a first step in applying this idea would be to use 32-bit offsets from the start of the .rodata.str1.1 section, so we can still let the linker merge strings and end up with them non-contiguous without having to force the one that gets kept to be the one that's part of our block of strings.)
[Bug tree-optimization/84011] Optimize switch table with run-time relocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011 --- Comment #12 from Peter Cordes --- (In reply to Jakub Jelinek from comment #10) > (In reply to Peter Cordes from comment #9) > > gcc already totally misses optimizations here where one string is a suffix > > of another. "mii" could just be a pointer to the 3rd byte of "sgmii", but > > we instead duplicate all the characters. That's where major savings are > > possible for this function. > > ?? That is the task for the linker SHF_MERGE|SHF_STRINGS handling. > Why should gcc duplicate that? Oops, right I was only looking at gcc's asm output, didn't check an actual linked binary. Will the linker currently catch a case like this? .LC_base: .LC2: .string "mii" .LC3: .string "gmii" table: .byte .LC2 - .LC_base, .LC3 - .LC_base and drop .string "mii" entirely + rewrite the table to .byte .LC3+1 - .LC_base, .LC3 - .LC_base (This discussion should probably be happening on bug 85585.) Sorry I don't know the actual mechanism by which gcc signals to the linker that it can / can't merge. I guess only in some sections? Because gcc couldn't allow it if was emitting an array like this, where dropping a string would change the offsets for later data and break offset calculations: const struct { char str[11]; } table[] = { {"mii"}, {"gmii"} };
[Bug tree-optimization/84011] Optimize switch table with run-time relocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011 --- Comment #11 from Jan Hubicka --- > > ?? That is the task for the linker SHF_MERGE|SHF_STRINGS handling. > Why should gcc duplicate that? I suppose there would be small room for improvements where GCC could use the fact that one string's address is actually address of another string + offset. That may save some relocations. Honza
[Bug tree-optimization/84011] Optimize switch table with run-time relocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011 --- Comment #10 from Jakub Jelinek --- (In reply to Peter Cordes from comment #9) > gcc already totally misses optimizations here where one string is a suffix > of another. "mii" could just be a pointer to the 3rd byte of "sgmii", but > we instead duplicate all the characters. That's where major savings are > possible for this function. ?? That is the task for the linker SHF_MERGE|SHF_STRINGS handling. Why should gcc duplicate that?
[Bug tree-optimization/84011] Optimize switch table with run-time relocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011 Peter Cordes changed: What|Removed |Added CC||peter at cordes dot ca --- Comment #9 from Peter Cordes --- (In reply to rguent...@suse.de from comment #4) > An optimization would be to > add an indirection by, say, only recording the constant offset > into an "array of strings" in the table, thus effectively > > "case1\0case2\0..."[CSWITCH[i]] > > which would require only a relocation to access the single string > constant. But it would prohibit cases of string merging within > those strings unless we implement that as well for this optimization. gcc already totally misses optimizations here where one string is a suffix of another. "mii" could just be a pointer to the 3rd byte of "sgmii", but we instead duplicate all the characters. That's where major savings are possible for this function. > Note this might be profitable unconditionally, not just with -fpie/pic > as the CSWITCH table would be smaller (dependent on the total > size of the merged string). Indeed, I wrote up bug 85585 with ideas for optimizing this. A table of byte or uint16_t offsets into a static buffer of packed strings looks good for PIC and for position-dependent. To avoid any runtime relocations, all you need is the ability to get a static address into a register (e.g. RIP-relative LEA) and do an indexed load relative to it, just like using a normal static char[]. Then add the load result to that address. Runtime relocation is nice to avoid even if you don't *need* to avoid it. Also possible is padding each string out to a constant length and calculating an index into that, removing a level of indirection. (Good when strings are similar length and/or all short, and there aren't many strings that are duplicates or suffixes of others.) Again you just need to get a static address into a register, and add it to 11*enum_value. This is all ADD + LEA (with one of them being RIP-relative).
[Bug tree-optimization/84011] Optimize switch table with run-time relocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011 --- Comment #8 from rguenther at suse dot de --- On Wed, 24 Jan 2018, hjl.tools at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011 > > H.J. Lu changed: > >What|Removed |Added > > CC||jakub at redhat dot com > Depends on||36881 > > --- Comment #6 from H.J. Lu --- > The code in question is done on purpose: > > commit 54af7f7e3f0f38a4cd5667fda9fd9a68ad554df0 > Author: jakub> Date: Wed Oct 15 06:43:19 2008 + > > PR tree-optimization/36881 > * tree-switch-conversion.c (check_final_bb): For flag_pic, check > that each value doesn't need runtime relocations, for !flag_pic > check that each value is just a valid initializer constant. > > * gcc.dg/tree-ssa/pr36881.c: New test. > > > git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@141129 > 138bc75d-0d04-0410-961f-82ee72b054a4 > > Should we revisit this decision? We could add a hook to allow the target to override. Like targetm.avoid_runtime_relocations_p () or runtime_relocations_expensive_p () or just (as it seems for SPU) runtime_relocations_supported_p ()? As said ideally we'd simply avoid creating relocations for string constants or other constant initializers. Maybe that's even possible in some weird ways via using section labels/addresses? Thus make the table contain .Lstring1 - .section_start_of_where_Lstring1_is, ... and thus a constant initializer using some extension to query that section start or even the offset -- __builtin_offset_from_section_start ("foo")?
[Bug tree-optimization/84011] Optimize switch table with run-time relocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84011 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #7 from Jakub Jelinek --- Well, we'd need a target hook for these weirdo targets that don't support them at all (spu, anything else?), and for the rest, perhaps we should think of something better. E.g. if not just some elements of the array return null_pointer_node, but all of them, perhaps we could emit in the array content value - array instead of value and then add array to whatever we load from the array.