[Bug target/83507] [8 Regression] ICE in internal_dfa_insn_code_* for powerpc targets
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83507 Roman Zhuykov changed: What|Removed |Added CC||zhroma at ispras dot ru --- Comment #12 from Roman Zhuykov --- Not sure if I have to reopen it, but fix wasn’t correct. In this example we don’t have -fmodulo-sched-allow-regmoves enabled and we should not create any register moves at all. More discussion and proper fix: https://gcc.gnu.org/ml/gcc-patches/2019-04/msg00632.html
[Bug testsuite/90113] New: Useless torture mode for gfortran.dg tests
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90113 Bug ID: 90113 Summary: Useless torture mode for gfortran.dg tests Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: testsuite Assignee: unassigned at gcc dot gnu.org Reporter: zhroma at ispras dot ru Target Milestone: --- I’ve recently found that tests, which are placed in gcc/testsuite/gfortran.dg folder are running in “torture” mode with different optimization options. While working with PR90001 I’ve looked into sms-2.f90 and forall_10.f90 tests from gfortran.dg. I analyzed only compilation time in that PR, but also was wondered with that these tests were compiled several times with lines like: sms-2.f90: “-O0 -O2 -fmodulo-sched” “-O1 -O2 -fmodulo-sched” ... “-O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions -O2 -fmodulo-sched” forall_10.f90 “-O0 -O” “-O1 -O” ... “-O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions -O” I’ve found a discussion when gfortran.dg/dg.exp was added in 2004, and there Joseph mentioned “torture” ideas: https://gcc.gnu.org/ml/gcc-patches/2004-07/msg01131.html “gcc.c-torture: multiple optimization options, built with -w to disable all warnings. gcc.dg: no looping over multiple options, defaults to -ansi -pedantic if no options given. gcc.dg/torture: like gcc.dg (so no -w) but loops over multiple options. Much of gcc.dg that uses some optimization options belongs in there.” I’ve started working with gcc a bit later, but I always thought, that same logic can be applied to other languages. And now I know that in fortran tests it is broken since that old time. Looking into recent commits (which add new fortran tests) shows that people also wrongly suppose that difference between gfortran.fortran-torture and gfortran.dg is same as in C language test folders (gcc.c-torture and gcc.dg). So, a lot of tests in gfortran.dg contain optimization level option, and this leads to many useless torture runs. In most torture option lines only optimization level is set, and an option inside test overrides that level. Maybe this broken logic should be fixed and we have to disable torture runs in gfortran.dg and run them like gcc.dg tests?
[Bug rtl-optimization/90001] Compile-time hog in swing modulo scheduler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90001 --- Comment #5 from Roman Zhuykov --- Retested patch separately, everything works. Have found 2 more slow Fortran examples on (obsolete) spu platform and with additional options like -O1/O2 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions gfortran.dg/sms-2.f90 gfortran.dg/forall_10.f90 Compilation| time, sec |unpatched vs patched| sms options forall_10 | 35.02 0.66 | -fmodulo-sched forall_10 | 87.44 0.52 | -fmodulo-sched -fmodulo-sched-allow-regmoves sms-2 | 34.58 0.44 | -fmodulo-sched sms-2 | 86.77 0.27 | -fmodulo-sched -fmodulo-sched-allow-regmoves
[Bug rtl-optimization/90040] New: [meta-bug] modulo-scheduler and partitioning issues
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90040 Bug ID: 90040 Summary: [meta-bug] modulo-scheduler and partitioning issues Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: zhroma at ispras dot ru Target Milestone: --- Here I want to discuss the situation with modulo scheduler pass when -freorder-blocks-and-partition is also enabled. TL;DR Kindly ask RTL folks to fix ICE happening in cfg_layout_redirect_edge_and_branch_force while trying to redirect a crossing jump. -- The problem here is not in modulo scheduler algorithm itself, but in pass initialization (and finalization) procedures. It enter (and later finally leaves) cfg_layout mode, it also calls loop_optimizer_init. And all this stuff leads to many branch redirections, which should successfully happen after partitioning in bbro pass. This issue is not new, at least here https://gcc.gnu.org/ml/gcc-patches/2011-07/msg01811.html I found that entering cfg_layout broke things on x86 (where required doloop pattern is absent), and introduce an idea to move the sms pass before bbro. But later I got this thought from Richard https://gcc.gnu.org/ml/gcc-patches/2011-10/msg01526.html and I agree that we have to fix it in another way. Now in 2019, we got at least five bugs about the same situation. I want to connect them all into this discussion. First to say, PR85408 and PR87329 are open now, PR87475 is fixed, but they all are about the same assertion «internal compiler error: in patch_jump_insn, at cfgrtl.c:1271». PR87475 was fixed by Jakub back in November by r266219, and two other were later reported unreproducible, so, IMHO, they are fixed now too. Jakub's ChangeLog: PR rtl-optimization/87475 * cfgrtl.c (patch_jump_insn): Allow redirection failure for CROSSING_JUMP_P insns. (cfg_layout_redirect_edge_and_branch): Don't ICE if ret is NULL. Second, I want to discuss PR85426, where first buggy assertion was same as in that three PRs, but after Jakub’s fix, Arseny reports another fallen assertion: «internal compiler error: in cfg_layout_redirect_edge_and_branch_force, at cfgrtl.c:4482» And this last assertion is the real issue -- that happens when we call redirect_edge_and_branch_force for crossing jump. I’m not familiar with partitioning, so have no idea how to fix this. Hopefully, if we find a fix for this, PR89116 would also be fixed, although situation there in not absolutely the same — there we ICE on same assertion only in split_edge_and_insert while running generate_prolog_epilog, so this happens only after SMS succeeded to create a schedule. [Appendix 1] I want to mention PR83886 also, which was fixed by Honza with the following ChangeLog: PR rtl/84058 * cfgcleanup.c (try_forward_edges): Do not give up on crossing jumps; choose last target that matches the criteria (i.e. no partition changes for non-crossing jumps). * cfgrtl.c (cfg_layout_redirect_edge_and_branch): Add basic support for redirecting crossing jumps to non-crossing. So here we have also some change with crossing jumps redirection. [Appendix 2] There are also some issues with entering/exiting cfg_layout. Technically, all of them are fixed right now: PR83771, PR83475 (fixed together with PR81791). But I wonder maybe this all is just latent, because in other passes I see a special check to prevent entering cfg_layout after partitioning. For example in hw-doloop.c we have: /* We can't enter cfglayout mode anymore if basic block partitioning already happened. */ if (do_reorder && !crtl->has_bb_partition) This condition were added back in 2014 https://gcc.gnu.org/ml/gcc-patches/2014-01/msg00282.html and that time it looks like: if (do_reorder && !flag_reorder_blocks_and_partition) And later were adjusted by Honza to current state: https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00515.html But we dont have any check like this in modulo scheduler. There we certainly can’t proceed without entering cfg_layout, so I’m not sure it would be a good idea to add such a check. But maybe we have now some more latent bugs with entering cfg_layout after partitioning? [Appendix 3] Last month I spent a lot of time updating my patches described here https://gcc.gnu.org/ml/gcc-patches/2017-02/msg01647.html and have locally added several other patches to fix most of modulo-sched PRs from bugzilla, but annoying issue described here prohibits me to test my branch in all possible scenarios. I'll try to add some comments into all other modulo scheduler bugzilla PRs this or next week while we are on stage 4.
[Bug target/84369] test case gcc.dg/sms-10.c fails on power9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84369 Roman Zhuykov changed: What|Removed |Added CC||zhroma at ispras dot ru --- Comment #3 from Roman Zhuykov --- I compared modulo-sched DDGs in “power8 vs power9” modes and main difference is not combined instruction mentioned by Peter, but movsi_internal1 dependencies cost. For this two instructions: (insn 23 22 25 4 (set (mem:SI (plus:DI (reg/f:DI 126 [ regstat_n_sets_and_refs.1_9 ]) (reg:DI 141 [ ivtmp.26 ])) [2 MEM[base: regstat_n_sets_and_refs.1_9, index: ivtmp.26_35, offset: 0B]+0 S4 A32]) (reg:SI 148 [ _7->n_refs ])) "sms-10.c":50:40 502 {*movsi_internal1} (expr_list:REG_DEAD (reg:SI 148 [ _7->n_refs ]) (nil))) (insn 32 31 33 4 (set (mem:SI (plus:DI (reg/f:DI 159) (reg:DI 141 [ ivtmp.26 ])) [2 MEM[base: _44, index: ivtmp.26_35, offset: 0B]+0 S4 A32]) (reg:SI 154)) "sms-10.c":51:40 502 {*movsi_internal1} (expr_list:REG_DEAD (reg:SI 154) (nil))) insn_default_latency (and then insn_sched_cost) function returns significantly different values: 5 for power8, 0 for power9. There are other movsi_internal1 instructions in this loop, their cost also differ, but it’s only 1 cycle “3 vs 4” change, hopefully it is correct. The same thing (“5 vs 0” cost) also broke this test on 32-bit powerpc. There are no combining difference there, only the cost issue, but it also prevents modulo-sched to succeed. I’m not familiar with .md files, so I ask somebody to look at “5 vs 0” issue. If such cost difference is correct, then it seems a good solution just to skip this test on power9 cpus.
[Bug target/87979] ICE in compute_split_row at modulo-sched.c:2393
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87979 --- Comment #2 from Roman Zhuykov --- Situation is same in the following tests on ia64 platform with -fmodulo-sched enabled (with any of O1, O2, Os): gcc.dg/torture/pr82762.c gcc.c-torture/execute/20170419-1.c We divide by zero when we try to schedule loop body in zero cycles. Both res_mii and rec_mii estimations equals zero. We have to start with one cycle in this situation. Patch was successfully bootstrapped and regtested with few other patches on x86_64. In cross-compiler mode to s390, spu, aarch64, arm, ia64, ppc and ppc64 patch was regtested, and also with -fmodulo-sched enabled by default. All same testing also done on 8 branch. Mentioned ia64 tests were the only difference. diff --git a/gcc/modulo-sched.c b/gcc/modulo-sched.c --- a/gcc/modulo-sched.c +++ b/gcc/modulo-sched.c @@ -1597,6 +1597,7 @@ sms_schedule (void) mii = 1; /* Need to pass some estimate of mii. */ rec_mii = sms_order_nodes (g, mii, node_order, _asap); mii = MAX (res_MII (g), rec_mii); + mii = MAX (mii, 1); maxii = MAX (max_asap, MAXII_FACTOR * mii); if (dump_file)
[Bug target/87979] ICE in compute_split_row at modulo-sched.c:2393
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87979 Roman Zhuykov changed: What|Removed |Added CC||zhroma at ispras dot ru --- Comment #1 from Roman Zhuykov --- Created attachment 46137 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46137=edit Proposed patch
[Bug rtl-optimization/84032] ICE in optimize_sc, at modulo-sched.c:1064
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84032 --- Comment #4 from Roman Zhuykov --- There is the following mistake in logic behind the code. We want to schedule the branch instructions only as a last instruction in a row. But when branch was scheduled and we add other instructions into partial schedule, we sometimes allow them to be in same row after the branch. The issue happens later when we try to reschedule branch into another row, algorithm there works like this: (1) Remove branch from the row where it is (say, “previous row”) (2) Try insert into the needed row (3) If success – OK, continue scheduling other instructions (4) But when inserting (2) was not done – insert it back into “previous row” and this insertion must be certainly successful, which is checked by assertion. But when on step (1) branch in not last in a row there is no guarantee, that on step (4) we could insert it back, because there we will try only last-in-a-row position for it. Proposed patch solves this totally preventing other instructions to be scheduled after branch in the same row. Patch was successfully bootstrapped and regtested with few other patches on x86_64. In cross-compiler mode to s390, spu, aarch64, arm, ia64, ppc and ppc64 patch was regtested, and also with -fmodulo-sched enabled by default. All same testing also done on 8 branch. No new failures introduced. diff --git a/gcc/modulo-sched.c b/gcc/modulo-sched.c --- a/gcc/modulo-sched.c +++ b/gcc/modulo-sched.c @@ -2996,9 +2996,7 @@ ps_insn_find_column (partial_schedule_ptr ps, ps_insn_ptr ps_i, last_must_precede = next_ps_i; } /* The closing branch must be the last in the row. */ - if (must_precede - && bitmap_bit_p (must_precede, next_ps_i->id) - && JUMP_P (ps_rtl_insn (ps, next_ps_i->id))) + if (JUMP_P (ps_rtl_insn (ps, next_ps_i->id))) return false; last_in_row = next_ps_i;
[Bug rtl-optimization/84032] ICE in optimize_sc, at modulo-sched.c:1064
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84032 Roman Zhuykov changed: What|Removed |Added CC||zhroma at ispras dot ru --- Comment #3 from Roman Zhuykov --- Created attachment 46136 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46136=edit Proposed patch
[Bug rtl-optimization/90001] Compile-time hog in swing modulo scheduler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90001 --- Comment #4 from Roman Zhuykov --- Thanks for testcase. 2-3 weeks ago I already caught and fixed this on my local branch, see some info in the bottom. Current algorithm which finds recurrence_length for all DDG strongly connected components works in like O(N^6) time, where N in the number of nodes in DDG. The time is so bad mostly for graphs with lots of edges, like almost N^2 edges. Richard's suggestion is right - it will be still something like O(N^5) in worst case even without bitmap overhead for such graphs. My proposed algorithm works in O(N^3). Algorithm of finding SCCs itself is also not optimal (maybe up to O(N^4)), but here it left untouched. For some situations, when amount of edges is smaller (like equal to N), new algorithm can be unfortunately slower than old one. But I think it's better here to add some bail-out when we got more than 1000 nodes for example. Before creating this patch, I tested special version of it, where both approaches were in action and asserts were inserted to check that algorithms results (longest_simple_path values) are absolutely the same. I can publish this special version if needed. I wonder how regression test can be created for such a situation? [Testing] Proposed patch with a bunch of ~25 other patches was tested a lot: *(1) Bootstrapped and regtested on x86_64 *(2) Cross-compiler regtest on aarch64, arm, powerpc, powerpc64, ia64 and s390 *(3) Also done (1) and (2) with -fmodulo-sched enabled by default *(4) Also done (1) and (2) with -fmodulo-sched and -fmodulo-sched-allow-regmoves enabled by default *(5) Moreover, all (1-4) was also done with 4.9, 5, 6, 7, and 8 branches, on active branches an trunk date was 20190327. More than 250 compiler instances built and tested in total (counting both "unpatched" vs "patched"). No new failures which can relate to this algorithm were found. "Special version" was also tested in practically same scenarios (and one more week before, like 20190320), but not exactly all of them. But still have to retest it separately without all my stuff :) [PS] Last month I spent a lot of time updating my patches described here https://gcc.gnu.org/ml/gcc-patches/2017-02/msg01647.html and have locally added several other patches, including this fix. My updated branches are not published yet, because there are still some unsolved issues, I can't fix some bugzilla PRs also. I'll try to add comments in another modulo-sched-related PRs soon.
[Bug rtl-optimization/90001] Compile-time hog in swing modulo scheduler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90001 Roman Zhuykov changed: What|Removed |Added CC||zhroma at ispras dot ru --- Comment #3 from Roman Zhuykov --- Created attachment 46099 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46099=edit Proposed patch Untested on trunk yet
[Bug rtl-optimization/80112] [5/6 Regression] ICE in doloop_condition_get at loop-doloop.c:158
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80112 Roman Zhuykov changed: What|Removed |Added CC||zhroma at ispras dot ru --- Comment #5 from Roman Zhuykov --- Created attachment 41049 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41049=edit maybe more proper fix 6 years ago I was solving issue with same code lines and with Richard Sandiford's help found a bit better solution, it was even approved, but unfortunately we forgot to commit it. Discussion links: https://gcc.gnu.org/ml/gcc-patches/2011-07/msg01803.html https://gcc.gnu.org/ml/gcc-patches/2011-09/msg02049.html https://gcc.gnu.org/ml/gcc-patches/2012-02/msg00479.html Maybe it's better to apply that old patch? PS. All my modulo-sched improvements described here together https://gcc.gnu.org/ml/gcc-patches/2017-02/msg01647.html
[Bug target/69252] [4.9/5/6 Regression] gcc.dg/vect/vect-iv-9.c FAILs with -Os -fmodulo-sched -fmodulo-sched-allow-regmoves -fsched-pressure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69252 --- Comment #13 from Roman Zhuykov --- (In reply to Jakub Jelinek from comment #12) > Thus, Roman, can you please post your patch to gcc-patches? Ok, in addition to comment 3 link, reposted it right now https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01336.html, and ask Martin to help with creating new regression test, since I'm not ready to setup powerpc qemu vm at the moment.
[Bug target/69252] [4.9/5/6 Regression] gcc.dg/vect/vect-iv-9.c FAILs with -Os -fmodulo-sched -fmodulo-sched-allow-regmoves -fsched-pressure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69252 --- Comment #7 from Roman Zhuykov --- (In reply to Jakub Jelinek from comment #5) > insufficient SMS testsuite coverage. Not sure it's helpful, but 3 weeks ago I succesfully reg-strapped some bunch of my SMS patches including this fix on x86-64 and aarch64 using trunk 20151222. fmodulo-sched and fmodulo-sched-allow-regmoves options were enabled by default, fsched-pressure left untouched. No new regressions, excluding some scan-assembler-times failures because of loop versioning. Maybe later on stage1 I'll send all my stuff to gcc-patches.
[Bug target/69252] [4.9/5/6 Regression] gcc.dg/vect/vect-iv-9.c FAILs with -Os -fmodulo-sched -fmodulo-sched-allow-regmoves -fsched-pressure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69252 --- Comment #3 from Roman Zhuykov --- I'll try to help. While working with expanding SMS functionality 4-5 years ago (https://gcc.gnu.org/ml/gcc-patches/2011-07/msg01807.html), I create several fixes not connected to my main non-doloop-support patch. Unfortunately only two of that fixes were approved and only one fix was committed. Could you try this patch first of all: https://gcc.gnu.org/ml/gcc-patches/2011-12/msg01800.html This patch suits current trunk. There are links to other fixes, I can provide newer versions for trunk if needed. https://gcc.gnu.org/ml/gcc-patches/2011-09/msg02049.html https://gcc.gnu.org/ml/gcc-patches/2011-07/msg01804.html https://gcc.gnu.org/ml/gcc-patches/2011-12/msg00505.html https://gcc.gnu.org/ml/gcc-patches/2011-12/msg00506.html
[Bug rtl-optimization/57372] New: [4.9 Regression] Miscompiled tailcall on ARM
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57372 Bug ID: 57372 Summary: [4.9 Regression] Miscompiled tailcall on ARM Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: zhroma at ispras dot ru Created attachment 30164 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30164action=edit Preprocessed minimized testcase On ARM, gcc 4.9 revision 198928 and later sometimes creates wrong code for tailcall. For C++ code: class A { public: virtual void foo1(); }; A foo2() {} void foo3() { foo2().foo1(); } In function foo3 last instruction is bx r3, but register r3 doesn't contain a proper address at that moment. Attachment contains practically the same code, it was adjusted to create an executable, and executable produced by g++ -O2 segfaults. Earlier versions of GCC works ok.
[Bug c++/55081] New: [4.8 regression?] Non-optimized static array elements initialization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55081 Bug #: 55081 Summary: [4.8 regression?] Non-optimized static array elements initialization Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: zhr...@ispras.ru Created attachment 28536 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28536 Preprocessed minimized testcase. In some cases g++ 4.8 revision 192141 and later generate initialization block for local static array with constant elements, while earlier g++ versions insert constants into assembly data section (gcc-base is rev192140, gcc-peak is rev192141, tested on 64bit Linux): $ cat test.cpp struct R { int field; }; long* foo() { R r; static long array[] = { sizeof(char), (reinterpret_castlong((r.field)) - reinterpret_castlong(r))+1, }; return array; } $ gcc-base/bin/g++ -O2 test.cpp -S -o 1.s $ gcc-peak/bin/g++ -O2 test.cpp -S -o 2.s $ diff -u 1.s 2.s --- 1.s +++ 2.s @@ -6,8 +6,27 @@ _Z3foov: .LFB0: .cfi_startproc +cmpb$0, _ZGVZ3foovE5array(%rip) +je.L11 movl$_ZZ3foovE5array, %eax ret +.p2align 4,,10 +.p2align 3 +.L11: +subq$8, %rsp +.cfi_def_cfa_offset 16 +movl$_ZGVZ3foovE5array, %edi +call__cxa_guard_acquire +testl%eax, %eax +je.L3 +movl$_ZGVZ3foovE5array, %edi +movq$1, _ZZ3foovE5array(%rip) +call__cxa_guard_release +.L3: +movl$_ZZ3foovE5array, %eax +addq$8, %rsp +.cfi_def_cfa_offset 8 +ret .cfi_endproc .LFE0: .size_Z3foov, .-_Z3foov @@ -16,7 +35,9 @@ .type_ZZ3foovE5array, @object .size_ZZ3foovE5array, 16 _ZZ3foovE5array: +.zero8 .quad1 -.quad1 +.local_ZGVZ3foovE5array +.comm_ZGVZ3foovE5array,8,8 .identGCC: (GNU) 4.8.0 20121005 (experimental) .section.note.GNU-stack,,@progbits So, the value of array[0] (sizeof(char) equals 1) is generated on the first function call instead of emitting it to assemlby data section directly. If I remove second constant element static long array[] = { sizeof(char), }; .. or reimplement it in the following way static long array[] = { sizeof(char), __builtin_offsetof(R, field)+1, }; the problem disappears. As I understand, the patch rev192141 goal is new warning. Maybe it should not affect codegen so much? Additional information. The problem described above lead to Webkit build failure. There is the following step while generating assembly for Webkit JavaScriptCore low-level interpreter: it generates dummy executable containing a function with static array: static const unsigned extractorTable[308992] = { unsigned(-1639711386), (reinterpret_castptrdiff_t((reinterpret_castArrayProfile* (0x4000)-unsigned(267773781), sizeof(ValueProfile), // and so on... }; And later this dummy executable file (its data section) is parsed to find all these sizeof-and-offset values. This certainly seems strange, but when Webkit is cross-compiled it helps to find offsets without running anything on target. After gcc revision 192141 that executable-parsing script fails to get all sizeof(...) values - they are zeros in gcc-generated assembly data section.