[Bug target/83507] [8 Regression] ICE in internal_dfa_insn_code_* for powerpc targets

2019-04-16 Thread zhroma at ispras dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83507

Roman Zhuykov  changed:

   What|Removed |Added

 CC||zhroma at ispras dot ru

--- Comment #12 from Roman Zhuykov  ---
Not sure if I have to reopen it, but fix wasn’t correct.  In this example we
don’t have -fmodulo-sched-allow-regmoves enabled and we should not create any
register moves at all.

More discussion and proper fix:
https://gcc.gnu.org/ml/gcc-patches/2019-04/msg00632.html

[Bug testsuite/90113] New: Useless torture mode for gfortran.dg tests

2019-04-16 Thread zhroma at ispras dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90113

Bug ID: 90113
   Summary: Useless torture mode for gfortran.dg tests
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zhroma at ispras dot ru
  Target Milestone: ---

I’ve recently found that tests, which are placed in gcc/testsuite/gfortran.dg
folder are running in “torture” mode with different optimization options.

While working with PR90001 I’ve looked into sms-2.f90 and forall_10.f90 tests
from gfortran.dg.  I analyzed only compilation time in that PR, but also was
wondered with that these tests were compiled several times with lines like:
sms-2.f90:
“-O0 -O2 -fmodulo-sched”
“-O1 -O2 -fmodulo-sched”
...
“-O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
-finline-functions -O2 -fmodulo-sched”

forall_10.f90
“-O0 -O”
“-O1 -O”
...
“-O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
-finline-functions -O”

I’ve found a discussion when gfortran.dg/dg.exp was added in 2004, and there 
Joseph mentioned “torture” ideas:
https://gcc.gnu.org/ml/gcc-patches/2004-07/msg01131.html
“gcc.c-torture: multiple optimization options, built with -w to disable all
warnings.
gcc.dg: no looping over multiple options, defaults to -ansi -pedantic if
no options given.
gcc.dg/torture: like gcc.dg (so no -w) but loops over multiple options.  
Much of gcc.dg that uses some optimization options belongs in there.”

I’ve started working with gcc a bit later, but I always thought, that same
logic can be applied to other languages.  And now I know that in fortran tests
it is broken since that old time.

Looking into recent commits (which add new fortran tests) shows that people
also wrongly suppose that difference between gfortran.fortran-torture and
gfortran.dg is same as in C language test folders (gcc.c-torture and gcc.dg).

So, a lot of tests in gfortran.dg contain optimization level option, and this
leads to many useless torture runs.  In most torture option lines only
optimization level is set, and an option inside test overrides that level.

Maybe this broken logic should be fixed and we have to disable torture runs in
gfortran.dg and run them like gcc.dg tests?

[Bug rtl-optimization/90001] Compile-time hog in swing modulo scheduler

2019-04-16 Thread zhroma at ispras dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90001

--- Comment #5 from Roman Zhuykov  ---
Retested patch separately, everything works.
Have found 2 more slow Fortran examples on (obsolete) spu platform and with
additional options like -O1/O2 -fomit-frame-pointer -funroll-loops -fpeel-loops
-ftracer -finline-functions

gfortran.dg/sms-2.f90 
gfortran.dg/forall_10.f90

Compilation|
time, sec  |unpatched vs patched| sms options
forall_10  |  35.02   0.66  | -fmodulo-sched
forall_10  |  87.44   0.52  | -fmodulo-sched -fmodulo-sched-allow-regmoves
sms-2  |  34.58   0.44  | -fmodulo-sched
sms-2  |  86.77   0.27  | -fmodulo-sched -fmodulo-sched-allow-regmoves

[Bug rtl-optimization/90040] New: [meta-bug] modulo-scheduler and partitioning issues

2019-04-10 Thread zhroma at ispras dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90040

Bug ID: 90040
   Summary: [meta-bug] modulo-scheduler and partitioning issues
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zhroma at ispras dot ru
  Target Milestone: ---

Here I want to discuss the situation with modulo scheduler pass when
-freorder-blocks-and-partition is also enabled.

TL;DR Kindly ask RTL folks to fix ICE happening in 
cfg_layout_redirect_edge_and_branch_force while trying to redirect a crossing
jump.

--
The problem here is not in modulo scheduler algorithm itself, but in pass
initialization (and finalization) procedures. It enter (and later finally
leaves) cfg_layout mode, it also calls loop_optimizer_init. And all this stuff
leads to many branch redirections, which should successfully happen after
partitioning in bbro pass.

This issue is not new, at least here
https://gcc.gnu.org/ml/gcc-patches/2011-07/msg01811.html
I found that entering cfg_layout broke things on x86 (where required doloop
pattern is absent), and introduce an idea to move the sms pass before bbro. But
later I got this thought from Richard
https://gcc.gnu.org/ml/gcc-patches/2011-10/msg01526.html
and I agree that we have to fix it in another way.

Now in 2019, we got at least five bugs about the same situation. I want to
connect them all into this discussion.

First to say, PR85408 and PR87329 are open now, PR87475 is fixed, but they all
are about the same assertion
«internal compiler error: in patch_jump_insn, at cfgrtl.c:1271». PR87475 was
fixed by Jakub back in November by r266219, and two other were later reported
unreproducible, so, IMHO, they are fixed now too.

Jakub's ChangeLog:
PR rtl-optimization/87475
* cfgrtl.c (patch_jump_insn): Allow redirection failure for
CROSSING_JUMP_P insns.
(cfg_layout_redirect_edge_and_branch): Don't ICE if ret is NULL.

Second, I want to discuss PR85426, where first buggy assertion was same as in
that three PRs, but after Jakub’s fix, Arseny reports another fallen assertion:
«internal compiler error: in cfg_layout_redirect_edge_and_branch_force, at
cfgrtl.c:4482»

And this last assertion is the real issue -- that happens when we call
redirect_edge_and_branch_force for crossing jump. I’m not familiar with
partitioning, so have no idea how to fix this.

Hopefully, if we find a fix for this, PR89116 would also be fixed, although
situation there in not absolutely the same — there we ICE on same assertion
only in split_edge_and_insert while running generate_prolog_epilog, so this
happens only after SMS succeeded to create a schedule.

[Appendix 1] I want to mention PR83886 also, which was fixed by Honza with the
following ChangeLog:
PR rtl/84058
* cfgcleanup.c (try_forward_edges): Do not give up on crossing
jumps; choose last target that matches the criteria (i.e.
no partition changes for non-crossing jumps).
* cfgrtl.c (cfg_layout_redirect_edge_and_branch): Add basic
support for redirecting crossing jumps to non-crossing.

So here we have also some change with crossing jumps redirection.

[Appendix 2] There are also some issues with entering/exiting cfg_layout.
Technically, all of them are fixed right now: PR83771, PR83475 (fixed together
with PR81791).
But I wonder maybe this all is just latent, because in other passes I see a
special check to prevent entering cfg_layout after partitioning.

For example in hw-doloop.c we have:
  /* We can't enter cfglayout mode anymore if basic block partitioning
 already happened.  */
  if (do_reorder && !crtl->has_bb_partition)

This condition were added back in 2014
https://gcc.gnu.org/ml/gcc-patches/2014-01/msg00282.html and that time it looks
like:
  if (do_reorder && !flag_reorder_blocks_and_partition)

And later were adjusted by Honza to current state:
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00515.html

But we dont have any check like this in modulo scheduler. There we certainly
can’t proceed without entering cfg_layout, so I’m not sure it would be a good
idea to add such a check. But maybe we have now some more latent bugs with
entering cfg_layout after partitioning?

[Appendix 3] Last month I spent a lot of time updating my patches described
here
https://gcc.gnu.org/ml/gcc-patches/2017-02/msg01647.html
and have locally added several other patches to fix most of modulo-sched PRs
from bugzilla, but annoying issue described here prohibits me to test my branch
in all possible scenarios.
I'll try to add some comments into all other modulo scheduler bugzilla PRs this
or next week while we are on stage 4.

[Bug target/84369] test case gcc.dg/sms-10.c fails on power9

2019-04-10 Thread zhroma at ispras dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84369

Roman Zhuykov  changed:

   What|Removed |Added

 CC||zhroma at ispras dot ru

--- Comment #3 from Roman Zhuykov  ---
I compared modulo-sched DDGs in “power8 vs power9” modes and main difference is
not combined instruction mentioned by Peter, but movsi_internal1 dependencies
cost. For this two instructions:

(insn 23 22 25 4 (set (mem:SI (plus:DI (reg/f:DI 126 [
regstat_n_sets_and_refs.1_9 ])
(reg:DI 141 [ ivtmp.26 ])) [2 MEM[base:
regstat_n_sets_and_refs.1_9, index: ivtmp.26_35, offset: 0B]+0 S4 A32])
(reg:SI 148 [ _7->n_refs ])) "sms-10.c":50:40 502 {*movsi_internal1}
 (expr_list:REG_DEAD (reg:SI 148 [ _7->n_refs ])
(nil)))

(insn 32 31 33 4 (set (mem:SI (plus:DI (reg/f:DI 159)
(reg:DI 141 [ ivtmp.26 ])) [2 MEM[base: _44, index:
ivtmp.26_35, offset: 0B]+0 S4 A32])
(reg:SI 154)) "sms-10.c":51:40 502 {*movsi_internal1}
 (expr_list:REG_DEAD (reg:SI 154)
(nil)))

insn_default_latency (and then insn_sched_cost) function returns significantly
different values: 5 for power8, 0 for power9.

There are other movsi_internal1 instructions in this loop, their cost also
differ, but it’s only 1 cycle “3 vs 4” change, hopefully it is correct.

The same thing (“5 vs 0” cost) also broke this test on 32-bit powerpc. There
are no combining difference there, only the cost issue, but it also prevents
modulo-sched to succeed.

I’m not familiar with .md files, so I ask somebody to look at “5 vs 0” issue.
If such cost difference is correct, then it seems a good solution just to skip
this test on power9 cpus.

[Bug target/87979] ICE in compute_split_row at modulo-sched.c:2393

2019-04-10 Thread zhroma at ispras dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87979

--- Comment #2 from Roman Zhuykov  ---
Situation is same in the following tests on ia64 platform with -fmodulo-sched
enabled (with any of O1, O2, Os):
gcc.dg/torture/pr82762.c
gcc.c-torture/execute/20170419-1.c

We divide by zero when we try to schedule loop body in zero cycles. Both
res_mii and rec_mii estimations equals zero. We have to start with one cycle in
this situation.

Patch was successfully bootstrapped and regtested with few other patches on
x86_64. In cross-compiler mode to s390, spu, aarch64, arm, ia64, ppc and ppc64
patch was regtested, and also with -fmodulo-sched enabled by default.
All same testing also done on 8 branch. Mentioned ia64 tests were the only
difference.

diff --git a/gcc/modulo-sched.c b/gcc/modulo-sched.c
--- a/gcc/modulo-sched.c
+++ b/gcc/modulo-sched.c
@@ -1597,6 +1597,7 @@ sms_schedule (void)
   mii = 1; /* Need to pass some estimate of mii.  */
   rec_mii = sms_order_nodes (g, mii, node_order, _asap);
   mii = MAX (res_MII (g), rec_mii);
+  mii = MAX (mii, 1);
   maxii = MAX (max_asap, MAXII_FACTOR * mii);

   if (dump_file)

[Bug target/87979] ICE in compute_split_row at modulo-sched.c:2393

2019-04-10 Thread zhroma at ispras dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87979

Roman Zhuykov  changed:

   What|Removed |Added

 CC||zhroma at ispras dot ru

--- Comment #1 from Roman Zhuykov  ---
Created attachment 46137
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46137=edit
Proposed patch

[Bug rtl-optimization/84032] ICE in optimize_sc, at modulo-sched.c:1064

2019-04-10 Thread zhroma at ispras dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84032

--- Comment #4 from Roman Zhuykov  ---
There is the following mistake in logic behind the code.

We want to schedule the branch instructions only as a last instruction in a
row. But when branch was scheduled and we add other instructions into partial
schedule, we sometimes allow them to be in same row after the branch.

The issue happens later when we try to reschedule branch into another row,
algorithm there works like this:
(1) Remove branch from the row where it is (say, “previous row”)
(2) Try insert into the needed row
(3) If success – OK, continue scheduling other instructions
(4) But when inserting (2) was not done – insert it back into “previous row”
and this insertion must be certainly successful, which is checked by assertion.

But when on step (1) branch in not last in a row there is no guarantee, that on
step (4) we could insert it back, because there we will try only last-in-a-row
position for it.

Proposed patch solves this totally preventing other instructions to be
scheduled after branch in the same row.

Patch was successfully bootstrapped and regtested with few other patches on
x86_64. In cross-compiler mode to s390, spu, aarch64, arm, ia64, ppc and ppc64
patch was regtested, and also with -fmodulo-sched enabled by default.
All same testing also done on 8 branch. No new failures introduced.

diff --git a/gcc/modulo-sched.c b/gcc/modulo-sched.c
--- a/gcc/modulo-sched.c
+++ b/gcc/modulo-sched.c
@@ -2996,9 +2996,7 @@ ps_insn_find_column (partial_schedule_ptr ps, ps_insn_ptr
ps_i,
 last_must_precede = next_ps_i;
 }
   /* The closing branch must be the last in the row.  */
-  if (must_precede 
- && bitmap_bit_p (must_precede, next_ps_i->id)
- && JUMP_P (ps_rtl_insn (ps, next_ps_i->id)))
+  if (JUMP_P (ps_rtl_insn (ps, next_ps_i->id)))
return false;

last_in_row = next_ps_i;

[Bug rtl-optimization/84032] ICE in optimize_sc, at modulo-sched.c:1064

2019-04-10 Thread zhroma at ispras dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84032

Roman Zhuykov  changed:

   What|Removed |Added

 CC||zhroma at ispras dot ru

--- Comment #3 from Roman Zhuykov  ---
Created attachment 46136
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46136=edit
Proposed patch

[Bug rtl-optimization/90001] Compile-time hog in swing modulo scheduler

2019-04-08 Thread zhroma at ispras dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90001

--- Comment #4 from Roman Zhuykov  ---
Thanks for testcase.
2-3 weeks ago I already caught and fixed this on my local branch, see some info
in the bottom.

Current algorithm which finds recurrence_length for all DDG strongly connected
components works in like O(N^6) time, where N in the number of nodes in DDG.
The time is so bad mostly for graphs with lots of edges, like almost N^2 edges.
Richard's suggestion is right - it will be still something like O(N^5) in worst
case even without bitmap overhead for such graphs. My proposed algorithm works
in O(N^3). Algorithm of finding SCCs itself is also not optimal (maybe up to
O(N^4)), but here it left untouched.

For some situations, when amount of edges is smaller (like equal to N), new
algorithm can be unfortunately slower than old one. But I think it's better
here to add some bail-out when we got more than 1000 nodes for example.

Before creating this patch, I tested special version of it, where both
approaches were in action and asserts were inserted to check that algorithms
results (longest_simple_path values) are absolutely the same. I can publish
this special version if needed.

I wonder how regression test can be created for such a situation?

[Testing]
Proposed patch with a bunch of ~25 other patches was tested a lot:
*(1) Bootstrapped and regtested on x86_64
*(2) Cross-compiler regtest on aarch64, arm, powerpc, powerpc64, ia64 and s390
*(3) Also done (1) and (2) with -fmodulo-sched enabled by default
*(4) Also done (1) and (2) with -fmodulo-sched and
-fmodulo-sched-allow-regmoves enabled by default
*(5) Moreover, all (1-4) was also done with 4.9, 5, 6, 7, and 8 branches, on
active branches an trunk date was 20190327.

More than 250 compiler instances built and tested in total (counting
both "unpatched" vs "patched").
No new failures which can relate to this algorithm were found.
"Special version" was also tested in practically same scenarios (and one more
week before, like 20190320), but not exactly all of them.

But still have to retest it separately without all my stuff :)

[PS] Last month I spent a lot of time updating my patches described here
https://gcc.gnu.org/ml/gcc-patches/2017-02/msg01647.html and have locally added
several other patches, including this fix. My updated branches are not
published yet, because there are still some unsolved issues, I can't fix some
bugzilla PRs also. I'll try to add comments in another modulo-sched-related PRs
soon.

[Bug rtl-optimization/90001] Compile-time hog in swing modulo scheduler

2019-04-08 Thread zhroma at ispras dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90001

Roman Zhuykov  changed:

   What|Removed |Added

 CC||zhroma at ispras dot ru

--- Comment #3 from Roman Zhuykov  ---
Created attachment 46099
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46099=edit
Proposed patch

Untested on trunk yet

[Bug rtl-optimization/80112] [5/6 Regression] ICE in doloop_condition_get at loop-doloop.c:158

2017-03-24 Thread zhroma at ispras dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80112

Roman Zhuykov  changed:

   What|Removed |Added

 CC||zhroma at ispras dot ru

--- Comment #5 from Roman Zhuykov  ---
Created attachment 41049
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41049=edit
maybe more proper fix

6 years ago I was solving issue with same code lines and with Richard
Sandiford's help found a bit better solution, it was even approved, but
unfortunately we forgot to commit it. Discussion links:
https://gcc.gnu.org/ml/gcc-patches/2011-07/msg01803.html
https://gcc.gnu.org/ml/gcc-patches/2011-09/msg02049.html
https://gcc.gnu.org/ml/gcc-patches/2012-02/msg00479.html

Maybe it's better to apply that old patch?

PS. All my modulo-sched improvements described here together
https://gcc.gnu.org/ml/gcc-patches/2017-02/msg01647.html

[Bug target/69252] [4.9/5/6 Regression] gcc.dg/vect/vect-iv-9.c FAILs with -Os -fmodulo-sched -fmodulo-sched-allow-regmoves -fsched-pressure

2016-01-18 Thread zhroma at ispras dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69252

--- Comment #13 from Roman Zhuykov  ---
(In reply to Jakub Jelinek from comment #12)
> Thus, Roman, can you please post your patch to gcc-patches?
Ok, in addition to comment 3 link, reposted it right now
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01336.html, and ask Martin to
help with creating new regression test, since I'm not ready to setup powerpc
qemu vm at the moment.

[Bug target/69252] [4.9/5/6 Regression] gcc.dg/vect/vect-iv-9.c FAILs with -Os -fmodulo-sched -fmodulo-sched-allow-regmoves -fsched-pressure

2016-01-14 Thread zhroma at ispras dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69252

--- Comment #7 from Roman Zhuykov  ---
(In reply to Jakub Jelinek from comment #5)
> insufficient SMS testsuite coverage.
Not sure it's helpful, but 3 weeks ago I succesfully reg-strapped some bunch of
my SMS patches including this fix on x86-64 and aarch64 using trunk 20151222.
fmodulo-sched and fmodulo-sched-allow-regmoves options were enabled by default,
 fsched-pressure left untouched. No new regressions, excluding some
scan-assembler-times failures because of loop versioning. Maybe later on stage1
I'll send all my stuff to gcc-patches.

[Bug target/69252] [4.9/5/6 Regression] gcc.dg/vect/vect-iv-9.c FAILs with -Os -fmodulo-sched -fmodulo-sched-allow-regmoves -fsched-pressure

2016-01-13 Thread zhroma at ispras dot ru
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69252

--- Comment #3 from Roman Zhuykov  ---
I'll try to help. While working with expanding SMS functionality 4-5 years ago
(https://gcc.gnu.org/ml/gcc-patches/2011-07/msg01807.html), I create several
fixes not connected to my main non-doloop-support patch. Unfortunately only two
of that fixes were approved and only one fix was committed.

Could you try this patch first of all:
https://gcc.gnu.org/ml/gcc-patches/2011-12/msg01800.html
This patch suits current trunk.

There are links to other fixes, I can provide newer versions for trunk if
needed.
https://gcc.gnu.org/ml/gcc-patches/2011-09/msg02049.html
https://gcc.gnu.org/ml/gcc-patches/2011-07/msg01804.html
https://gcc.gnu.org/ml/gcc-patches/2011-12/msg00505.html
https://gcc.gnu.org/ml/gcc-patches/2011-12/msg00506.html

[Bug rtl-optimization/57372] New: [4.9 Regression] Miscompiled tailcall on ARM

2013-05-22 Thread zhroma at ispras dot ru
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57372

Bug ID: 57372
   Summary: [4.9 Regression] Miscompiled tailcall on ARM
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zhroma at ispras dot ru

Created attachment 30164
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=30164action=edit
Preprocessed minimized testcase

On ARM, gcc 4.9 revision 198928 and later sometimes creates wrong code for
tailcall.

For C++ code:

class A {
  public:
  virtual void foo1();
};

A foo2() {}

void foo3() {
  foo2().foo1();
}

In function foo3 last instruction is bx r3, but register r3 doesn't contain a
proper address at that moment. Attachment contains practically the same code,
it was adjusted to create an executable, and executable produced by g++ -O2
segfaults.

Earlier versions of GCC works ok.


[Bug c++/55081] New: [4.8 regression?] Non-optimized static array elements initialization

2012-10-26 Thread zhroma at ispras dot ru


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55081



 Bug #: 55081

   Summary: [4.8 regression?] Non-optimized static array elements

initialization

Classification: Unclassified

   Product: gcc

   Version: 4.8.0

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: c++

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: zhr...@ispras.ru





Created attachment 28536

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28536

Preprocessed minimized testcase.



In some cases g++ 4.8 revision 192141 and later generate initialization block

for local static array with constant elements, while earlier g++ versions 

insert constants into assembly data section (gcc-base is rev192140, gcc-peak is

rev192141, tested on 64bit Linux):

$ cat test.cpp 

struct R {

int field;

};

long* foo() {

R r;

static long array[] = {

sizeof(char),

(reinterpret_castlong((r.field)) -

reinterpret_castlong(r))+1,

};

return array;

}

$ gcc-base/bin/g++ -O2 test.cpp -S -o 1.s

$ gcc-peak/bin/g++ -O2 test.cpp -S -o 2.s 

$ diff -u 1.s 2.s

--- 1.s

+++ 2.s

@@ -6,8 +6,27 @@

 _Z3foov:

 .LFB0:

 .cfi_startproc

+cmpb$0, _ZGVZ3foovE5array(%rip)

+je.L11

 movl$_ZZ3foovE5array, %eax

 ret

+.p2align 4,,10

+.p2align 3

+.L11:

+subq$8, %rsp

+.cfi_def_cfa_offset 16

+movl$_ZGVZ3foovE5array, %edi

+call__cxa_guard_acquire

+testl%eax, %eax

+je.L3

+movl$_ZGVZ3foovE5array, %edi

+movq$1, _ZZ3foovE5array(%rip)

+call__cxa_guard_release

+.L3:

+movl$_ZZ3foovE5array, %eax

+addq$8, %rsp

+.cfi_def_cfa_offset 8

+ret

 .cfi_endproc

 .LFE0:

 .size_Z3foov, .-_Z3foov

@@ -16,7 +35,9 @@

 .type_ZZ3foovE5array, @object

 .size_ZZ3foovE5array, 16

 _ZZ3foovE5array:

+.zero8

 .quad1

-.quad1

+.local_ZGVZ3foovE5array

+.comm_ZGVZ3foovE5array,8,8

 .identGCC: (GNU) 4.8.0 20121005 (experimental)

 .section.note.GNU-stack,,@progbits



So, the value of array[0] (sizeof(char) equals 1) is generated on the first

function call instead of emitting it to assemlby data section directly. If I

remove second constant element



static long array[] = {

sizeof(char),

};



.. or reimplement it in the following way



static long array[] = {

sizeof(char),

__builtin_offsetof(R, field)+1,

};



the problem disappears.



As I understand, the patch rev192141 goal is new warning. Maybe it should not

affect codegen so much?



Additional information.

The problem described above lead to Webkit build failure.

There is the following step while generating assembly for Webkit JavaScriptCore

low-level interpreter: it generates dummy executable containing a function with

static array:



static const unsigned extractorTable[308992] = {

unsigned(-1639711386),

(reinterpret_castptrdiff_t((reinterpret_castArrayProfile*

(0x4000)-unsigned(267773781),

sizeof(ValueProfile),

// and so on...

};



And later this dummy executable file (its data section) is parsed to find all

these sizeof-and-offset values. This certainly seems strange, but when Webkit

is cross-compiled it helps to find offsets without running anything on target.

After gcc revision 192141 that executable-parsing script fails to get all

sizeof(...) values - they are zeros in gcc-generated assembly data section.