[Bug middle-end/87162] [6.2.0] Internal compiler error: Error reporting routines re-entered.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87162 ma.jiang at zte dot com.cn changed: What|Removed |Added CC||ma.jiang at zte dot com.cn --- Comment #8 from ma.jiang at zte dot com.cn --- (In reply to Wen Yang from comment #7) > This patch will make it more robust: > > # git diff > diff --git a/gcc/trans-mem.c b/gcc/trans-mem.c > index 1d4eb80..326e1c4 100644 > --- a/gcc/trans-mem.c > +++ b/gcc/trans-mem.c > @@ -2099,6 +2099,9 @@ gate_tm_init (void) >if (!flag_tm) > return false; > > + if (!cfun->cfg) > +return false; > + >calculate_dominance_info (CDI_DOMINATORS); >bitmap_obstack_initialize (_obstack); I think gate functions should not do complex compute. gate_tm_init should be split into two parts.
[Bug driver/78772] -fstrict-aliasing should turn on Wstrict-aliasing automaticly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78772 --- Comment #12 from ma.jiang at zte dot com.cn --- (In reply to Jim Wilson from comment #11) Hi Jim, Thank you for your patient explanations. > > If you have code with a lot of pointer casts, you should probably be > compiling with -fno-strict-aliasing and/or -Wstrict-aliasing. The handling > of these options was decided a while ago, and we won't be making any changes > here. OK, Now I understand the basic strategy of gcc for the strict-alias stuff(and other similar optimizations).
[Bug driver/78772] -fstrict-aliasing should turn on Wstrict-aliasing automaticly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78772 --- Comment #9 from ma.jiang at zte dot com.cn --- (In reply to Jim Wilson from comment #8) > Looking at the 2011 ISO C standard, section 6.5 Expressions, paragraph 7, > says "An object shall have its stored value accessed only by an lvalue > expression that has one of the following types:". It then lists compatible > types, a qualified compatible type, a signed/unsigned compatible type, a > signed/unsigned qualified compatible type, an aggregate/union that contains > a compatible type, or a char type. If your code conforms to this rule, then > -fstrict-aliasing will not break it. > > GCC supports type punning via unions. Or you can use -fno-strict-aliasing > if you want to use pointer casts for type punning. Hi Jim, Sorry for the delay, it really take me much time to read the standard(which it's not in my mother language, and hard to understand ...). Thanks a lot, I haven't noticed these pieces in c-11 standard before. It looks like that c-standard has required strict-alias since c-99. I think this fact should give us more reason to enable "-Wstrict-aliasing" by default. First, it's a violation of the c-standard. Moreover, it may generate broken binary.Obviously, this kind of warning is not similar to those "informational warnings" such as "-Wunused-variable". And by the way, the strict-alias thing is really a mess...The gcc manual should talk more about this(more right/wrong examples?). I am still not sure whether following codes(which are widely used...) are ok with strict-alias. extern char buf[32]; int *a = (int *)&(buf[16]); int result = *a; //is this ok? or we have to use memcpy(, [16], sizeof(int))?
[Bug driver/78772] -fstrict-aliasing should turn on Wstrict-aliasing automaticly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78772 --- Comment #7 from ma.jiang at zte dot com.cn --- (In reply to Markus Trippelsdorf from comment #5) Hi Markus, > These optimizations are not dangerous if you use standard conforming code. I think these optimizations are dangerous because they create wrong binaries silently, even if we use standard conforming codes. Let us just look at the "-fstrict-aliasing". This optimization assume "In particular, an object of one type is assumed never to reside at the same address as an object of a different type, unless the types are almost the same". But this restriction *IS NOT* required by the C-standard. So codes that conformed to C-standard could also be translated to wrong binaries by GCC with "-fstrict-aliasing", in silence... > > We could have a discussion if it might be better to enable some -Wall > warnings > by default like Clang. But I'm not sure if this gives huge benefits to users. I do not know whether it's a good idea to enable warnings such as Wstrict-aliasing by default. But I believe we should enable these warnings when corresponding optimizations are turned on, this do give huge benefits to users.
[Bug driver/78772] -fstrict-aliasing should turn on Wstrict-aliasing automaticly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78772 --- Comment #4 from ma.jiang at zte dot com.cn --- (In reply to Markus Trippelsdorf from comment #3) > Please no. > There are many other cases where optimizations could introduce issues that > you will not notice when you compile without warnings. > Just use -Wall. Hi, Markus Thanks for your quick reply. I know the existence of both "Wall" and "Wextra", And I did use them when compiling my codes. But, there are many guys do not know them. Moreover, As I see many open source projects does not use them either. It is that OK to assume all gcc users know "Wall"? On the other side, creating a new mechanism to open warnings for dangerous optimizations is not a hard work as I think. We just need to create a table which connect the optimization and corresponding warning.
[Bug driver/78772] -fstrict-aliasing should turn on Wstrict-aliasing automaticly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78772 --- Comment #1 from ma.jiang at zte dot com.cn --- Created attachment 40308 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40308=edit proposed patch
[Bug driver/78772] New: -fstrict-aliasing should turn on Wstrict-aliasing automaticly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78772 Bug ID: 78772 Summary: -fstrict-aliasing should turn on Wstrict-aliasing automaticly Product: gcc Version: 6.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: driver Assignee: unassigned at gcc dot gnu.org Reporter: ma.jiang at zte dot com.cn Target Milestone: --- Hi all, "-fstrict-aliasing" which is turned on at O2 level, May create wrong codes. But, GCC does not give a warning for it by default. IMO, we should turn on Wstrict-aliasing when strict-aliasing is enabled.
[Bug middle-end/65449] -fstrict-volatile-bitfields affects volatile pointer dereference and produce wrong codes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65449 --- Comment #2 from ma.jiang at zte dot com.cn --- (In reply to Bernd Edlinger from comment #1) Hi Richard, the invalid/different code for -O2 -fstrict-volatile-bitfields will be fixed with my proposed patch, because the misalignedness of mm should be visible at -O2 and prevent the strict_volatile bitfields path to be entered. Could you give your OK to the latest version? see https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00817.html Thanks Bernd. Hi Bernd, Your patch do fix the unaligned stw problem. But,there are still 4 lbz instructions for *((volatile int *)mm)=4; after your fix. I thought they were caused by the -fstrict-volatile-bitfields originally.After a more careful check, I find they are caused by temp = force_reg (mode, op0); in store_fixed_bit_field_1. The *((int *)mm)=4; produce no lbz instructions, but still produce useless load when doing rtl expand. (insn 7 6 8 2 (set (reg:QI 157) (mem/c:QI (plus:SI (reg/f:SI 155) (const_int 1 [0x1])) [1 MEM[(int *)mt + 1B]+0 S1 A8])) nt.c:5 489 {*movqi_internal} (nil)) These loads will be eliminated in fwprop1 as they are meaningless.However, after adding volatile for the memory mm, the fwprop1 pass can not delete these loads as volatile loads should not be removed. So, I think we should first get rid of the volatile flag from op0 before call force_reg (mode, op0). I have tried to adding rtx op1 = copy_rtx (op0); MEM_VOLATILE_P(op1)=0; just before force_reg, and it do remove those lbz instruction.
[Bug middle-end/65449] -fstrict-volatile-bitfields affects volatile pointer dereference and produce wrong codes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65449 --- Comment #4 from ma.jiang at zte dot com.cn --- Hi Bernd, Thanks for your apply. (In reply to Bernd Edlinger from comment #3) Yes, but that is not the fault of the strict volatile code path any more. // Sure,I agree with you. The redundant read is another problem. In fact, it should be named as volatile pointer dereference produce redundant read. For bit-fields this redundant read is exactly what AAPCS demands: // Yes.I know that volatile bitfiled need the redundant read. That is why I thought there were connections between the redundant read and -fstrict-volatile-bitfields originally. IMO, the problem is this: In store_fixed_bit_field_1 we do not know if the access is on a packed structure member where the extra read is not necessary, or if we have a bit-field where the extra read would be mandatory, even if the complete container is overwritten. // I agree that the problem is caused by the store_fixed_bit_field_1.But,I disgree with you about three things.First,the redundant read should not appear if -fstrict-volatile-bitfields was off. Second, in store_fixed_bit_field_1 we can distinguish pointer dereference from structure member access(e.g.,MEM_EXPR(op0) will tell us whether op0 is a componet_ref or not, if MEM_P(op0) is true). Third, as gcc manual said -fstrict-volatile-bitfields :This option should be used if accesses to volatile bit-fields (or other structure fields, although the compiler usually honors those types anyway) should use a single access of the width of the field’s type, aligned to a natural alignment if possible., store_fixed_bit_field_1 does not need to distinguish bitfields from normal structure member. BTW: What happens in your example if you use -O0? Isn't there still an unaligned stw access? That's because you example is in a way invalid C. You can't set an int* to an unaligned address, it must be at least aligned to the GET_MODE_ALIGNMENT(SImode). //Yes, I know what you mean. Split access is a auxiliary fuction provided by compiler with the help of data analysis.As -O0 turn off all analysis , an unaligned stw is acceptable. And ,BTW, the C standard said nothing about unaligned access as I know. Could you show me something about it?
[Bug target/65459] New: munaligned-access still produce split access codes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65459 Bug ID: 65459 Summary: munaligned-access still produce split access codes Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: ma.jiang at zte dot com.cn Hi all, The munaligned-access option on ARM can not stop gcc from producing split access codes. A very simple test char mt[20]; void main() { void *mm=(mt[1]); *(( int *)mm)=4; } compiled with -O2 -munaligned-access could reproduce this problem.As we can see,*(( int *)mm)=4; cause gcc to emit 4 strb instructions not a unaligned stw . In my opinion, this bug is caused by the SLOW_UNALIGNED_ACCESS macro,which is defined to be 1 in arm.h. It is very strange that the munaligned-access option can affect neither SLOW_UNALIGNED_ACCESS nor STRICT_ALIGNMENT. On PPC,the mstrict-align can affect STRICT_ALIGNMENT, and SLOW_UNALIGNED_ACCESS is determined by STRICT_ALIGNMENT, ALIGN, and MODE(args of the SLOW_UNALIGNED_ACCESS macro). This is a good example for ARM, I think.
[Bug middle-end/65449] New: -fstrict-volatile-bitfields affects volatile pointer dereference and produce wrong codes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65449 Bug ID: 65449 Summary: -fstrict-volatile-bitfields affects volatile pointer dereference and produce wrong codes Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: ma.jiang at zte dot com.cn Hi,all. It seems that -fstrict-volatile-bitfields can affect volatile pointer dereference. However, the gcc manual said this option should only affect accesse to bit-fields or structure fields. Compiling the test case: char mt[20]; void main() { void *mm=(mt[1]); *((volatile int *)mm)=4; } with -O2 -mstrict-align -fstrict-volatile-bitfields on PPC. We can see that *((volatile int *)mm)=4 is done by a single stw. Beware that -mstrict-align means a non-aligned memory access is disallowed, and (mt[1]) is obviously not a address aligned to 4-bytes boundary. The compiler should have no reasons to produce a unaligned stw when mstric-align is on. Further more,compiling with -O2 -mstrict-align -fno-strict-volatile-bitfields, the compiler will produce four lbz/stb pairs for *((volatile int *)mm)=4;. This is ridiculous as the C standard does not require the read, and surely no performance benefits could grain from these lbz.
[Bug rtl-optimization/61241] built-in memset makes the caller function slower
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61241 --- Comment #4 from ma.jiang at zte dot com.cn --- (In reply to ktkachov from comment #3) Can you please send the patch to gcc-patc...@gcc.gnu.org including a ChangeLog Done! Thanks.
[Bug rtl-optimization/61241] New: built-in memset makes the caller function slower than normal memset
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61241 Bug ID: 61241 Summary: built-in memset makes the caller function slower than normal memset Product: gcc Version: 4.10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ma.jiang at zte dot com.cn Compiled with -O2, #include string.h extern int off; void *test(char *a1, char* a2) { memset(a2, 123, 123); return a2 + off; } gives a result as following. mov ip, r1 mov r1, #123 stmfd sp!, {r3, lr} mov r0, ip mov r2, r1 bl memset movwr3, #:lower16:off movtr3, #:upper16:off mov ip, r0 ldr r0, [r3] add r0, ip, r0 ldmfd sp!, {r3, pc} After adding -fno-builtin, the assemble code becomes shorter. stmfd sp!, {r4, lr} mov r4, r1 mov r1, #123 mov r0, r4 mov r2, r1 bl memset movwr3, #:lower16:off movtr3, #:upper16:off ldr r0, [r3] add r0, r4, r0 ldmfd sp!, {r4, pc} One reason is that arm_eabi must align stack to 8 bytes, so it push a meaningless r3. But that is not the most important reason. When using built-in memset, ira can know that memset does not change the value of r0. Then choosing r0 instead of ip is clearly more profitable, because this choice can get rid of the redundant mov ip,r0; mov r0,ip; pair. For this rtl sequence: (insn 7 8 9 2 (set (reg:SI 0 r0) (reg/v/f:SI 115 [ a2 ])) open_test.c:5 186 {*arm_movsi_insn} (nil)) (insn 9 7 10 2 (set (reg:SI 2 r2) (reg:SI 1 r1)) open_test.c:5 186 {*arm_movsi_insn} (expr_list:REG_EQUAL (const_int 123 [0x7b]) (nil))) (call_insn 10 9 24 2 (parallel [ (set (reg:SI 0 r0) (call (mem:SI (symbol_ref:SI (memset) [flags 0x41] function_decl 0xb7d72500 memset) [0 __builtin_memset S4 A32]) (const_int 0 [0]))) (use (const_int 0 [0])) (clobber (reg:SI 14 lr)) ]) open_test.c:5 251 {*call_value_symbol} (expr_list:REG_RETURNED (reg/v/f:SI 115 [ a2 ]) (expr_list:REG_DEAD (reg:SI 2 r2) (expr_list:REG_DEAD (reg:SI 1 r1) (expr_list:REG_UNUSED (reg:SI 0 r0) (expr_list:REG_EH_REGION (const_int 0 [0]) (nil)) (expr_list:REG_CFA_WINDOW_SAVE (set (reg:SI 0 r0) (reg:SI 0 r0)) (expr_list:REG_CFA_WINDOW_SAVE (use (reg:SI 2 r2)) (expr_list:REG_CFA_WINDOW_SAVE (use (reg:SI 1 r1)) (expr_list:REG_CFA_WINDOW_SAVE (use (reg:SI 0 r0)) (nil)) Assigning r0 to r115 was blocked by two pieces of code in process_bb_node_lives(In ira-lives.c). 1: call_p = CALL_P (insn); for (def_rec = DF_INSN_DEFS (insn); *def_rec; def_rec++) if (!call_p || !DF_REF_FLAGS_IS_SET (*def_rec, DF_REF_MAY_CLOBBER)) mark_ref_live (*def_rec); 2: /* Mark each used value as live. */ for (use_rec = DF_INSN_USES (insn); *use_rec; use_rec++) mark_ref_live (*use_rec); In piece 1, set (reg:SI 0 ) (reg/v/f:SI 115) will make r0 conflict with r115 when r115 is living. This is not necessary as set (reg:SI 0) (reg:SI 0) will not hurt any other instruction. Making r0 conflict with all living pseudo registers will lose the chance to optimize a set instruction. I think at least for a simple single set, we should not make the source register conflict with the dest register when one of them is hard register and the other is not. In piece 2, after call memset, r0 will become living and then conflict with living r115. This code neglect that r115 is the result of find_call_crossed_cheap_reg, and in fact r115 is the same as r0. As discussed above, the two pieces of code block the ira to do a more profitable choice.I have build a patch to fix this problem. After the patch, the assemble code with built-in memset become shorter than normal memset. mov r0, r1 mov r1, #123 stmfd sp!, {r3, lr} mov r2, r1 bl memset movwr3, #:lower16:off movtr3, #:upper16:off ldr r3, [r3] add r0, r0, r3 ldmfd sp!, {r3, pc} I have done a bootstrap and make check on x86, nothing change after the patch. Is that patch OK for trunk?
[Bug rtl-optimization/61241] built-in memset makes the caller function slower than normal memset
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61241 --- Comment #1 from ma.jiang at zte dot com.cn --- Created attachment 32822 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=32822action=edit proposed patch
[Bug rtl-optimization/61241] built-in memset makes the caller function slower than normal memset
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61241 --- Comment #2 from ma.jiang at zte dot com.cn --- Created attachment 32823 -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=32823action=edit testcase should be put into gcc/testsuite/gcc.target/arm
[Bug c/59785] New: atomic_store should load source operand atomically?
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59785 Bug ID: 59785 Summary: atomic_store should load source operand atomically? Product: gcc Version: 4.8.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: ma.jiang at zte dot com.cn gcc provide a atomic_store interface as below. void __atomic_store (type *ptr, type *val, int memmodel) In manual, There is only a simple description ---This is the generic version of an atomic store. It stores the value of *val into *ptr. But it seemed that this interface could not do what the description said when *val is a global memory which might be messed up by other threads. unsigned long long int ta=123; unsigned long long int tb=321; int test() { __atomic_store(tb, ta, __ATOMIC_RELAXED); return 0; } On ARM,the codes above give out a assembly instruction sequence as below. movwr3, #:lower16:ta movtr3, #:upper16:ta ldrdr0, [r3] movwr3, #:lower16:tb movtr3, #:upper16:tb .L3: ldrexd r4, r5, [r3] strexd r2, r0, r1, [r3] cmp r2, #0 bne .L3 the source operand ta was not fetched atomically, so we might get a totally wrong num from tb when ta were modified by other threads. I guess the atomic_store interface only provide a function as--fetch val non-atomically ,and then store it into ptr atomically. This is exactlly what the gimple tree shows. test () { long long unsigned int ta.0; int D.4127; ta.0 = ta; __atomic_store_8 (tb, ta.0, 0); D.4127 = 0; return D.4127; } I think it would be better to give a more detailed desctription in the manual. Or make a atomical load in the atomic_store.
[Bug c/59785] atomic_store should load source operand atomically?
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59785 --- Comment #2 from ma.jiang at zte dot com.cn --- Ok, Thanks for the reply. I know what you mean. A atomic_store is not responsible for loading the source operand atomically as it is just a store.
[Bug lto/59000] New: lto can't merge user-defined weak builtin functions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59000 Bug ID: 59000 Summary: lto can't merge user-defined weak builtin functions Product: gcc Version: 4.8.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: ma.jiang at zte dot com.cn Created attachment 31156 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=31156action=edit first source User defined built-in implementation cause lto error. Use source files in the attachment, following commands can reproduce this problem. xxx-gcc *.c -O2 -flto -c xxx-gcc *.o -o tt.mid -r -O2 -flto -nostdlib lto1: error: two or more sections for .gnu.lto_scalbnf.a2a70fd51cfc1298 (null):0: confused by earlier errors, bailing out lto-wrapper: ../powerpc-unknown-linux-gnu-gcc returned 1 exit status collect2: error: lto-wrapper returned 1 exit status It seems that there is a bug in lto_symtab_merge_cgraph_nodes_1 in lto-symtab.c. cgraph_node *ce = dyn_cast cgraph_node (e); if (ce !DECL_BUILT_IN (e-symbol.decl)) lto_cgraph_replace_node (ce, cgraph (prevailing)); if (varpool_node *ve = dyn_cast varpool_node (e)) lto_varpool_replace_node (ve, varpool (prevailing)); The function will do real replace only when e is NOT referring to a built-in function.This is not right, because user can provide their own built-in implementations which should also be merged. we change DECL_BUILT_IN (e-symbol.decl)) to DECL_IS_BUILTIN (e-symbol.decl)) to fix this problem, is the fix ok for the mainstream?
[Bug lto/59000] lto can't merge user-defined weak builtin functions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59000 --- Comment #1 from ma.jiang at zte dot com.cn --- Created attachment 31157 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=31157action=edit second source
[Bug lto/59000] lto can't merge user-defined weak builtin functions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59000 --- Comment #3 from ma.jiang at zte dot com.cn --- (In reply to Richard Biener from comment #2) This works on trunk, but of course symtab merging is pre-empted here by tree merging I think. The error can be reproduced on trunk with t1.i __attribute__((weak)) float scalbnf(float z) { return z; } t2.i __attribute__((weak)) float scalbnf(float x, int y) { return x*y; } which would be kind of invalid, of course. Or with t1.i __attribute__((weak,used)) float scalbnf(float x, int y) { return x*y; } t2.i __attribute__((weak)) float scalbnf(float x, int y) { return x*y; } or with any other unrelated and harmless extra attribute on one decl. This never worked btw. Thanks for the reply.Yes, the problem can be reproduced by any built-in functions when there are two global definitions(of cources,with the same name) in the lto linking process. The tree merging decided to merge the two definitions as there can only be one global definition in a object file.This is right ,but the lto_symtab_merge_cgraph_nodes_1 refuse to do the real merge. Built-in functions which are NOT defined by user have no real bodys in the object file, so they does not need a real merge.I think that is what lto_symtab_merge_cgraph_nodes_1 really want to do.Unfortunately,It make a mistake by using DECL_BUILT_IN rather than DECL_IS_BUILTIN.