Re: [PATCH] avoid calling alloca(0)
On 11/19/16 00:52, Martin Sebor wrote: > If you or others are concerned about the rate of false positives > of this warning please point me at a code base that might be a good > test bed to to try it on. Besides GCC I've built Binutils and the > Linux kernel with just the 5 instances in GCC. > you should also include glibc and busybox, to be sure. Bernd.
Re: [PATCH] avoid calling alloca(0)
On 11/19/16 00:52, Martin Sebor wrote: > On 11/18/2016 03:51 PM, Bernd Edlinger wrote: >> > of the builtin (the function is not declared without attribute >> > alloc_size, at least in Glibc, but GCC still expands it inline). >> > This is as simple as enclosing alloca in parentheses: >> > >> > void *p = (alloca)(n); >> > >> > Ugly? Perhaps. One might say that code that does tricky or >> >> No. I doubt that will work, unless you use -ffreestanding on the >> command line. > > It works because "__builtin_alloca" is distinct from "alloca" and > the warning code can simply compare the strings, just as you say > you did below. It's not ideal and I would prefer to avoid it but > I offer it as another solution if the performance overhead of > (n + 1) is thought to be too high. > So far I thought the warning is trying to make no differences between malloc, realloc and alloca. I would say that using realloc(x,0) is for sure always wrong. Nobody will object against a warning for that. And yes, malloc(0) is unportable, but if the result is not used and directly fed into free that should be no problem, but a warning would be also helpful because it is unportable code. But I hope that it is not the same as with the false positive array bounds warnings where the compiler first transforms the source code into something completely different and then starts to warn about it. Regarding alloca, there are probably three different forms. First a function that is not a builtin but has the name "alloca" like special_function_p understands it. It has no attributes unless the header mentions them. Calling this function with 0 should not be warned about, right? Then a function that has the name "alloca" and matches the signature of the builtin "alloca" function. And last a function that has the name "__builtin_alloca". You can distinguish between these, and possibly only warn for __builtin_alloca? Note, that will depend on the way the glibc header works. Everything would be more easy if the glibc header would not do that, and just use the alloca and no macro. Then it would be more natural to warn about alloca and not about __builtin_alloca, because that is always implemented in a sensible way. But even if the __builtin_alloca is called with 0, what do we do with the result? I mean any use of the value would be wrong. So why is there a warning, when the value is not used? Bernd.
Re: [PATCH] [AArch64] Fix PR78382
Hi Naveen, On 18/11/16 16:30, Hurugalawadi, Naveen wrote: @@ -1374,10 +1374,17 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm, case SYMBOL_SMALL_TLSGD: { rtx_insn *insns; - rtx result = gen_rtx_REG (Pmode, R0_REGNUM); + rtx result; + if (TARGET_ILP32) + result = gen_rtx_REG (SImode, R0_REGNUM); + else + result = gen_rtx_REG (DImode, R0_REGNUM); Why don't you use the mode of dest as done in other similar places. Like: + machine_mode mode = GET_MODE (dest); + rtx result = gen_rtx_REG (mode, R0_REGNUM); Thanks, Kugan
[PATCH, committed] TILEPro: link against libgcc.a when creating shared libraries
This patch forces gcc to link against libgcc.a when creating shared libraries, needed for 64-bit multiplies. Bootstrapped and tested on tilepro hardware, also backported to GCC 6. 2016-11-18 Walter Lee* config.host (tilepro*-*-linux*): Add t-slibgcc-libgcc. diff --git a/libgcc/config.host b/libgcc/config.host index 64beb21..e7e5413 100644 --- a/libgcc/config.host +++ b/libgcc/config.host @@ -1267,7 +1267,7 @@ tilegx*-*-linux*) md_unwind_header=tilepro/linux-unwind.h ;; tilepro*-*-linux*) - tmake_file="${tmake_file} tilepro/t-crtstuff t-softfp-sfdf t-softfp tilepro/t-tilepro" + tmake_file="${tmake_file} tilepro/t-crtstuff t-softfp-sfdf t-softfp tilepro/t-tilepro t-slibgcc-libgcc" md_unwind_header=tilepro/linux-unwind.h ;; v850*-*-*)
[PATCH, committed] TILE-Gx: fix barrier bundling
This patch fixes a bundling bug. When there are consecutive barriers, the end-of-bundle marker of the last barrier is getting dropped. Bootstrapped and tested on tilegx hardware, also backported to GCC 6. 2016-11-18 Walter Lee* config/tilegx/tilegx.c (tilegx_gen_bundles): Preserve end-of-bundle marker for consecutive barriers. diff --git a/gcc/config/tilegx/tilegx.c b/gcc/config/tilegx/tilegx.c index 76a7455..0403e8e 100644 --- a/gcc/config/tilegx/tilegx.c +++ b/gcc/config/tilegx/tilegx.c @@ -4469,8 +4469,7 @@ tilegx_gen_bundles (void) rtx_insn *end = NEXT_INSN (BB_END (bb)); prev = NULL; - for (insn = next_insn_to_bundle (BB_HEAD (bb), end); insn; - prev = insn, insn = next) + for (insn = next_insn_to_bundle (BB_HEAD (bb), end); insn; insn = next) { next = next_insn_to_bundle (NEXT_INSN (insn), end); @@ -4506,7 +4505,11 @@ tilegx_gen_bundles (void) PUT_MODE (prev, QImode); } delete_insn (insn); + +// Note: prev remains the same for next iteration. } + else +prev = insn; } } }
[PATCH, committed] TILE-Gx: fix clzsi2 for big-endian
This patch fixes the clzsi2 pattern, which was broken for big-endian. Bootstrapped and tested on tilegx hardware, also backported to GCC 6. 2016-11-18 Walter Lee* config/tilegx/tilegx.md (clzsi2): Fix for big-endian. --- a/gcc/config/tilegx/tilegx.md +++ b/gcc/config/tilegx/tilegx.md @@ -1798,19 +1798,20 @@ [(set_attr "type" "Y0")]) (define_expand "clzsi2" - [(set (match_dup 2) - (zero_extend:DI (match_operand:SI 1 "reg_or_0_operand" ""))) - (set (match_dup 2) - (ashift:DI (match_dup 2) - (const_int 32))) - (set (match_dup 2) - (clz:DI (match_dup 2))) - (set (match_operand:SI 0 "register_operand" "") - (subreg:SI (match_dup 2) 0))] - "" - { - operands[2] = gen_reg_rtx (DImode); - }) + [(set (match_operand:SI 0 "register_operand" "=r") + (clz:SI (match_operand:SI 1 "reg_or_0_operand" "rO")))] + "" + { +rtx tmp1 = gen_reg_rtx (DImode); +rtx tmp2 = gen_reg_rtx (DImode); +rtx tmp3 = gen_reg_rtx (DImode); + +emit_insn (gen_zero_extendsidi2 (tmp1, operands[1])); +emit_insn (gen_ashldi3 (tmp2, tmp1, (GEN_INT (32; +emit_insn (gen_clzdi2 (tmp3, tmp2)); +emit_move_insn (operands[0], gen_lowpart (SImode, tmp3)); +DONE; + }) (define_insn "ctz2" [(set (match_operand:I48MODE 0 "register_operand" "=r")
Re: [PATCH, C++] Warn on redefinition of builtin functions (PR c++/71973)
On 11/18/16 23:15, Bernd Edlinger wrote: >>> - TREE_NOTHROW (olddecl) = 0; >>> + TREE_NOTHROW (olddecl) = TREE_NOTHROW (newdecl); >> >> I still think a better fix would be to add a copy of TREE_NOTHROW to the >> else block of the if (types_match), to go with the existing copies of >> TREE_READONLY and TREE_THIS_VOLATILE. >> Hmm, looking at it again, I start to think you are right. It looks like it is *not* possible that something other than a builtin function can reach the bottom of that function with !types_match. The upper half of duplicate_decls is structured like this: /* Check for redeclaration and other discrepancies. */ if (TREE_CODE (olddecl) == FUNCTION_DECL && DECL_ARTIFICIAL (olddecl)) { if (TREE_CODE (newdecl) != FUNCTION_DECL) { return NULL_TREE; } TREE_NOTHROW (olddecl) = 0; } else if (TREE_CODE (olddecl) != TREE_CODE (newdecl)) { return error_mark_node; } else if (!types_match) { ... if (TREE_CODE (newdecl) == TEMPLATE_DECL) { return NULL_TREE; } if (TREE_CODE (newdecl) == FUNCTION_DECL) { if (DECL_EXTERN_C_P (newdecl) && DECL_EXTERN_C_P (olddecl)) { ... return NULL_TREE; } else if (...) { return error_mark_node; } else return NULL_TREE; } else { ... return error_mark_node; } } else if () So in the case types_match=true, it can neither be that olddecl is something other than a builtin function decl, nor can newdecl be something other than a function decl. Everything else makes the function return already here. Bernd.
[gomp4] remove use of CUDA unified memory in libgomp
This patch eliminates the use of CUDA unified shared memory via cuMemcpy inside nvptx_exec. The major problem with unified memory is that the CUDA driver needs to copy all of the host addresses to a special data region prior to transferring the data to and from the accelerator. I'm not sure why the use CUDA unified shared memory is necessary here because libgomp already has the necessary machinery to manage the data mappings itself directly. The only usage of CUDA unified memory occurs inside nvptx_exec. Specifically, that function uses 'dp', which is managed by the map_* functions, as a staging area to upload the omp struct arguments to the PTX kernel. This particular use of unified memory is particularly unfortunately as it imposes a synchronization barrier each time a PTX kernel is launched. At first I tried to eliminate those map* functions altogether, but that caused failures in asyncwait-1.c. This happens because the device memory used by async PTX kernel X may be overridden by async PTX kernel Y. Apparently, the map functions try to rectify this situation by implementing a pseudo circular queue in a paged sized block of memory. However, that queue implementation is fragile because map_pop 'frees' all of the allocations from the queue, not just the memory for the most recently used kernel. This patch does two things. First, it replaces the call to cuMemcpy inside nvptx_exec with cuMemcpyHtoD. Second, it teaches the map routines how to use a linked-list data structure to manage asynchronous PTX stream arguments. As an optimization map_init reserves a large block of memory for the first cuda_map so that non-async functions do not have to waste time allocating device memory. Additional cuda_maps are created for each async PTX stream with the requested SIZE as necessary. Because of the asynchronous nature, the map routines need to be guarded by a pthread lock. map_init and map_fini are already guarded by ptx_dev_lock, but map_push and map_pop need to be locked by ptx_event_lock. I don't really like the use of malloc for the cuda_maps, but that can be optimized with a different data structure later. I've applied this patch to gomp-4_0-branch. Cesar 2016-11-18 Cesar Philippidislibgomp/ * plugin/plugin-nvptx.c (struct cuda_map): New. (struct ptx_stream): Replace d, h, h_begin, h_end, h_next, h_prev, h_tail with (cuda_map *) map. (cuda_map_create): New function. (cuda_map_destroy): New function. (map_init): Update to use a linked list of cuda_map objects. (map_fini): Likewise. (map_pop): Likewise. (map_push): Likewise. Return CUdeviceptr instead of void. (init_streams_for_device): Remove stales references to ptx_stream members. (select_stream_for_async): Likewise. (nvptx_exec): Update call to map_init. diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index e4fcc0e..c435012 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -95,20 +95,20 @@ cuda_error (CUresult r) static unsigned int instantiated_devices = 0; static pthread_mutex_t ptx_dev_lock = PTHREAD_MUTEX_INITIALIZER; +struct cuda_map +{ + CUdeviceptr d; + size_t size; + bool active; + struct cuda_map *next; +}; + struct ptx_stream { CUstream stream; pthread_t host_thread; bool multithreaded; - - CUdeviceptr d; - void *h; - void *h_begin; - void *h_end; - void *h_next; - void *h_prev; - void *h_tail; - + struct cuda_map *map; struct ptx_stream *next; }; @@ -120,101 +120,114 @@ struct nvptx_thread struct ptx_device *ptx_dev; }; +static struct cuda_map * +cuda_map_create (size_t size) +{ + struct cuda_map *map = GOMP_PLUGIN_malloc (sizeof (struct cuda_map)); + + assert (map); + + map->next = NULL; + map->size = size; + map->active = false; + + CUDA_CALL_ERET (NULL, cuMemAlloc, >d, size); + assert (map->d); + + return map; +} + +static void +cuda_map_destroy (struct cuda_map *map) +{ + CUDA_CALL_ASSERT (cuMemFree, map->d); + free (map); +} + +/* The following map_* routines manage the CUDA device memory that + contains the data mapping arguments for cuLaunchKernel. Each + asynchronous PTX stream may have multiple pending kernel + invocations, which are launched in a FIFO order. As such, the map + routines maintains a queue of cuLaunchKernel arguments. + + Calls to map_push and map_pop must be guarded by ptx_event_lock. + Likewise, calls to map_init and map_fini are guarded by + ptx_dev_lock inside GOMP_OFFLOAD_init_device and + GOMP_OFFLOAD_fini_device, respectively. */ + static bool map_init (struct ptx_stream *s) { int size = getpagesize (); assert (s); - assert (!s->d); - assert (!s->h); - - CUDA_CALL (cuMemAllocHost, >h, size); - CUDA_CALL (cuMemHostGetDevicePointer, >d, s->h, 0); - assert (s->h); + s->map = cuda_map_create (size); - s->h_begin = s->h; - s->h_end = s->h_begin + size; - s->h_next = s->h_prev = s->h_tail = s->h_begin; - - assert
[committed] Fix g++.dg/cpp1y/pr68180.C on i686-linux
Hi! I've noticed in i686-linux bootstrap: FAIL: g++.dg/cpp1y/pr68180.C -std=c++14 (test for excess errors) This is due to using of vector argument in a function, without corresponding vector ISA enabled (SSE). Fixed thusly, committed to trunk as obvious. 2016-11-18 Jakub JelinekPR c++/68180 * g++.dg/cpp1y/pr68180.C: Add -Wno-psabi as dg-additional-options. --- gcc/testsuite/g++.dg/cpp1y/pr68180.C.jj 2016-11-17 18:08:10.554741910 +0100 +++ gcc/testsuite/g++.dg/cpp1y/pr68180.C2016-11-18 22:31:18.242429801 +0100 @@ -1,5 +1,6 @@ // PR c++/68180 // { dg-do compile { target c++14 } } +// { dg-additional-options "-Wno-psabi" } typedef float __attribute__( ( vector_size( 16 ) ) ) float32x4_t; constexpr float32x4_t fill(float x) { Jakub
Re: [PATCH] avoid calling alloca(0)
On 11/18/2016 03:51 PM, Bernd Edlinger wrote: > of the builtin (the function is not declared without attribute > alloc_size, at least in Glibc, but GCC still expands it inline). > This is as simple as enclosing alloca in parentheses: > > void *p = (alloca)(n); > > Ugly? Perhaps. One might say that code that does tricky or No. I doubt that will work, unless you use -ffreestanding on the command line. It works because "__builtin_alloca" is distinct from "alloca" and the warning code can simply compare the strings, just as you say you did below. It's not ideal and I would prefer to avoid it but I offer it as another solution if the performance overhead of (n + 1) is thought to be too high. Although the macro is not used in this case, even a declaraion like extern void *alloca (size_t __size); goes thru the decl-anticipated path, that means it inherits all the attributes from the builtin, unless the parameter don't match, in that case you get a warning in C but no warning in C++ (I am going to change the latter). There is currently an attribute malloc but no explicit attribute alloca, that's one of the reasons why special_function_p still has to do a string compare of the function name aginst "alloca". I think Jakub is right that we need a better way to fix the warning for instance with a comment like the fall thru thing. I don't believe this rare, corner case justifies special treatment. There are many warnings in both -Wall and -Wextra that are orders of magnitude noisier and that people find useful nonetheless. They don't need a special mechanism to suppress or fine tune beyond what GCC already provides: an option and a pragma, and neither should this one. I certainly don't think that extending the complicated (and controversial) machinery that was put in place to deal with the exceedingly noisy fallthrough warning would be worth the effort or even appropriate. If you or others are concerned about the rate of false positives of this warning please point me at a code base that might be a good test bed to to try it on. Besides GCC I've built Binutils and the Linux kernel with just the 5 instances in GCC. Martin
Re: [PATCH] libiberty: Fix -Wimplicit-fallthrough warnings.
On Wednesday 02 November 2016, Mark Wielaard wrote: > -case 11: c+=((hashval_t)k[10]<<24); > -case 10: c+=((hashval_t)k[9]<<16); > -case 9 : c+=((hashval_t)k[8]<<8); > +case 11: c+=((hashval_t)k[10]<<24); /* fall through */ > +case 10: c+=((hashval_t)k[9]<<16); /* fall through */ > +case 9 : c+=((hashval_t)k[8]<<8);/* fall through */ >/* the first byte of c is reserved for the length */ This really highlights another exception -Wimplicit-fallthough should tolerate at least on level 1. Single line of code. case X: my_statement(); case y: my_statement(); and case X: my_statement(); case y: my_statement(); In both cases, the lack of break is obvious at a glance and thus a comment highlighting it has never been needed and shouldn't be enforced. Well, at least in my opinion, and it would take away all the rest of false positives I run into. Best regards `Allan
Re: [PATCH], Tweak PowerPC movdi constraints
On Fri, Nov 18, 2016 at 05:52:12PM -0500, Michael Meissner wrote: > On Fri, Nov 18, 2016 at 04:43:40PM -0600, Segher Boessenkool wrote: > > On Fri, Nov 18, 2016 at 03:38:38PM -0500, Michael Meissner wrote: > > > This patch tweaks the movdi constraints for the PowerPC to use "^" or "$" > > > constraints instead of "?*". This allows the register allocator to more > > > often > > > allocate DImode to a floating point/vector register when it is desirable > > > to do > > > so. > > > > It also changes some plain "?" to "^" or "$" or even "*" (which cannot > > work for multi-character constraints, it skips one character, not one > > constraint). Wrong version of the patch? > > Note, if '*' does not work with multi-character prefixes, that is a bug. All > of rs6000.md assumes that ?*wa means that the register allocator will not > consider VSX vector registers for when calculating register preferences. The documentation is out of date. >From ira-costs.c: /* Scan all the constraint letters. See if the operand matches any of the constraints. Collect the valid register classes and see if this operand accepts memory. */ while ((c = *p)) { switch (c) { case '*': /* Ignore the next letter for this pass. */ c = *++p; break; and then 83 lines later: p += CONSTRAINT_LEN (c, p); if (c == ',') break; } so it does in fact work. Neither the patch description nor the changelog says you are doing these changes though. > > > I built bootstrap compilers and did make check with no regressions on: > > > 1)Little endian power8, --with-cpu=power8 > > > 2)Big endian power8, --with-cpu=power8 (no 32-bit support) > > > 3)Big endian power7, --with-cpu=power7 (both 32/64-bit support) > > > > Could you also test with reload please? Just LE is enough I guess. > > We'd like to keep reload working for GCC 7 at least, and these cost > > prefixes tend to break mov patterns :-/ > > Argh, I guess you are right, but then if reload doesn't work, I will likely > submit the patch where there are three different movdi's (one for 32-bit > without the change, one for 64-bit with reload, and one for 64-bit with lra). > I would prefer not to do that. Let's hope it just works :-) Segher
Re: [PATCH], Tweak PowerPC movdi constraints
On Fri, Nov 18, 2016 at 04:43:40PM -0600, Segher Boessenkool wrote: > On Fri, Nov 18, 2016 at 03:38:38PM -0500, Michael Meissner wrote: > > This patch tweaks the movdi constraints for the PowerPC to use "^" or "$" > > constraints instead of "?*". This allows the register allocator to more > > often > > allocate DImode to a floating point/vector register when it is desirable to > > do > > so. > > It also changes some plain "?" to "^" or "$" or even "*" (which cannot > work for multi-character constraints, it skips one character, not one > constraint). Wrong version of the patch? Note, if '*' does not work with multi-character prefixes, that is a bug. All of rs6000.md assumes that ?*wa means that the register allocator will not consider VSX vector registers for when calculating register preferences. > > I built bootstrap compilers and did make check with no regressions on: > > 1) Little endian power8, --with-cpu=power8 > > 2) Big endian power8, --with-cpu=power8 (no 32-bit support) > > 3) Big endian power7, --with-cpu=power7 (both 32/64-bit support) > > Could you also test with reload please? Just LE is enough I guess. > We'd like to keep reload working for GCC 7 at least, and these cost > prefixes tend to break mov patterns :-/ Argh, I guess you are right, but then if reload doesn't work, I will likely submit the patch where there are three different movdi's (one for 32-bit without the change, one for 64-bit with reload, and one for 64-bit with lra). I would prefer not to do that. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH] avoid calling alloca(0)
> of the builtin (the function is not declared without attribute > alloc_size, at least in Glibc, but GCC still expands it inline). > This is as simple as enclosing alloca in parentheses: > > void *p = (alloca)(n); > > Ugly? Perhaps. One might say that code that does tricky or No. I doubt that will work, unless you use -ffreestanding on the command line. Although the macro is not used in this case, even a declaraion like extern void *alloca (size_t __size); goes thru the decl-anticipated path, that means it inherits all the attributes from the builtin, unless the parameter don't match, in that case you get a warning in C but no warning in C++ (I am going to change the latter). There is currently an attribute malloc but no explicit attribute alloca, that's one of the reasons why special_function_p still has to do a string compare of the function name aginst "alloca". I think Jakub is right that we need a better way to fix the warning for instance with a comment like the fall thru thing. Complicated code changes can also introduce new errors. Bernd.
Re: [PATCH] substring_loc info needs default track-macro-expansion (PR preprocessor/78324)
On 11/18/2016 03:57 PM, David Malcolm wrote: On Fri, 2016-11-18 at 09:51 -0700, Martin Sebor wrote: Martin: are the changes to your test cases OK by you, or is there a better way to rewrite them? Thanks for looking into it! Since the purpose of the test_sprintf_note function in the test is to verify the location of the caret within the warnings I think we should keep it if it's possible. Would either removing the P macro or moving the function to a different file that doesn't use the -ftrack-macro-expansion=0 option work? To get substring locations with the proposed patch, that code will need to be in a file without -ftrack-macro-expansion=0. builtin-sprintf-warn-4.c seems to fit the bill, and has caret-printing enabled too, so here's another attempt at the patch, just covering the affected test cases, which moves test_sprintf_note to that file, and drops "P". The carets/underlines from the warnings look sane, and the test case verifies that via dg-begin/end-multiline-output directives. The test case also verifies the carets/underlins from the *notes*. [FWIW, I'm less convinced by the carets/underlines from the notes: they all underline the whole of the __builtin_sprintf expression, though looking at gimple-ssa-sprintf.c, I see: location_t callloc = gimple_location (info.callstmt); which is used for the "inform" calls in question. Hence I think it's faithfully printing what that code is asking it to. I'd prefer to not touch that location in this patch, since it seems orthogonal to fixing the PR preprocessor/78324; perhaps something to address as part of PR middle-end/77696 ?]. Martin: how does this look to you? Any objections to this change as part of my fix for PR preprocessor/78324? Not at all. It looks great -- using the multiline output is even better than the original. You also noticed the comment about the caret limitation being out of data and removed it. Thanks for that too! I agree that the underlining in the notes could stand to be improved at some point. I'll see if I can get to it sometime after I'm done with all my pending patches. Thanks again! Martin PS If there's something I can help with while you're working on the rest of the bug let me know.
[PATCH] Handle EOF in c_parser_parse_rtl_body
On Fri, 2016-11-18 at 22:13 +, Joseph Myers wrote: > On Fri, 18 Nov 2016, David Malcolm wrote: > > > + /* Consume all tokens, up to the closing brace, handling > > + matching pairs of braces in the rtl dump. */ > > + int num_open_braces = 1; > > + while (1) > > +{ > > + switch (c_parser_peek_token (parser)->type) > > + { > > + case CPP_OPEN_BRACE: > > + num_open_braces++; > > + break; > > + case CPP_CLOSE_BRACE: > > + if (--num_open_braces == 0) > > + goto found_closing_brace; > > + break; > > + default: > > + break; > > + } > > + c_parser_consume_token (parser); > > +} > > What if you have an EOF without the close brace being found? I'd > expect > you to hit the > > gcc_assert (parser->tokens[0].type != CPP_EOF); > > in c_parser_consume_token. Oops; thanks. Here's a patch on top of v5 that (I hope) addresses that. gcc/c/ChangeLog: * c-parser.c (c_parser_parse_rtl_body): Handle CPP_EOF. gcc/testsuite/ChangeLog: * gcc.dg/rtl/truncated-rtl-file.c: New test case. --- gcc/c/c-parser.c | 4 gcc/testsuite/gcc.dg/rtl/truncated-rtl-file.c | 2 ++ 2 files changed, 6 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/rtl/truncated-rtl-file.c diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index d645d29..fef882a 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -18326,6 +18326,10 @@ c_parser_parse_rtl_body (c_parser *parser, char *start_with_pass) if (--num_open_braces == 0) goto found_closing_brace; break; + case CPP_EOF: + error_at (start_loc, "no closing brace"); + free (start_with_pass); + return; default: break; } diff --git a/gcc/testsuite/gcc.dg/rtl/truncated-rtl-file.c b/gcc/testsuite/gcc.dg/rtl/truncated-rtl-file.c new file mode 100644 index 000..4dd8214 --- /dev/null +++ b/gcc/testsuite/gcc.dg/rtl/truncated-rtl-file.c @@ -0,0 +1,2 @@ +void __RTL test (void) +{ /* { dg-error "no closing brace" } */ -- 1.8.5.3
Re: [PATCH], Tweak PowerPC movdi constraints
On Fri, Nov 18, 2016 at 03:38:38PM -0500, Michael Meissner wrote: > This patch tweaks the movdi constraints for the PowerPC to use "^" or "$" > constraints instead of "?*". This allows the register allocator to more often > allocate DImode to a floating point/vector register when it is desirable to do > so. It also changes some plain "?" to "^" or "$" or even "*" (which cannot work for multi-character constraints, it skips one character, not one constraint). Wrong version of the patch? > I built bootstrap compilers and did make check with no regressions on: > 1)Little endian power8, --with-cpu=power8 > 2)Big endian power8, --with-cpu=power8 (no 32-bit support) > 3)Big endian power7, --with-cpu=power7 (both 32/64-bit support) Could you also test with reload please? Just LE is enough I guess. We'd like to keep reload working for GCC 7 at least, and these cost prefixes tend to break mov patterns :-/ Segher
Re: [PATCH] substring_loc info needs default track-macro-expansion (PR preprocessor/78324)
On Fri, 2016-11-18 at 09:51 -0700, Martin Sebor wrote: > > Martin: are the changes to your test cases OK by you, or is there > > a better way to rewrite them? > > Thanks for looking into it! > > Since the purpose of the test_sprintf_note function in the test is > to verify the location of the caret within the warnings I think we > should keep it if it's possible. Would either removing the P macro > or moving the function to a different file that doesn't use the > -ftrack-macro-expansion=0 option work? To get substring locations with the proposed patch, that code will need to be in a file without -ftrack-macro-expansion=0. builtin-sprintf-warn-4.c seems to fit the bill, and has caret-printing enabled too, so here's another attempt at the patch, just covering the affected test cases, which moves test_sprintf_note to that file, and drops "P". The carets/underlines from the warnings look sane, and the test case verifies that via dg-begin/end-multiline-output directives. The test case also verifies the carets/underlins from the *notes*. [FWIW, I'm less convinced by the carets/underlines from the notes: they all underline the whole of the __builtin_sprintf expression, though looking at gimple-ssa-sprintf.c, I see: location_t callloc = gimple_location (info.callstmt); which is used for the "inform" calls in question. Hence I think it's faithfully printing what that code is asking it to. I'd prefer to not touch that location in this patch, since it seems orthogonal to fixing the PR preprocessor/78324; perhaps something to address as part of PR middle-end/77696 ?]. Martin: how does this look to you? Any objections to this change as part of my fix for PR preprocessor/78324? gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/builtin-sprintf-warn-1.c (test_sprintf_note): Move to... * gcc.dg/tree-ssa/builtin-sprintf-warn-4.c: ...here. Drop -ftrack-macro-expansion=0. (test_sprintf_note): Remove "P" macro. Add dg-begin/end-multiline-output directives. (LINE, buffer, ptr): Copy from builtin-sprintf-warn-1.c. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c index 5779a95..a24889b 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c @@ -170,35 +170,6 @@ void test_sprintf_zero_length_array (void *p, int i) __builtin_sprintf (buffer (1), "%s", s [i].a); } -/* Verify that the note printed along with the diagnostic mentions - the correct sizes and refers to the location corresponding to - the affected directive. */ - -void test_sprintf_note (void) -{ -#define P __builtin_sprintf - - /* Diagnostic column numbers are 1-based. */ - - P (buffer (0),/* { dg-message "format output 4 bytes into a destination of size 0" } */ - "%c%s%i", '1', "2", 3);/* { dg-warning "7:.%c. directive writing 1 byte into a region of size 0" } */ - - P (buffer (1),/* { dg-message "format output 6 bytes into a destination of size 1" } */ - "%c%s%i", '1', "23", 45); /* { dg-warning "9:.%s. directive writing 2 bytes into a region of size 0" } */ - - P (buffer (2),/* { dg-message "format output 6 bytes into a destination of size 2" } */ - "%c%s%i", '1', "2", 345); /* { dg-warning "11:.%i. directive writing 3 bytes into a region of size 0" } */ - - /* It would be nice if the caret in the location range for the format - string below could be made to point at the closing quote of the format - string, like so: - sprintf (d, "%c%s%i", '1', "2", 3456); - ~~^ - Unfortunately, that doesn't work with the current setup. */ - P (buffer (6),/* { dg-message "format output 7 bytes into a destination of size 6" } */ - "%c%s%i", '1', "2", 3456); /* { dg-warning "writing a terminating nul past the end of the destination" } */ -} - #undef T #define T(size, fmt, ...)\ __builtin___sprintf_chk (buffer (size), 0, objsize (size), fmt, \ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c index faa5806..3b3fb68 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-Wformat -Wformat-length=1 -fdiagnostics-show-caret -ftrack-macro-expansion=0" } */ +/* { dg-options "-Wformat -Wformat-length=1 -fdiagnostics-show-caret" } */ extern int sprintf (char*, const char*, ...); @@ -91,3 +91,81 @@ void test (void) } /* { dg-prune-output "too many arguments for format" } */ + +/* When debugging, define LINE to the line number of the test case to exercise + and avoid exercising any of the others. The buffer macro + below makes use of LINE to avoid warnings for
Re: [PATCH, C++] Warn on redefinition of builtin functions (PR c++/71973)
On 11/18/16 22:19, Jason Merrill wrote: > On 11/05/2016 12:44 PM, Bernd Edlinger wrote: >> + warning_at (DECL_SOURCE_LOCATION (newdecl), 0, >> + "declaration of %q+#D conflicts with built-in " >> + "declaration %q#D", newdecl, olddecl); > > There needs to be a way to disable this warning, even if it's enabled by > default. > It is mostly disabled by -Wno-system-headers. So it will only warn for declarations in .cc files. Then I will probably need a better name for it. I'd like -Wbuiltin-declaration-mismatch for instance? I wonder if that should also control the C warning, and the case where the internal __builtin_ is declared differently. It is somehow hard to explain why it has to be a C++ only warning, except for historical reasons. >> - TREE_NOTHROW (olddecl) = 0; >> + TREE_NOTHROW (olddecl) = TREE_NOTHROW (newdecl); > > I still think a better fix would be to add a copy of TREE_NOTHROW to the > else block of the if (types_match), to go with the existing copies of > TREE_READONLY and TREE_THIS_VOLATILE. > I can give it a try, but I don't understand why that does apparently not give us problems with other types of declaration mismatches. It would be good to have a test case where a redeclaration with a non-builtin causes similar slightly-wrong-code issues. Possible that some hidden issues exist already, but also possible that I create new issues, if I don't understand the code completely. Is the else block ever used for non-builtin functions? Can variable declarations end up there? What other types of objects are possible there? Given that stage1 is already over, should I split the warning part from the wrong code issue? What do you think? Thanks Bernd.
Re: [PATCH] Add "__RTL" to cc1 (v5)
On Fri, 18 Nov 2016, David Malcolm wrote: > + /* Consume all tokens, up to the closing brace, handling > + matching pairs of braces in the rtl dump. */ > + int num_open_braces = 1; > + while (1) > +{ > + switch (c_parser_peek_token (parser)->type) > + { > + case CPP_OPEN_BRACE: > + num_open_braces++; > + break; > + case CPP_CLOSE_BRACE: > + if (--num_open_braces == 0) > + goto found_closing_brace; > + break; > + default: > + break; > + } > + c_parser_consume_token (parser); > +} What if you have an EOF without the close brace being found? I'd expect you to hit the gcc_assert (parser->tokens[0].type != CPP_EOF); in c_parser_consume_token. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Fix divmod expansion (PR middle-end/78416)
On November 18, 2016 10:45:10 PM GMT+01:00, Jakub Jelinekwrote: >Hi! > >As the testcase shows, expand_divmod doesn't handle properly >division/remainder in modes larger than HOST_BITS_PER_WIDE_INT like >TImode - if op1 is "negative" CONST_INT, it means it has 65 most >significant >bits set, but EXACT_POWER_OF_2_OR_ZERO_P will happily return true on >that >and optimize the division into a right shift. >Fixed by taking into account that EXACT_POWER_OF_2_OR_ZERO_P works only >if either the mode bitsize/precision are smaller or equal than HWI, or >if the value doesn't have MSB bit set. > >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? I wonder if transforming the const-int to wide int makes this all easier to read? >2016-11-18 Jakub Jelinek > > PR middle-end/78416 > * expmed.c (expand_divmod): For modes wider than HWI, take into > account implicit 1 bits above msb for EXACT_POWER_OF_2_OR_ZERO_P. > Formatting fixes. Use size <= HOST_BITS_PER_WIDE_INT instead of > HOST_BITS_PER_WIDE_INT >= size. > > * gcc.dg/torture/pr78416.c: New test. > >--- gcc/expmed.c.jj2016-10-31 13:28:12.0 +0100 >+++ gcc/expmed.c 2016-11-18 16:57:57.608703605 +0100 >@@ -3998,7 +3998,15 @@ expand_divmod (int rem_flag, enum tree_c > if (unsignedp) > ext_op1 &= GET_MODE_MASK (mode); > op1_is_pow2 = ((EXACT_POWER_OF_2_OR_ZERO_P (ext_op1) >- || (! unsignedp && EXACT_POWER_OF_2_OR_ZERO_P >(-ext_op1; >+/* If mode is wider than HWI and op1 has msb set, >+ then it has there extra implicit 1 bits above it. */ >+&& (GET_MODE_PRECISION (mode) <= HOST_BITS_PER_WIDE_INT >+|| INTVAL (op1) >= 0)) >+ || (! unsignedp >+ && EXACT_POWER_OF_2_OR_ZERO_P (-ext_op1) >+ && (GET_MODE_PRECISION (mode) >+ <= HOST_BITS_PER_WIDE_INT >+ || INTVAL (op1) < 0))); > } > > /* >@@ -4141,8 +4149,17 @@ expand_divmod (int rem_flag, enum tree_c > op1_is_constant = CONST_INT_P (op1); > op1_is_pow2 = (op1_is_constant >&& ((EXACT_POWER_OF_2_OR_ZERO_P (INTVAL (op1)) >-|| (! unsignedp >-&& EXACT_POWER_OF_2_OR_ZERO_P (-UINTVAL >(op1)); >+/* If mode is wider than HWI and op1 has msb set, >+ then it has there extra implicit 1 bits above >+ it. */ >+&& (GET_MODE_PRECISION (compute_mode) >+<= HOST_BITS_PER_WIDE_INT >+|| INTVAL (op1) >= 0)) >+ || (! unsignedp >+ && EXACT_POWER_OF_2_OR_ZERO_P (-UINTVAL (op1)) >+ && (GET_MODE_PRECISION (compute_mode) >+ <= HOST_BITS_PER_WIDE_INT >+ || INTVAL (op1) < 0; > } > >/* If one of the operands is a volatile MEM, copy it into a register. >*/ >@@ -4185,7 +4202,8 @@ expand_divmod (int rem_flag, enum tree_c > unsigned HOST_WIDE_INT d = (INTVAL (op1) > & GET_MODE_MASK (compute_mode)); > >- if (EXACT_POWER_OF_2_OR_ZERO_P (d)) >+ if (EXACT_POWER_OF_2_OR_ZERO_P (d) >+ && (INTVAL (op1) >= 0 || size <= HOST_BITS_PER_WIDE_INT)) > { > pre_shift = floor_log2 (d); > if (rem_flag) >@@ -4325,7 +4343,7 @@ expand_divmod (int rem_flag, enum tree_c > else if (d == -1) > quotient = expand_unop (compute_mode, neg_optab, op0, > tquotient, 0); >- else if (HOST_BITS_PER_WIDE_INT >= size >+ else if (size <= HOST_BITS_PER_WIDE_INT >&& abs_d == HOST_WIDE_INT_1U << (size - 1)) > { > /* This case is not handled correctly below. */ >@@ -4335,6 +4353,7 @@ expand_divmod (int rem_flag, enum tree_c > goto fail1; > } > else if (EXACT_POWER_OF_2_OR_ZERO_P (d) >+ && (size <= HOST_BITS_PER_WIDE_INT || d >= 0) >&& (rem_flag >? smod_pow2_cheap (speed, compute_mode) >: sdiv_pow2_cheap (speed, compute_mode)) >@@ -4348,7 +4367,9 @@ expand_divmod (int rem_flag, enum tree_c > compute_mode) >!= CODE_FOR_nothing))) > ; >- else if (EXACT_POWER_OF_2_OR_ZERO_P (abs_d)) >+ else if (EXACT_POWER_OF_2_OR_ZERO_P (abs_d) >+ && (size <= HOST_BITS_PER_WIDE_INT >+
Re: [PATCH] Fix target_clones attribute handling (PR middle-end/78419)
On November 18, 2016 10:41:06 PM GMT+01:00, Jakub Jelinekwrote: >Hi! > >The following testcase ICEs because of buffer overflow in the >expand_target_clones function. The main bug is that get_attr_str >doesn't count the commas in the strings, so while it handles >__attribute__((target_clones ("avx", "foo", "avx2", "avx512f", >"default"))) >it doesn't handle >__attribute__((target_clones ("avx,foo,avx2,avx512f,default"))) >which is also meant to be valid. >This is fixed by the get_attr_str hunk. >The rest of the changes are to avoid leaking memory, avoid double >diagnostics (if targetm.target_option.valid_attribute_p fails, >it has reported errors already, so there is no point to report >further error or warning), fixing location_t of the diagnostics >(targetm.target_option.valid_attribute_p uses input_location), >and various formatting fixes and cleanups. > >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. Richard. >2016-11-18 Jakub Jelinek > > PR middle-end/78419 > * multiple_target.c (get_attr_len): Start with argnum and increment > argnum on every arg. Use strchr in a loop instead of counting commas > manually. > (get_attr_str): Increment argnum for every comma in the string. > (separate_attrs): Use for instead of while loop, simplify. > (expand_target_clones): Rename defenition argument to definition. > Free attrs and attr_str even when diagnosing errors. Temporarily > change input_location around targetm.target_option.valid_attribute_p > calls. Don't emit warning or errors if that function fails. > > * gcc.target/i386/pr78419.c: New test. > >--- gcc/multiple_target.c.jj 2016-08-29 12:17:37.0 +0200 >+++ gcc/multiple_target.c 2016-11-18 14:14:11.684489030 +0100 >@@ -101,22 +101,18 @@ get_attr_len (tree arglist) > { > tree arg; > int str_len_sum = 0; >- int argnum = 1; >+ int argnum = 0; > > for (arg = arglist; arg; arg = TREE_CHAIN (arg)) > { >- unsigned int i; > const char *str = TREE_STRING_POINTER (TREE_VALUE (arg)); >- int len = strlen (str); >- >+ size_t len = strlen (str); > str_len_sum += len + 1; >- if (arg != arglist) >+ for (const char *p = strchr (str, ','); p; p = strchr (p + 1, >',')) > argnum++; >- for (i = 0; i < strlen (str); i++) >- if (str[i] == ',') >-argnum++; >+ argnum++; > } >- if (argnum == 1) >+ if (argnum <= 1) > return -1; > return str_len_sum; > } >@@ -134,8 +130,9 @@ get_attr_str (tree arglist, char *attr_s > for (arg = arglist; arg; arg = TREE_CHAIN (arg)) > { > const char *str = TREE_STRING_POINTER (TREE_VALUE (arg)); >- > size_t len = strlen (str); >+ for (const char *p = strchr (str, ','); p; p = strchr (p + 1, >',')) >+ argnum++; > memcpy (attr_str + str_len_sum, str, len); > attr_str[str_len_sum + len] = TREE_CHAIN (arg) ? ',' : '\0'; > str_len_sum += len + 1; >@@ -152,19 +149,16 @@ separate_attrs (char *attr_str, char **a > { > int i = 0; > bool has_default = false; >- char *attr = strtok (attr_str, ","); > >- while (attr != NULL) >+ for (char *attr = strtok (attr_str, ","); >+ attr != NULL; attr = strtok (NULL, ",")) > { > if (strcmp (attr, "default") == 0) > { > has_default = true; >-attr = strtok (NULL, ","); > continue; > } >- attrs[i] = attr; >- attr = strtok (NULL, ","); >- i++; >+ attrs[i++] = attr; > } > if (!has_default) > return -1; >@@ -235,7 +229,7 @@ create_target_clone (cgraph_node *node, >create the appropriate clone for each valid target attribute. */ > > static bool >-expand_target_clones (struct cgraph_node *node, bool defenition) >+expand_target_clones (struct cgraph_node *node, bool definition) > { > int i; > /* Parsing target attributes separated by comma. */ >@@ -266,6 +260,8 @@ expand_target_clones (struct cgraph_node > { > error_at (DECL_SOURCE_LOCATION (node->decl), > "default target was not set"); >+ XDELETEVEC (attrs); >+ XDELETEVEC (attr_str); > return false; > } > >@@ -286,22 +282,24 @@ expand_target_clones (struct cgraph_node > > create_new_asm_name (attr, suffix); > /* Create new target clone. */ >- cgraph_node *new_node = create_target_clone (node, defenition, >suffix); >+ cgraph_node *new_node = create_target_clone (node, definition, >suffix); > XDELETEVEC (suffix); > > /* Set new attribute for the clone. */ > tree attributes = make_attribute ("target", attr, > DECL_ATTRIBUTES (new_node->decl)); > DECL_ATTRIBUTES (new_node->decl) = attributes; >+ location_t saved_loc = input_location; >+ input_location = DECL_SOURCE_LOCATION (node->decl); >if (!targetm.target_option.valid_attribute_p (new_node->decl, NULL,
Re: [PATCH] avoid calling alloca(0)
On 11/18/2016 10:25 AM, Jakub Jelinek wrote: On Fri, Nov 18, 2016 at 10:14:09AM -0700, Martin Sebor wrote: Because people make mistakes and warnings help us avoid them (isn't that obvious?) Just because we get it right most of the time doesn't mean we get right all of the time. The papers and exploits I pointed to should provide ample evidence that zero allocations are a problem that can have serious (and costly) consequences. We (i.e., Red Hat) recognize this risk and have made it our goal to help stem some of these problems. Are you talking about cases where user writes malloc (0) explicitly, or where user writes malloc (n); and the optimizers figure out n is 0, either always, or e.g. because the compiler decided to duplicate the code and and in one of the branches it proves it is 0, or you want to add a runtime warning when malloc is called with 0? Both. The existing -Walloca-larger-than warning and the proposed -Walloc-zero warning diagnose both. The non-constant case (i.e., one where the zero is the result of constant propagation) is the more interesting (and dangerous) one but I don't think there is value in trying to distinguish the two. E.g. I don't see how writing malloc (0) in the code should have anything to do with any overflows. The cases where jump threading creates cases where we call functions with constant arguments and we then warn on them is also problematic, often such code is even dead except the compiler is not able to prove it. It IMHO doesn't belong to either of these. You've made that quite clear. I struggle to reconcile your position in this case with the one in debate about the -Wimplicit-fallthrough option where you insisted on the exact opposite despite the overwhelming number of false positives caused by it. Why is it appropriate for that option to be in -Wextra and not this one? It also matters what one can do to tell the compiler the code is fine. For e.g. -Wimplicit-fallthrough and many other warnings one can add a comment, or attribute, etc. to get the warning out of the way. But the patch you've posted for the alloca (0) stuff contained changes of n to n + !n, so the warning basically asks people to slow down their code for no reason, just to get the warning out of the way. No, it does not. There are ways to avoid the warning without changing the value of the argument. The obvious one is to set -Wno-alloc-zero, either on the command line or via pragma GCC diagnostic. Another one is to call the alloca function instead of the builtin (the function is not declared without attribute alloc_size, at least in Glibc, but GCC still expands it inline). This is as simple as enclosing alloca in parentheses: void *p = (alloca)(n); Ugly? Perhaps. One might say that code that does tricky or unportable things is ugly for that reason alone. IMO, it's a good thing when it also looks unusual (or even ugly). It draws attention to itself and is less likely to be copied or reused, or broken by those who don't realize that it's fragile. The specific patch we're discussing touches just 5 functions in all of GCC, and it changes an alloca(n) to alloca(n + !n). Your original objection was that the fix was too ugly. I'd be happy to change it to (n + 1) if that makes it less offensive to you (or to anything else that lets us move forward, including the (alloca)(n) above). Either way, none of these calls is in hot loops so the runtime impact of any of this on GCC hardly seems worth talking about. Having said that, after another review of the functions changed by the patch it looks like avoiding the zero case (e.g., by returning early or branching) might actually improve their runtime performance (though I doubt that would be worth the effort either). The same could well be true for other such instances of the warning. Martin PS I resisted changing the XALLOCAVEC macro itself but I'm not opposed to it and it's also a possibility.
[PR target/25512] Optimize certain equality tests on m68k
The yearly m68k day or two of bugfixing for the retro-computing folks is under way. It's earlier than last year, so I can do more than just fix regressions. This BZ is a simple request to optimize sequences like: moveq #1,%d1 cmp.l %d0, %d1 beq/bne into subq.l #1, %d0 beq/bne When d0/d1 are both dead at the cmp.l. Similarly for small negative numbers which get turned into an addq.l. I've chosen to implement this as a 3 insn -> 3 insn peephole2. That may not seem useful at first. But when done right, the 2nd insn (the compare) becomes trivial for final to eliminate resulting in the desired code. Each time this applies we save 2 bytes and some number of cycles depending on the particular m68k implementation we're on. This hits just over 4000 times building newlib. Of course we have so many multilib configurations that any one target is going to see considerably less impact. Tested with m68k.exp in the GCC testsuite and by building GCC's target libraries as well as newlib/libgloss for the m68k-elf toolchain (libgfortran doesn't build due to an unrelated issue). It'll get further testing once I start my yearly m68k bootstrap. But obviously that'll be running for a good long while. Installing on the trunk. Jeff commit fe059a93ff2768570cab30c2bcf58e71cfb59a66 Author: lawDate: Fri Nov 18 21:52:32 2016 + PR target/25112 * config/m68k/m68k.c (moveq feeding equality comparison): New peepholes. * config/m68k/predicates.md (addq_subq_operand): New predicate. (equality_comparison_operator): Likewise. PR target/25112 * gcc.target/m68k/pr25112: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@242605 138bc75d-0d04-0410-961f-82ee72b054a4 diff --git a/gcc/ChangeLog b/gcc/ChangeLog index ff3da21..534ef4b 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,11 @@ +2016-11-18 Jeff Law + + PR target/25112 + * config/m68k/m68k.c (moveq feeding equality comparison): New + peepholes. + * config/m68k/predicates.md (addq_subq_operand): New predicate. + (equality_comparison_operator): Likewise. + 2016-11-18 Richard Sandiford * rtlanal.c (load_extend_op): Move to... diff --git a/gcc/config/m68k/m68k.md b/gcc/config/m68k/m68k.md index 3d7895d..7b7f373 100644 --- a/gcc/config/m68k/m68k.md +++ b/gcc/config/m68k/m68k.md @@ -7641,6 +7641,46 @@ (const_int 0))] "operands[5] = (operands[0] == operands[3]) ? operands[4] : operands[3];") +;; We want to turn +;; moveq const,dX +;; cmp.l dX,dY +;; je/jne +;; +;; into +;; addq/subq -const,dY +;; cmp.l dY, 0 +;; je/jne +;; +;; dX and dY must both be dead at the end of the sequence and the constant +;; must be valid for addq/subq. +;; +;; Essentially we're making it trivial for final to realize the comparison +;; is not needed +;; +;; Testing has shown a variant where the operands are reversed in the +;; comparison never hits, so I have not included that variant. +;; + +(define_peephole2 + [(set (match_operand:SI 0 "register_operand" "") + (match_operand:SI 1 "addq_subq_operand" "")) + (set (cc0) (compare (match_operand:SI 2 "register_operand" "") + (match_dup 0))) + (set (pc) (if_then_else (match_operator 5 "equality_comparison_operator" + [(cc0) (const_int 0)]) + (match_operand 3 "pc_or_label_operand") + (match_operand 4 "pc_or_label_operand")))] + "peep2_reg_dead_p (2, operands[0]) + && peep2_reg_dead_p (2, operands[2]) + && (operands[3] == pc_rtx || operands[4] == pc_rtx) + && DATA_REG_P (operands[2])" + [(set (match_dup 2) (plus:SI (match_dup 2) (match_dup 6))) + (set (cc0) (compare (match_dup 2) (const_int 0))) + (set (pc) (if_then_else (match_op_dup 5 [(cc0) (const_int 0)]) + (match_dup 3) + (match_dup 4)))] + "operands[6] = GEN_INT (-INTVAL (operands[1]));") + (define_peephole2 [(set (match_operand:SI 0 "register_operand" "") (match_operand:SI 1 "pow2_m1_operand" "")) diff --git a/gcc/config/m68k/predicates.md b/gcc/config/m68k/predicates.md index 186436c..bfb548a 100644 --- a/gcc/config/m68k/predicates.md +++ b/gcc/config/m68k/predicates.md @@ -245,6 +245,18 @@ || reload_completed)); }) +;; Used to detect constants that are valid for addq/subq instructions +(define_predicate "addq_subq_operand" + (match_code "const_int") +{ + return ((INTVAL (op) <= 8 && INTVAL (op) > 0) + || (INTVAL (op) >= -8 && INTVAL (op) < 0)); +}) + +;; Used to detect equality and non-equality operators +(define_predicate "equality_comparison_operator" + (match_code "eq,ne")) + ;; Used to detect when an operand is either a register ;; or a constant that is all ones in its lower bits. ;; Used by insv pattern to help
[PATCH] Fix divmod expansion (PR middle-end/78416)
Hi! As the testcase shows, expand_divmod doesn't handle properly division/remainder in modes larger than HOST_BITS_PER_WIDE_INT like TImode - if op1 is "negative" CONST_INT, it means it has 65 most significant bits set, but EXACT_POWER_OF_2_OR_ZERO_P will happily return true on that and optimize the division into a right shift. Fixed by taking into account that EXACT_POWER_OF_2_OR_ZERO_P works only if either the mode bitsize/precision are smaller or equal than HWI, or if the value doesn't have MSB bit set. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2016-11-18 Jakub JelinekPR middle-end/78416 * expmed.c (expand_divmod): For modes wider than HWI, take into account implicit 1 bits above msb for EXACT_POWER_OF_2_OR_ZERO_P. Formatting fixes. Use size <= HOST_BITS_PER_WIDE_INT instead of HOST_BITS_PER_WIDE_INT >= size. * gcc.dg/torture/pr78416.c: New test. --- gcc/expmed.c.jj 2016-10-31 13:28:12.0 +0100 +++ gcc/expmed.c2016-11-18 16:57:57.608703605 +0100 @@ -3998,7 +3998,15 @@ expand_divmod (int rem_flag, enum tree_c if (unsignedp) ext_op1 &= GET_MODE_MASK (mode); op1_is_pow2 = ((EXACT_POWER_OF_2_OR_ZERO_P (ext_op1) -|| (! unsignedp && EXACT_POWER_OF_2_OR_ZERO_P (-ext_op1; + /* If mode is wider than HWI and op1 has msb set, +then it has there extra implicit 1 bits above it. */ + && (GET_MODE_PRECISION (mode) <= HOST_BITS_PER_WIDE_INT + || INTVAL (op1) >= 0)) +|| (! unsignedp +&& EXACT_POWER_OF_2_OR_ZERO_P (-ext_op1) +&& (GET_MODE_PRECISION (mode) +<= HOST_BITS_PER_WIDE_INT +|| INTVAL (op1) < 0))); } /* @@ -4141,8 +4149,17 @@ expand_divmod (int rem_flag, enum tree_c op1_is_constant = CONST_INT_P (op1); op1_is_pow2 = (op1_is_constant && ((EXACT_POWER_OF_2_OR_ZERO_P (INTVAL (op1)) - || (! unsignedp - && EXACT_POWER_OF_2_OR_ZERO_P (-UINTVAL (op1)); + /* If mode is wider than HWI and op1 has msb set, +then it has there extra implicit 1 bits above +it. */ + && (GET_MODE_PRECISION (compute_mode) + <= HOST_BITS_PER_WIDE_INT + || INTVAL (op1) >= 0)) +|| (! unsignedp +&& EXACT_POWER_OF_2_OR_ZERO_P (-UINTVAL (op1)) +&& (GET_MODE_PRECISION (compute_mode) +<= HOST_BITS_PER_WIDE_INT +|| INTVAL (op1) < 0; } /* If one of the operands is a volatile MEM, copy it into a register. */ @@ -4185,7 +4202,8 @@ expand_divmod (int rem_flag, enum tree_c unsigned HOST_WIDE_INT d = (INTVAL (op1) & GET_MODE_MASK (compute_mode)); - if (EXACT_POWER_OF_2_OR_ZERO_P (d)) + if (EXACT_POWER_OF_2_OR_ZERO_P (d) + && (INTVAL (op1) >= 0 || size <= HOST_BITS_PER_WIDE_INT)) { pre_shift = floor_log2 (d); if (rem_flag) @@ -4325,7 +4343,7 @@ expand_divmod (int rem_flag, enum tree_c else if (d == -1) quotient = expand_unop (compute_mode, neg_optab, op0, tquotient, 0); - else if (HOST_BITS_PER_WIDE_INT >= size + else if (size <= HOST_BITS_PER_WIDE_INT && abs_d == HOST_WIDE_INT_1U << (size - 1)) { /* This case is not handled correctly below. */ @@ -4335,6 +4353,7 @@ expand_divmod (int rem_flag, enum tree_c goto fail1; } else if (EXACT_POWER_OF_2_OR_ZERO_P (d) +&& (size <= HOST_BITS_PER_WIDE_INT || d >= 0) && (rem_flag ? smod_pow2_cheap (speed, compute_mode) : sdiv_pow2_cheap (speed, compute_mode)) @@ -4348,7 +4367,9 @@ expand_divmod (int rem_flag, enum tree_c compute_mode) != CODE_FOR_nothing))) ; - else if (EXACT_POWER_OF_2_OR_ZERO_P (abs_d)) + else if (EXACT_POWER_OF_2_OR_ZERO_P (abs_d) +&& (size <= HOST_BITS_PER_WIDE_INT +|| abs_d != (unsigned HOST_WIDE_INT) d)) { if (rem_flag) { @@ -4483,7 +4504,7 @@ expand_divmod (int rem_flag, enum tree_c
[PATCH] Fix target_clones attribute handling (PR middle-end/78419)
Hi! The following testcase ICEs because of buffer overflow in the expand_target_clones function. The main bug is that get_attr_str doesn't count the commas in the strings, so while it handles __attribute__((target_clones ("avx", "foo", "avx2", "avx512f", "default"))) it doesn't handle __attribute__((target_clones ("avx,foo,avx2,avx512f,default"))) which is also meant to be valid. This is fixed by the get_attr_str hunk. The rest of the changes are to avoid leaking memory, avoid double diagnostics (if targetm.target_option.valid_attribute_p fails, it has reported errors already, so there is no point to report further error or warning), fixing location_t of the diagnostics (targetm.target_option.valid_attribute_p uses input_location), and various formatting fixes and cleanups. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2016-11-18 Jakub JelinekPR middle-end/78419 * multiple_target.c (get_attr_len): Start with argnum and increment argnum on every arg. Use strchr in a loop instead of counting commas manually. (get_attr_str): Increment argnum for every comma in the string. (separate_attrs): Use for instead of while loop, simplify. (expand_target_clones): Rename defenition argument to definition. Free attrs and attr_str even when diagnosing errors. Temporarily change input_location around targetm.target_option.valid_attribute_p calls. Don't emit warning or errors if that function fails. * gcc.target/i386/pr78419.c: New test. --- gcc/multiple_target.c.jj2016-08-29 12:17:37.0 +0200 +++ gcc/multiple_target.c 2016-11-18 14:14:11.684489030 +0100 @@ -101,22 +101,18 @@ get_attr_len (tree arglist) { tree arg; int str_len_sum = 0; - int argnum = 1; + int argnum = 0; for (arg = arglist; arg; arg = TREE_CHAIN (arg)) { - unsigned int i; const char *str = TREE_STRING_POINTER (TREE_VALUE (arg)); - int len = strlen (str); - + size_t len = strlen (str); str_len_sum += len + 1; - if (arg != arglist) + for (const char *p = strchr (str, ','); p; p = strchr (p + 1, ',')) argnum++; - for (i = 0; i < strlen (str); i++) - if (str[i] == ',') - argnum++; + argnum++; } - if (argnum == 1) + if (argnum <= 1) return -1; return str_len_sum; } @@ -134,8 +130,9 @@ get_attr_str (tree arglist, char *attr_s for (arg = arglist; arg; arg = TREE_CHAIN (arg)) { const char *str = TREE_STRING_POINTER (TREE_VALUE (arg)); - size_t len = strlen (str); + for (const char *p = strchr (str, ','); p; p = strchr (p + 1, ',')) + argnum++; memcpy (attr_str + str_len_sum, str, len); attr_str[str_len_sum + len] = TREE_CHAIN (arg) ? ',' : '\0'; str_len_sum += len + 1; @@ -152,19 +149,16 @@ separate_attrs (char *attr_str, char **a { int i = 0; bool has_default = false; - char *attr = strtok (attr_str, ","); - while (attr != NULL) + for (char *attr = strtok (attr_str, ","); + attr != NULL; attr = strtok (NULL, ",")) { if (strcmp (attr, "default") == 0) { has_default = true; - attr = strtok (NULL, ","); continue; } - attrs[i] = attr; - attr = strtok (NULL, ","); - i++; + attrs[i++] = attr; } if (!has_default) return -1; @@ -235,7 +229,7 @@ create_target_clone (cgraph_node *node, create the appropriate clone for each valid target attribute. */ static bool -expand_target_clones (struct cgraph_node *node, bool defenition) +expand_target_clones (struct cgraph_node *node, bool definition) { int i; /* Parsing target attributes separated by comma. */ @@ -266,6 +260,8 @@ expand_target_clones (struct cgraph_node { error_at (DECL_SOURCE_LOCATION (node->decl), "default target was not set"); + XDELETEVEC (attrs); + XDELETEVEC (attr_str); return false; } @@ -286,22 +282,24 @@ expand_target_clones (struct cgraph_node create_new_asm_name (attr, suffix); /* Create new target clone. */ - cgraph_node *new_node = create_target_clone (node, defenition, suffix); + cgraph_node *new_node = create_target_clone (node, definition, suffix); XDELETEVEC (suffix); /* Set new attribute for the clone. */ tree attributes = make_attribute ("target", attr, DECL_ATTRIBUTES (new_node->decl)); DECL_ATTRIBUTES (new_node->decl) = attributes; + location_t saved_loc = input_location; + input_location = DECL_SOURCE_LOCATION (node->decl); if (!targetm.target_option.valid_attribute_p (new_node->decl, NULL, - TREE_VALUE (attributes), 0)) + TREE_VALUE (attributes), +
Re: [PATCH, C++] Warn on redefinition of builtin functions (PR c++/71973)
On 11/05/2016 12:44 PM, Bernd Edlinger wrote: + warning_at (DECL_SOURCE_LOCATION (newdecl), 0, + "declaration of %q+#D conflicts with built-in " + "declaration %q#D", newdecl, olddecl); There needs to be a way to disable this warning, even if it's enabled by default. - TREE_NOTHROW (olddecl) = 0; + TREE_NOTHROW (olddecl) = TREE_NOTHROW (newdecl); I still think a better fix would be to add a copy of TREE_NOTHROW to the else block of the if (types_match), to go with the existing copies of TREE_READONLY and TREE_THIS_VOLATILE. Jason
Re: [PATCH] Fix dwarf2out related bootstrap failure on Solaris (PR debug/78191)
OK. On Thu, Nov 3, 2016 at 12:42 PM, Jakub Jelinekwrote: > Hi! > > My recent optimize_abbrev_table optimization apparently broke Solaris > bootstrap. > The bug is specific I think just to -gdwarf-{2,3} -gno-strict-dwarf, where > DW_FORM_exprloc can't be used for location expressions and some location > expression contains DW_OP_GNU_{{const,regval,deref}_type,convert,reinterpret} > and there are at least 128 die abbreviations used by more than one DIE. > Because of the lack of DW_FORM_exprloc we need to decide on > DW_FORM_block{1,2,4} and size the location expression, for that we need to > know > the DIE offsets of base offset DIEs that are referenced (earlier code > ensures such types appear very early in the CU, right after the > DW_TAG_compile_unit DIE) and for that their size and their abbreviation values > need to be constant, so the optimize_abbrev_table opts that reorder > abbreviation > numbers by decreasing usage count or for -gdwarf-5 attempt to optimize > implicit > constants can't be done for the CU and base types. > For -gdwarf-4 and above, we can emit DW_FORM_exprloc, so value_format doesn't > need to compute any sizes and thus calc_base_type_die_sizes shouldn't be > called. > > Bootstrapped/regtested on x86_64-linux and i686-linux and Rainer has kindly > tested it on Solaris, ok for trunk? > > 2016-11-03 Jakub Jelinek > > * dwarf2out.c (size_of_discr_list): Fix typo in function comment. > > PR debug/78191 > * dwarf2out.c (abbrev_opt_base_type_end): New variable. > (die_abbrev_cmp): Sort dies with die_abbrev smaller than > abbrev_opt_base_type_end only by increasing die_abbrev, before > any other dies. > (optimize_abbrev_table): Don't change abbrev numbers of > base types and CU or optimize implicit consts in them if > calc_base_type_die_sizes has been called during build_abbrev_table. > (calc_base_type_die_sizes): If abbrev_opt_start, set > abbrev_opt_base_type_end to one plus largest base type's > die_abbrev. > > --- gcc/dwarf2out.c.jj 2016-11-03 08:47:59.0 +0100 > +++ gcc/dwarf2out.c 2016-11-03 12:26:03.192459170 +0100 > @@ -1909,7 +1909,7 @@ size_of_discr_value (dw_discr_value *dis > return size_of_sleb128 (discr_value->v.sval); > } > > -/* Return the size of the value in a DW_discr_list attribute. */ > +/* Return the size of the value in a DW_AT_discr_list attribute. */ > > static int > size_of_discr_list (dw_discr_list_ref discr_list) > @@ -8548,6 +8548,11 @@ optimize_external_refs (dw_die_ref die) > /* First abbrev_id that can be optimized based on usage. */ > static unsigned int abbrev_opt_start; > > +/* Maximum abbrev_id of a base type plus one (we can't optimize DIEs with > + abbrev_id smaller than this, because they must be already sized > + during build_abbrev_table). */ > +static unsigned int abbrev_opt_base_type_end; > + > /* Vector of usage counts during build_abbrev_table. Indexed by > abbrev_id - abbrev_opt_start. */ > static vec abbrev_usage_count; > @@ -8646,12 +8651,16 @@ die_abbrev_cmp (const void *p1, const vo >gcc_checking_assert (die1->die_abbrev >= abbrev_opt_start); >gcc_checking_assert (die2->die_abbrev >= abbrev_opt_start); > > - if (abbrev_usage_count[die1->die_abbrev - abbrev_opt_start] > - > abbrev_usage_count[die2->die_abbrev - abbrev_opt_start]) > -return -1; > - if (abbrev_usage_count[die1->die_abbrev - abbrev_opt_start] > - < abbrev_usage_count[die2->die_abbrev - abbrev_opt_start]) > -return 1; > + if (die1->die_abbrev >= abbrev_opt_base_type_end > + && die2->die_abbrev >= abbrev_opt_base_type_end) > +{ > + if (abbrev_usage_count[die1->die_abbrev - abbrev_opt_start] > + > abbrev_usage_count[die2->die_abbrev - abbrev_opt_start]) > + return -1; > + if (abbrev_usage_count[die1->die_abbrev - abbrev_opt_start] > + < abbrev_usage_count[die2->die_abbrev - abbrev_opt_start]) > + return 1; > +} > >/* Stabilize the sort. */ >if (die1->die_abbrev < die2->die_abbrev) > @@ -8729,10 +8738,12 @@ optimize_abbrev_table (void) >sorted_abbrev_dies.qsort (die_abbrev_cmp); > >unsigned int abbrev_id = abbrev_opt_start - 1; > - unsigned int first_id = 0; > + unsigned int first_id = ~0U; >unsigned int last_abbrev_id = 0; >unsigned int i; >dw_die_ref die; > + if (abbrev_opt_base_type_end > abbrev_opt_start) > + abbrev_id = abbrev_opt_base_type_end - 1; >/* Reassign abbreviation ids from abbrev_opt_start above, so that > most commonly used abbreviations come first. */ >FOR_EACH_VEC_ELT (sorted_abbrev_dies, i, die) > @@ -8740,10 +8751,15 @@ optimize_abbrev_table (void) > dw_attr_node *a; > unsigned ix; > > + /* If calc_base_type_die_sizes has been called, the CU and > +base types after it can't
Re: [C++ PATCH, ABI] Fix mangling of TLS init and wrapper fns (PR c++/77285)
On Fri, Nov 11, 2016 at 7:23 AM, Jakub Jelinekwrote: > Not really sure if we want to call just check_abi_tags (based on the idea > that likely any uses of the TLS var from other TUs will be broken), I think that's fine. OK. Jason
Re: [Patch, Fortran] PR 78392: ICE in gfc_trans_auto_array_allocation, at fortran/trans-array.c:5979
Hi Dominique, >> the attached patch fixes an ice-on-valid problem, simply by removing an >> assert. ... > > I have several instances in my test suite showing that the proposed patch > removes the ICE but generates wrong code: > > pr42359, second test, => ICE on another place > pr54613, sixth and eighth tests, thanks for the comments, those cases are closely related. I previously assumed that the test case for this PR would be legal, but by now I think that's wrong. The test case should be rejected, and we already have checking mechanisms for this (see resolve_fl_variable), but apparently they are not working. My current suspicion is that 'gfc_is_constant_expr' has a bug, because it claims the call to the function 'get_i' to be a constant expression. This is not true, because get_i() can not be reduced to a compile-time constant. In any case the patch I proposed is wrong and the assert should stay. Cheers, Janus
[PATCH] Add "__RTL" to cc1 (v5)
On Wed, 2016-11-16 at 14:24 +0100, Richard Biener wrote: > On Tue, Nov 15, 2016 at 10:07 PM, David Malcolm> wrote: > > On Mon, 2016-11-14 at 16:14 +0100, Richard Biener wrote: > > > On Fri, Nov 11, 2016 at 10:15 PM, David Malcolm < > > > dmalc...@redhat.com> > > > wrote: > > > > Changed in this version: > > > > > > > > * Rather than running just one pass, run *all* passes, but > > > > start at > > > > the given pass; support for "dg-do run" tests that execute > > > > the > > > > resulting code. > > > > * Updated test cases to new "compact" dump format; more test > > > > cases; > > > > use "dg-do run" in various places. > > > > * Lots of bugfixing > > > > > > > > Links to previous versions: > > > > https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00263.html > > > > https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00500.html > > > > > Does running the RTL passes right from the parser work with > > > -fsyntax > > > -only? > > > > It depends what you mean by "work". If I run it with -fsyntax > > -only, > > then pass_rest_of_compilation::gate returns false, and none of the > > RTL > > passes are run. Is this behavior correct? > > Yes. Thanks. > > > Doing it like __GIMPLE has the advantage of not exposing > > > "rest_of_compilation", etc.. > > > > The gimple part of the compiler supports having multiple functions > > in > > memory at once, and then compiling them in arbitrary order based on > > decisions made by the callgraph. > > > > By contrast, the RTL part of the compiler is full of singleton > > state: > > things like crtl (aka x_rtl), the state of emit-rtl.c, > > "reload_completed", etc etc. > > Ah, of course - I forgot that... > > > To try to limit the scope of the project, for the RTL frontend work > > I'm > > merely attempting to restore the singleton RTL state from a dump, > > rather than to support having per function stashes of RTL state. > > > > Hence the rest of compilation gets invoked directly from the > > frontend > > for the __RTL case, since it will get overwritten if there's a > > second > > __RTL function in the source file (which sounds like an idea for a > > test > > case; I'll attempt such a test case). > > > > I hope this is a reasonable approach. If not, I suppose I can > > attempt > > to bundle it up into some kind of RTL function state, but that > > seems > > like significant scope creep. > > Yeah, I think it's a good approach for now. OK. > > > I'm now handling __GIMPLE from within declspecs (the GIMPLE FE > > > stuff > > > has been committed), it would be nice to match the __RTL piece > > > here. > > > > (Congratulations on getting the GIMPLE FE stuff in) > > > > I'm not sure I understand you here - do you want me to rewrite the > > __RTL parsing to match the __GIMPLE piece, or the other way around? > > If the former, presumably I should reuse (and rename) > > c_parser_gimple_pass_list? > > Handle __RTL like __GIMPLE, thus as declspec. Of course ultimatively > Joseph has the last word here. I've updated the patch to do so. > > > > > > gcc/ChangeLog: > > > > * Makefile.in (OBJS): Add run-rtl-passes.o. > > > > > > > > gcc/c-family/ChangeLog: > > > > * c-common.c (c_common_reswords): Add "__RTL". > > > > * c-common.h (enum rid): Add RID_RTL. > > > > > > > > gcc/c/ChangeLog: > > > > * c-parser.c: Include "read-rtl-function.h" and > > > > "run-rtl-passes.h". > > > > (c_parser_declaration_or_fndef): In the "GNU > > > > extensions" > > > > part of > > > > the leading comment, add an alternate production for > > > > "function-definition", along with new "rtl-body > > > > -specifier" > > > > and > > > > "rtl-body-pass-specifier" productions. Handle "__RTL" > > > > by > > > > calling > > > > c_parser_parse_rtl_body. Convert a timevar_push/pop > > > > pair > > > > to an auto_timevar, to cope with early exit. > > > > (c_parser_parse_rtl_body): New function. > > > > > > > > gcc/ChangeLog: > > > > * cfg.c (free_original_copy_tables): Remove assertion > > > > on original_copy_bb_pool. > > > > > > How can that trigger? > > > > It happens when running pass_outof_cfg_layout_mode::execute; seen > > with > > gcc.dg/rtl/x86_64/test-return-const.c.before-fwprop.c. > > > > The input file is a dump taken in cfg_layout mode (although in this > > case it's a trivial 3-basic-block CFG, but ideally there would be > > cases > > with non-trivial control flow); it has "fwprop1" as its starting > > pass. > > > > Running without -quiet shows: > > > > skipping pass: *rest_of_compilation > > skipping pass: vregs > > skipping pass: into_cfglayout > > skipping pass: jump > > skipping pass: subreg1 > > skipping pass: cse1 > > found starting pass: fwprop1 > > > > i.e. it skips the into_cfglayout (amongst others), to start with > > fwprop1. > > > > In theory skipping a pass ought to be a no-op, assuming that we're > > faithfully reconstructing
[PATCH] Introduce emit_status::ensure_regno_capacity (v5)
On Wed, 2016-10-05 at 17:55 +0200, Bernd Schmidt wrote: > On 10/05/2016 06:15 PM, David Malcolm wrote: > > - /* Make sure regno_pointer_align, and regno_reg_rtx are large > > - enough to have an element for this pseudo reg number. */ > > + int cur_size = crtl->emit.regno_pointer_align_length; > > + if (reg_rtx_no == cur_size) > > +crtl->emit.ensure_regno_capacity (cur_size * 2); > > Patch looks ok in principle, but maybe this manipulation of the size > should be part of the new function as well - i.e. don't pass a > new_size > to it, make it check reg_rtx_no itself. Changed in this version - dropped "new_size" argument - added a while loop to cover the case where reg_rtx_no needs to grow by more the double (potentially needed when loading RTL dumps) - added a gcc_assert to ensure that the buffer is large enough Successfully bootstrapped on x86_64-pc-linux-gnu as part of the patch kit. OK for trunk? (assuming it tests OK individually) gcc/ChangeLog: * emit-rtl.c (gen_reg_rtx): Move regno_pointer_align and regno_reg_rtx resizing logic to... (emit_status::ensure_regno_capacity): ...this new method, and ensure that the buffers are large enough. (init_emit): Allocate regno_reg_rtx using ggc_cleared_vec_alloc rather than ggc_vec_alloc. * function.h (emit_status::ensure_regno_capacity): New method. --- gcc/emit-rtl.c | 48 +--- gcc/function.h | 2 ++ 2 files changed, 31 insertions(+), 19 deletions(-) diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c index e995899..b2b5fde 100644 --- a/gcc/emit-rtl.c +++ b/gcc/emit-rtl.c @@ -1057,29 +1057,38 @@ gen_reg_rtx (machine_mode mode) /* Do not call gen_reg_rtx with uninitialized crtl. */ gcc_assert (crtl->emit.regno_pointer_align_length); - /* Make sure regno_pointer_align, and regno_reg_rtx are large - enough to have an element for this pseudo reg number. */ + crtl->emit.ensure_regno_capacity (); + gcc_assert (reg_rtx_no < crtl->emit.regno_pointer_align_length); - if (reg_rtx_no == crtl->emit.regno_pointer_align_length) -{ - int old_size = crtl->emit.regno_pointer_align_length; - char *tmp; - rtx *new1; + val = gen_raw_REG (mode, reg_rtx_no); + regno_reg_rtx[reg_rtx_no++] = val; + return val; +} - tmp = XRESIZEVEC (char, crtl->emit.regno_pointer_align, old_size * 2); - memset (tmp + old_size, 0, old_size); - crtl->emit.regno_pointer_align = (unsigned char *) tmp; +/* Make sure m_regno_pointer_align, and regno_reg_rtx are large + enough to have elements in the range 0 <= idx <= reg_rtx_no. */ - new1 = GGC_RESIZEVEC (rtx, regno_reg_rtx, old_size * 2); - memset (new1 + old_size, 0, old_size * sizeof (rtx)); - regno_reg_rtx = new1; +void +emit_status::ensure_regno_capacity () +{ + int old_size = regno_pointer_align_length; - crtl->emit.regno_pointer_align_length = old_size * 2; -} + if (reg_rtx_no < old_size) +return; - val = gen_raw_REG (mode, reg_rtx_no); - regno_reg_rtx[reg_rtx_no++] = val; - return val; + int new_size = old_size * 2; + while (reg_rtx_no >= new_size) +new_size *= 2; + + char *tmp = XRESIZEVEC (char, regno_pointer_align, new_size); + memset (tmp + old_size, 0, new_size - old_size); + regno_pointer_align = (unsigned char *) tmp; + + rtx *new1 = GGC_RESIZEVEC (rtx, regno_reg_rtx, new_size); + memset (new1 + old_size, 0, (new_size - old_size) * sizeof (rtx)); + regno_reg_rtx = new1; + + crtl->emit.regno_pointer_align_length = new_size; } /* Return TRUE if REG is a PARM_DECL, FALSE otherwise. */ @@ -5667,7 +5676,8 @@ init_emit (void) crtl->emit.regno_pointer_align = XCNEWVEC (unsigned char, crtl->emit.regno_pointer_align_length); - regno_reg_rtx = ggc_vec_alloc (crtl->emit.regno_pointer_align_length); + regno_reg_rtx = +ggc_cleared_vec_alloc (crtl->emit.regno_pointer_align_length); /* Put copies of all the hard registers into regno_reg_rtx. */ memcpy (regno_reg_rtx, diff --git a/gcc/function.h b/gcc/function.h index b564f45..cabffb9 100644 --- a/gcc/function.h +++ b/gcc/function.h @@ -34,6 +34,8 @@ struct GTY(()) sequence_stack { }; struct GTY(()) emit_status { + void ensure_regno_capacity (); + /* This is reset to LAST_VIRTUAL_REGISTER + 1 at the start of each function. After rtl generation, it is 1 plus the largest register number used. */ int x_reg_rtx_no; -- 1.8.5.3
[PATCH], Tweak PowerPC movdi constraints
This patch tweaks the movdi constraints for the PowerPC to use "^" or "$" constraints instead of "?*". This allows the register allocator to more often allocate DImode to a floating point/vector register when it is desirable to do so. I did a full Spec 2006 run with this patch installed, and most of the benchmarks were neutral in terms of performance. The 482.sphinx3 benchmark had a 2.5% performance boost with these patches. There were no benchmarks that regressed with this patch. I built bootstrap compilers and did make check with no regressions on: 1) Little endian power8, --with-cpu=power8 2) Big endian power8, --with-cpu=power8 (no 32-bit support) 3) Big endian power7, --with-cpu=power7 (both 32/64-bit support) Can I check this patch into the trunk? [gcc] 2016-11-18 Michael Meissner* config/rs6000/rs6000.md (movdi_internal32): Change FPR/VSX "?*" load/store constraints to "^" if the instruction allows d-form addressing or "$" if it only allows x-form addressing. Change FPR/VSX move constraints to "^". (movdi_internal64): Likewise. [gcc/testsuite] 2016-11-18 Michael Meissner * gcc.target/powerpc/ppc-round2.c: Allow XSCVDPSXWS and XSCVDPUXWS to be generated instead of FCTIWUZ or FCTIWZ. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/rs6000.md === --- gcc/config/rs6000/rs6000.md (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 242556) +++ gcc/config/rs6000/rs6000.md (.../gcc/config/rs6000) (working copy) @@ -8118,10 +8118,10 @@ (define_insn "p8_mfvsrd_4_disf" (define_insn "*movdi_internal32" [(set (match_operand:DI 0 "rs6000_nonimmediate_operand" - "=Y,r, r, ?m,?*d,?*d, - r, ?wY, ?Z,?*wb, ?*wv, ?wi, - ?wo, ?wo, ?wv, ?wi, ?wi,?wv, - ?wv") + "=Y,r, r, ^m,^d, ^d, + r, ^wY, $Z,^wb, $wv,^wi, + *wo, *wo, *wv, *wi, *wi,*wv, + *wv") (match_operand:DI 1 "input_operand" "r,Y, r, d, m, d, @@ -8195,9 +8195,9 @@ (define_split (define_insn "*movdi_internal64" [(set (match_operand:DI 0 "nonimmediate_operand" "=Y,r, r, r, r, r, -?m,?*d, ?*d, ?wY, ?Z, ?*wb, -?*wv, ?wi, ?wo, ?wo, ?wv,?wi, -?wi, ?wv, ?wv, r, *h, *h, +^m,^d,^d,^Y,$Z, $wb, +$wv, ^wi, *wo, *wo, *wv,*wi, +*wi, *wv, *wv, r, *h, *h, ?*r, ?*wg, ?*r, ?*wj") (match_operand:DI 1 "input_operand" Index: gcc/testsuite/gcc.target/powerpc/ppc-round2.c === --- gcc/testsuite/gcc.target/powerpc/ppc-round2.c (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc) (revision 242556) +++ gcc/testsuite/gcc.target/powerpc/ppc-round2.c (.../gcc/testsuite/gcc.target/powerpc) (working copy) @@ -5,8 +5,8 @@ /* { dg-options "-O2 -mcpu=power8" } */ /* { dg-final { scan-assembler-times "fcfid " 2 } } */ /* { dg-final { scan-assembler-times "fcfids " 2 } } */ -/* { dg-final { scan-assembler-times "fctiwuz "2 } } */ -/* { dg-final { scan-assembler-times "fctiwz " 2 } } */ +/* { dg-final { scan-assembler-times "fctiwuz \|xscvdpuxws " 2 } } */ +/* { dg-final { scan-assembler-times "fctiwz \|xscvdpsxws " 2 } } */ /* { dg-final { scan-assembler-times "mfvsrd " 4 } } */ /* { dg-final { scan-assembler-times "mtvsrwa "2 } } */ /* { dg-final { scan-assembler-times "mtvsrwz "2 } } */
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
On Fri, Nov 18, 2016 at 08:41:01PM +0100, Jakub Jelinek wrote: > I'm seeing lots of ICEs with this. Here is untested fix for that, will bootstrap/regtest it soon (after my current set of bootstraps finishes). 2016-11-18 Jakub Jelinek* config/i386/i386.c (ix86_expand_builtin): Remove msk_mov variable, don't initialize it, don't use it for the case where it isn't provable %{z} nor using the same argument, instead move merge argument into a new pseudo and use that as target. Formatting fixes. --- gcc/config/i386/i386.c.jj 2016-11-18 20:04:31.0 +0100 +++ gcc/config/i386/i386.c 2016-11-18 21:21:17.764190127 +0100 @@ -38220,14 +38220,12 @@ rdseed_step: rtx (*fcn) (rtx, rtx, rtx, rtx); rtx (*fcn_mask) (rtx, rtx, rtx, rtx, rtx); rtx (*fcn_maskz) (rtx, rtx, rtx, rtx, rtx, rtx); - rtx (*msk_mov) (rtx, rtx, rtx, rtx); int masked = 1; machine_mode mode, wide_mode, nar_mode; nar_mode = V4SFmode; mode = V16SFmode; wide_mode = V64SFmode; - msk_mov = gen_avx512f_loadv16sf_mask; fcn_mask = gen_avx5124fmaddps_4fmaddps_mask; fcn_maskz = gen_avx5124fmaddps_4fmaddps_maskz; @@ -38270,7 +38268,6 @@ rdseed_step: wide_mode = V64SImode; fcn_mask = gen_avx5124vnniw_vp4dpwssd_mask; fcn_maskz = gen_avx5124vnniw_vp4dpwssd_maskz; - msk_mov = gen_avx512f_loadv16si_mask; goto v4fma_expand; case IX86_BUILTIN_4DPWSSDS_MASK: @@ -38279,7 +38276,6 @@ rdseed_step: wide_mode = V64SImode; fcn_mask = gen_avx5124vnniw_vp4dpwssds_mask; fcn_maskz = gen_avx5124vnniw_vp4dpwssds_maskz; - msk_mov = gen_avx512f_loadv16si_mask; goto v4fma_expand; case IX86_BUILTIN_4FMAPS_MASK: @@ -38295,11 +38291,11 @@ v4fma_expand: wide_reg = gen_reg_rtx (wide_mode); for (i = 0; i < 4; i++) { - args[i] = CALL_EXPR_ARG (exp, i); + args[i] = CALL_EXPR_ARG (exp, i); ops[i] = expand_normal (args[i]); - emit_move_insn (gen_rtx_SUBREG (mode, wide_reg, (i) * 64), - ops[i]); + emit_move_insn (gen_rtx_SUBREG (mode, wide_reg, i * 64), + ops[i]); } accum = expand_normal (CALL_EXPR_ARG (exp, 4)); @@ -38318,7 +38314,7 @@ v4fma_expand: emit_insn (fcn (target, accum, wide_reg, mem)); else { - rtx merge, mask; + rtx merge, mask; merge = expand_normal (CALL_EXPR_ARG (exp, 6)); mask = expand_normal (CALL_EXPR_ARG (exp, 7)); @@ -38340,18 +38336,16 @@ v4fma_expand: merge = force_reg (mode, merge); emit_insn (fcn_mask (target, wide_reg, mem, merge, mask)); } - /* Merge with something unknown might happen if we z-mask w/ -O0. */ + /* Merge with something unknown might happen if we z-mask w/ -O0. */ else { - rtx tmp = target; - emit_insn (fcn_mask (tmp, wide_reg, mem, tmp, mask)); - - target = force_reg (mode, merge); - emit_insn (msk_mov (target, tmp, target, mask)); + target = gen_reg_rtx (mode); + emit_move_insn (target, merge); + emit_insn (fcn_mask (target, wide_reg, mem, target, mask)); } } - return target; - } + return target; + } case IX86_BUILTIN_4FNMASS: fcn = gen_avx5124fmaddps_4fnmaddss; @@ -38366,7 +38360,6 @@ v4fma_expand: case IX86_BUILTIN_4FNMASS_MASK: fcn_mask = gen_avx5124fmaddps_4fnmaddss_mask; fcn_maskz = gen_avx5124fmaddps_4fnmaddss_maskz; - msk_mov = gen_avx512vl_loadv4sf_mask; goto s4fma_expand; case IX86_BUILTIN_4FMASS_MASK: @@ -38380,22 +38373,21 @@ v4fma_expand: fcn_mask = gen_avx5124fmaddps_4fmaddss_mask; fcn_maskz = gen_avx5124fmaddps_4fmaddss_maskz; - msk_mov = gen_avx512vl_loadv4sf_mask; s4fma_expand: mode = V4SFmode; wide_reg = gen_reg_rtx (V64SFmode); for (i = 0; i < 4; i++) { -rtx tmp; -args[i] = CALL_EXPR_ARG (exp, i); -ops[i] = expand_normal (args[i]); + rtx tmp; + args[i] = CALL_EXPR_ARG (exp, i); + ops[i] = expand_normal (args[i]); -tmp = gen_reg_rtx (SFmode); -emit_move_insn (tmp, gen_rtx_SUBREG (SFmode, ops[i], 0)); + tmp = gen_reg_rtx (SFmode); + emit_move_insn (tmp, gen_rtx_SUBREG (SFmode, ops[i], 0)); -emit_move_insn
Re: [PATCH] avoid calling alloca(0)
On 11/18/2016 11:46 AM, Jeff Law wrote: On 11/18/2016 12:58 AM, Jakub Jelinek wrote: On Thu, Nov 17, 2016 at 05:32:31PM -0700, Martin Sebor wrote: The point is warning on an alloca (0) may not be as clear cut as it might seem. It's probably a reasonable thing to do on the host, but on a target, which might be embedded and explicitly overriding the builtin alloca, warning on alloca (0) is less of a slam dunk. I confess I haven't heard of such an implementation before but after some searching I have managed to find a few examples of it online, such as in QNX or in the BlackBerry SDK, or in the GCC shipped by Texas Instruments. For instance: In the libiberty/alloca.c, I don't see anything special about alloca (0). Yes, alloca (0) reclaims garbage, but so does alloca (1) and alloca (4035). alloca (0) just returns NULL after it. The diffutils link is the same alloca.c as in libiberty. Returning NULL or returning a non-unique pointer for alloca (0) is just fine, it is the same thing as returning NULL or returning a non-unique pointer from malloc (0). We definitely don't want to warn for malloc (0), and IMNSHO don't want to warn for alloca (0), because it behaves the same. Maybe that's the key. We don't want to warn for alloca (0); But we should warn for whatever = alloca (0); The former is a clear request for GC of the alloca space. The latter it much more likely a request for space where something went wrong computing the amount of space. That makes sense to me. In fact, it already works this way, though not thanks to anything I did. GCC optimizes calls to alloca away when their return value isn't used so they don't trigger the warning. With -fno-builtin-alloca (and with the Glibc alloca macro suppressed), a call to alloca(0) does not emit the warning because Glibc alloca isn't declared with attribute alloc_size (or any other attribute except C++ nothrow). Only calls to the built-in whose return value is used do trigger it, whether the argument is a literal zero or the result of constant propagation. Martin
documenting command-line options (was Re: PING [PATCH] enable -fprintf-return-value by default)
On 11/18/2016 11:52 AM, Martin Sebor wrote: [snip] I think it would be be ideal if all the options were sorted the same way in all sections. Is there some command to have texinfo sort them for us? If not, can we write a script to sort them, either each time just before generating the docs or manually? (I'm happy to help.) Otherwise, consistency will continue to be an elusive goal. There are at least two reasons why I don't think following the style used by each section is likely to yield good results (and clearly hasn't to date). First, the big sections already have examples of both approaches (or simply options out of order). In some of them it can also be hard to tell if the radix of the options you're looking to for guidance starts with an 'n'. Second, when adding more than one option to different sections (such as the -Wformat-length and -fprintf-format-length options) having to remember to apply a different sort order for each (warnings are sorted by radix but optimization options, for the most parts, strictly alphabetically), seems also likely to trip people up. This is wandering off into a general documentation maintenance discussion that has little to do with the original patch review request. GCC has way too many command-line options and most of them are interesting to only some tiny fraction of users. The documentation needs to be structured to focus on the options users are most likely to need or find useful, and not bury the information about the most important options in the middle of a huge pile of barely-useful documentation about barely-useful options. For this reason, I think a strict alphabetical sorting across the board is not helpful. E.g., for the list of optimization options, the -O options are clearly more important than the -f options controlling individual optimizations, so -O should be the first thing users see when they look at that section. Similarly for debug options, the vast majority of users won't care about anything other than -g. For target-specific options, it may make sense to list -march/-mcpu/-mtune together regardless of their alphabetization with respect to other -m options for that target. My last round of cleanup on invoke.texi was focused on refining the categorization of options and moving some option documentation that was clearly mis-categorized to more appropriate places or newly-created categories. Having fewer items listed in each category makes the ordering of things within the group less critical. I think people looking for documentation for a specific option should be looking in the index first. We should have @cindex entries for both the -ffoo and -fno-foo variants so the alphabetization works either way. Anyway long story short there's probably no perfect organization for this chapter, and the sheer number of options being documented makes it a pile of work even to implement small changes in convention across the board. That shouldn't stop us from making incremental improvements in organization, one section at a time, or taking care not to make the current situation worse when adding documentation for new options. -Sandra
C++ PATCH for c++/67631, list-init and explicit
Just need to pass the flags down. Tested x86_64-pc-linux-gnu, applying to trunk. commit 6aee56eb70ff04c9c1fbcbef388144d4a0338154 Author: Jason MerrillDate: Fri Nov 18 10:04:49 2016 -0500 PR c++/67631 - list-init and explicit conversions * semantics.c (finish_compound_literal): Call digest_init_flags. * typeck2.c (digest_init_flags): Add complain parm. (store_init_value): Pass it. diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index e5f9113..5674886 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -6839,7 +6839,7 @@ extern tree store_init_value (tree, tree, vec **, int); extern tree split_nonconstant_init (tree, tree); extern bool check_narrowing(tree, tree, tsubst_flags_t); extern tree digest_init(tree, tree, tsubst_flags_t); -extern tree digest_init_flags (tree, tree, int); +extern tree digest_init_flags (tree, tree, int, tsubst_flags_t); extern tree digest_nsdmi_init (tree, tree); extern tree build_scoped_ref (tree, tree, tree *); extern tree build_x_arrow (location_t, tree, diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c index 96c67a5..389e7f1 100644 --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -2713,7 +2713,8 @@ finish_compound_literal (tree type, tree compound_literal, if (type == error_mark_node) return error_mark_node; } - compound_literal = digest_init (type, compound_literal, complain); + compound_literal = digest_init_flags (type, compound_literal, LOOKUP_NORMAL, + complain); if (TREE_CODE (compound_literal) == CONSTRUCTOR) TREE_HAS_CONSTRUCTOR (compound_literal) = true; diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c index 2ca4bf2..b214c99 100644 --- a/gcc/cp/typeck2.c +++ b/gcc/cp/typeck2.c @@ -794,7 +794,7 @@ store_init_value (tree decl, tree init, vec ** cleanups, int flags) value = init; else /* Digest the specified initializer into an expression. */ -value = digest_init_flags (type, init, flags); +value = digest_init_flags (type, init, flags, tf_warning_or_error); value = extend_ref_init_temps (decl, value, cleanups); @@ -1165,9 +1165,9 @@ digest_init (tree type, tree init, tsubst_flags_t complain) } tree -digest_init_flags (tree type, tree init, int flags) +digest_init_flags (tree type, tree init, int flags, tsubst_flags_t complain) { - return digest_init_r (type, init, false, flags, tf_warning_or_error); + return digest_init_r (type, init, false, flags, complain); } /* Process the initializer INIT for an NSDMI DECL (a FIELD_DECL). */ @@ -1183,7 +1183,7 @@ digest_nsdmi_init (tree decl, tree init) if (BRACE_ENCLOSED_INITIALIZER_P (init) && CP_AGGREGATE_TYPE_P (type)) init = reshape_init (type, init, tf_warning_or_error); - init = digest_init_flags (type, init, flags); + init = digest_init_flags (type, init, flags, tf_warning_or_error); if (TREE_CODE (init) == TARGET_EXPR) /* This represents the whole initialization. */ TARGET_EXPR_DIRECT_INIT_P (init) = true; diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist-explicit1.C b/gcc/testsuite/g++.dg/cpp0x/initlist-explicit1.C new file mode 100644 index 000..5e00b2d --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/initlist-explicit1.C @@ -0,0 +1,11 @@ +// PR c++/67631 +// { dg-do compile { target c++11 } } + +struct X +{ + explicit operator unsigned (); +}; +unsigned foo () +{ + return unsigned{ X () }; +}
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
Hi! On Thu, Nov 17, 2016 at 02:18:57PM -0800, H.J. Lu wrote: > > Hi HJ, could you please commit it? > > Done. I'm seeing lots of ICEs with this. E.g. reduced: typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__)); typedef unsigned char __mmask8; typedef float __v4sf __attribute__ ((__vector_size__ (16))); static inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_setzero_ps (void) { return __extension__ (__m128){ 0.0f, 0.0f, 0.0f, 0.0f }; } __m128 foo (__mmask8 __U, __m128 __A, __m128 __B, __m128 __C, __m128 __D, __m128 __E, __m128 *__F) { return (__m128) __builtin_ia32_4fmaddss_mask ((__v4sf) __B, (__v4sf) __C, (__v4sf) __D, (__v4sf) __E, (__v4sf) __A, (const __v4sf *) __F, (__v4sf) _mm_setzero_ps (), (__mmask8) __U); } ICEs with -mavx5124fmaps -O0, but succeeds with -mavx512vl -mavx5124fmaps -O0 or -mavx5124fmaps -O2. fcn_mask = gen_avx5124fmaddps_4fmaddss_mask; fcn_maskz = gen_avx5124fmaddps_4fmaddss_maskz; msk_mov = gen_avx512vl_loadv4sf_mask; looks wrong, while -mavx5124fmaps implies -mavx512f, it doesn't imply -mavx512vl, so using -mavx512vl insns unconditionally is just wrong. You need some fallback if avx512vl isn't available, perhaps use avx512f 512-bit masked insns with bits in the mask forced to pick only the ones you want? Also, seems there are various formatting issues in the change, e.g. shortly after s4fma_expand: there is indentation by 3 chars relative to above { instead of 2, gen_rtx_SUBREG (V16SFmode, tmp, 0)); has extra 1 char indentation, some lines too long. Jakub
Re: [fixincludes] Fix macOS 10.12 and (PR sanitizer/78267)
On Fri, Nov 18, 2016 at 9:42 AM, Mike Stumpwrote: > On Nov 18, 2016, at 2:45 AM, Rainer Orth > wrote: >> So the current suggestion is to combine my fixincludes patch and Jack's >> patch to disable use if !__BLOCKS__. > >> I guess this is ok for mainline now to restore bootstrap? > > I think we are down to everyone likes this, Ok. The big question is, does > this need a back port? My thinking on that subject is that since include fixes do not directly affect the compiler and, instead, facilitate its use on various platforms, it is therefore reasonably safe to back port. Especially if adequate guards (selection tests) are included in the fix to prevent it from taking action on the wrong files. But I also think it is really a "steering committee" decision. For me, I think it is safe enough.
Re: PING [PATCH] enable -fprintf-return-value by default
On 11/18/2016 11:52 AM, Martin Sebor wrote: I think it would be be ideal if all the options were sorted the same way in all sections. Is there some command to have texinfo sort them for us? If not, can we write a script to sort them, either each time just before generating the docs or manually? (I'm happy to help.) Otherwise, consistency will continue to be an elusive goal. I'm not aware of texinfo way to do this automatically. There are at least two reasons why I don't think following the style used by each section is likely to yield good results (and clearly hasn't to date). First, the big sections already have examples of both approaches (or simply options out of order). In some of them it can also be hard to tell if the radix of the options you're looking to for guidance starts with an 'n'. Second, when adding more than one option to different sections (such as the -Wformat-length and -fprintf-format-length options) having to remember to apply a different sort order for each (warnings are sorted by radix but optimization options, for the most parts, strictly alphabetically), seems also likely to trip people up. Let's split this issue off by moving the option into the location Sandra has asked so that we're at least kindof, sorta, locally consistent. That allows your patch to go forward. Then separately we can see if we can bring more sense to the larger issue. Sandra has tried to work towards bring sanity to our documentation (which has grown like field bindweed over time) and we can include a discussion about this issue in that larger effort. PS I don't mind moving the -fno-printf-return-value option up or down and I will do it before committing the patch. I would just prefer to be able not to have to remember and worry about all these subtle rules. There are too many of them and they tend to take time and energy away from things that matter more (like fixing bugs). Understood. But that's also part of the reason why we delegate things -- there's a million little things to remember and nobody can remember them all. So it's a balance between saying "we should clean this up and bring consistency here now" and "the maintainer has asked for a change, let's do that and address the consistency issues separately". There's obviously pros and cons to each decision which I don't enumerate ;-) Jeff
Re: PING [PATCH] enable -fprintf-return-value by default
On 11/18/2016 10:38 AM, Sandra Loosemore wrote: On 11/18/2016 09:01 AM, Martin Sebor wrote: On 11/17/2016 10:34 PM, Sandra Loosemore wrote: On 11/16/2016 09:49 AM, Martin Sebor wrote: I'm looking for an approval of the attached patch. I've adjusted the documentation based on Sandra's input (i.e., documented the negative of the option rather than the positive; thank you for the review, btw.) On 11/08/2016 08:13 PM, Martin Sebor wrote: The -fprintf-return-value optimization has been disabled since the last time it caused a bootstrap failure on powerpc64le. With the underlying problems fixed GCC has bootstrapped fine on all of powerpc64, powerpc64le and x86_64 and tested with no regressions. I'd like to re-enable the option. The attached patch does that. [snip] Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi(revision 242500) +++ gcc/doc/invoke.texi(working copy) @@ -384,7 +384,7 @@ Objective-C and Objective-C++ Dialects}. -fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss @gol -fomit-frame-pointer -foptimize-sibling-calls @gol -fpartial-inlining -fpeel-loops -fpredictive-commoning @gol --fprefetch-loop-arrays -fprintf-return-value @gol +-fprefetch-loop-arrays -fno-printf-return-value @gol -fprofile-correction @gol -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol -fprofile-reorder-functions @gol Please keep this list alphabetized -- the other "-fno-*" options are sorted as such. The documentation parts of the patch are OK with that fixed, but I can't approve changing the default for the option. I kept the option in the same place on the assumption that it was the "printf" radix of the name, not the "no-" prefix, that these were alphabetized by. But now that you point it out and I've looked more carefully at some of the other options, I see that in some sections they are listed strictly alphabetically (-fno-foo after -fmoo) while in others it's the way you suggest. AFAICS, the former style looks prevalent in the C++ Language Options and the in Warning Options, for example. The latter seems to be more popular in the Optimization Options section (though there are counterexamples). I'm happy to follow either of these conventions as long as its consistent. Otherwise, without both kinds of examples to choose from, I don't think I can trust my brain to remember yet another rule. Well, how about the rule is that you look at the convention of the specific list you are adding something to? At least that way we retain local consistency and don't mess up those parts of the documentation that we have already tried to organize in some way. There's so much material in the command-line options chapter that it would be hard to figure out how to present it even if the content were completely static, much less when people are adding/renaming/modifying options all the time. I think it would be be ideal if all the options were sorted the same way in all sections. Is there some command to have texinfo sort them for us? If not, can we write a script to sort them, either each time just before generating the docs or manually? (I'm happy to help.) Otherwise, consistency will continue to be an elusive goal. There are at least two reasons why I don't think following the style used by each section is likely to yield good results (and clearly hasn't to date). First, the big sections already have examples of both approaches (or simply options out of order). In some of them it can also be hard to tell if the radix of the options you're looking to for guidance starts with an 'n'. Second, when adding more than one option to different sections (such as the -Wformat-length and -fprintf-format-length options) having to remember to apply a different sort order for each (warnings are sorted by radix but optimization options, for the most parts, strictly alphabetically), seems also likely to trip people up. Martin PS I don't mind moving the -fno-printf-return-value option up or down and I will do it before committing the patch. I would just prefer to be able not to have to remember and worry about all these subtle rules. There are too many of them and they tend to take time and energy away from things that matter more (like fixing bugs).
Re: [PATCH] avoid calling alloca(0)
On 11/18/2016 12:58 AM, Jakub Jelinek wrote: On Thu, Nov 17, 2016 at 05:32:31PM -0700, Martin Sebor wrote: The point is warning on an alloca (0) may not be as clear cut as it might seem. It's probably a reasonable thing to do on the host, but on a target, which might be embedded and explicitly overriding the builtin alloca, warning on alloca (0) is less of a slam dunk. I confess I haven't heard of such an implementation before but after some searching I have managed to find a few examples of it online, such as in QNX or in the BlackBerry SDK, or in the GCC shipped by Texas Instruments. For instance: In the libiberty/alloca.c, I don't see anything special about alloca (0). Yes, alloca (0) reclaims garbage, but so does alloca (1) and alloca (4035). alloca (0) just returns NULL after it. The diffutils link is the same alloca.c as in libiberty. Returning NULL or returning a non-unique pointer for alloca (0) is just fine, it is the same thing as returning NULL or returning a non-unique pointer from malloc (0). We definitely don't want to warn for malloc (0), and IMNSHO don't want to warn for alloca (0), because it behaves the same. Maybe that's the key. We don't want to warn for alloca (0); But we should warn for whatever = alloca (0); The former is a clear request for GC of the alloca space. The latter it much more likely a request for space where something went wrong computing the amount of space. jeff
Re: [PATCH] avoid calling alloca(0)
On 11/17/2016 05:32 PM, Martin Sebor wrote: I confess I haven't heard of such an implementation before but after some searching I have managed to find a few examples of it online, such as in QNX or in the BlackBerry SDK, or in the GCC shipped by Texas Instruments. For instance: I have the "advantage" of being a GCC gray beard and had the misfortune of maintaining a system that had one of those allocas (hpux) *and* having to debug heap exhaustions in GCC that would occur due to this kind of construct for (some large number of iterations) frobit (...) Where frobit would allocate a metric ton of stuff with alloca. Note the calls to alloca down in frobit would all appear to be at the same stack depth (alloca literally recorded the SP value and the PA was a fixed frame target). So the calls to alloca within frobit wouldn't deallocate anything. http://www.qnx.com/developers/docs/6.5.0SP1.update/com.qnx.doc.neutrino_lib_ref/a/alloca.html Their GCC-derived compilers apparently enable it with -fno-builtins. Right. That said, I don't view this as reason to exclude the warning from -Wextra. The vendor-provided compilers are obviously customized versions of GCC with their own features, including their own set of options and warnings. It shouldn't stop us from enabling those we expect to be useful to the typical GCC user on typical targets, and especially those that can help expose sources of common bugs. Recognizing I'm preaching to choir here: Calling alloca with any argument is considered dangerous enough to be discouraged in most man pages. Alloca(0) is a potentially dangerous corner case of an already dangerous API. It seems at least as problematic as any of the warnings already in -Wextra, and I'd say as many in -Wall. Right. We're well in the "living dangerously" space. But I could claim that on one of these targets, warning about an alloca (0) would likely result in a developer removing it or forcing it to allocate some small amount of space to shut up the warning. That in turn runs the risk of making the target code more prone to heap exhaustion and possibly less secure, particularly if the developer isn't aware of the special nature of alloca (0). Even on systems with this unusual alloca implementation, since alloca(0) is special (and expensive), warning on such calls would likely be useful. In fact, since calling alloca(0) in these environments is actually important, warning on them coukd help detect their unintended absence. The few such calls that are intended can be easily tweaked to suppress the warning. It certainly is special and often expensive. But I'd come back to the likely behavior of the developer. I suspect unless there's a comment indicating the alloca (0) is intentional, they're likely to just remove it. So I see both sides here and I'm not sure what the best path forward should be. jeff
Re: [PATCH] enable -fprintf-return-value by default
On 11/08/2016 08:13 PM, Martin Sebor wrote: The -fprintf-return-value optimization has been disabled since the last time it caused a bootstrap failure on powerpc64le. With the underlying problems fixed GCC has bootstrapped fine on all of powerpc64, powerpc64le and x86_64 and tested with no regressions. I'd like to re-enable the option. The attached patch does that. Thanks Martin gcc-fprintf-return-value.diff gcc/c-family/ChangeLog: * c.opt (-fprintf-return-value): Enable by default. gcc/ChangeLog: * doc/invoke.texi (-fprintf-return-value): Document that option is enabled by default. OK once you and Sandra are in-sync on the doc changes. jeff
Re: [Patch v4 0/17] Add support for _Float16 to AArch64 and ARM
On Fri, Nov 11, 2016 at 03:37:17PM +, James Greenhalgh wrote: > > Hi, > > This patch set enables the _Float16 type specified in ISO/IEC TS 18661-3 > for AArch64 and ARM. The patch set has been posted over the past two months, > with many of the target-independent changes approved. I'm reposting it in > entirity in the form I hope to commit it to trunk. > > The patch set can be roughly split in three; first, hookization of > TARGET_FLT_EVAL_METHOD, and changes to the excess precision logic in the > compiler to handle the new values for FLT_EVAL_METHOD defined in > ISO/IEC TS-18661-3. Second, the AArch64 changes required to enable _Float16, > and finally the ARM changes required to enable _Float16. > > The broad goals and an outline of each patch in the patch set were > described in https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02383.html . > As compared to the original submission, the patch set has grown an ARM > port, and has had several rounds of technical review on the target > independent aspects. > > This has resulted in many of the patches already being approved, a full > summary of the status of each ticket is immediately below. > > Clearly the focus for review of this patch set now needs to be the AArch64 > and ARM ports, I hope the appropriate maintainers will be able to do so in > time for the patch set to be accepted for GCC 7. > > I've built and tested the full patch set on ARM (cross and native), > AArch64 (cross and native) and x86_64 (native) with no identified issues. All the target independent changes are now approved, and all the ARM patches have been approved (two are conditional on minor changes). I'd like to apply the target independent and ARM changes to trunk, while I wait for approval of the AArch64 changes. The changes for the two ports should be independent. Would that be acceptable, or would you prefer me to wait for review of the AArch64 changes? I will then send a follow-up patch for doc/extend.texi detailing the availability of _Float16 on ARM (I'm holding off on doing this until I know which order the ARM and AArch64 parts will go in). Thanks, James > -- > Target independent changes > > 10 patches, 9 previously approved, 1 New implementing testsuite > changes to enable _Float16 tests in more circumstances on ARM. > -- > > [Patch 1/17] Add a new target hook for describing excess precision intentions > > Approved: https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00781.html > > [Patch 2/17] Implement TARGET_C_EXCESS_PRECISION for i386 > > Blanket approved by Jeff in: > https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02402.html > > [Patch 3/17] Implement TARGET_C_EXCESS_PRECISION for s390 > > Approved: https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01554.html > > [Patch 4/17] Implement TARGET_C_EXCESS_PRECISION for m68k > > Blanket approved by Jeff in: > https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02402.html > And by Andreas: https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02414.html > > There was a typo in the original patch, fixed in: > https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01173.html > which I would apply as an "obvious" fix to the original patch. > > [Patch 5/17] Add -fpermitted-flt-eval-methods=[c11|ts-18661-3] > > Approved: https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02405.html > > Joseph had a comment in > https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00335.html that the tests > should check FLT_EVAL_METHOD from rather than > __FLT_EVAL_METHOD__. Rather than implement that suggestion, I added tests > to patch 6 which tested the macro, and left the tests in this > patch testing the internal macro. > > [Patch 6/17] Migrate excess precision logic to use TARGET_EXCESS_PRECISION > > Approved (after removing a rebase bug): > https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00231.html > > [Patch 7/17] Delete TARGET_FLT_EVAL_METHOD and poison it. > > Approved: https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02401.html > > [Patch 8/17] Make _Float16 available if HFmode is available > > Approved: https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02403.html > > [Patch libgcc 9/17] Update soft-fp from glibc > > Self approved under policy that we can update libraries which GCC mirrors > without further approval. > > [Patch testsuite patch 10/17] Add options for floatN when checking effective > target for support > > NEW! > > > AArch64 changes > > 3 patches, none reviewed > > > [Patch AArch64 11/17] Add floatdihf2 and floatunsdihf2 patterns > > Not reviewed, last pinged (^6): > https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00584.html > > [Patch libgcc AArch64 12/17] Enable hfmode soft-float conversions and > truncations > > Not reviewed: > https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02395.html > > [Patch AArch64 13/17] Enable _Float16 for AArch64 > > Not reviewed: >
libgo patch committed: Move sched variable from C to Go
This patch simply moves the schedt (aka Sched) type and the sched variable from C to Go. This is a step toward further changes. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 242594) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -fc4ca600b2fc6de81fd3c4014542d6a50593db1a +bf4762823c4543229867436399be3ae30b4d13bb The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/go/runtime/runtime2.go === --- libgo/go/runtime/runtime2.go(revision 242581) +++ libgo/go/runtime/runtime2.go(working copy) @@ -550,9 +550,6 @@ const ( _MaxGomaxprocs = 1 << 8 ) -/* -Commented out for gccgo for now. - type schedt struct { // accessed atomically. keep at top to ensure alignment on 32-bit systems. goidgen uint64 @@ -578,18 +575,17 @@ type schedt struct { runqsize int32 // Global cache of dead G's. - gflock mutex - gfreeStack *g - gfreeNoStack *g - ngfree int32 + gflock mutex + gfree *g + ngfree int32 // Central cache of sudog structs. sudoglock mutex sudogcache *sudog - // Central pool of available defer structs of different sizes. + // Central pool of available defer structs. deferlock mutex - deferpool [5]*_defer + deferpool *_defer gcwaiting uint32 // gc is waiting to run stopwait int32 @@ -608,7 +604,6 @@ type schedt struct { procresizetime int64 // nanotime() of last change to gomaxprocs totaltime int64 // ∫gomaxprocs dt up to procresizetime } -*/ // The m.locked word holds two pieces of state counting active calls to LockOSThread/lockOSThread. // The low bit (LockExternal) is a boolean reporting whether any LockOSThread call is active. @@ -772,8 +767,10 @@ var ( ncpu int32 -// forcegc forcegcstate -// sched schedt + // forcegc forcegcstate + + sched schedt + // newprocsint32 // Information about what cpu features are available. Index: libgo/go/runtime/stubs.go === --- libgo/go/runtime/stubs.go (revision 242581) +++ libgo/go/runtime/stubs.go (working copy) @@ -520,3 +520,9 @@ func dumpregs(*_siginfo_t, unsafe.Pointe // Temporary for gccgo until we port panic.go. func startpanic() + +// Temporary for gccgo until we port proc.go. +//go:linkname getsched runtime.getsched +func getsched() *schedt { + return +} Index: libgo/runtime/proc.c === --- libgo/runtime/proc.c(revision 242581) +++ libgo/runtime/proc.c(working copy) @@ -351,48 +351,18 @@ runtime_mcall(void (*pfn)(G*)) // // Design doc at http://golang.org/s/go11sched. -typedef struct Sched Sched; -struct Sched { - Lock; - - uint64 goidgen; - M* midle; // idle m's waiting for work - int32 nmidle; // number of idle m's waiting for work - int32 nmidlelocked; // number of locked m's waiting for work - int32 mcount; // number of m's that have been created - int32 maxmcount; // maximum number of m's allowed (or die) - - P* pidle; // idle P's - uint32 npidle; - uint32 nmspinning; - - // Global runnable queue. - G* runqhead; - G* runqtail; - int32 runqsize; - - // Global cache of dead G's. - Lockgflock; - G* gfree; - - uint32 gcwaiting; // gc is waiting to run - int32 stopwait; - Notestopnote; - uint32 sysmonwait; - Notesysmonnote; - uint64 lastpoll; - - int32 profilehz; // cpu profiling rate -}; +typedef struct schedt Sched; enum { - // Number of goroutine ids to grab from runtime_sched.goidgen to local per-P cache at once. + // Number of goroutine ids to grab from runtime_sched->goidgen to local per-P cache at once. // 16 seems to provide enough amortization, but other than that it's mostly arbitrary number. GoidCacheBatch = 16, }; -Sched runtime_sched; +extern Sched* runtime_getsched() __asm__ (GOSYM_PREFIX "runtime.getsched"); + +static Sched* runtime_sched; int32 runtime_gomaxprocs; uint32 runtime_needextram = 1; M runtime_m0; @@ -471,6 +441,8 @@ runtime_schedinit(void) const byte *p; Eface i; + runtime_sched = runtime_getsched(); + m = _m0; g = _g0; m->g0 = g; @@ -479,7 +451,7 @@ runtime_schedinit(void) initcontext(); - runtime_sched.maxmcount = 1; +
Re: [fixincludes] Fix macOS 10.12 and (PR sanitizer/78267)
On Nov 18, 2016, at 2:45 AM, Rainer Orthwrote: > So the current suggestion is to combine my fixincludes patch and Jack's > patch to disable use if !__BLOCKS__. > I guess this is ok for mainline now to restore bootstrap? I think we are down to everyone likes this, Ok. The big question is, does this need a back port? If you fix includes virtual members or data members of C/C++ classes, just be careful disappearing content as that can change the layout of the structure or class.
Re: PING [PATCH] enable -fprintf-return-value by default
On 11/18/2016 09:01 AM, Martin Sebor wrote: On 11/17/2016 10:34 PM, Sandra Loosemore wrote: On 11/16/2016 09:49 AM, Martin Sebor wrote: I'm looking for an approval of the attached patch. I've adjusted the documentation based on Sandra's input (i.e., documented the negative of the option rather than the positive; thank you for the review, btw.) On 11/08/2016 08:13 PM, Martin Sebor wrote: The -fprintf-return-value optimization has been disabled since the last time it caused a bootstrap failure on powerpc64le. With the underlying problems fixed GCC has bootstrapped fine on all of powerpc64, powerpc64le and x86_64 and tested with no regressions. I'd like to re-enable the option. The attached patch does that. [snip] Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi(revision 242500) +++ gcc/doc/invoke.texi(working copy) @@ -384,7 +384,7 @@ Objective-C and Objective-C++ Dialects}. -fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss @gol -fomit-frame-pointer -foptimize-sibling-calls @gol -fpartial-inlining -fpeel-loops -fpredictive-commoning @gol --fprefetch-loop-arrays -fprintf-return-value @gol +-fprefetch-loop-arrays -fno-printf-return-value @gol -fprofile-correction @gol -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol -fprofile-reorder-functions @gol Please keep this list alphabetized -- the other "-fno-*" options are sorted as such. The documentation parts of the patch are OK with that fixed, but I can't approve changing the default for the option. I kept the option in the same place on the assumption that it was the "printf" radix of the name, not the "no-" prefix, that these were alphabetized by. But now that you point it out and I've looked more carefully at some of the other options, I see that in some sections they are listed strictly alphabetically (-fno-foo after -fmoo) while in others it's the way you suggest. AFAICS, the former style looks prevalent in the C++ Language Options and the in Warning Options, for example. The latter seems to be more popular in the Optimization Options section (though there are counterexamples). I'm happy to follow either of these conventions as long as its consistent. Otherwise, without both kinds of examples to choose from, I don't think I can trust my brain to remember yet another rule. Well, how about the rule is that you look at the convention of the specific list you are adding something to? At least that way we retain local consistency and don't mess up those parts of the documentation that we have already tried to organize in some way. There's so much material in the command-line options chapter that it would be hard to figure out how to present it even if the content were completely static, much less when people are adding/renaming/modifying options all the time. -Sandra
Re: Import libcilkrts Build 4467 (PR target/68945)
2016-11-17 20:01 GMT+03:00 Jeff Law: > On 11/17/2016 09:56 AM, Ilya Verbin wrote: >> >> 2016-11-17 18:50 GMT+03:00 Rainer Orth : >>> >>> Rainer Orth writes: >>> I happened to notice that my libcilkrts SPARC port has been applied upstream. So to reach closure on this issue for the GCC 7 release, I'd like to import upstream into mainline which seems to be covered by the free-for-all clause in https://gcc.gnu.org/svnwrite.html#policies, even though https://gcc.gnu.org/codingconventions.html#upstream lists nothing specific and we have no listed maintainer. >>> >>> >>> I initially used Ilya's intel.com address, which bounced. Now using the >>> current address listed in MAINTAINERS... >> >> >> Yeah, I don't work for Intel anymore. And I'm not a libcilkrts >> maintainer, so I can't approve it. > > Do you want to be? IIRC I was going to nominate you, but held off knowing > your situation was going to change. > > If you're interested in maintainer positions, I can certainly still nominate > you. I have little experience with this library, and no longer have a connection with Cilk developers an Intel, so I'm not interested. -- Ilya
Re: [PATCH, Darwin] Fix PR57438 by avoiding empty function bodies and trailing labels.
On Fri, Nov 18, 2016 at 05:19:31PM +, Iain Sandoe wrote: > I’d like to do that; is there a way to force a jump table for a limited set > of cases? > (the example was about the smallest I could get where GCC elected to produce > a jump table instead of branches) --param case-values-threshold ? Segher
Re: [PATCH] avoid calling alloca(0)
On Fri, Nov 18, 2016 at 10:14:09AM -0700, Martin Sebor wrote: > Because people make mistakes and warnings help us avoid them (isn't > that obvious?) Just because we get it right most of the time doesn't > mean we get right all of the time. The papers and exploits I pointed > to should provide ample evidence that zero allocations are a problem > that can have serious (and costly) consequences. We (i.e., Red Hat) > recognize this risk and have made it our goal to help stem some of > these problems. Are you talking about cases where user writes malloc (0) explicitly, or where user writes malloc (n); and the optimizers figure out n is 0, either always, or e.g. because the compiler decided to duplicate the code and and in one of the branches it proves it is 0, or you want to add a runtime warning when malloc is called with 0? E.g. I don't see how writing malloc (0) in the code should have anything to do with any overflows. The cases where jump threading creates cases where we call functions with constant arguments and we then warn on them is also problematic, often such code is even dead except the compiler is not able to prove it. > >It IMHO doesn't belong to either of these. > > You've made that quite clear. I struggle to reconcile your > position in this case with the one in debate about the > -Wimplicit-fallthrough option where you insisted on the exact > opposite despite the overwhelming number of false positives > caused by it. Why is it appropriate for that option to be in > -Wextra and not this one? It also matters what one can do to tell the compiler the code is fine. For e.g. -Wimplicit-fallthrough and many other warnings one can add a comment, or attribute, etc. to get the warning out of the way. But the patch you've posted for the alloca (0) stuff contained changes of n to n + !n, so the warning basically asks people to slow down their code for no reason, just to get the warning out of the way. Jakub
Re: [PATCH] debug/PR78112: remove recent duplicates for DW_TAG_subprogram attributes
Hi Christophe, On 11/18/2016 03:03 PM, Christophe Lyon wrote: Hi, Part of the new testcase added with this commit fails on arm* targets: g++.dg/pr78112.C scan-assembler-times DW_AT_object_pointer 37 Thank you for the report. As I said on the bugzilla (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78112#c17), I reproduce inconsistent results on this even on a very reduced testcase. So I wonder if I should just remove this testcase. -- Pierre-Marie de Rodat
Re: [fixincludes] Fix macOS 10.12 and (PR sanitizer/78267)
I think restoring bootstrap is likely a good thing. On Fri, Nov 18, 2016 at 2:45 AM, Rainer Orthwrote: > > I guess this is ok for mainline now to restore bootstrap?
Re: [PATCH] DWARF: make signedness explicit for enumerator const values
On 11/14/2016 01:05 PM, Mark Wielaard wrote: You can either choose a signed/unsigned form not giving the consumer a hint about the size of the underlying constant value or one of the sized data forms that don't give a hint about the value representation/signedness. So in both cases the consumer looses without an actual (base) type telling them how to interpret the constant. I see, thank you for the explanation! :-) I agree with you, so I’ll revise to instead add a DW_AT_type attribute to enumeration DIEs to point to a base type. -- Pierre-Marie de Rodat
Re: [PATCH v2][PR libgfortran/78314] Fix ieee_support_halting
On 16/11/16 16:53, Szabolcs Nagy wrote: > ieee_support_halting only checked the availability of status > flags, not trapping support. On some targets the later can > only be checked at runtime: feenableexcept reports if > enabling traps failed. > > So check trapping support by enabling/disabling it. > > Updated the test that enabled trapping to check if it is > supported. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. > > gcc/testsuite/ > 2016-11-16 Szabolcs Nagy> > PR libgfortran/78314 > * gfortran.dg/ieee/ieee_6.f90: Use ieee_support_halting. > > libgfortran/ > 2016-11-16 Szabolcs Nagy > > PR libgfortran/78314 > * config/fpu-glibc.h (support_fpu_trap): Use feenableexcept. > it seems this broke ieee_8.f90 which tests compile time vs runtime value of ieee_support_halting if fortran needs this, then support_halting should be always false on arm and aarch64. but i'm not familiar enough with fortran to tell if there is some better workaround.
Re: [PATCH, Darwin] Fix PR57438 by avoiding empty function bodies and trailing labels.
Hi Mike, > On 18 Nov 2016, at 17:16, Mike Stumpwrote: > > On Nov 18, 2016, at 4:33 AM, Iain Sandoe wrote: >> >> As discussed on IRC, I was under the impression that it is desired to move >> away from #ifdef towards if() and I have been adding those where locally >> things have been touched - in this case it was only partially possible. >> >> However, FAOD - you pointed out that for the RS6000 back-end #ifdef are >> still preferred, > > Shudder. We can encourage anyone that likes #if, to like if () instead. > >> OK now for trunk? > > Ok; I'm pretty sure that change can only impact darwin. If you wanted to > reduce the test case to 3 cases, I think that would also show the problem > that show that your patch fixes it, ok with such a change, if you want. I’d like to do that; is there a way to force a jump table for a limited set of cases? (the example was about the smallest I could get where GCC elected to produce a jump table instead of branches) Iain > >> Open branches? > > I'm fine with back porting. >
Re: [PATCH, Darwin] Fix PR57438 by avoiding empty function bodies and trailing labels.
On Nov 18, 2016, at 4:33 AM, Iain Sandoewrote: > > As discussed on IRC, I was under the impression that it is desired to move > away from #ifdef towards if() and I have been adding those where locally > things have been touched - in this case it was only partially possible. > > However, FAOD - you pointed out that for the RS6000 back-end #ifdef are still > preferred, Shudder. We can encourage anyone that likes #if, to like if () instead. > OK now for trunk? Ok; I'm pretty sure that change can only impact darwin. If you wanted to reduce the test case to 3 cases, I think that would also show the problem that show that your patch fixes it, ok with such a change, if you want. > Open branches? I'm fine with back porting.
Re: [PATCH] avoid calling alloca(0)
On 11/18/2016 09:29 AM, Jakub Jelinek wrote: On Fri, Nov 18, 2016 at 09:21:38AM -0700, Martin Sebor wrote: In the libiberty/alloca.c, I don't see anything special about alloca (0). Yes, alloca (0) reclaims garbage, but so does alloca (1) and alloca (4035). alloca (0) just returns NULL after it. The diffutils link is the same alloca.c as in libiberty. Returning NULL or returning a non-unique pointer for alloca (0) is just fine, it is the same thing as returning NULL or returning a non-unique pointer from malloc (0). We definitely don't want to warn for malloc (0), and IMNSHO don't want to warn for alloca (0), because it behaves the same. I disagree. At a minimum, the return value of malloc(0) is implementation-defined and so relying on it being either null or non-null is non-portable and can be a source of subtle bugs. Most apps know what malloc (0) means and treat it correctly, they know they shouldn't dereference the pointer, because it is either NULL or holds an array with 0 elements. I fail to see why you would want to warn. Because people make mistakes and warnings help us avoid them (isn't that obvious?) Just because we get it right most of the time doesn't mean we get right all of the time. The papers and exploits I pointed to should provide ample evidence that zero allocations are a problem that can have serious (and costly) consequences. We (i.e., Red Hat) recognize this risk and have made it our goal to help stem some of these problems. But malloc(0) has also been known to result from unsigned overflow which has led to vulnerabilities and exploits (famously by the JPG COM vulnerability exploit, but there are plenty of others, including for instance CVE-2016-2851). Much academic research has been devoted to this problem and to solutions to detect and prevent it. How is it different from overflowing to 1 or 2 or 27? What is special on the value 0? It's a case that, unlike the others, can be readily detected. It would be nice to detect the others as well but GCC can't do that yet. That doesn't mean we shouldn't try to detect at least the small subset for now. It doesn't have to be perfect for it to be useful. In the absence of a policy or guidelines it's a matter of opinion whether or not this warning belongs in either -Wall or -Wextra. It IMHO doesn't belong to either of these. You've made that quite clear. I struggle to reconcile your position in this case with the one in debate about the -Wimplicit-fallthrough option where you insisted on the exact opposite despite the overwhelming number of false positives caused by it. Why is it appropriate for that option to be in -Wextra and not this one? Martin
Patch ping
Hi! I'd like to ping 2 patches: http://gcc.gnu.org/ml/gcc-patches/2016-11/msg01074.html - C++ ABI - mangling of TLS aux symbols; either the posted patch or one with if (abi_version_at_least (11)) http://gcc.gnu.org/ml/gcc-patches/2016-11/msg00351.html - DWARF Solaris bootstrap fix Jakub
Re: [PATCH] Fix combine's make_extraction (PR rtl-optimization/78378)
Hi, On Wed, 16 Nov 2016, Jakub Jelinek wrote: > If inner is a MEM, make_extraction requires that pos is a multiple of > bytes and deals with offsetting it. Or otherwise requires that pos is a > multiple of BITS_PER_WORD and for REG inner it handles that too. But if > inner is something different, it calls just force_to_mode to the target > mode, which only really works if pos is 0. This should also fix the aarch64 fail for gcc.c-torture/execute/cbrt.c . (At least I came up with the same patch in PR78390) Ciao, Michael.
Re: [PATCH, Fortran, pr78395, v1] [OOP] error on polymorphic assignment
Hi Andre, Btw, when using the in gcc-7 available polymorphic assign, then v6 is actually auto-allocated and the program runs fine. So what are your opinions on the auto-allocation issue? It is my experience that such questions can speedily and correctly be answered by the regulars on comp.lang.fortran. I have no opinion either way :-) Regards Thomas
[PATCH, Fortran, pr78395, v1] [OOP] error on polymorphic assignment
Hi all, attached patch fixes the issue which was given by nesting calls to typebound procedures. The expression of the inner typebound procedure call was resolved correctly, but in the case of it's having a class type the ref-list was discarded. Leaving the list of references untouched, resolves the wrong error-message and generates correct code. When checking the shortened example in comment #3 one gets a segfault, because v6 is not allocated explicitly. The initial example made sure, that v6 was allocated. Reading through the standard, I did not find, whether the auto-allocation is applicable here. I therefore have extended the testcase by an allocate(v6). Dominique pointed out, that there are already some prs for adding an on-demand -fcheck=something runtime check for not allocated objects. But that does not solve the question, whether v6 should be auto-allocated when assigned by a typebound-procedure (ifort and cray need v6 allocated do, i.e., they don't auto-allocate). Btw, when using the in gcc-7 available polymorphic assign, then v6 is actually auto-allocated and the program runs fine. So what are your opinions on the auto-allocation issue? This patch fixes the wrong error messages in both gcc-7 and gcc-6. Bootstraped and regtested on x86_64-linux/F23 for gcc-7 and -6. Ok for trunk and gcc-6? Regards, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de gcc/testsuite/ChangeLog: 2016-11-18 Andre VehreschildPR fortran/78395 * gfortran.dg/typebound_operator_21.f03: New test. gcc/fortran/ChangeLog: 2016-11-18 Andre Vehreschild PR fortran/78395 * resolve.c (resolve_typebound_function): Prevent stripping of refs, when the base-expression is a class' typed one. diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c index 825bb12..589a673 100644 --- a/gcc/fortran/resolve.c +++ b/gcc/fortran/resolve.c @@ -6140,7 +6140,7 @@ resolve_typebound_function (gfc_expr* e) gfc_free_ref_list (class_ref->next); class_ref->next = NULL; } - else if (e->ref && !class_ref) + else if (e->ref && !class_ref && expr->ts.type != BT_CLASS) { gfc_free_ref_list (e->ref); e->ref = NULL; diff --git a/gcc/testsuite/gfortran.dg/typebound_operator_21.f03 b/gcc/testsuite/gfortran.dg/typebound_operator_21.f03 new file mode 100644 index 000..ea374a1 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/typebound_operator_21.f03 @@ -0,0 +1,78 @@ +! { dg-do run } +! +! Test that pr78395 is fixed. +! Contributed by Chris and Janus Weil + +module types_mod + implicit none + + type, public :: t1 +integer :: a + contains +procedure :: get_t2 + end type + + type, public :: t2 +integer :: b + contains +procedure, pass(rhs) :: mul2 +procedure :: assign +generic :: operator(*) => mul2 +generic :: assignment(=) => assign + end type + +contains + + function get_t2(this) +class(t1), intent(in) :: this +class(t2), allocatable :: get_t2 +type(t2), allocatable :: local +allocate(local) +local%b = this%a +call move_alloc(local, get_t2) + end function + + function mul2(lhs, rhs) +class(t2), intent(in) :: rhs +integer, intent(in) :: lhs +class(t2), allocatable :: mul2 +type(t2), allocatable :: local +allocate(local) +local%b = rhs%b*lhs +call move_alloc(local, mul2) + end function + + subroutine assign(this, rhs) +class(t2), intent(out) :: this +class(t2), intent(in) :: rhs +select type(rhs) +type is(t2) + this%b = rhs%b +class default + error stop +end select + end subroutine + +end module + + +program minimal + use types_mod + implicit none + + class(t1), allocatable :: v4 + class(t2), allocatable :: v6 + + allocate(v4, source=t1(4)) + allocate(v6) + v6 = 3 * v4%get_t2() + + select type (v6) +type is (t2) + if (v6%b /= 12) error stop +class default + error stop + end select + deallocate(v4, v6) +end +
Re: [PATCH PR78114]Refine gfortran.dg/vect/fast-math-mgrid-resid.f
On Fri, Nov 18, 2016 at 4:52 PM, Michael Matzwrote: > Hi, > > On Thu, 17 Nov 2016, Bin.Cheng wrote: > >> B) Depending on ilp, I think below test strings fail for long time with >> haswell: >> ! { dg-final { scan-tree-dump-times "Executing predictive commoning >> without unrolling" 1 "pcom" { target lp64 } } } >> ! { dg-final { scan-tree-dump-times "Executing predictive commoning >> without unrolling" 2 "pcom" { target ia32 } } } >> Because vectorizer choose vf==4 in this case, and there is no >> predictive commoning opportunities at all. >> Also the newly added test string fails in this case too because the >> prolog peeled iterates more than 1 times. > > Btw, this probably means that on haswell (or other archs with vf==4) mgrid > is slower than necessary. On mgrid you really really want predictive > commoning to happen. Vectorization isn't that interesting there. Interesting, I will check if there is difference between 2/4 vf. we do have cases that smaller vf is better and should be chosen, though for different reasons. Thanks, bin > > > Ciao, > Michael.
Re: [PATCH fix PR71767 2/4 : Darwin configury] Arrange for ld64 to be detected as Darwin's linker
On Nov 18, 2016, at 3:13 AM, Iain Sandoewrote: > > Thanks, at least I’m not going completely crazy ;-) I'll just note for completeness that Jeff also couldn't explain a failure of your latest patch. If you run into one, let me know. > OK now for trunk? Ok. > open branches? Ok.
[PATCH GCC]Simplify (cond (cmp (convert? x) c1) (op x c2) c3) -> (op (minmax x c1) c2)
Hi, This is a rework of https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02007.html. Though review comments suggested it could be merged with last kind simplification of fold_cond_expr_with_comparison, it's not really applicable. As a matter of fact, the suggestion stands for patch @https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02005.html. I had previous patch (https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01898.html) moving fold_cond_expr_with_comparison to match.pd pattern and incorporated that patch into it. For this one, the rework is trivial, just renames several variable tags as suggested. Bootstrap and test on x86_64 and AArch64, is it OK? Thanks, bin 2016-11-17 Bin Cheng* match.pd: Add new pattern: (cond (cmp (convert? x) c1) (op x c2) c3) -> (op (minmax x c1) c2). gcc/testsuite/ChangeLog 2016-11-17 Bin Cheng * gcc.dg/fold-bopcond-1.c: New test. * gcc.dg/fold-bopcond-2.c: New test.diff --git a/gcc/match.pd b/gcc/match.pd index a8d94de..e47f77a 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -2012,6 +2012,107 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (code == EQ_EXPR) (cond (cmp @1 (convert:from_type @3)) (convert:from_type @3) @2))) +/* (cond (cmp (convert? x) c1) (op x c2) c3) -> (op (minmax x c1) c2) if: + + 1) OP is PLUS or MINUS. + 2) CMP is LT, LE, GT or GE. + 3) C3 == (C1 op C2), and computation doesn't have undefined behavior. + + This pattern also handles special cases like: + + A) Operand x is a unsigned to signed type conversion and c1 is + integer zero. In this case, + (signed type)x < 0 <=> x > MAX_VAL(signed type) + (signed type)x >= 0 <=> x <= MAX_VAL(signed type) + B) Const c1 may not equal to (C3 op' C2). In this case we also + check equality for (c1+1) and (c1-1) by adjusting comparison + code. + + TODO: Though signed type is handled by this pattern, it cannot be + simplified at the moment because C standard requires additional + type promotion. In order to match it here, the IR needs + to be cleaned up by other optimizers, i.e, VRP. */ +(for op (plus minus) + (for cmp (lt le gt ge) + (simplify + (cond (cmp (convert? @X) INTEGER_CST@1) (op @X INTEGER_CST@2) INTEGER_CST@3) + (with { tree from_type = TREE_TYPE (@X), to_type = TREE_TYPE (@1); } +(if (types_match (from_type, to_type) +/* Check if it is special case A). */ +|| (TYPE_UNSIGNED (from_type) +&& !TYPE_UNSIGNED (to_type) +&& TYPE_PRECISION (from_type) == TYPE_PRECISION (to_type) +&& integer_zerop (@1) +&& (cmp == LT_EXPR || cmp == GE_EXPR))) + (with + { + bool overflow = false; + enum tree_code code, cmp_code = cmp; + tree real_c1, c1 = @1, c2 = @2, c3 = @3; + tree op_type = TREE_TYPE (@X); + signop sgn = TYPE_SIGN (op_type); + widest_int wmin = widest_int::from (wi::min_value (op_type), sgn); + widest_int wmax = widest_int::from (wi::max_value (op_type), sgn); + + /* Handle special case A), given x of unsigned type: + ((signed type)x < 0) <=> (x > MAX_VAL(signed type)) + ((signed type)x >= 0) <=> (x <= MAX_VAL(signed type)) */ + if (!types_match (from_type, to_type)) + { + if (cmp_code == LT_EXPR) + cmp_code = GT_EXPR; + if (cmp_code == GE_EXPR) + cmp_code = LE_EXPR; + c1 = wide_int_to_tree (op_type, wi::max_value (to_type)); + } + /* To simplify this pattern, we require c3 = (c1 op c2). Here we + compute (c3 op' c2) and check if it equals to c1 with op' being + the inverted operator of op. Make sure overflow doesn't happen + if it is undefined. */ + if (op == PLUS_EXPR) + real_c1 = wide_int_to_tree (op_type, + wi::sub (c3, c2, sgn, )); + else + real_c1 = wide_int_to_tree (op_type, + wi::add (c3, c2, sgn, )); + code = cmp_code; + if (!overflow || !TYPE_OVERFLOW_UNDEFINED (op_type)) + { + /* Check if c1 equals to real_c1. Boundary condition is handled + by adjusting comparison operation if necessary. */ + if (wi::to_widest (c1) == (wi::to_widest (real_c1) - 1)) + { + /* X <= Y - 1 equals to X < Y. */ + if (cmp_code == LE_EXPR) + code = LT_EXPR; + /* X > Y - 1 equals to X >= Y. */ + if (cmp_code == GT_EXPR) + code = GE_EXPR; + } + if (wi::to_widest (c1) == (wi::to_widest (real_c1) + 1)) + { + /* X < Y + 1 equals to X <= Y. */ + if (cmp_code == LT_EXPR) + code = LE_EXPR; + /* X >= Y + 1 equals to X > Y. */ + if (cmp_code ==
[PATCH GCC]Move simplification of (A == C1) ? A : C2 to match.pd
Hi, This is a follow up patch for https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01898.html It moves remaining simplification for (A == C1) ? A : C2 in fold_cond_expr_with_comparison to match.pd. Bootstrap and test on x86_64 and AArch64, is it OK? Thanks, bin 2016-11-17 Bin Cheng* fold-const.c (fold_cond_expr_with_comparison): Move simplification for A == C1 ? A : C2 to below. * match.pd: Move from above to here: (cond (eq (convert1? x) c1) (convert2? x) c2) -> (cond (eq x c1) c1 c2).diff --git a/gcc/fold-const.c b/gcc/fold-const.c index 1e61ccf..1877dac 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -5210,19 +5210,6 @@ fold_cond_expr_with_comparison (location_t loc, tree type, } } - /* If this is A == C1 ? A : C2 with C1 and C2 constant integers, - we simplify it into A == C1 ? C1 : C2. */ - - if (comp_code == EQ_EXPR - && INTEGRAL_TYPE_P (type) - && TREE_CODE (arg01) == INTEGER_CST - && TREE_CODE (arg1) != INTEGER_CST - && TREE_CODE (arg2) == INTEGER_CST) -{ - arg1 = fold_convert_loc (loc, type, arg01); - return fold_build3_loc (loc, COND_EXPR, type, arg0, arg1, arg2); -} - return NULL_TREE; } diff --git a/gcc/match.pd b/gcc/match.pd index 4beac4e..a8d94de 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -1939,15 +1939,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* Simplification moved from fold_cond_expr_with_comparison. It may also be extended. */ -/* (cond (cmp (convert1? x) c1) (convert2? x) c2) -> (minmax (x c)) if: +/* This pattern implements two kinds simplification: + + Case 1) + (cond (cmp (convert1? x) c1) (convert2? x) c2) -> (minmax (x c)) if: 1) Conversions are type widening from smaller type. 2) Const c1 equals to c2 after canonicalizing comparison. 3) Comparison has tree code LT, LE, GT or GE. This specific pattern is needed when (cmp (convert x) c) may not be simplified by comparison patterns because of multiple uses of x. It also makes sense here because simplifying across multiple - referred var is always benefitial for complicated cases. */ -(for cmp (lt le gt ge) + referred var is always benefitial for complicated cases. + + Case 2) + (cond (eq (convert1? x) c1) (convert2? x) c2) -> (cond (eq x c1) c1 c2). */ +(for cmp (lt le gt ge eq) (simplify (cond (cmp@0 (convert1? @1) INTEGER_CST@3) (convert2? @1) INTEGER_CST@2) (with @@ -1966,37 +1972,45 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && (TYPE_UNSIGNED (from_type) || TYPE_SIGN (c2_type) == TYPE_SIGN (from_type) { -if (wi::to_widest (@3) == (wi::to_widest (@2) - 1)) - { -/* X <= Y - 1 equals to X < Y. */ -if (cmp_code == LE_EXPR) - code = LT_EXPR; -/* X > Y - 1 equals to X >= Y. */ -if (cmp_code == GT_EXPR) - code = GE_EXPR; - } -if (wi::to_widest (@3) == (wi::to_widest (@2) + 1)) - { -/* X < Y + 1 equals to X <= Y. */ -if (cmp_code == LT_EXPR) - code = LE_EXPR; -/* X >= Y + 1 equals to X > Y. */ -if (cmp_code == GE_EXPR) - code = GT_EXPR; - } -if (code != cmp_code || wi::to_widest (@2) == wi::to_widest (@3)) +if (code != EQ_EXPR) { -if (cmp_code == LT_EXPR || cmp_code == LE_EXPR) - code = MIN_EXPR; -if (cmp_code == GT_EXPR || cmp_code == GE_EXPR) - code = MAX_EXPR; +if (wi::to_widest (@3) == (wi::to_widest (@2) - 1)) + { +/* X <= Y - 1 equals to X < Y. */ +if (cmp_code == LE_EXPR) + code = LT_EXPR; +/* X > Y - 1 equals to X >= Y. */ +if (cmp_code == GT_EXPR) + code = GE_EXPR; + } +if (wi::to_widest (@3) == (wi::to_widest (@2) + 1)) + { +/* X < Y + 1 equals to X <= Y. */ +if (cmp_code == LT_EXPR) + code = LE_EXPR; +/* X >= Y + 1 equals to X > Y. */ +if (cmp_code == GE_EXPR) + code = GT_EXPR; + } +if (code != cmp_code || wi::to_widest (@2) == wi::to_widest (@3)) + { +if (cmp_code == LT_EXPR || cmp_code == LE_EXPR) + code = MIN_EXPR; +if (cmp_code == GT_EXPR || cmp_code == GE_EXPR) + code = MAX_EXPR; + } } +/* Can do A == C1 ? A : C2 -> A == C1 ? C1 : C2? */ +else if (!int_fits_type_p (@3, from_type)) + code = ERROR_MARK; } } (if (code == MAX_EXPR) (convert (max @1 (convert:from_type @2))) (if (code == MIN_EXPR) - (convert (min @1 (convert:from_type @2 +
Re: [PATCH PR78114]Refine gfortran.dg/vect/fast-math-mgrid-resid.f
Hi, On Thu, 17 Nov 2016, Bin.Cheng wrote: > B) Depending on ilp, I think below test strings fail for long time with > haswell: > ! { dg-final { scan-tree-dump-times "Executing predictive commoning > without unrolling" 1 "pcom" { target lp64 } } } > ! { dg-final { scan-tree-dump-times "Executing predictive commoning > without unrolling" 2 "pcom" { target ia32 } } } > Because vectorizer choose vf==4 in this case, and there is no > predictive commoning opportunities at all. > Also the newly added test string fails in this case too because the > prolog peeled iterates more than 1 times. Btw, this probably means that on haswell (or other archs with vf==4) mgrid is slower than necessary. On mgrid you really really want predictive commoning to happen. Vectorization isn't that interesting there. Ciao, Michael.
Re: [PATCH] substring_loc info needs default track-macro-expansion (PR preprocessor/78324)
Martin: are the changes to your test cases OK by you, or is there a better way to rewrite them? Thanks for looking into it! Since the purpose of the test_sprintf_note function in the test is to verify the location of the caret within the warnings I think we should keep it if it's possible. Would either removing the P macro or moving the function to a different file that doesn't use the -ftrack-macro-expansion=0 option work? Martin diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c index 5779a95..b6a6011 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c @@ -181,13 +181,13 @@ void test_sprintf_note (void) /* Diagnostic column numbers are 1-based. */ P (buffer (0),/* { dg-message "format output 4 bytes into a destination of size 0" } */ - "%c%s%i", '1', "2", 3);/* { dg-warning "7:.%c. directive writing 1 byte into a region of size 0" } */ + "%c%s%i", '1', "2", 3);/* { dg-warning ".%c. directive writing 1 byte into a region of size 0" } */ P (buffer (1),/* { dg-message "format output 6 bytes into a destination of size 1" } */ - "%c%s%i", '1', "23", 45); /* { dg-warning "9:.%s. directive writing 2 bytes into a region of size 0" } */ + "%c%s%i", '1', "23", 45); /* { dg-warning ".%s. directive writing 2 bytes into a region of size 0" } */ P (buffer (2),/* { dg-message "format output 6 bytes into a destination of size 2" } */ - "%c%s%i", '1', "2", 345); /* { dg-warning "11:.%i. directive writing 3 bytes into a region of size 0" } */ + "%c%s%i", '1', "2", 345); /* { dg-warning ".%i. directive writing 3 bytes into a region of size 0" } */ /* It would be nice if the caret in the location range for the format string below could be made to point at the closing quote of the format diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c index faa5806..b587d00 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-Wformat -Wformat-length=1 -fdiagnostics-show-caret -ftrack-macro-expansion=0" } */ +/* { dg-options "-Wformat -Wformat-length=1 -fdiagnostics-show-caret" } */ extern int sprintf (char*, const char*, ...);
Re: [PATCH] avoid calling alloca(0)
On 11/18/16, Jakub Jelinekwrote: > On Fri, Nov 18, 2016 at 09:21:38AM -0700, Martin Sebor wrote: >> >In the libiberty/alloca.c, I don't see anything special about alloca >> > (0). >> >Yes, alloca (0) reclaims garbage, but so does alloca (1) and alloca >> > (4035). >> >alloca (0) just returns NULL after it. The diffutils link is the same >> >alloca.c as in libiberty. Returning NULL or returning a non-unique >> > pointer >> >for alloca (0) is just fine, it is the same thing as returning NULL or >> >returning a non-unique pointer from malloc (0). We definitely don't >> > want >> >to warn for malloc (0), and IMNSHO don't want to warn for alloca (0), >> >because it behaves the same. >> >> I disagree. At a minimum, the return value of malloc(0) is >> implementation-defined and so relying on it being either null >> or non-null is non-portable and can be a source of subtle bugs. > > Most apps know what malloc (0) means and treat it correctly, they know they > shouldn't dereference the pointer, because it is either NULL or holds an > array with 0 elements. I fail to see why you would want to warn. > Just as a reference point, my old version of the clang static analyzer warns for malloc(0); but not alloca(0); though. For example: $ cat alloc_0.c #include #include void fn(void) { char *ptr0 = (char *)malloc(0); void *ptr1 = alloca(0); *ptr0 = *(char *)ptr1; } $ clang -Wall -Wextra -pedantic --analyze -c alloc_0.c alloc_0.c:6:23: warning: Call to 'malloc' has an allocation size of 0 bytes char *ptr0 = (char *)malloc(0); ^ ~ 1 warning generated. Doing some more Googling on the topic finds debates as to whether this warning is warranted or not, but it seems like people are pretty used to dealing with it from clang already, so they probably wouldn't mind too much if gcc started being consistent with it.
Re: [PATCH, GCC/ARM, ping] Optional -mthumb for Thumb only targets
On 11/11/16 14:35, Kyrill Tkachov wrote: On 08/11/16 13:36, Thomas Preudhomme wrote: Ping? Best regards, Thomas On 25/10/16 18:07, Thomas Preudhomme wrote: Hi, Currently when a user compiles for a thumb-only target (such as Cortex-M processors) without specifying the -mthumb option GCC throws the error "target CPU does not support ARM mode". This is suboptimal from a usability point of view: the -mthumb could be deduced from the -march or -mcpu option when there is no ambiguity. This patch implements this behavior by extending the DRIVER_SELF_SPECS to automatically append -mthumb to the command line for thumb-only targets. It does so by checking the last -march option if any is given or the last -mcpu option otherwise. There is no ordering issue because conflicting -mcpu and -march is already handled. Note that the logic cannot be implemented in function arm_option_override because we need to provide the modified command line to the GCC driver for finding the right multilib path and the function arm_option_override is executed too late for that effect. ChangeLog entries are as follow: *** gcc/ChangeLog *** 2016-10-18 Terry GuoThomas Preud'homme PR target/64802 * common/config/arm/arm-common.c (arm_target_thumb_only): New function. * config/arm/arm-opts.h: Include arm-flags.h. (struct arm_arch_core_flag): Define. (arm_arch_core_flags): Define. * config/arm/arm-protos.h: Include arm-flags.h. (FL_NONE, FL_ANY, FL_CO_PROC, FL_ARCH3M, FL_MODE26, FL_MODE32, FL_ARCH4, FL_ARCH5, FL_THUMB, FL_LDSCHED, FL_STRONG, FL_ARCH5E, FL_XSCALE, FL_ARCH6, FL_VFPV2, FL_WBUF, FL_ARCH6K, FL_THUMB2, FL_NOTM, FL_THUMB_DIV, FL_VFPV3, FL_NEON, FL_ARCH7EM, FL_ARCH7, FL_ARM_DIV, FL_ARCH8, FL_CRC32, FL_SMALLMUL, FL_NO_VOLATILE_CE, FL_IWMMXT, FL_IWMMXT2, FL_ARCH6KZ, FL2_ARCH8_1, FL2_ARCH8_2, FL2_FP16INST, FL_TUNE, FL_FOR_ARCH2, FL_FOR_ARCH3, FL_FOR_ARCH3M, FL_FOR_ARCH4, FL_FOR_ARCH4T, FL_FOR_ARCH5, FL_FOR_ARCH5T, FL_FOR_ARCH5E, FL_FOR_ARCH5TE, FL_FOR_ARCH5TEJ, FL_FOR_ARCH6, FL_FOR_ARCH6J, FL_FOR_ARCH6K, FL_FOR_ARCH6Z, FL_FOR_ARCH6ZK, FL_FOR_ARCH6KZ, FL_FOR_ARCH6T2, FL_FOR_ARCH6M, FL_FOR_ARCH7, FL_FOR_ARCH7A, FL_FOR_ARCH7VE, FL_FOR_ARCH7R, FL_FOR_ARCH7M, FL_FOR_ARCH7EM, FL_FOR_ARCH8A, FL2_FOR_ARCH8_1A, FL2_FOR_ARCH8_2A, FL_FOR_ARCH8M_BASE, FL_FOR_ARCH8M_MAIN, arm_feature_set, ARM_FSET_MAKE, ARM_FSET_MAKE_CPU1, ARM_FSET_MAKE_CPU2, ARM_FSET_CPU1, ARM_FSET_CPU2, ARM_FSET_EMPTY, ARM_FSET_ANY, ARM_FSET_HAS_CPU1, ARM_FSET_HAS_CPU2, ARM_FSET_HAS_CPU, ARM_FSET_ADD_CPU1, ARM_FSET_ADD_CPU2, ARM_FSET_DEL_CPU1, ARM_FSET_DEL_CPU2, ARM_FSET_UNION, ARM_FSET_INTER, ARM_FSET_XOR, ARM_FSET_EXCLUDE, ARM_FSET_IS_EMPTY, ARM_FSET_CPU_SUBSET): Move to ... * config/arm/arm-flags.h: This new file. * config/arm/arm.h (TARGET_MODE_SPEC_FUNCTIONS): Define. (EXTRA_SPEC_FUNCTIONS): Add TARGET_MODE_SPEC_FUNCTIONS to its value. (TARGET_MODE_SPECS): Define. (DRIVER_SELF_SPECS): Add TARGET_MODE_SPECS to its value. *** gcc/testsuite/ChangeLog *** 2016-10-11 Thomas Preud'homme PR target/64802 * gcc.target/arm/optional_thumb-1.c: New test. * gcc.target/arm/optional_thumb-2.c: New test. * gcc.target/arm/optional_thumb-3.c: New test. No regression when running the testsuite for -mcpu=cortex-m0 -mthumb, -mcpu=cortex-m0 -marm and -mcpu=cortex-a8 -marm Is this ok for trunk? This looks like a useful usability improvement. This is ok after a bootstrap on an arm-none-linux-gnueabihf target. Sorry for the delay, Kyrill I've rebased the patch on top of the arm_feature_set type consistency fix [1] and committed it. The committed patch is in attachment for reference. [1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01680.html Best regards, Thomas diff --git a/gcc/common/config/arm/arm-common.c b/gcc/common/config/arm/arm-common.c index f3b674339a50460d55920ca8d26275a550bbbc1e..473417a2e5f04488197c27ead2b65680bddec274 100644 --- a/gcc/common/config/arm/arm-common.c +++ b/gcc/common/config/arm/arm-common.c @@ -98,6 +98,29 @@ arm_rewrite_mcpu (int argc, const char **argv) return arm_rewrite_selected_cpu (argv[argc - 1]); } +/* Called by the driver to check whether the target denoted by current + command line options is a Thumb-only target. ARGV is an array of + -march and -mcpu values (ie. it contains the rhs after the equal + sign) and we use the last one of them to make a decision. The + number of elements in ARGV is given in ARGC. */ +const char * +arm_target_thumb_only (int argc, const char **argv) +{ + unsigned int opt; + + if (argc) +{ + for (opt = 0; opt < (ARRAY_SIZE (arm_arch_core_flags)); opt++) + if ((strcmp (argv[argc - 1],
Re: [PATCH] Fix PR78413
> On Nov 18, 2016, at 10:33 AM, Markus Trippelsdorf> wrote: > > On 2016.11.18 at 10:27 -0600, Bill Schmidt wrote: >> === >> --- gcc/testsuite/gcc.dg/tree-ssa/pr78413.c (revision 0) >> +++ gcc/testsuite/gcc.dg/tree-ssa/pr78413.c (working copy) >> @@ -0,0 +1,35 @@ >> +/* PR78413. These previously failed in tree if-conversion due to a loop >> + latch with multiple predecessors that the code did not anticipate. */ >> +/* { dg-do compile } */ >> +/* { dg-options "-O3 -ffast-math" } */ > > Please add -fno-strict-aliasing, otherwise fn1() wouldn't ICE. Whoops. Yes indeed, sorry I missed the flag difference for the second failure. Bill
[COMMITTED] Add myself to MAINTAINERS (Write After Approval).
Committed as r242595. Thanks, Toma Tabacu Index: ChangeLog === --- ChangeLog (revision 242594) +++ ChangeLog (working copy) @@ -1,3 +1,7 @@ +2016-11-18 Toma Tabacu+ + * MAINTAINERS (Write After Approval): Add myself. + 2016-11-15 Matthias Klose * Makefile.def: Remove references to GCJ. Index: MAINTAINERS === --- MAINTAINERS (revision 242594) +++ MAINTAINERS (working copy) @@ -587,6 +587,7 @@ Robert Suchanek Andrew Sutton Gabriele Svelto +Toma Tabacu Sriraman Tallam Chung-Lin Tang Samuel Tardieu
Re: [PATCH] Fix PR78413
On 2016.11.18 at 10:27 -0600, Bill Schmidt wrote: > === > --- gcc/testsuite/gcc.dg/tree-ssa/pr78413.c (revision 0) > +++ gcc/testsuite/gcc.dg/tree-ssa/pr78413.c (working copy) > @@ -0,0 +1,35 @@ > +/* PR78413. These previously failed in tree if-conversion due to a loop > + latch with multiple predecessors that the code did not anticipate. */ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -ffast-math" } */ Please add -fno-strict-aliasing, otherwise fn1() wouldn't ICE. > +extern long long int llrint(double x); > +int a; > +double b; > +__attribute__((cold)) void decode_init() { > + int c, d = 0; > + for (; d < 12; d++) { > +if (d) > + b = 0; > +c = 0; > +for (; c < 6; c++) > + a = b ? llrint(b) : 0; > + } > +} > + > +struct S { > + _Bool bo; > +}; > +int a, bb, c, d; > +void fn1() { > + do > +do > + do { > + struct S *e = (struct S *)1; > + do > + bb = a / (e->bo ? 2 : 1); > + while (bb); > + } while (0); > +while (d); > + while (c); > +} -- Markus
Re: [PATCH] avoid calling alloca(0)
On Fri, Nov 18, 2016 at 09:21:38AM -0700, Martin Sebor wrote: > >In the libiberty/alloca.c, I don't see anything special about alloca (0). > >Yes, alloca (0) reclaims garbage, but so does alloca (1) and alloca (4035). > >alloca (0) just returns NULL after it. The diffutils link is the same > >alloca.c as in libiberty. Returning NULL or returning a non-unique pointer > >for alloca (0) is just fine, it is the same thing as returning NULL or > >returning a non-unique pointer from malloc (0). We definitely don't want > >to warn for malloc (0), and IMNSHO don't want to warn for alloca (0), > >because it behaves the same. > > I disagree. At a minimum, the return value of malloc(0) is > implementation-defined and so relying on it being either null > or non-null is non-portable and can be a source of subtle bugs. Most apps know what malloc (0) means and treat it correctly, they know they shouldn't dereference the pointer, because it is either NULL or holds an array with 0 elements. I fail to see why you would want to warn. > But malloc(0) has also been known to result from unsigned overflow > which has led to vulnerabilities and exploits (famously by the JPG > COM vulnerability exploit, but there are plenty of others, including > for instance CVE-2016-2851). Much academic research has been devoted > to this problem and to solutions to detect and prevent it. How is it different from overflowing to 1 or 2 or 27? What is special on the value 0? > In the absence of a policy or guidelines it's a matter of opinion > whether or not this warning belongs in either -Wall or -Wextra. It IMHO doesn't belong to either of these. Jakub
[PATCH] Fix PR78413
Hi, The if-conversion patch for PR77848 missed a case where an outer loop should not be versioned for vectorization; this was caught by an assert in tests recorded in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78413. This patch fixes the problem by ensuring that both inner and outer loop latches have a single predecessor before versioning an outer loop. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Is this ok for trunk? Thanks, Bill [gcc] 2016-11-18 Bill SchmidtPR tree-optimization/78413 * tree-if-conv.c (versionable_outer_loop_p): Require that both inner and outer loop latches have single predecessors. [gcc/testsuite] 2016-11-18 Bill Schmidt PR tree-optimization/78413 * gcc.dg/tree-ssa/pr78413.c: New test. Index: gcc/testsuite/gcc.dg/tree-ssa/pr78413.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr78413.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr78413.c (working copy) @@ -0,0 +1,35 @@ +/* PR78413. These previously failed in tree if-conversion due to a loop + latch with multiple predecessors that the code did not anticipate. */ +/* { dg-do compile } */ +/* { dg-options "-O3 -ffast-math" } */ + +extern long long int llrint(double x); +int a; +double b; +__attribute__((cold)) void decode_init() { + int c, d = 0; + for (; d < 12; d++) { +if (d) + b = 0; +c = 0; +for (; c < 6; c++) + a = b ? llrint(b) : 0; + } +} + +struct S { + _Bool bo; +}; +int a, bb, c, d; +void fn1() { + do +do + do { + struct S *e = (struct S *)1; + do + bb = a / (e->bo ? 2 : 1); + while (bb); + } while (0); +while (d); + while (c); +} Index: gcc/tree-if-conv.c === --- gcc/tree-if-conv.c (revision 242551) +++ gcc/tree-if-conv.c (working copy) @@ -2575,6 +2575,8 @@ version_loop_for_if_conversion (struct loop *loop) - The loop has a single exit. - The loop header has a single successor, which is the inner loop header. +- Each of the inner and outer loop latches have a single + predecessor. - The loop exit block has a single predecessor, which is the inner loop's exit block. */ @@ -2586,7 +2588,9 @@ versionable_outer_loop_p (struct loop *loop) || loop->inner->next || !single_exit (loop) || !single_succ_p (loop->header) - || single_succ (loop->header) != loop->inner->header) + || single_succ (loop->header) != loop->inner->header + || !single_pred_p (loop->latch) + || !single_pred_p (loop->inner->latch)) return false; basic_block outer_exit = single_pred (loop->latch);
Re: [PATCH] avoid calling alloca(0)
On 11/18/2016 12:58 AM, Jakub Jelinek wrote: On Thu, Nov 17, 2016 at 05:32:31PM -0700, Martin Sebor wrote: The point is warning on an alloca (0) may not be as clear cut as it might seem. It's probably a reasonable thing to do on the host, but on a target, which might be embedded and explicitly overriding the builtin alloca, warning on alloca (0) is less of a slam dunk. I confess I haven't heard of such an implementation before but after some searching I have managed to find a few examples of it online, such as in QNX or in the BlackBerry SDK, or in the GCC shipped by Texas Instruments. For instance: In the libiberty/alloca.c, I don't see anything special about alloca (0). Yes, alloca (0) reclaims garbage, but so does alloca (1) and alloca (4035). alloca (0) just returns NULL after it. The diffutils link is the same alloca.c as in libiberty. Returning NULL or returning a non-unique pointer for alloca (0) is just fine, it is the same thing as returning NULL or returning a non-unique pointer from malloc (0). We definitely don't want to warn for malloc (0), and IMNSHO don't want to warn for alloca (0), because it behaves the same. I disagree. At a minimum, the return value of malloc(0) is implementation-defined and so relying on it being either null or non-null is non-portable and can be a source of subtle bugs. But malloc(0) has also been known to result from unsigned overflow which has led to vulnerabilities and exploits (famously by the JPG COM vulnerability exploit, but there are plenty of others, including for instance CVE-2016-2851). Much academic research has been devoted to this problem and to solutions to detect and prevent it. The following paper has some good background and references: https://www.usenix.org/legacy/event/woot10/tech/full_papers/Vanegue.pdf Coding standards rules have been developed requiring conforming software to avoid allocating zero bytes for these reasons. TS 1796, the C Secure Coding Rules technical specification, has such a requirement. It was derived from the CERT C Secure Coding rule MEM04-C. Beware of zero-length allocations: https://www.securecoding.cert.org/confluence/x/GQI The same argument applies to alloca(0) with the added caveat that, unlike with the other allocation functions, a non-null return value need not be distinct. In the absence of a policy or guidelines it's a matter of opinion whether or not this warning belongs in either -Wall or -Wextra. Based on the severity of the problems that allocating zero size has been linked to I think it could be successfully argued that it should be in -Wall (the problems are obviously more serious than those that have ever been associated with the -Wunused-type warnings, for example). I put it in -Wextra only because I was trying to be sensitive to the "no false positive" argument. All this said, before debating the fine details of under which conditions each of the new warninsg should be enabled, I was hoping to get comments on the meat of the patch that implements the warnings. Martin
libgo patch committed: Don't call __go_alloc/__go_free from environment routines
This patch to libgo changes the environment support routines to not call __go_alloc/__go_free, but to instead use malloc/free. This code just needs temporary buffers, it is natural to write it in C because it calls setenv/unsetenv, and C code can safely call malloc/free but with the future GC can not safely allocate memory on the Go heap. Patch bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 242592) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -2ab785788691ad289f838a0b3a6bc9013d0fc337 +fc4ca600b2fc6de81fd3c4014542d6a50593db1a The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/runtime/go-setenv.c === --- libgo/runtime/go-setenv.c (revision 242581) +++ libgo/runtime/go-setenv.c (working copy) @@ -9,10 +9,7 @@ #include #include -#include "go-alloc.h" #include "runtime.h" -#include "arch.h" -#include "malloc.h" /* Set the C environment from Go. This is called by syscall.Setenv. */ @@ -25,7 +22,6 @@ setenv_c (String k, String v) unsigned char *kn; const byte *vs; unsigned char *vn; - intgo len; ks = k.str; if (ks == NULL) @@ -39,25 +35,23 @@ setenv_c (String k, String v) #ifdef HAVE_SETENV - if (ks != NULL && ks[k.len] != 0) + if (ks[k.len] != 0) { - // Objects that are explicitly freed must be at least 16 bytes in size, - // so that they are not allocated using tiny alloc. - len = k.len + 1; - if (len < TinySize) - len = TinySize; - kn = __go_alloc (len); + kn = malloc (k.len + 1); + if (kn == NULL) + runtime_throw ("out of malloc memory"); __builtin_memcpy (kn, ks, k.len); + kn[k.len] = '\0'; ks = kn; } - if (vs != NULL && vs[v.len] != 0) + if (vs[v.len] != 0) { - len = v.len + 1; - if (len < TinySize) - len = TinySize; - vn = __go_alloc (len); + vn = malloc (v.len + 1); + if (vn == NULL) + runtime_throw ("out of malloc memory"); __builtin_memcpy (vn, vs, v.len); + vn[v.len] = '\0'; vs = vn; } @@ -66,19 +60,20 @@ setenv_c (String k, String v) #else /* !defined(HAVE_SETENV) */ len = k.len + v.len + 2; - if (len < TinySize) -len = TinySize; - kn = __go_alloc (len); + kn = malloc (len); + if (kn == NULL) +runtime_throw ("out of malloc memory"); __builtin_memcpy (kn, ks, k.len); kn[k.len] = '='; __builtin_memcpy (kn + k.len + 1, vs, v.len); kn[k.len + v.len + 1] = '\0'; putenv ((char *) kn); + kn = NULL; /* putenv takes ownership of the string. */ #endif /* !defined(HAVE_SETENV) */ if (kn != NULL) -__go_free (kn); +free (kn); if (vn != NULL) -__go_free (vn); +free (vn); } Index: libgo/runtime/go-unsetenv.c === --- libgo/runtime/go-unsetenv.c (revision 242581) +++ libgo/runtime/go-unsetenv.c (working copy) @@ -9,10 +9,7 @@ #include #include -#include "go-alloc.h" #include "runtime.h" -#include "arch.h" -#include "malloc.h" /* Unset an environment variable from Go. This is called by syscall.Unsetenv. */ @@ -24,7 +21,6 @@ unsetenv_c (String k) { const byte *ks; unsigned char *kn; - intgo len; ks = k.str; if (ks == NULL) @@ -33,14 +29,11 @@ unsetenv_c (String k) #ifdef HAVE_UNSETENV - if (ks != NULL && ks[k.len] != 0) + if (ks[k.len] != 0) { - // Objects that are explicitly freed must be at least 16 bytes in size, - // so that they are not allocated using tiny alloc. - len = k.len + 1; - if (len < TinySize) - len = TinySize; - kn = __go_alloc (len); + kn = malloc (k.len + 1); + if (kn == NULL) + runtime_throw ("out of malloc memory"); __builtin_memcpy (kn, ks, k.len); ks = kn; } @@ -50,5 +43,5 @@ unsetenv_c (String k) #endif /* !defined(HAVE_UNSETENV) */ if (kn != NULL) -__go_free (kn); +free (kn); }
Re: PING [PATCH] enable -fprintf-return-value by default
On 11/17/2016 10:34 PM, Sandra Loosemore wrote: On 11/16/2016 09:49 AM, Martin Sebor wrote: I'm looking for an approval of the attached patch. I've adjusted the documentation based on Sandra's input (i.e., documented the negative of the option rather than the positive; thank you for the review, btw.) On 11/08/2016 08:13 PM, Martin Sebor wrote: The -fprintf-return-value optimization has been disabled since the last time it caused a bootstrap failure on powerpc64le. With the underlying problems fixed GCC has bootstrapped fine on all of powerpc64, powerpc64le and x86_64 and tested with no regressions. I'd like to re-enable the option. The attached patch does that. [snip] Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi(revision 242500) +++ gcc/doc/invoke.texi(working copy) @@ -384,7 +384,7 @@ Objective-C and Objective-C++ Dialects}. -fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss @gol -fomit-frame-pointer -foptimize-sibling-calls @gol -fpartial-inlining -fpeel-loops -fpredictive-commoning @gol --fprefetch-loop-arrays -fprintf-return-value @gol +-fprefetch-loop-arrays -fno-printf-return-value @gol -fprofile-correction @gol -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol -fprofile-reorder-functions @gol Please keep this list alphabetized -- the other "-fno-*" options are sorted as such. The documentation parts of the patch are OK with that fixed, but I can't approve changing the default for the option. I kept the option in the same place on the assumption that it was the "printf" radix of the name, not the "no-" prefix, that these were alphabetized by. But now that you point it out and I've looked more carefully at some of the other options, I see that in some sections they are listed strictly alphabetically (-fno-foo after -fmoo) while in others it's the way you suggest. AFAICS, the former style looks prevalent in the C++ Language Options and the in Warning Options, for example. The latter seems to be more popular in the Optimization Options section (though there are counterexamples). I'm happy to follow either of these conventions as long as its consistent. Otherwise, without both kinds of examples to choose from, I don't think I can trust my brain to remember yet another rule. Martin
Re: Fix PR77881: combine improvement
Hi, On Fri, 18 Nov 2016, Bin.Cheng wrote: > On Wed, Nov 16, 2016 at 3:05 PM, Andreas Schwabwrote: > > On Nov 14 2016, Michael Matz wrote: > > > >> PR missed-optimization/77881 > >> * combine.c (simplify_comparison): Remove useless subregs > >> also inside the loop, not just after it. > >> (make_compound_operation): Recognize some subregs as being > >> masking as well. > > > > This breaks gcc.c-torture/execute/cbrt.c execution test on aarch64. > Hi, > I can confirm that, also new PR opened for tracking. > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78422 See PR78390 for a patch (comment #8) fixing the aarch64 problem. Ciao, Michael.
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On 18 November 2016 at 16:46, Yuri Rumyantsevwrote: > It is very strange that this test failed on arm, since it requires > target avx2 to check vectorizer dumps: > > /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { > target avx2_runtime } } } */ > /* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED > \\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */ > > Could you please clarify what is the reason of the failure? It's not the scan-dumps that fail, but the execution. The test calls abort() for some reason. It will take me a while to rebuild the test manually in the right debug environment to provide you with more traces. > > Thanks. > > 2016-11-18 16:20 GMT+03:00 Christophe Lyon : >> On 15 November 2016 at 15:41, Yuri Rumyantsev wrote: >>> Hi All, >>> >>> Here is patch for non-masked epilogue vectoriziation. >>> >>> Bootstrap and regression testing did not show any new failures. >>> >>> Is it OK for trunk? >>> >>> Thanks. >>> Changelog: >>> >>> 2016-11-15 Yuri Rumyantsev >>> >>> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New. >>> * tree-if-conv.c (tree_if_conversion): Make public. >>> * * tree-if-conv.h: New file. >>> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid >>> dynamic alias checks for epilogues. >>> * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog. >>> * tree-vect-loop.c: include tree-if-conv.h. >>> (new_loop_vec_info): Add zeroing orig_loop_info field. >>> (vect_analyze_loop_2): Don't try to enhance alignment for epilogues. >>> (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL >>> if epilogue is vectorized, set up orig_loop_info field of loop_vinfo >>> using passed argument. >>> (vect_transform_loop): Check if created epilogue should be returned >>> for further vectorization with less vf. If-convert epilogue if >>> required. Print vectorization success for epilogue. >>> * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization >>> if it is required, pass loop_vinfo produced during vectorization of >>> loop body to vect_analyze_loop. >>> * tree-vectorizer.h (struct _loop_vec_info): Add new field >>> orig_loop_info. >>> (LOOP_VINFO_ORIG_LOOP_INFO): New. >>> (LOOP_VINFO_EPILOGUE_P): New. >>> (LOOP_VINFO_ORIG_VECT_FACTOR): New. >>> (vect_do_peeling): Change prototype to return epilogue. >>> (vect_analyze_loop): Add argument of loop_vec_info type. >>> (vect_transform_loop): Return created loop. >>> >>> gcc/testsuite/ >>> >>> * lib/target-supports.exp (check_avx2_hw_available): New. >>> (check_effective_target_avx2_runtime): New. >>> * gcc.dg/vect/vect-tail-nomask-1.c: New test. >>> >> >> Hi, >> >> This new test fails on arm-none-eabi (using default cpu/fpu/mode): >> gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test >> gcc.dg/vect/vect-tail-nomask-1.c execution test >> >> It does pass on the same target if configured --with-cpu=cortex-a9. >> >> Christophe >> >> >> >>> >>> 2016-11-14 20:04 GMT+03:00 Richard Biener : On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev wrote: >Richard, > >I checked one of the tests designed for epilogue vectorization using >patches 1 - 3 and found out that build compiler performs vectorization >of epilogues with --param vect-epilogues-nomask=1 passed: > >$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o >t1.new-nomask.s -fdump-tree-vect-details >$ grep VECTORIZED -c t1.c.156t.vect >4 > Without param only 2 loops are vectorized. > >Should I simply add a part of tests related to this feature or I must >delete all not necessary changes also? Please remove all not necessary changes. Richard. >Thanks. >Yuri. > >2016-11-14 16:40 GMT+03:00 Richard Biener : >> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote: >> >>> Richard, >>> >>> In my previous patch I forgot to remove couple lines related to aux >field. >>> Here is the correct updated patch. >> >> Yeah, I noticed. This patch would be ok for trunk (together with >> necessary parts from 1 and 2) if all not required parts are removed >> (and you'd add the testcases covering non-masked tail vect). >> >> Thus, can you please produce a single complete patch containing only >> non-masked epilogue vectoriziation? >> >> Thanks, >> Richard. >> >>> Thanks. >>> Yuri. >>> >>> 2016-11-14 15:51 GMT+03:00 Richard Biener : >>> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote: >>> > >>> >> Richard, >>> >> >>> >> I prepare updated 3 patch with passing additional argument to >>> >> vect_analyze_loop as you proposed (untested). >>> >> >>> >> You wrote: >>> >> tw, I wonder if you can produce a single patch containing
Re: [PATCH, ARM] Enable ldrd/strd peephole rules unconditionally
On 11/18/16 12:58, Christophe Lyon wrote: > On 17 November 2016 at 10:23, Kyrill Tkachov >wrote: >> >> On 09/11/16 12:58, Bernd Edlinger wrote: >>> >>> Hi! >>> >>> >>> This patch enables the ldrd/strd peephole rules unconditionally. >>> >>> It is meant to fix cases, where the patch to reduce the sha512 >>> stack usage splits ldrd/strd instructions into separate ldr/str insns, >>> but is technically independent from the other patch: >>> >>> See https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00523.html >>> >>> It was necessary to change check_effective_target_arm_prefer_ldrd_strd >>> to retain the true prefer_ldrd_strd tuning flag. >>> >>> >>> Bootstrapped and reg-tested on arm-linux-gnueabihf. >>> Is it OK for trunk? >> >> >> This is ok. >> Thanks, >> Kyrill >> > > Hi Bernd, > > Since you committed this patch (r242549), I'm seeing the new test > failing on some arm*-linux-gnueabihf configurations: > > FAIL: gcc.target/arm/pr53447-5.c scan-assembler-times ldrd 10 > FAIL: gcc.target/arm/pr53447-5.c scan-assembler-times strd 9 > > See > http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/242549/report-build-info.html > for a map of failures. > > Am I missing something? Hi Christophe, as always many thanks for your testing... I have apparently only looked at the case -mfloat-abi=soft here, which is what my other patch is going to address. But all targets with -mfpu=neon -mfloat-abi=hard can also use vldr.64 instead of ldrd and vstr.64 instead of strd, which should be accepted as well. So the attached patch should fix at least most of the fallout. Is it OK for trunk? Thanks Bernd. 2016-11-18 Bernd Edlinger * gcc.target/arm/pr53447-5.c: Fix test expectations for neon-fpu. Index: gcc/testsuite/gcc.target/arm/pr53447-5.c === --- gcc/testsuite/gcc.target/arm/pr53447-5.c (revision 242588) +++ gcc/testsuite/gcc.target/arm/pr53447-5.c (working copy) @@ -15,5 +15,8 @@ void foo(long long* p) p[9] -= p[10]; } -/* { dg-final { scan-assembler-times "ldrd" 10 } } */ -/* { dg-final { scan-assembler-times "strd" 9 } } */ +/* We accept neon instructions vldr.64 and vstr.64 as well. + Note: DejaGnu counts patterns with alternatives twice, + so actually there are only 10 loads and 9 stores. */ +/* { dg-final { scan-assembler-times "(ldrd|vldr\\.64)" 20 } } */ +/* { dg-final { scan-assembler-times "(strd|vstr\\.64)" 18 } } */
Re: [PATCH, vec-tails] Support loop epilogue vectorization
It is very strange that this test failed on arm, since it requires target avx2 to check vectorizer dumps: /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { target avx2_runtime } } } */ /* { dg-final { scan-tree-dump-times "LOOP EPILOGUE VECTORIZED \\(VS=16\\)" 2 "vect" { target avx2_runtime } } } */ Could you please clarify what is the reason of the failure? Thanks. 2016-11-18 16:20 GMT+03:00 Christophe Lyon: > On 15 November 2016 at 15:41, Yuri Rumyantsev wrote: >> Hi All, >> >> Here is patch for non-masked epilogue vectoriziation. >> >> Bootstrap and regression testing did not show any new failures. >> >> Is it OK for trunk? >> >> Thanks. >> Changelog: >> >> 2016-11-15 Yuri Rumyantsev >> >> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New. >> * tree-if-conv.c (tree_if_conversion): Make public. >> * * tree-if-conv.h: New file. >> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid >> dynamic alias checks for epilogues. >> * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog. >> * tree-vect-loop.c: include tree-if-conv.h. >> (new_loop_vec_info): Add zeroing orig_loop_info field. >> (vect_analyze_loop_2): Don't try to enhance alignment for epilogues. >> (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL >> if epilogue is vectorized, set up orig_loop_info field of loop_vinfo >> using passed argument. >> (vect_transform_loop): Check if created epilogue should be returned >> for further vectorization with less vf. If-convert epilogue if >> required. Print vectorization success for epilogue. >> * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization >> if it is required, pass loop_vinfo produced during vectorization of >> loop body to vect_analyze_loop. >> * tree-vectorizer.h (struct _loop_vec_info): Add new field >> orig_loop_info. >> (LOOP_VINFO_ORIG_LOOP_INFO): New. >> (LOOP_VINFO_EPILOGUE_P): New. >> (LOOP_VINFO_ORIG_VECT_FACTOR): New. >> (vect_do_peeling): Change prototype to return epilogue. >> (vect_analyze_loop): Add argument of loop_vec_info type. >> (vect_transform_loop): Return created loop. >> >> gcc/testsuite/ >> >> * lib/target-supports.exp (check_avx2_hw_available): New. >> (check_effective_target_avx2_runtime): New. >> * gcc.dg/vect/vect-tail-nomask-1.c: New test. >> > > Hi, > > This new test fails on arm-none-eabi (using default cpu/fpu/mode): > gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test > gcc.dg/vect/vect-tail-nomask-1.c execution test > > It does pass on the same target if configured --with-cpu=cortex-a9. > > Christophe > > > >> >> 2016-11-14 20:04 GMT+03:00 Richard Biener : >>> On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev >>> wrote: Richard, I checked one of the tests designed for epilogue vectorization using patches 1 - 3 and found out that build compiler performs vectorization of epilogues with --param vect-epilogues-nomask=1 passed: $ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o t1.new-nomask.s -fdump-tree-vect-details $ grep VECTORIZED -c t1.c.156t.vect 4 Without param only 2 loops are vectorized. Should I simply add a part of tests related to this feature or I must delete all not necessary changes also? >>> >>> Please remove all not necessary changes. >>> >>> Richard. >>> Thanks. Yuri. 2016-11-14 16:40 GMT+03:00 Richard Biener : > On Mon, 14 Nov 2016, Yuri Rumyantsev wrote: > >> Richard, >> >> In my previous patch I forgot to remove couple lines related to aux field. >> Here is the correct updated patch. > > Yeah, I noticed. This patch would be ok for trunk (together with > necessary parts from 1 and 2) if all not required parts are removed > (and you'd add the testcases covering non-masked tail vect). > > Thus, can you please produce a single complete patch containing only > non-masked epilogue vectoriziation? > > Thanks, > Richard. > >> Thanks. >> Yuri. >> >> 2016-11-14 15:51 GMT+03:00 Richard Biener : >> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote: >> > >> >> Richard, >> >> >> >> I prepare updated 3 patch with passing additional argument to >> >> vect_analyze_loop as you proposed (untested). >> >> >> >> You wrote: >> >> tw, I wonder if you can produce a single patch containing just >> >> epilogue vectorization, that is combine patches 1-3 but rip out >> >> changes only needed by later patches? >> >> >> >> Did you mean that I exclude all support for vectorization epilogues, >> >> i.e. exclude from 2-nd patch all non-related changes >> >> like >> >> >> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c >> >> index 11863af..32011c1 100644 >> >> ---
Re: Fix PR77881: combine improvement
On Wed, Nov 16, 2016 at 3:05 PM, Andreas Schwabwrote: > On Nov 14 2016, Michael Matz wrote: > >> PR missed-optimization/77881 >> * combine.c (simplify_comparison): Remove useless subregs >> also inside the loop, not just after it. >> (make_compound_operation): Recognize some subregs as being >> masking as well. > > This breaks gcc.c-torture/execute/cbrt.c execution test on aarch64. Hi, I can confirm that, also new PR opened for tracking. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78422 Thanks, bin > > Andreas. > > -- > Andreas Schwab, SUSE Labs, sch...@suse.de > GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 > "And now for something completely different."
[PATCH] substring_loc info needs default track-macro-expansion (PR preprocessor/78324)
gcc.dg/tree-ssa/builtin-sprintf-2.c is showing intermittent failures, which valgrind shows to be a read past the end of a buffer. The root cause is that the on-demand substring loc code isn't set up to cope with -ftrack-macro-expansion != 2 (the default). In the failing case, it attempts to use this location as the location of the literal: ../../src/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-2.c:95:1: note: RNG (0, 0, 0, "%hhi", i) ^~~ rather than: RNG (0, 0, 0, "%hhi", i) ^~~~ On re-parsing to generate the on-demand substring locations, it thus erroneously attempts to parse the 'R' as a raw string, and so this code within cpp_interpret_string_1 fires: if (*p == 'R') { const uchar *prefix; /* Skip over 'R"'. */ p += 2; prefix = p; while (*p != '(') p++; and the issue happens in the "while" loop, as it erroneously walks through this memory: (gdb) p strs.m_vec.m_vecdata[0] $20 = {len = 3, text = 0xc9bcca0 "RNG"} trying to find the open parenthesis, and starts reading beyond the allocated buffer. The fix is to require that -ftrack-macro-expansion == 2 (the default) for on-demand string literal locations to be available, failing gracefully to simply using the whole location_t otherwise. Doing so requires some fixups to existing test cases: [gcc.dg/tree-ssa/builtin-sprintf-warn-{1,4}.c both have -ftrack-macro-expansion=0. In the latter case, it seems to be unnecessary, so this patch removes it. In the former case, it seems to be needed, but there are some expected locations in test_sprintf_note that are changed by the patch. It's not clear to me how to fix that, so for now the patch removes the expected column numbers from that function within the test case. Successfully bootstrapped on x86_64-pc-linux-gnu. I think I can self-approve the input.c change and new test cases, but I'm not sure about the changes to Martin's test cases. Martin: are the changes to your test cases OK by you, or is there a better way to rewrite them? gcc/ChangeLog: PR preprocessor/78324 * input.c (get_substring_ranges_for_loc): Fail gracefully if -ftrack-macro-expansion has a value other than 2. gcc/testsuite/ChangeLog: PR preprocessor/78324 * gcc.dg/plugin/diagnostic-test-string-literals-1.c (test_multitoken_macro): New function. * gcc.dg/plugin/diagnostic-test-string-literals-3.c: New test case. * gcc.dg/plugin/diagnostic-test-string-literals-4.c: New test case. * gcc.dg/plugin/plugin.exp (plugin_test_list): Add the new test cases. * gcc.dg/tree-ssa/builtin-sprintf-warn-1.c (test_sprintf_note): Drop expected column numbers from dg-warning directives. * gcc.dg/tree-ssa/builtin-sprintf-warn-4.c: Drop -ftrack-macro-expansion=0. --- gcc/input.c| 9 + .../plugin/diagnostic-test-string-literals-1.c | 16 .../plugin/diagnostic-test-string-literals-3.c | 43 ++ .../plugin/diagnostic-test-string-literals-4.c | 43 ++ gcc/testsuite/gcc.dg/plugin/plugin.exp | 4 +- .../gcc.dg/tree-ssa/builtin-sprintf-warn-1.c | 6 +-- .../gcc.dg/tree-ssa/builtin-sprintf-warn-4.c | 2 +- 7 files changed, 118 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-3.c create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-4.c diff --git a/gcc/input.c b/gcc/input.c index 728f4dd..611e18b 100644 --- a/gcc/input.c +++ b/gcc/input.c @@ -1322,6 +1322,15 @@ get_substring_ranges_for_loc (cpp_reader *pfile, if (strloc == UNKNOWN_LOCATION) return "unknown location"; + /* Reparsing the strings requires accurate location information. + If -ftrack-macro-expansion has been overridden from its default + of 2, then we might have a location of a macro expansion point, + rather than the location of the literal itself. + Avoid this by requiring that we have full macro expansion tracking + for substring locations to be available. */ + if (cpp_get_options (pfile)->track_macro_expansion != 2) +return "track_macro_expansion != 2"; + /* If string concatenation has occurred at STRLOC, get the locations of all of the literal tokens making up the compound string. Otherwise, just use STRLOC. */ diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-1.c b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-1.c index 3e44936..76a085e 100644 --- a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-1.c +++ b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-1.c @@ -243,6 +243,22 @@ test_macro (void) { dg-end-multiline-output "" } */ } +void +test_multitoken_macro (void) +{ +#define RANGE ("0123456789") /* { dg-error "unable to
Re: [PATCH 1/2] PR77822
On Fri, Nov 18, 2016 at 08:02:08AM -0600, Segher Boessenkool wrote: > On Fri, Nov 18, 2016 at 01:09:24PM +0100, Dominik Vogt wrote: > > IN_RANGE(POS...) makes sure that POS is a non-negative number > > smaller than UPPER, so (UPPER) - (POS) does not wrap. Or is there > > some case that the new macro does not handle? > > I think it works fine. > > With a name like UPPER, you may want to have SIZE_POS_IN_RANGE work like > IN_RANGE, i.e. UPPER is inclusive then. Dunno. Yeah, maybe rather call it RANGE to avoid too much similarity. Some name that is so vague that one has to read the documentation. I'll post an updated patch later. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany
libgo patch committed: Remove long-obsolete "old" directories
The directories old/regexp and old/template were removed from the master Go library in 2012 (https://golang.org/cl/5979046) but somehow that was not reflected in libgo. This patch removes them from libgo. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian patch.txt.bz2 Description: BZip2 compressed data
Re: [PATCH] S390: Lower requirements for successful htm tests.
On Fri, Nov 18, 2016 at 02:48:30PM +0100, Dominik Vogt wrote: > gcc/testsuite/ChangeLog > > * gcc.target/s390/htm-builtins-1.c (DEFAULT_MAX_REPETITIONS) > (DEFAULT_REQUIRED_QUORUM, NUM_WARMUP_RUNS): Lower requirements for > successful test. > * gcc.target/s390/htm-builtins-2.c (DEFAULT_MAX_REPETITIONS) > (DEFAULT_REQUIRED_QUORUM): Likewise. Applied. Thanks! -Andreas-
Re: [PATCH v3] PR77359: Properly align local variables in functions calling alloca.
On Fri, Nov 11, 2016 at 10:33:16AM +0100, Dominik Vogt wrote: > gcc/ChangeLog > > * config/rs6000/rs6000.c (rs6000_stack_info): PR/77359: Properly align > local variables in functions calling alloca. Also update the ASCII > drawings > * config/rs6000/rs6000.h (STARTING_FRAME_OFFSET, STACK_DYNAMIC_OFFSET): > PR/77359: Likewise. > * config/rs6000/aix.h (STARTING_FRAME_OFFSET, STACK_DYNAMIC_OFFSET): > PR/77359: Copy AIX specific versions of the rs6000.h macros to aix.h. Applied. Thanks! -Andreas-
Re: [Patch, Fortran] PR 78392: ICE in gfc_trans_auto_array_allocation, at fortran/trans-array.c:5979
Hi Janus, > the attached patch fixes an ice-on-valid problem, simply by removing an > assert. ... I have several instances in my test suite showing that the proposed patch removes the ICE but generates wrong code: pr42359, second test, => ICE on another place pr54613, sixth and eighth tests, Thanks for working on the issue, Dominique
Re: [PATCH] debug/PR78112: remove recent duplicates for DW_TAG_subprogram attributes
On 10 November 2016 at 12:06, Pierre-Marie de Rodatwrote: > On 11/09/2016 10:02 AM, Richard Biener wrote: >> >> Using scan-assembler-times on the dwarf? > > > I always have a bad feeling about this kind of check as I imagine it can > break very easily with legit changes. But I have nothing better to > contribute, so I’ve added one such testcase. ;-) > >>> Ok to commit? >> >> >> Ok. > > > This is committed as r242035. Thanks! > Hi, Part of the new testcase added with this commit fails on arm* targets: g++.dg/pr78112.C scan-assembler-times DW_AT_object_pointer 37 Christophe > -- > Pierre-Marie de Rodat
Re: [PATCH 1/2] PR77822
On Fri, Nov 18, 2016 at 01:09:24PM +0100, Dominik Vogt wrote: > How about this: > > -- > /* A convenience macro to determine whether a SIZE lies inclusively >within [1, UPPER], POS lies inclusively within between >[0, UPPER - 1] and the sum lies inclusively within [1, UPPER]. >UPPER must be >= 1. */ > #define SIZE_POS_IN_RANGE(SIZE, POS, UPPER) \ > (IN_RANGE ((POS), 0, (unsigned HOST_WIDE_INT) (UPPER) - 1) \ >&& IN_RANGE ((SIZE), 1, (unsigned HOST_WIDE_INT) (UPPER) \ >- (unsigned HOST_WIDE_INT)(POS))) ^ missing space here > IN_RANGE(POS...) makes sure that POS is a non-negative number > smaller than UPPER, so (UPPER) - (POS) does not wrap. Or is there > some case that the new macro does not handle? I think it works fine. With a name like UPPER, you may want to have SIZE_POS_IN_RANGE work like IN_RANGE, i.e. UPPER is inclusive then. Dunno. Segher
Re: [v3 PATCH] LWG 2766, LWG 2749
On 18 November 2016 at 15:54, Jonathan Wakelywrote: > I agree, but if we'd just refused to support such undefined behaviour > in stage 1 we wouldn't now be in a position of saying we can't change > it in stage 3. I want to support the code that the previous attempt breaks. I don't think I can do so without concepts.
Re: [v3 PATCH] LWG 2766, LWG 2749
On 17/11/16 23:38 +0200, Ville Voutilainen wrote: The first patch does everything but the container adaptor parts (which are wrong in the p/r) and the tuple part (which is icky). The second part does the tuple parts, and needs some serious RFC. I know it's ugly as sin, but it works. I don't think we should try to teach our tuple to give the right answer for is_constructible etc., because the last attempt at it broke boost and would break reasonable existing code in addition to just boost - such things are not Stage 3 material, imnsho. I agree, but if we'd just refused to support such undefined behaviour in stage 1 we wouldn't now be in a position of saying we can't change it in stage 3. Oh well, I'll review this ASAP.
Re: Rework subreg_get_info
Joseph Myerswrites: > On Tue, 15 Nov 2016, Richard Sandiford wrote: >> Richard Sandiford writes: >> > This isn't intended to change the behaviour, just rewrite the >> > existing logic in a different (and hopefully clearer) way. >> > The new form -- particularly the part based on the "block" >> > concept -- is easier to convert to polynomial sizes. >> > >> > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? >> >> Sorry, I should have said: this was also tested by compiling the >> testsuite before and after the change at -O2 -ftree-vectorize on: >> >> aarch64-linux-gnueabi alpha-linux-gnu arc-elf arm-linux-gnueabi >> arm-linux-gnueabihf avr-elf bfin-elf c6x-elf cr16-elf cris-elf >> epiphany-elf fr30-elf frv-linux-gnu ft32-elf h8300-elf >> hppa64-hp-hpux11.23 ia64-linux-gnu i686-pc-linux-gnu >> i686-apple-darwin iq2000-elf lm32-elf m32c-elf m32r-elf >> m68k-linux-gnu mcore-elf microblaze-elf mips-linux-gnu >> mipsisa64-linux-gnu mmix mn10300-elf moxie-rtems msp430-elf >> nds32le-elf nios2-linux-gnu nvptx-none pdp11 powerpc-linux-gnu >> powerpc-eabispe powerpc64-linux-gnu powerpc-ibm-aix7.0 rl78-elf > > e500 double (both DFmode and TFmode) was the case that motivated the > original creation of subreg_get_info. I think powerpc-linux-gnuspe > --enable-e500-double would be a good case for verifying the patch doesn't > change generated code. (Though when I tried building it from mainline > sources last week I got an ICE in LRA building __multc3, so testing it > might be problematic at present.) Thanks for the pointer. I've now tried comparing the testsuite output on that combination and there were no differences. 31 tests triggered an internal compiler error but >17800 compiled successfully, so hopefully the test was at least somewhat meaningful. (For the record: I got a build failure due to addsi3 FAILing, but I hacked around that by converting it to a force_reg/DONE.) Thanks, Richard
[PATCH] S390: Lower requirements for successful htm tests.
The attached patch makes the htm tests on s390 less sensitive to spurious abort. Please check the commit comment for details. The modified tests have been run once on a zEC12. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany gcc/testsuite/ChangeLog * gcc.target/s390/htm-builtins-1.c (DEFAULT_MAX_REPETITIONS) (DEFAULT_REQUIRED_QUORUM, NUM_WARMUP_RUNS): Lower requirements for successful test. * gcc.target/s390/htm-builtins-2.c (DEFAULT_MAX_REPETITIONS) (DEFAULT_REQUIRED_QUORUM): Likewise. >From cd354b024c5e129575465b9088b0e0d03340c8f0 Mon Sep 17 00:00:00 2001 From: Dominik VogtDate: Fri, 18 Nov 2016 14:06:28 +0100 Subject: [PATCH] S390: Lower requirements for successful htm tests. Due to spurious aborts, requiring four successful tests out of five fails too often; accept four out of seven instead. Reduce number of warmup passes. --- gcc/testsuite/gcc.target/s390/htm-builtins-1.c | 6 +++--- gcc/testsuite/gcc.target/s390/htm-builtins-2.c | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/gcc/testsuite/gcc.target/s390/htm-builtins-1.c b/gcc/testsuite/gcc.target/s390/htm-builtins-1.c index c90490f..ff43be9 100644 --- a/gcc/testsuite/gcc.target/s390/htm-builtins-1.c +++ b/gcc/testsuite/gcc.target/s390/htm-builtins-1.c @@ -13,9 +13,9 @@ /* local definitions -- */ -#define DEFAULT_MAX_REPETITIONS 5 -#define DEFAULT_REQUIRED_QUORUM ((DEFAULT_MAX_REPETITIONS) - 1) -#define NUM_WARMUP_RUNS 10 +#define DEFAULT_MAX_REPETITIONS 7 +#define DEFAULT_REQUIRED_QUORUM 4 +#define NUM_WARMUP_RUNS 2 /* local macros --- */ diff --git a/gcc/testsuite/gcc.target/s390/htm-builtins-2.c b/gcc/testsuite/gcc.target/s390/htm-builtins-2.c index 15b0d12..bb9d346 100644 --- a/gcc/testsuite/gcc.target/s390/htm-builtins-2.c +++ b/gcc/testsuite/gcc.target/s390/htm-builtins-2.c @@ -13,8 +13,8 @@ /* local definitions -- */ -#define DEFAULT_MAX_REPETITIONS 5 -#define DEFAULT_REQUIRED_QUORUM ((DEFAULT_MAX_REPETITIONS) - 1) +#define DEFAULT_MAX_REPETITIONS 7 +#define DEFAULT_REQUIRED_QUORUM 4 #define DEFAULT_ABORT_ADDRESS (0x12345678u) /* local macros --- */ -- 2.3.0
[patch,avr] Disallow LDS / STS for .rodata on AVR_TINY.
Currently, Binutils still comes with an AVR_TINY linker description file which puts .rodata into RAM so that LDS is applicable for reading such objects. However, as these devices come with a linearised address model, linker descriptions which put .rodata into flash are much more reasonable. This patch caters for such situations: As .rodata virtual addresses will be offset by 0x4000, respective objects may no more be accessed by LDS. Moreover, objects with a section attribute won't be accessed by LDS. Ok for trunk? Johann PR target/78093 * config/avr/avr.c (avr_decl_maybe_lds_p): New static function. (avr_encode_section_info) [TARGET_ABSDATA && AVR_TINY]: Use it. Index: config/avr/avr.c === --- config/avr/avr.c (revision 242544) +++ config/avr/avr.c (working copy) @@ -10102,6 +10102,33 @@ avr_section_type_flags (tree decl, const } +/* A helper for the next function. NODE is a decl that is associated with + a symbol. Return TRUE if the respective object may be accessed by LDS. + There migth still be other reasons for why LDS is not appropriate. + This function is only useful for AVR_TINY. */ + +static bool +avr_decl_maybe_lds_p (tree node) +{ + if (!node + || TREE_CODE (node) != VAR_DECL + || DECL_SECTION_NAME (node) != NULL) +return false; + + if (TREE_READONLY (node)) +return true; + + // C++ requires peeling arrays. + + do +node = TREE_TYPE (node); + while (ARRAY_TYPE == TREE_CODE (node)); + + return (node != error_mark_node + && !TYPE_READONLY (node)); +} + + /* Implement `TARGET_ENCODE_SECTION_INFO'. */ static void @@ -10193,7 +10220,8 @@ avr_encode_section_info (tree decl, rtx if (avr_decl_absdata_p (decl, DECL_ATTRIBUTES (decl)) || (TARGET_ABSDATA && !progmem_p - && !addr_attr) + && !addr_attr + && avr_decl_maybe_lds_p (decl)) || (addr_attr // If addr_attr is non-null, it has an argument. Peek into it. && TREE_INT_CST_LOW (TREE_VALUE (TREE_VALUE (addr_attr))) < 0xc0))
Re: [PATCH, vec-tails] Support loop epilogue vectorization
On 15 November 2016 at 15:41, Yuri Rumyantsevwrote: > Hi All, > > Here is patch for non-masked epilogue vectoriziation. > > Bootstrap and regression testing did not show any new failures. > > Is it OK for trunk? > > Thanks. > Changelog: > > 2016-11-15 Yuri Rumyantsev > > * params.def (PARAM_VECT_EPILOGUES_NOMASK): New. > * tree-if-conv.c (tree_if_conversion): Make public. > * * tree-if-conv.h: New file. > * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid > dynamic alias checks for epilogues. > * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog. > * tree-vect-loop.c: include tree-if-conv.h. > (new_loop_vec_info): Add zeroing orig_loop_info field. > (vect_analyze_loop_2): Don't try to enhance alignment for epilogues. > (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL > if epilogue is vectorized, set up orig_loop_info field of loop_vinfo > using passed argument. > (vect_transform_loop): Check if created epilogue should be returned > for further vectorization with less vf. If-convert epilogue if > required. Print vectorization success for epilogue. > * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization > if it is required, pass loop_vinfo produced during vectorization of > loop body to vect_analyze_loop. > * tree-vectorizer.h (struct _loop_vec_info): Add new field > orig_loop_info. > (LOOP_VINFO_ORIG_LOOP_INFO): New. > (LOOP_VINFO_EPILOGUE_P): New. > (LOOP_VINFO_ORIG_VECT_FACTOR): New. > (vect_do_peeling): Change prototype to return epilogue. > (vect_analyze_loop): Add argument of loop_vec_info type. > (vect_transform_loop): Return created loop. > > gcc/testsuite/ > > * lib/target-supports.exp (check_avx2_hw_available): New. > (check_effective_target_avx2_runtime): New. > * gcc.dg/vect/vect-tail-nomask-1.c: New test. > Hi, This new test fails on arm-none-eabi (using default cpu/fpu/mode): gcc.dg/vect/vect-tail-nomask-1.c -flto -ffat-lto-objects execution test gcc.dg/vect/vect-tail-nomask-1.c execution test It does pass on the same target if configured --with-cpu=cortex-a9. Christophe > > 2016-11-14 20:04 GMT+03:00 Richard Biener : >> On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev >> wrote: >>>Richard, >>> >>>I checked one of the tests designed for epilogue vectorization using >>>patches 1 - 3 and found out that build compiler performs vectorization >>>of epilogues with --param vect-epilogues-nomask=1 passed: >>> >>>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o >>>t1.new-nomask.s -fdump-tree-vect-details >>>$ grep VECTORIZED -c t1.c.156t.vect >>>4 >>> Without param only 2 loops are vectorized. >>> >>>Should I simply add a part of tests related to this feature or I must >>>delete all not necessary changes also? >> >> Please remove all not necessary changes. >> >> Richard. >> >>>Thanks. >>>Yuri. >>> >>>2016-11-14 16:40 GMT+03:00 Richard Biener : On Mon, 14 Nov 2016, Yuri Rumyantsev wrote: > Richard, > > In my previous patch I forgot to remove couple lines related to aux >>>field. > Here is the correct updated patch. Yeah, I noticed. This patch would be ok for trunk (together with necessary parts from 1 and 2) if all not required parts are removed (and you'd add the testcases covering non-masked tail vect). Thus, can you please produce a single complete patch containing only non-masked epilogue vectoriziation? Thanks, Richard. > Thanks. > Yuri. > > 2016-11-14 15:51 GMT+03:00 Richard Biener : > > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote: > > > >> Richard, > >> > >> I prepare updated 3 patch with passing additional argument to > >> vect_analyze_loop as you proposed (untested). > >> > >> You wrote: > >> tw, I wonder if you can produce a single patch containing just > >> epilogue vectorization, that is combine patches 1-3 but rip out > >> changes only needed by later patches? > >> > >> Did you mean that I exclude all support for vectorization >>>epilogues, > >> i.e. exclude from 2-nd patch all non-related changes > >> like > >> > >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > >> index 11863af..32011c1 100644 > >> --- a/gcc/tree-vect-loop.c > >> +++ b/gcc/tree-vect-loop.c > >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop) > >>LOOP_VINFO_PEELING_FOR_GAPS (res) = false; > >>LOOP_VINFO_PEELING_FOR_NITER (res) = false; > >>LOOP_VINFO_OPERANDS_SWAPPED (res) = false; > >> + LOOP_VINFO_CAN_BE_MASKED (res) = false; > >> + LOOP_VINFO_REQUIRED_MASKS (res) = 0; > >> + LOOP_VINFO_COMBINE_EPILOGUE (res) = false; > >> + LOOP_VINFO_MASK_EPILOGUE (res) = false; > >> + LOOP_VINFO_NEED_MASKING (res) = false; > >> + LOOP_VINFO_ORIG_LOOP_INFO (res) =
Re: [PATCH] Fix PR78189
On 11 November 2016 at 09:56, Richard Bienerwrote: > On Thu, 10 Nov 2016, Christophe Lyon wrote: > >> On 10 November 2016 at 09:34, Richard Biener wrote: >> > On Wed, 9 Nov 2016, Christophe Lyon wrote: >> > >> >> On 9 November 2016 at 09:36, Bin.Cheng wrote: >> >> > On Tue, Nov 8, 2016 at 9:11 AM, Richard Biener >> >> > wrote: >> >> >> On Mon, 7 Nov 2016, Christophe Lyon wrote: >> >> >> >> >> >>> Hi Richard, >> >> >>> >> >> >>> >> >> >>> On 7 November 2016 at 09:01, Richard Biener wrote: >> >> >>> > >> >> >>> > The following fixes an oversight when computing alignment in the >> >> >>> > vectorizer. >> >> >>> > >> >> >>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to >> >> >>> > trunk. >> >> >>> > >> >> >>> > Richard. >> >> >>> > >> >> >>> > 2016-11-07 Richard Biener >> >> >>> > >> >> >>> > PR tree-optimization/78189 >> >> >>> > * tree-vect-data-refs.c (vect_compute_data_ref_alignment): >> >> >>> > Fix >> >> >>> > alignment computation. >> >> >>> > >> >> >>> > * g++.dg/torture/pr78189.C: New testcase. >> >> >>> > >> >> >>> > Index: gcc/testsuite/g++.dg/torture/pr78189.C >> >> >>> > === >> >> >>> > --- gcc/testsuite/g++.dg/torture/pr78189.C (revision 0) >> >> >>> > +++ gcc/testsuite/g++.dg/torture/pr78189.C (working copy) >> >> >>> > @@ -0,0 +1,41 @@ >> >> >>> > +/* { dg-do run } */ >> >> >>> > +/* { dg-additional-options "-ftree-slp-vectorize >> >> >>> > -fno-vect-cost-model" } */ >> >> >>> > + >> >> >>> > +#include >> >> >>> > + >> >> >>> > +struct A >> >> >>> > +{ >> >> >>> > + void * a; >> >> >>> > + void * b; >> >> >>> > +}; >> >> >>> > + >> >> >>> > +struct alignas(16) B >> >> >>> > +{ >> >> >>> > + void * pad; >> >> >>> > + void * misaligned; >> >> >>> > + void * pad2; >> >> >>> > + >> >> >>> > + A a; >> >> >>> > + >> >> >>> > + void Null(); >> >> >>> > +}; >> >> >>> > + >> >> >>> > +void B::Null() >> >> >>> > +{ >> >> >>> > + a.a = nullptr; >> >> >>> > + a.b = nullptr; >> >> >>> > +} >> >> >>> > + >> >> >>> > +void __attribute__((noinline,noclone)) >> >> >>> > +NullB(void * misalignedPtr) >> >> >>> > +{ >> >> >>> > + B* b = reinterpret_cast(reinterpret_cast> >> >>> > *>(misalignedPtr) - offsetof(B, misaligned)); >> >> >>> > + b->Null(); >> >> >>> > +} >> >> >>> > + >> >> >>> > +int main() >> >> >>> > +{ >> >> >>> > + B b; >> >> >>> > + NullB(); >> >> >>> > + return 0; >> >> >>> > +} >> >> >>> > diff --git gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c >> >> >>> > index 9346cfe..b03cb1e 100644 >> >> >>> > --- gcc/tree-vect-data-refs.c >> >> >>> > +++ gcc/tree-vect-data-refs.c >> >> >>> > @@ -773,10 +773,25 @@ vect_compute_data_ref_alignment (struct >> >> >>> > data_reference *dr) >> >> >>> >base = ref; >> >> >>> >while (handled_component_p (base)) >> >> >>> > base = TREE_OPERAND (base, 0); >> >> >>> > + unsigned int base_alignment; >> >> >>> > + unsigned HOST_WIDE_INT base_bitpos; >> >> >>> > + get_object_alignment_1 (base, _alignment, _bitpos); >> >> >>> > + /* As data-ref analysis strips the MEM_REF down to its base >> >> >>> > operand >> >> >>> > + to form DR_BASE_ADDRESS and adds the offset to DR_INIT we >> >> >>> > have to >> >> >>> > + adjust things to make base_alignment valid as the alignment of >> >> >>> > + DR_BASE_ADDRESS. */ >> >> >>> >if (TREE_CODE (base) == MEM_REF) >> >> >>> > -base = build2 (MEM_REF, TREE_TYPE (base), base_addr, >> >> >>> > - build_int_cst (TREE_TYPE (TREE_OPERAND (base, >> >> >>> > 1)), 0)); >> >> >>> > - unsigned int base_alignment = get_object_alignment (base); >> >> >>> > +{ >> >> >>> > + base_bitpos -= mem_ref_offset (base).to_short_addr () * >> >> >>> > BITS_PER_UNIT; >> >> >>> > + base_bitpos &= (base_alignment - 1); >> >> >>> > +} >> >> >>> > + if (base_bitpos != 0) >> >> >>> > +base_alignment = base_bitpos & -base_bitpos; >> >> >>> > + /* Also look at the alignment of the base address DR analysis >> >> >>> > + computed. */ >> >> >>> > + unsigned int base_addr_alignment = get_pointer_alignment >> >> >>> > (base_addr); >> >> >>> > + if (base_addr_alignment > base_alignment) >> >> >>> > +base_alignment = base_addr_alignment; >> >> >>> > >> >> >>> >if (base_alignment >= TYPE_ALIGN (TREE_TYPE (vectype))) >> >> >>> > DR_VECT_AUX (dr)->base_element_aligned = true; >> >> >>> >> >> >>> Since you committed this patch (r241892), I'm seeing execution >> >> >>> failures: >> >> >>> gcc.dg/vect/pr40074.c -flto -ffat-lto-objects execution test >> >> >>> gcc.dg/vect/pr40074.c execution test >> >> >>> on armeb-none-linux-gnueabihf --with-mode=arm --with-cpu=cortex-a9 >> >> >>> --with-fpu=neon-fp16 >> >> >>> (using qemu as simulator) >> >> >> >> >> >> The difference is that we now vectorize
Re: Fix PR78154
On 17 November 2016 at 15:24, Richard Bienerwrote: > On Thu, 17 Nov 2016, Prathamesh Kulkarni wrote: > >> On 17 November 2016 at 14:21, Richard Biener wrote: >> > On Thu, 17 Nov 2016, Prathamesh Kulkarni wrote: >> > >> >> Hi Richard, >> >> Following your suggestion in PR78154, the patch checks if stmt >> >> contains call to memmove (and friends) in gimple_stmt_nonzero_warnv_p >> >> and returns true in that case. >> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu. >> >> Cross-testing on arm*-*-*, aarch64*-*-* in progress. >> >> Would it be OK to commit this patch in stage-3 ? >> > >> > As people noted we have returns_nonnull for this and that is already >> > checked. So please make sure the builtins get this attribute instead. >> OK thanks, I will add the returns_nonnull attribute to the required >> string builtins. >> I noticed some of the string builtins don't have RET1 in builtins.def: >> strcat, strncpy, strncat have ATTR_NOTHROW_NONNULL_LEAF. >> Should they instead be having ATTR_RET1_NOTHROW_NONNULL_LEAF similar >> to entries for memmove, strcpy ? > > Yes, I think so. Hi, In the attached patch I added returns_nonnull attribute to ATTR_RET1_NOTHROW_NONNULL_LEAF, and changed few builtins like strcat, strncpy, strncat and corresponding _chk builtins to use ATTR_RET1_NOTHROW_NONNULL_LEAF. Does the patch look correct ? Thanks, Prathamesh > > Richard. diff --git a/gcc/builtin-attrs.def b/gcc/builtin-attrs.def index 8dc59c9..da82da5 100644 --- a/gcc/builtin-attrs.def +++ b/gcc/builtin-attrs.def @@ -108,6 +108,7 @@ DEF_ATTR_IDENT (ATTR_TYPEGENERIC, "type generic") DEF_ATTR_IDENT (ATTR_TM_REGPARM, "*tm regparm") DEF_ATTR_IDENT (ATTR_TM_TMPURE, "transaction_pure") DEF_ATTR_IDENT (ATTR_RETURNS_TWICE, "returns_twice") +DEF_ATTR_IDENT (ATTR_RETURNS_NONNULL, "returns_nonnull") DEF_ATTR_TREE_LIST (ATTR_NOVOPS_LIST, ATTR_NOVOPS, ATTR_NULL, ATTR_NULL) @@ -195,8 +196,11 @@ DEF_ATTR_TREE_LIST (ATTR_CONST_NOTHROW_NONNULL, ATTR_CONST, ATTR_NULL, \ ATTR_NOTHROW_NONNULL) /* Nothrow leaf functions whose pointer parameter(s) are all nonnull, and which return their first argument. */ -DEF_ATTR_TREE_LIST (ATTR_RET1_NOTHROW_NONNULL_LEAF, ATTR_FNSPEC, ATTR_LIST_STR1, \ +DEF_ATTR_TREE_LIST (ATTR_RET1_NOTHROW_NONNULL_LEAF_1, ATTR_RETURNS_NONNULL, ATTR_NULL, \ ATTR_NOTHROW_NONNULL_LEAF) +DEF_ATTR_TREE_LIST (ATTR_RET1_NOTHROW_NONNULL_LEAF, ATTR_FNSPEC, ATTR_LIST_STR1, \ + ATTR_RET1_NOTHROW_NONNULL_LEAF_1) + /* Nothrow const leaf functions whose pointer parameter(s) are all nonnull. */ DEF_ATTR_TREE_LIST (ATTR_CONST_NOTHROW_NONNULL_LEAF, ATTR_CONST, ATTR_NULL, \ ATTR_NOTHROW_NONNULL_LEAF) diff --git a/gcc/builtins.def b/gcc/builtins.def index 219feeb..c697b0a 100644 --- a/gcc/builtins.def +++ b/gcc/builtins.def @@ -646,13 +646,13 @@ DEF_LIB_BUILTIN(BUILT_IN_MEMCHR, "memchr", BT_FN_PTR_CONST_PTR_INT_SIZE, DEF_LIB_BUILTIN(BUILT_IN_MEMCMP, "memcmp", BT_FN_INT_CONST_PTR_CONST_PTR_SIZE, ATTR_PURE_NOTHROW_NONNULL_LEAF) DEF_LIB_BUILTIN_CHKP (BUILT_IN_MEMCPY, "memcpy", BT_FN_PTR_PTR_CONST_PTR_SIZE, ATTR_RET1_NOTHROW_NONNULL_LEAF) DEF_LIB_BUILTIN_CHKP (BUILT_IN_MEMMOVE, "memmove", BT_FN_PTR_PTR_CONST_PTR_SIZE, ATTR_RET1_NOTHROW_NONNULL_LEAF) -DEF_EXT_LIB_BUILTIN_CHKP (BUILT_IN_MEMPCPY, "mempcpy", BT_FN_PTR_PTR_CONST_PTR_SIZE, ATTR_NOTHROW_NONNULL_LEAF) +DEF_EXT_LIB_BUILTIN_CHKP (BUILT_IN_MEMPCPY, "mempcpy", BT_FN_PTR_PTR_CONST_PTR_SIZE, ATTR_RET1_NOTHROW_NONNULL_LEAF) DEF_LIB_BUILTIN_CHKP (BUILT_IN_MEMSET, "memset", BT_FN_PTR_PTR_INT_SIZE, ATTR_RET1_NOTHROW_NONNULL_LEAF) DEF_EXT_LIB_BUILTIN(BUILT_IN_RINDEX, "rindex", BT_FN_STRING_CONST_STRING_INT, ATTR_PURE_NOTHROW_NONNULL_LEAF) -DEF_EXT_LIB_BUILTIN_CHKP (BUILT_IN_STPCPY, "stpcpy", BT_FN_STRING_STRING_CONST_STRING, ATTR_NOTHROW_NONNULL_LEAF) -DEF_EXT_LIB_BUILTIN(BUILT_IN_STPNCPY, "stpncpy", BT_FN_STRING_STRING_CONST_STRING_SIZE, ATTR_NOTHROW_NONNULL_LEAF) +DEF_EXT_LIB_BUILTIN_CHKP (BUILT_IN_STPCPY, "stpcpy", BT_FN_STRING_STRING_CONST_STRING, ATTR_RET1_NOTHROW_NONNULL_LEAF) +DEF_EXT_LIB_BUILTIN(BUILT_IN_STPNCPY, "stpncpy", BT_FN_STRING_STRING_CONST_STRING_SIZE, ATTR_RET1_NOTHROW_NONNULL_LEAF) DEF_EXT_LIB_BUILTIN(BUILT_IN_STRCASECMP, "strcasecmp", BT_FN_INT_CONST_STRING_CONST_STRING, ATTR_PURE_NOTHROW_NONNULL_LEAF) -DEF_LIB_BUILTIN_CHKP (BUILT_IN_STRCAT, "strcat", BT_FN_STRING_STRING_CONST_STRING, ATTR_NOTHROW_NONNULL_LEAF) +DEF_LIB_BUILTIN_CHKP (BUILT_IN_STRCAT, "strcat", BT_FN_STRING_STRING_CONST_STRING, ATTR_RET1_NOTHROW_NONNULL_LEAF) DEF_LIB_BUILTIN_CHKP (BUILT_IN_STRCHR, "strchr", BT_FN_STRING_CONST_STRING_INT, ATTR_PURE_NOTHROW_NONNULL_LEAF) DEF_LIB_BUILTIN(BUILT_IN_STRCMP, "strcmp", BT_FN_INT_CONST_STRING_CONST_STRING, ATTR_PURE_NOTHROW_NONNULL_LEAF) DEF_LIB_BUILTIN_CHKP (BUILT_IN_STRCPY, "strcpy", BT_FN_STRING_STRING_CONST_STRING,
[Patch, Fortran] PR 78392: ICE in gfc_trans_auto_array_allocation, at fortran/trans-array.c:5979
Hi all, the attached patch fixes an ice-on-valid problem, simply by removing an assert. The generated code works as expected and the patch regtests cleanly on x86_64-linux-gnu. Ok for trunk? Cheers, Janus 2016-11-18 Janus WeilPR fortran/78392 * trans-array.c (gfc_trans_auto_array_allocation): Remove an assert. 2016-11-18 Janus Weil PR fortran/78392 * gfortran.dg/saved_automatic_2.f90: New test case. Index: gcc/fortran/trans-array.c === --- gcc/fortran/trans-array.c (Revision 242586) +++ gcc/fortran/trans-array.c (Arbeitskopie) @@ -5976,7 +5976,6 @@ gfc_trans_auto_array_allocation (tree decl, gfc_sy type = TREE_TYPE (type); gcc_assert (!sym->attr.use_assoc); - gcc_assert (!TREE_STATIC (decl)); gcc_assert (!sym->module); if (sym->ts.type == BT_CHARACTER ! { dg-do run } ! ! PR 78392: ICE in gfc_trans_auto_array_allocation, at fortran/trans-array.c:5979 ! ! Contributed by Janus Weil module mytypes implicit none contains pure integer function get_i () get_i = 13 end function end module program test use mytypes implicit none integer, dimension(get_i()), save :: x if (size(x) /= 13) call abort() end