Re: [PATCH] Fix force_to_mode not to modify in place the passed rtl (PR rtl-optimization/58726)
On Tue, Dec 03, 2013 at 09:43:51PM -0700, Jeff Law wrote: On 12/03/13 15:46, Jakub Jelinek wrote: As described in the PR, the problem here is that during combine i2 pattern is substituted into more than one place in i3 pattern, unique_copy is 0 (but, even if it would be non-zero, it could happen if the comparison was processed first before the normal set inside of the parallel) and thus the same RTL is (temporarily) shared between two locations. force_to_mode is first called with mask 0xdc36 (that is actually find for both occurrences in the andhi_2 pattern) and later on inside of the comparison again with mask 0x8000, and as it modifies the IF_THEN_ELSE in place, it modifies also the other location (it is fine if the comparison uses 0x8000 mask, but not in the other spot). As in the end we fold it to a constant, we don't undo it and use incorrect constant. Fixed by making sure force_to_mode doesn't modify x in place. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.8? 2013-12-03 Jakub Jelinek ja...@redhat.com PR rtl-optimization/58726 * combine.c (force_to_mode): Fix comment typo. Don't destructively modify x for ROTATE, ROTATERT and IF_THEN_ELSE. * gcc.c-torture/execute/pr58726.c: New test. I'd worry there's other latent bugs of this nature and if we'd be better off avoiding the temporary sharing. We have structure sharing rules for a reason -- I'd hate to think of all the code that would need auditing to ensure it was safe with this bogus sharing. I'm afraid I'm not familiar with the unwritten rules of combine.c enough to know whether the above fix is all we need, or if there are further issues just latent. Any thoughts of how painful it'd be to avoid the sharing to start with? Perhaps most of the subst function could be moved to subst_1, drop the and subst as a wrapper (with dropped unique_copy argument?) could do for_each_rtx first to find out how many occurrences of from there are in x, and if it is more than one, subst_1 then would copy_rtx for each return of to other than the last one. IMHO doing copy_rtx unconditionally would be too expensive for the common case, would create too much garbage. But it doesn't look like a safe thing to do on the release branches, at least not until it is for a few months on the trunk. Jakub
RE: [REPOST] Invalid Code when reading from unaligned zero-sized array
On Tue, 3 Dec 2013 15:12:05, Richard Biener wrote: On Tue, Dec 3, 2013 at 2:10 PM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Dec 3, 2013 at 1:48 PM, Bernd Edlinger bernd.edlin...@hotmail.de wrote: Hi Jeff, please find attached the patch (incl. test cases) for the unaligned read BUG that I found while investigating on PR#57748: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57748 one test case is this one: pr57748-3.c: /* PR middle-end/57748 */ /* { dg-do run } */ /* wrong code in expand_expr_real_1. */ #include stdlib.h extern void abort (void); typedef long long V __attribute__ ((vector_size (2 * sizeof (long long)), may_alias)); typedef struct S { V a; V b[0]; } P __attribute__((aligned (1))); struct __attribute__((packed)) T { char c; P s; }; void __attribute__((noinline, noclone)) check (P *p) { if (p-b[0][0] != 3 || p-b[0][1] != 4) abort (); } void __attribute__((noinline, noclone)) foo (struct T *t) { V a = { 3, 4 }; t-s.b[0] = a; } int main () { struct T *t = (struct T *) calloc (128, 1); foo (t); check (t-s); free (t); return 0; } and the other one is pr57748-4.c: /* PR middle-end/57748 */ /* { dg-do run } */ /* wrong code in expand_expr_real_1. */ #include stdlib.h extern void abort (void); typedef long long V __attribute__ ((vector_size (2 * sizeof (long long)), may_alias)); typedef struct S { V b[1]; } P __attribute__((aligned (1))); struct __attribute__((packed)) T { char c; P s; }; void __attribute__((noinline, noclone)) check (P *p) { if (p-b[1][0] != 3 || p-b[1][1] != 4) abort (); } void __attribute__((noinline, noclone)) foo (struct T *t) { V a = { 3, 4 }; t-s.b[1] = a; } int main () { struct T *t = (struct T *) calloc (128, 1); foo (t); check (t-s); free (t); return 0; } The patch does add a boolean expand_reference parameter to expand_expr_real and expand_expr_real_1. I pass true when I intend to use the returned memory context as an array reference, instead of a value. At places where mis-aligned values are extracted, I do not return a register with the extracted mis-aligned value if expand_reference is true. When I have a VIEW_CONVERT_EXPR I pay attention to pass down the outer expand_reference to the inner expand_expr_real call. Expand_reference, is pretty much similar to the expand_modifier EXPAND_MEMORY. Boot-strapped and regression-tested on X86_64-pc-linux-gnu (many times). Ok for trunk? It still feels like papering over the underlying issue. Let me have a second (or third?) look. Few comments on your patch. @@ -9520,6 +9526,7 @@ expand_expr_real_1 (tree exp, rtx target, enum mac align = get_object_alignment (exp); if (modifier != EXPAND_WRITE modifier != EXPAND_MEMORY + !expand_reference mode != BLKmode align GET_MODE_ALIGNMENT (mode) /* If the target does not have special handling for unaligned (TARGET_MEM_REF), expand_reference should never be true here, there may be no component-refs around TARGET_MEM_REFs. Ok, I was not sure. Removed - Thanks. You miss adjusting the VIEW_CONVERT_EXPR path? (line-numbers are off a lot in your patch, context doesn't help very much :/ Does not seem to be against 4.8 either ...) Sorry, The line-numbers moved a lot 100 lines. The patch is against 4.9-trunk. Re-generated. Index: gcc/cfgexpand.c === --- gcc/cfgexpand.c (revision 204411) +++ gcc/cfgexpand.c (working copy) @@ -2189,7 +2189,7 @@ expand_call_stmt (gimple stmt) if (lhs) expand_assignment (lhs, exp, false); else - expand_expr_real_1 (exp, const0_rtx, VOIDmode, EXPAND_NORMAL, NULL); + expand_expr_real_1 (exp, const0_rtx, VOIDmode, EXPAND_NORMAL, NULL, false); mark_transaction_restart_calls (stmt); } this should use expand_expr (exp, const0_rtx, VOIDmode, EXPAND_NORMAL); anyway. expand_expr_real_1 is expand_expr_real without the ERROR check? Ok, changed to expand_expr. @@ -10286,7 +10297,10 @@ expand_expr_real_1 (tree exp, rtx target, enum mac op0 = copy_rtx (op0); set_mem_align (op0, MAX (MEM_ALIGN (op0), TYPE_ALIGN (type))); } - else if (mode != BLKmode + else if (modifier != EXPAND_WRITE + modifier != EXPAND_MEMORY + !expand_reference + mode != BLKmode MEM_ALIGN (op0) GET_MODE_ALIGNMENT (mode) /* If the target does have special handling for unaligned loads of mode then use them. */ @@ -10307,6 +10321,9 @@ expand_expr_real_1 (tree exp, rtx target, enum mac return reg; } else if (STRICT_ALIGNMENT + modifier != EXPAND_WRITE + modifier != EXPAND_MEMORY + !expand_reference mode != BLKmode MEM_ALIGN (op0) GET_MODE_ALIGNMENT (mode)) { why the unrelated change to add the modifier checks? Looks like both if cases are close by and factoring out common code like else if (!expand_reference mode != BLKmode MEM_ALIGN (... { if (...) else if (STRICT_ALIGNMENT) would be better, also matching the other
Re: libsanitizer merge from upstream r196090
On Wed, Dec 04, 2013 at 06:28:31AM +0100, Konstantin Serebryany wrote: We don't have any .cfi stuff in sanitizer_common (and I don't think we really need it in the internal_clone). Before fixing the tsan sources I'd like to see some indication that anyone cares about tsan working on older systems. Of course various parties care about that. But that doesn't necessarily imply they can or want to run buildbots for a different compiler they don't care about, there are e.g. security implications to that, resources etc. For GCC new ports etc. we never require people running some buildbots, people just report issues they encounter and those issues are fixed. tsan makes many assumptions about the system (address space, etc) which may not hold on old systems anyway. And tsan, even if it builds, will not work reliably. I really think that we need to disable libsanitizer on old systems until someone volunteers to set up a proper testing process upstream. If no one volunteers -- no one really needs it. The problem is that the definition of old systems is extremelly fuzzy, for you it is clearly != Ubuntu 12.04 or similar, but the unwritten assumptions libsanitizer makes are not related to one exact version of a single dependent component, it is a matter of compiler, compiler configuration, binutils, glibc, kernel, kernel headers, ... So, how do you disable libsanitizer on old systems when it is so fuzzy? There may be assumptions you just don't know about, and fixing bugreports by just adding reporter's toolchain combinations to kill lists is certainly not the way GCC is developed. Jakub
RE: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
Hi Uros! Attached is the revised patch. The target independent part has been already approved and added. This revision of the patch adds a x86 tune definition and checks it while deciding the unroll factor. Accommodated the comments given by you except one. *x will never be null for active insns. Since every rtx in the insn is checked for memory references, the NULL_RTX check is required. Regards Ganesh -Original Message- From: Uros Bizjak [mailto:ubiz...@gmail.com] Sent: Friday, November 22, 2013 1:46 PM To: Gopalasubramanian, Ganesh Cc: gcc-patches@gcc.gnu.org; Richard Guenther richard.guent...@gmail.com (richard.guent...@gmail.com); borntrae...@de.ibm.com; H.J. Lu (hjl.to...@gmail.com); Jakub Jelinek (ja...@redhat.com) Subject: Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4 On Wed, Nov 20, 2013 at 7:26 PM, Gopalasubramanian, Ganesh ganesh.gopalasubraman...@amd.com wrote: Steamroller processors contain a loop predictor and a loop buffer, which may make unrolling small loops less important. When unrolling small loops for steamroller, making the unrolled loop fit in the loop buffer should be a priority. This patch uses a heuristic approach (number of memory references) to decide the unrolling factor for small loops. This patch has some noise in SPEC 2006 results. Bootstrapping passes. I would like to know your comments before committing. Please split the patch to target-dependant and target-independant part, and get target-idependant part reviewed first. This part: + if (ix86_tune != PROCESSOR_BDVER3 ix86_tune != PROCESSOR_BDVER4) + { +return nunroll; + } is wrong. You should introduce tune variable (as H.J. suggested) and check that variable here. Target dependant tuning options should be in x86-tune.def, so everything regarding tuning can be found in one place. +if (INSN_P (insn) INSN_CODE (insn) != -1) +for_each_rtx (insn, (rtx_function) ix86_loop_memcount, mem_count); if (NONDEBUG_INSN_P (insn)) for_each_rtx (PATTERN(insn), ...); otherwise your heuristics will depend on -g compile option. + if ( (mem_count*nunroll) = 32) Extra parenthesis. +static int +ix86_loop_memcount (rtx *x, unsigned *mem_count) { + if (*x != NULL_RTX MEM_P (*x)) *x will never be null for active insns. Uros. unroll-adjust.patch Description: unroll-adjust.patch
Re: [PATCH] Fix SSE (pre-AVX) alignment handling (PR target/59163)
On Mon, Dec 2, 2013 at 11:58 PM, Jakub Jelinek ja...@redhat.com wrote: As discussed in the PR, combiner can combine e.g. unaligned integral load (e.g. TImode) together with some SSE instruction that requires aligned load, but doesn't actually check it. For AVX, most of the instructions actually allow unaligned operands, except for a few vmov* instructions where the pattern typically handle the misaligned mems through misaligned_operand checks, and some nontemporal move insns that have UNSPECs that should prevent combination. The following patch attempts to solve this by rejecting combining of unaligned memory loads/stores into SSE insns that don't allow it. I've added ssememalign attribute for that, but actually only later on realized that even for the insns which load/store 16 byte memory values if strict alignment checking isn't turned on in hw, the arguments don't have to be aligned at all, so perhaps instead of ssememalign in bits all we could have is a boolean attribute whether insn requires for pre-AVX memory operands to be as aligned as their mode, or not (with default that it does require that). I think that we should only prevent invalid compiler transformations, so the proposed approach is correct. In the attached example, the data is properly aligned, but compiler transforms correct program to an invalid one. The code, created with the patched compiler is valid, even if someone enables trap on unaligned setting. When trap on unaligned is activated, some program should not trap in different way due to compiler transformations. The optimized and unoptimized code should run (and trap) in the same way. However, I don't think ix86_expand_special_args_builtin is needed. It is the duty of the programmer to pass properly aligned address to these builtins. The compiler shouldn't magically fix invalid code, in the same way as it shouldn't break valid code as in the case above. Uros.
Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
On Wed, Dec 4, 2013 at 9:39 AM, Gopalasubramanian, Ganesh ganesh.gopalasubraman...@amd.com wrote: Attached is the revised patch. The target independent part has been already approved and added. This revision of the patch adds a x86 tune definition and checks it while deciding the unroll factor. Accommodated the comments given by you except one. *x will never be null for active insns. Since every rtx in the insn is checked for memory references, the NULL_RTX check is required. Yes you are correct. for_each_rtx also passes NULL_RTX, I was distracted by There are no sub-expressions. comment. +if (NONDEBUG_INSN_P (insn) INSN_CODE (insn) != -1) Do you need to check for INSN_CODE here? IIRC, checking for NONDEBUG_INSN_P is enough. +for_each_rtx (insn, (rtx_function) ix86_loop_memcount, mem_count); +} + free (bbs); + + if (mem_count =32) +return 32/mem_count; Ouch... mem_count can be zero. Is there a reason to change this part from previous patch? Uros.
Re: [PING] 3 patches waiting for approval/review
Hi Jeff, to my understanding a leaf function is characterized by not calling another function in its function body. So for the patch the question is whether we consider the mcount call belonging to the function body or not. As I see it from the backends for all uses of is_leaf the function body is what comes between the function prologue and the epilogue. Hence calling mcount *before* the function prologue does not speak against the leafness of a function. What do you think? Bye, -Andreas- On 22/10/13 21:28, Andreas Krebbel wrote: On 16/10/13 22:25, Jeff Law wrote: ... I still really feel this should be a target hook that is off by default so that the target maintainers can audit their target to ensure it operates correctly. Maybe I'm missing something, so perhaps another approach. Can you explain why you think it is safe to ignore calls to mcount when trying to determine if a function is a leaf or not? In general it is not safe to ignore calls to mcount. But if a target does insert the call to mcount before the function prologue then mcount will not use the stack space allocated by the current function. It will instead use the stack space allocated by the caller for the current function. The current function can still be a leaf function since the call does not happen within the scope of its stack frame. The difference between PROFILE_HOOKS and FUNCTION_PROFILER is that for the first the call to mcount is inserted always after the function prologue no matter what the backend returns for the profile_before_prologue target hook.
[gomp4] Restore GIMPLE_OACC_PARALLEL functionality (was: r205231 - in /branches/gomp-4_0-branch: ./ Chan...)
Hi! On Thu, 21 Nov 2013 20:20:45 -, ja...@gcc.gnu.org wrote: Author: jakub Date: Thu Nov 21 20:20:44 2013 New Revision: 205231 URL: http://gcc.gnu.org/viewcvs?rev=205231root=gccview=rev Log: svn merge -r204964:205223 svn+ssh://gcc.gnu.org/svn/gcc/trunk Jakub, many thanks for handling the vast majority of the merge changes! I only had to fix one additional case, r205658: gcc/ * gimple.h (is_a_helper): Handle GIMPLE_OACC_PARALLEL. --- gcc/gimple.h +++ gcc/gimple.h @@ -969,7 +969,8 @@ template inline bool is_a_helper const gimple_statement_omp_parallel::test (const_gimple gs) { - return gs-code == GIMPLE_OMP_PARALLEL || gs-code == GIMPLE_OMP_TASK || gs-code == GIMPLE_OMP_TARGET; + return gs-code == GIMPLE_OMP_PARALLEL || gs-code == GIMPLE_OMP_TASK +|| gs-code == GIMPLE_OMP_TARGET || gs-code == GIMPLE_OACC_PARALLEL; } template Grüße, Thomas pgpKT5E2HMfZD.pgp Description: PGP signature
Re: [PATCH] Fix force_to_mode not to modify in place the passed rtl (PR rtl-optimization/58726)
Fixed by making sure force_to_mode doesn't modify x in place. I think that it's the way to go, force_to_mode doesn't modify its argument except for these 2 cases. I'm not sure what the story is, but calling SUBST for these 2 cases doesn't seem really necessary. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.8? 2013-12-03 Jakub Jelinek ja...@redhat.com PR rtl-optimization/58726 * combine.c (force_to_mode): Fix comment typo. Don't destructively modify x for ROTATE, ROTATERT and IF_THEN_ELSE. * gcc.c-torture/execute/pr58726.c: New test. IMO it's the best fix at this point of the release cycles. -- Eric Botcazou
Re: [PATCH] Fix SSE (pre-AVX) alignment handling (PR target/59163)
On Wed, Dec 04, 2013 at 10:20:20AM +0100, Uros Bizjak wrote: When trap on unaligned is activated, some program should not trap in different way due to compiler transformations. The optimized and unoptimized code should run (and trap) in the same way. Ok. Note, seems I have missed a i386/sse-1.c test regression (apparently because earlier version of the patch failed that test and I have compared testresults against that instead of earlier regtest), and from extra testing that recorded all insns which were rejected by the new return false in the ix86_legitimate_combined_insn hook also sse4_1-movntdqa.c regressed. So attached updated patch fixes those two. I've missed ssememalign on sse_loadlpd insn where all the alternatives don't #UD on misaligned, and previously also skipped sse2_load[lh]pd because I was afraid about the splitters, but the splitter creates a DFmode store, which is always fine misaligned. Incremental diff is: --- gcc/config/i386/i386.c 2013-12-02 19:57:39.116438744 +0100 +++ gcc/config/i386/i386.c 2013-12-04 10:39:43.614269591 +0100 @@ -32563,6 +32543,15 @@ nargs = 1; klass = load; memory = 0; + switch (icode) + { + case CODE_FOR_sse4_1_movntdqa: + case CODE_FOR_avx2_movntdqa: + aligned_mem = true; + break; + default: + break; + } break; case VOID_FTYPE_PV2SF_V4SF: case VOID_FTYPE_PV4DI_V4DI: --- gcc/config/i386/sse.md 2013-12-02 18:02:00.325523050 +0100 +++ gcc/config/i386/sse.md 2013-12-04 11:08:48.589128840 +0100 @@ -5278,6 +5278,7 @@ %vmovlps\t{%2, %0|%q0, %2} [(set_attr isa noavx,avx,noavx,avx,*) (set_attr type sseshuf,sseshuf,ssemov,ssemov,ssemov) + (set_attr ssememalign 64) (set_attr length_immediate 1,1,*,*,*) (set_attr prefix orig,vex,orig,vex,maybe_vex) (set_attr mode V4SF,V4SF,V2SF,V2SF,V2SF)]) @@ -7066,6 +7067,7 @@ # [(set_attr isa noavx,avx,noavx,avx,*,*,*) (set_attr type ssemov,ssemov,sselog,sselog,ssemov,fmov,imov) + (set_attr ssememalign 64) (set_attr prefix_data16 1,*,*,*,*,*,*) (set_attr prefix orig,vex,orig,vex,*,*,*) (set_attr mode V1DF,V1DF,V2DF,V2DF,DF,DF,DF)]) @@ -7134,6 +7136,7 @@ (const_string imov) ] (const_string ssemov))) + (set_attr ssememalign 64) (set_attr prefix_data16 *,1,*,*,*,*,1,*,*,*,*) (set_attr length_immediate *,*,*,*,*,1,*,*,*,*,*) (set_attr prefix maybe_vex,orig,vex,orig,vex,orig,orig,vex,*,*,*) However, I don't think ix86_expand_special_args_builtin is needed. It is the duty of the programmer to pass properly aligned address to these builtins. The compiler shouldn't magically fix invalid code, in the same way as it shouldn't break valid code as in the case above. What I'm trying to do in ix86_expand_special_args_builtin is not in any way fixing invalid code. The problem is that those builtins don't have a memory as an argument, they have pointer, and the MEM is compiler created from scratch. As it uses just gen_rtx_MEM, it always means just BITS_PER_UNIT alignment on those MEMs, so the problem is that the combiner hook will reject any changes whatsoever on such instructions. get_pointer_alignment (added in the patch) adds there alignment info from what the compiler can find out (e.g. if the pointer points to a decl with certain alignment etc.), but if it is e.g. an argument of a function, it will still return likely BITS_PER_UNIT even when on valid code it will always be properly aligned (you'd need to use __builtin_assume_aligned or similar to force the alignment there). I went through the special builtins for which gen_rtx_MEM is called and they apparently fall into just two categories, one is various builtins for unaligned vector loads or for 16 byte load/store instructions (type 5 in AVX spec, so no #UD on misalign, at most #AC if enabled), and the other are non-temporal loads/stores, which both pre-AVX and AVX+ require strict alignment. Only for the latter category I'm forcing the mode alignment, because even if the compiler can't prove correct alignment based on ccp etc., valid program must have such MEMs properly aligned and if we create MEMs with smaller alignment, the combine hook will just punt on them always. So, ok for trunk this way? I guess 4.8 patch should wait some time before being backported. 2013-12-04 Jakub Jelinek ja...@redhat.com Uros Bizjak ubiz...@gmail.com PR target/59163 * config/i386/i386.c (ix86_legitimate_combined_insn): If for !TARGET_AVX there is misaligned MEM operand with vector mode and get_attr_ssememalign is 0, return false. (ix86_expand_special_args_builtin): Add get_pointer_alignment computed alignment and for non-temporal loads/stores also at least GET_MODE_ALIGNMENT as MEM_ALIGN. * config/i386/sse.md (sse_loadussemodesuffixavxsizesuffixmask_name,
Re: [PATCH] Fix force_to_mode not to modify in place the passed rtl (PR rtl-optimization/58726)
I'd worry there's other latent bugs of this nature and if we'd be better off avoiding the temporary sharing. We have structure sharing rules for a reason -- I'd hate to think of all the code that would need auditing to ensure it was safe with this bogus sharing. I wouldn't throw the baby with the bath's water here, it's one of the numerous PRs opened by Zhendong Su and which clearly look machine-generated. It fails with 4.4.x onwards and apparently nobody noticed the problem in real life. We know that we have latent sharing issues in the combiner because of the way it's designed, but they are quite rare in practice. -- Eric Botcazou
Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs
On Tue, Dec 3, 2013 at 11:04 PM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: On Tue, 2013-12-03 at 21:35 +0100, Richard Biener wrote: Yufeng Zhang yufeng.zh...@arm.com wrote: On 12/03/13 14:20, Richard Biener wrote: On Tue, Dec 3, 2013 at 1:50 PM, Yufeng Zhangyufeng.zh...@arm.com wrote: On 12/03/13 06:48, Jeff Law wrote: On 12/02/13 08:47, Yufeng Zhang wrote: Ping~ http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03360.html Thanks, Yufeng On 11/26/13 15:02, Yufeng Zhang wrote: On 11/26/13 12:45, Richard Biener wrote: On Thu, Nov 14, 2013 at 12:25 AM, Yufeng Zhangyufeng.zh...@arm.com wrote: On 11/13/13 20:54, Bill Schmidt wrote: The second version of your original patch is ok with me with the following changes. Sorry for the little side adventure into the next-interp logic; in the end that's going to hurt more than it helps in this case. Thanks for having a look at it, anyway. Thanks also for cleaning up this version to be less intrusive to common interfaces; I appreciate it. Thanks a lot for the review. I've attached an updated patch with the suggested changes incorporated. For the next-interp adventure, I was quite happy to do the experiment; it's a good chance of gaining insight into the pass. Many thanks for your prompt replies and patience in guiding! Everything else looks OK to me. Please ask Richard for final approval, as I'm not a maintainer. First a note, I need to check on voting for Bill as the slsr maintainer from the steering committee. Voting was in progress just before the close of stage1 development so I haven't tallied the results :-) Looking forward to some good news! :) Yes, you are right about the non-trivial 'base' tree are rarely shared. The cached is introduced mainly because get_alternative_base () may be called twice on the same 'base' tree, once in the find_basis_for_candidate () for look-up and the other time in alloc_cand_and_find_basis () for record_potential_basis (). I'm happy to leave out the cache if you think the benefit is trivial. Without some sense of how expensive the lookups are vs how often the cache hits it's awful hard to know if the cache is worth it. I'd say take it out unless you have some sense it's really saving time. It's a pretty minor implementation detail either way. I think the affine tree routines are generally expensive; it is worth having a cache to avoid calling them too many times. I run the slsr-*.c tests under gcc.dg/tree-ssa/ and find out that the cache hit rates range from 55.6% to 90%, with 73.5% as the average. The samples may not well represent the real world scenario, but they do show the fact that the 'base' tree can be shared to some extent. So I'd like to have the cache in the patch. +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-slsr } */ + +typedef int arr_2[50][50]; + +void foo (arr_2 a2, int v1) +{ + int i, j; + + i = v1 + 5; + j = i; + a2 [i-10] [j] = 2; + a2 [i] [j++] = i; + a2 [i+20] [j++] = i; + a2 [i-3] [i-1] += 1; + return; +} + +/* { dg-final { scan-tree-dump-times MEM 5 slsr } } */ +/* { dg-final { cleanup-tree-dump slsr } } */ scanning for 5 MEMs looks non-sensical. What transform do you expect? I see other slsr testcases do similar non-sensical checking which is bad, too. As the slsr optimizes CAND_REF candidates by simply lowering them to MEM_REF from e.g. ARRAY_REF, I think scanning for the number of MEM_REFs is an effective check. Alternatively, I can add a follow-up patch to add some dumping facility in replace_ref () to print out the replacing actions when -fdump-tree-slsr-details is on. I think adding some details to the dump and scanning for them would be better. That's the only change that is required for this to move forward. I've updated to patch to dump more details when -fdump-tree-slsr-details is on. The tests have also been updated to scan for these new dumps instead of MEMs. I suggest doing it quickly. We're well past stage1 close at this point. The bootstrapping on x86_64 is still running. OK to commit if it succeeds? I still don't like it. It's using the wrong and too expensive tools to do stuff. What kind of bases are we ultimately interested in? Browsing the code it looks like we're having /* Base expression for the chain of candidates: often, but not always, an SSA name. */ tree base_expr; which isn't really too informative but I suppose they are all kind-of-gimple_val()s? That said, I wonder if you can simply use get_addr_base_and_unit_offset in place of get_alternative_base (), ignoring the returned offset. 'base_expr' is essentially the base address of a handled_component_p, e.g. ARRAY_REF, COMPONENT_REF, etc. In most
Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs
On Wed, Dec 4, 2013 at 11:26 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Dec 3, 2013 at 11:04 PM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: On Tue, 2013-12-03 at 21:35 +0100, Richard Biener wrote: Yufeng Zhang yufeng.zh...@arm.com wrote: On 12/03/13 14:20, Richard Biener wrote: On Tue, Dec 3, 2013 at 1:50 PM, Yufeng Zhangyufeng.zh...@arm.com wrote: On 12/03/13 06:48, Jeff Law wrote: On 12/02/13 08:47, Yufeng Zhang wrote: Ping~ http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03360.html Thanks, Yufeng On 11/26/13 15:02, Yufeng Zhang wrote: On 11/26/13 12:45, Richard Biener wrote: On Thu, Nov 14, 2013 at 12:25 AM, Yufeng Zhangyufeng.zh...@arm.com wrote: On 11/13/13 20:54, Bill Schmidt wrote: The second version of your original patch is ok with me with the following changes. Sorry for the little side adventure into the next-interp logic; in the end that's going to hurt more than it helps in this case. Thanks for having a look at it, anyway. Thanks also for cleaning up this version to be less intrusive to common interfaces; I appreciate it. Thanks a lot for the review. I've attached an updated patch with the suggested changes incorporated. For the next-interp adventure, I was quite happy to do the experiment; it's a good chance of gaining insight into the pass. Many thanks for your prompt replies and patience in guiding! Everything else looks OK to me. Please ask Richard for final approval, as I'm not a maintainer. First a note, I need to check on voting for Bill as the slsr maintainer from the steering committee. Voting was in progress just before the close of stage1 development so I haven't tallied the results :-) Looking forward to some good news! :) Yes, you are right about the non-trivial 'base' tree are rarely shared. The cached is introduced mainly because get_alternative_base () may be called twice on the same 'base' tree, once in the find_basis_for_candidate () for look-up and the other time in alloc_cand_and_find_basis () for record_potential_basis (). I'm happy to leave out the cache if you think the benefit is trivial. Without some sense of how expensive the lookups are vs how often the cache hits it's awful hard to know if the cache is worth it. I'd say take it out unless you have some sense it's really saving time. It's a pretty minor implementation detail either way. I think the affine tree routines are generally expensive; it is worth having a cache to avoid calling them too many times. I run the slsr-*.c tests under gcc.dg/tree-ssa/ and find out that the cache hit rates range from 55.6% to 90%, with 73.5% as the average. The samples may not well represent the real world scenario, but they do show the fact that the 'base' tree can be shared to some extent. So I'd like to have the cache in the patch. +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-slsr } */ + +typedef int arr_2[50][50]; + +void foo (arr_2 a2, int v1) +{ + int i, j; + + i = v1 + 5; + j = i; + a2 [i-10] [j] = 2; + a2 [i] [j++] = i; + a2 [i+20] [j++] = i; + a2 [i-3] [i-1] += 1; + return; +} + +/* { dg-final { scan-tree-dump-times MEM 5 slsr } } */ +/* { dg-final { cleanup-tree-dump slsr } } */ scanning for 5 MEMs looks non-sensical. What transform do you expect? I see other slsr testcases do similar non-sensical checking which is bad, too. As the slsr optimizes CAND_REF candidates by simply lowering them to MEM_REF from e.g. ARRAY_REF, I think scanning for the number of MEM_REFs is an effective check. Alternatively, I can add a follow-up patch to add some dumping facility in replace_ref () to print out the replacing actions when -fdump-tree-slsr-details is on. I think adding some details to the dump and scanning for them would be better. That's the only change that is required for this to move forward. I've updated to patch to dump more details when -fdump-tree-slsr-details is on. The tests have also been updated to scan for these new dumps instead of MEMs. I suggest doing it quickly. We're well past stage1 close at this point. The bootstrapping on x86_64 is still running. OK to commit if it succeeds? I still don't like it. It's using the wrong and too expensive tools to do stuff. What kind of bases are we ultimately interested in? Browsing the code it looks like we're having /* Base expression for the chain of candidates: often, but not always, an SSA name. */ tree base_expr; which isn't really too informative but I suppose they are all kind-of-gimple_val()s? That said, I wonder if you can simply use get_addr_base_and_unit_offset in place of get_alternative_base (), ignoring the returned offset. 'base_expr' is essentially the
RE: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4
Ouch... mem_count can be zero. Is there a reason to change this part from previous patch? Oops! You're right. I will correct this. The idea is to count the memory references and decide on the unrolling factor. Previous patch does that in two steps I thought of doing that in a single step. (I think I missed my step here ;) ) Regards Ganesh -Original Message- From: Uros Bizjak [mailto:ubiz...@gmail.com] Sent: Wednesday, December 04, 2013 3:17 PM To: Gopalasubramanian, Ganesh Cc: gcc-patches@gcc.gnu.org; Richard Guenther richard.guent...@gmail.com (richard.guent...@gmail.com) Subject: Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4 On Wed, Dec 4, 2013 at 9:39 AM, Gopalasubramanian, Ganesh ganesh.gopalasubraman...@amd.com wrote: Attached is the revised patch. The target independent part has been already approved and added. This revision of the patch adds a x86 tune definition and checks it while deciding the unroll factor. Accommodated the comments given by you except one. *x will never be null for active insns. Since every rtx in the insn is checked for memory references, the NULL_RTX check is required. Yes you are correct. for_each_rtx also passes NULL_RTX, I was distracted by There are no sub-expressions. comment. +if (NONDEBUG_INSN_P (insn) INSN_CODE (insn) != -1) Do you need to check for INSN_CODE here? IIRC, checking for NONDEBUG_INSN_P is enough. +for_each_rtx (insn, (rtx_function) ix86_loop_memcount, mem_count); +} + free (bbs); + + if (mem_count =32) +return 32/mem_count; Ouch... mem_count can be zero. Is there a reason to change this part from previous patch? Uros.
Re: Fix a bug in points-to solver
On Tue, Dec 3, 2013 at 11:54 PM, Xinliang David Li davi...@google.com wrote: Done. Retested with the suggested change. Ok for trunk? Ok. Thanks, Richard. thanks, David On Tue, Dec 3, 2013 at 2:13 AM, Richard Biener richard.guent...@gmail.com wrote: On Mon, Dec 2, 2013 at 6:38 PM, Xinliang David Li davi...@google.com wrote: Points to solver has a bug that can cause complex constraints to be skipped leading to wrong points-to results. In the case that exposed the problem, there is sd constraint: x = *y which is never processed. 'y''s final points to set is { NULL READONLY ESCAPED NOLOCAL}, but 'x' points-to set is {}. What happens is before 'y'' is processed, it is merged with another node 'z' during cycle elimination (the complex constraints get transferred to 'z'), but 'z' is not marked as 'changed' so it is skipped in a later iteration. The attached patch fixed the problem. The problem is exposed by a large program built with -fprofile-generate in LIPO mode -- so there is no small testcase attached. Bootstrapped and regression tested on x86_64-unknown-linux-gnu, OK for trunk? Hmm, the unify_nodes call in eliminate_indirect_cycles is supposed to set the changed bit... which in this special case (updating of complex constraints, not the solution!) doesn't happen because unify_nodes doesn't consider this as a change. Which needs to change. So, can you please update your patch to return a bool from merge_node_constraints (any change happened?) and update changed accordingly in unify_nodes? Thanks, Richard. Index: ChangeLog === --- ChangeLog (revision 205579) +++ ChangeLog (working copy) @@ -1,3 +1,8 @@ +2013-12-02 Xinliang David Li davi...@google.com + + * tree-ssa-structalias.c (solve_graph): Mark rep node changed + after cycle elimination. + 2013-12-01 Eric Botcazou ebotca...@adacore.com * config/i386/winnt.c (i386_pe_asm_named_section): Be prepared for an Index: tree-ssa-structalias.c === --- tree-ssa-structalias.c (revision 205579) +++ tree-ssa-structalias.c (working copy) @@ -2655,8 +2655,13 @@ solve_graph (constraint_graph_t graph) /* In certain indirect cycle cases, we may merge this variable to another. */ - if (eliminate_indirect_cycles (i) find (i) != i) - continue; + if (eliminate_indirect_cycles (i)) +{ + unsigned int rep = find (i); + bitmap_set_bit (changed, rep); + if (i != rep) + continue; +} /* If the node has changed, we need to process the complex constraints and outgoing edges again. */
Re: Add TREE_INT_CST_OFFSET_NUNITS
On Wed, Dec 4, 2013 at 1:03 AM, Richard Sandiford rdsandif...@googlemail.com wrote: Richard Biener richard.guent...@gmail.com writes: Looking at the implementation it seems it would also work with return MIN (TREE_INT_CST_EXT_NUNITS (m_t), N / HOST_BITS_PER_WIDE_INT); Yeah, the MIN in the patch was probably bogus sorry. It only works if we can assume that no bitsizetype will have ADDR_MAX_PRECISION significant (non-sign) bits -- in particular that there's no such thing as an all-1s _unsigned_ bitsizetype. That might be true in practice given the way we use offsets, but it isn't safe to generalise that to all N. A safer form would be: if (ext_len OFFSET_INT_ELTS) TREE_INT_CST_OFFSET_NUNITS (t) = len; else TREE_INT_CST_OFFSET_NUNITS (t) = ext_len; The reason the general form doesn't work for all N is because of the compressed representation. E.g. the example I gave a while ago about a 256-bit all-1s unsigned number being { -1 } as a 256-bit number and { -1, -1, -1, -1, 0 } as a 257+-bit number. But the point of the patch is to avoid any runtime checks here, so the TYPE_PRECISION case is never actually used now. I just kept it around for completeness, since we'd been using it successfully so far. I can put in a gcc_unreachable if you prefer. Yeah, I'd prefer a gcc_unreachable and a comment. OK, how about this version? Tested on x86_64-linux-gnu. Ok. Thanks, Richard. Thanks, Richard Index: gcc/ChangeLog.wide-int === --- gcc/ChangeLog.wide-int 2013-12-03 23:55:26.142873345 + +++ gcc/ChangeLog.wide-int 2013-12-03 23:59:18.823744425 + @@ -617,6 +617,7 @@ (TREE_INT_CST_HIGH): Delete. (TREE_INT_CST_NUNITS): New. (TREE_INT_CST_EXT_NUNITS): Likewise. + (TREE_INT_CST_OFFSET_NUNITS): Likewise. (TREE_INT_CST_ELT): Likewise. (INT_CST_LT): Use wide-int interfaces. (INT_CST_LE): New. Index: gcc/tree-core.h === --- gcc/tree-core.h 2013-12-03 23:55:26.142873345 + +++ gcc/tree-core.h 2013-12-04 00:02:22.910222722 + @@ -764,11 +764,17 @@ struct GTY(()) tree_base { struct { /* The number of HOST_WIDE_INTs if the INTEGER_CST is accessed in its native precision. */ - unsigned short unextended; + unsigned char unextended; /* The number of HOST_WIDE_INTs if the INTEGER_CST is extended to wider precisions based on its TYPE_SIGN. */ - unsigned short extended; + unsigned char extended; + + /* The number of HOST_WIDE_INTs if the INTEGER_CST is accessed in +offset_int precision, with smaller integers being extended +according to their TYPE_SIGN. This is equal to one of the two +fields above but is cached for speed. */ + unsigned char offset; } int_length; /* VEC length. This field is only used with TREE_VEC. */ Index: gcc/tree.c === --- gcc/tree.c 2013-12-03 23:55:26.142873345 + +++ gcc/tree.c 2013-12-03 23:59:18.821744409 + @@ -1285,6 +1285,7 @@ wide_int_to_tree (tree type, const wide_ /* Make sure no one is clobbering the shared constant. */ gcc_checking_assert (TREE_TYPE (t) == type TREE_INT_CST_NUNITS (t) == 1 + TREE_INT_CST_OFFSET_NUNITS (t) == 1 TREE_INT_CST_EXT_NUNITS (t) == 1 TREE_INT_CST_ELT (t, 0) == hwi); else @@ -1964,6 +1965,13 @@ make_int_cst_stat (int len, int ext_len TREE_SET_CODE (t, INTEGER_CST); TREE_INT_CST_NUNITS (t) = len; TREE_INT_CST_EXT_NUNITS (t) = ext_len; + /* to_offset can only be applied to trees that are offset_int-sized + or smaller. EXT_LEN is correct if it fits, otherwise the constant + must be exactly the precision of offset_int and so LEN is correct. */ + if (ext_len = OFFSET_INT_ELTS) +TREE_INT_CST_OFFSET_NUNITS (t) = ext_len; + else +TREE_INT_CST_OFFSET_NUNITS (t) = len; TREE_CONSTANT (t) = 1; Index: gcc/tree.h === --- gcc/tree.h 2013-12-03 23:55:26.142873345 + +++ gcc/tree.h 2013-12-04 00:01:48.258944485 + @@ -907,6 +907,8 @@ #define TREE_INT_CST_NUNITS(NODE) \ (INTEGER_CST_CHECK (NODE)-base.u.int_length.unextended) #define TREE_INT_CST_EXT_NUNITS(NODE) \ (INTEGER_CST_CHECK (NODE)-base.u.int_length.extended) +#define TREE_INT_CST_OFFSET_NUNITS(NODE) \ + (INTEGER_CST_CHECK (NODE)-base.u.int_length.offset) #define TREE_INT_CST_ELT(NODE, I) TREE_INT_CST_ELT_CHECK (NODE, I) #define TREE_INT_CST_LOW(NODE) \ ((unsigned HOST_WIDE_INT) TREE_INT_CST_ELT (NODE, 0)) @@ -4623,11 +4625,15 @@
Re: [PATCH] Fix force_to_mode not to modify in place the passed rtl (PR rtl-optimization/58726)
On Wed, Dec 4, 2013 at 11:07 AM, Eric Botcazou ebotca...@adacore.com wrote: Fixed by making sure force_to_mode doesn't modify x in place. I think that it's the way to go, force_to_mode doesn't modify its argument except for these 2 cases. I'm not sure what the story is, but calling SUBST for these 2 cases doesn't seem really necessary. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.8? 2013-12-03 Jakub Jelinek ja...@redhat.com PR rtl-optimization/58726 * combine.c (force_to_mode): Fix comment typo. Don't destructively modify x for ROTATE, ROTATERT and IF_THEN_ELSE. * gcc.c-torture/execute/pr58726.c: New test. IMO it's the best fix at this point of the release cycles. I agree. Richard. -- Eric Botcazou
Re: [PATCH] Fix SSE (pre-AVX) alignment handling (PR target/59163)
On Wed, Dec 4, 2013 at 11:15 AM, Jakub Jelinek ja...@redhat.com wrote: On Wed, Dec 04, 2013 at 10:20:20AM +0100, Uros Bizjak wrote: When trap on unaligned is activated, some program should not trap in different way due to compiler transformations. The optimized and unoptimized code should run (and trap) in the same way. Ok. Note, seems I have missed a i386/sse-1.c test regression (apparently because earlier version of the patch failed that test and I have compared testresults against that instead of earlier regtest), and from extra testing that recorded all insns which were rejected by the new return false in the ix86_legitimate_combined_insn hook also sse4_1-movntdqa.c regressed. So attached updated patch fixes those two. I've missed ssememalign on sse_loadlpd insn where all the alternatives don't #UD on misaligned, and previously also skipped sse2_load[lh]pd because I was afraid about the splitters, but the splitter creates a DFmode store, which is always fine misaligned. Incremental diff is: --- gcc/config/i386/i386.c 2013-12-02 19:57:39.116438744 +0100 +++ gcc/config/i386/i386.c 2013-12-04 10:39:43.614269591 +0100 @@ -32563,6 +32543,15 @@ nargs = 1; klass = load; memory = 0; + switch (icode) + { + case CODE_FOR_sse4_1_movntdqa: + case CODE_FOR_avx2_movntdqa: + aligned_mem = true; + break; + default: + break; + } break; case VOID_FTYPE_PV2SF_V4SF: case VOID_FTYPE_PV4DI_V4DI: --- gcc/config/i386/sse.md 2013-12-02 18:02:00.325523050 +0100 +++ gcc/config/i386/sse.md 2013-12-04 11:08:48.589128840 +0100 @@ -5278,6 +5278,7 @@ %vmovlps\t{%2, %0|%q0, %2} [(set_attr isa noavx,avx,noavx,avx,*) (set_attr type sseshuf,sseshuf,ssemov,ssemov,ssemov) + (set_attr ssememalign 64) (set_attr length_immediate 1,1,*,*,*) (set_attr prefix orig,vex,orig,vex,maybe_vex) (set_attr mode V4SF,V4SF,V2SF,V2SF,V2SF)]) @@ -7066,6 +7067,7 @@ # [(set_attr isa noavx,avx,noavx,avx,*,*,*) (set_attr type ssemov,ssemov,sselog,sselog,ssemov,fmov,imov) + (set_attr ssememalign 64) (set_attr prefix_data16 1,*,*,*,*,*,*) (set_attr prefix orig,vex,orig,vex,*,*,*) (set_attr mode V1DF,V1DF,V2DF,V2DF,DF,DF,DF)]) @@ -7134,6 +7136,7 @@ (const_string imov) ] (const_string ssemov))) + (set_attr ssememalign 64) (set_attr prefix_data16 *,1,*,*,*,*,1,*,*,*,*) (set_attr length_immediate *,*,*,*,*,1,*,*,*,*,*) (set_attr prefix maybe_vex,orig,vex,orig,vex,orig,orig,vex,*,*,*) However, I don't think ix86_expand_special_args_builtin is needed. It is the duty of the programmer to pass properly aligned address to these builtins. The compiler shouldn't magically fix invalid code, in the same way as it shouldn't break valid code as in the case above. What I'm trying to do in ix86_expand_special_args_builtin is not in any way fixing invalid code. The problem is that those builtins don't have a memory as an argument, they have pointer, and the MEM is compiler created from scratch. As it uses just gen_rtx_MEM, it always means just BITS_PER_UNIT alignment on those MEMs, so the problem is that the combiner hook will reject any changes whatsoever on such instructions. get_pointer_alignment (added in the patch) adds there alignment info from what the compiler can find out (e.g. if the pointer points to a decl with certain alignment etc.), but if it is e.g. an argument of a function, it will still return likely BITS_PER_UNIT even when on valid code it will always be properly aligned (you'd need to use __builtin_assume_aligned or similar to force the alignment there). I went through the special builtins for which gen_rtx_MEM is called and they apparently fall into just two categories, one is various builtins for unaligned vector loads or for 16 byte load/store instructions (type 5 in AVX spec, so no #UD on misalign, at most #AC if enabled), and the other are non-temporal loads/stores, which both pre-AVX and AVX+ require strict alignment. Only for the latter category I'm forcing the mode alignment, because even if the compiler can't prove correct alignment based on ccp etc., valid program must have such MEMs properly aligned and if we create MEMs with smaller alignment, the combine hook will just punt on them always. Thanks for the explanation! Maybe you should add a short comment like the above in the code. So, ok for trunk this way? I guess 4.8 patch should wait some time before being backported. 2013-12-04 Jakub Jelinek ja...@redhat.com Uros Bizjak ubiz...@gmail.com PR target/59163 * config/i386/i386.c (ix86_legitimate_combined_insn): If for !TARGET_AVX there is misaligned MEM operand with vector mode and get_attr_ssememalign is 0, return false. (ix86_expand_special_args_builtin): Add
[Ada] Fix corrupted string with case expression and concatenation
In Ada 2012 we now have case expressions, i.e. conditional expressions with more than 2 choices, and this exposes an issue with the way such conditional constructs are translated in gigi, resulting in released stack storage being used in some cases. Tested on x86_64-suse-linux, applied on the mainline. 2013-12-04 Eric Botcazou ebotca...@adacore.com * gcc-interface/trans.c (Case_Statement_to_gnu): Do not push a binding level for each branch if this is a case expression in Ada 2012. (gnat_to_gnu) case N_Expression_With_Actions: Adjust comment. -- Eric BotcazouIndex: gcc-interface/trans.c === --- gcc-interface/trans.c (revision 205654) +++ gcc-interface/trans.c (working copy) @@ -2348,12 +2348,17 @@ Case_Statement_to_gnu (Node_Id gnat_node } } - /* Push a binding level here in case variables are declared as we want - them to be local to this set of statements instead of to the block - containing the Case statement. */ + /* This construct doesn't define a scope so we shouldn't push a binding + level around the statement list. Except that we have always done so + historically and this makes it possible to reduce stack usage. As a + compromise, we keep doing it for case statements, for which this has + never been problematic, but not for case expressions in Ada 2012. */ if (choices_added_p) { - tree group = build_stmt_group (Statements (gnat_when), true); + const bool is_case_expression + = (Nkind (Parent (gnat_node)) == N_Expression_With_Actions); + tree group + = build_stmt_group (Statements (gnat_when), !is_case_expression); bool group_may_fallthru = block_may_fallthru (group); add_stmt (group); if (group_may_fallthru) @@ -7002,8 +7007,8 @@ gnat_to_gnu (Node_Id gnat_node) // case N_Expression_With_Actions: - /* This construct doesn't define a scope so we don't wrap the statement - list in a BIND_EXPR; however, we wrap it in a SAVE_EXPR to protect it + /* This construct doesn't define a scope so we don't push a binding level + around the statement list; but we wrap it in a SAVE_EXPR to protect it from unsharing. */ gnu_result = build_stmt_group (Actions (gnat_node), false); gnu_result = build1 (SAVE_EXPR, void_type_node, gnu_result);
Re: [PATCH][ARM]Use of vcvt for float to fixed point conversions.
Sorry about the slow response. Been on holiday. On 20/11/13 16:27, Renlin Li wrote: Hi all, This patch will make the arm back-end use vcvt for float to fixed point conversions when applicable. Test on arm-none-linux-gnueabi has been done on the model. Okay for trunk? + (define_insn *combine_vcvtf2i + [(set (match_operand:SI 0 s_register_operand =r) + (fix:SI (fix:SF (mult:SF (match_operand:SF 1 s_register_operand t) +(match_operand 2 +const_double_vcvt_power_of_two Dp)] + TARGET_32BIT TARGET_HARD_FLOAT TARGET_VFP3 !flag_rounding_math + vcvt%?.s32.f32\\t%1, %1, %v2\;vmov%?\\t%0, %1 + [(set_attr predicable yes) +(set_attr predicable_short_it no) +(set_attr ce_count 2) +(set_attr type f_cvtf2i)] + ) + You need to set length to 8. --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/fixed_float_conversion.c @@ -0,0 +1,15 @@ +/* Check that vcvt is used for fixed and float data conversions. */ +/* { dg-do compile } */ +/* { dg-options -O1 -mfpu=vfp3 } */ +/* { dg-require-effective-target arm_vfp_ok } */ +float fixed_to_float(int i) +{ +return ((float)i / (1 16)); +} + +int float_to_fixed(float f) +{ +return ((int)(f*(1 16))); +} +/* { dg-final { scan-assembler vcvt.f32.s32 } } */ +/* { dg-final { scan-assembler vcvt.s32.f32 } } */ GNU coding style for functions. Ok with those changes. regards Ramana Kind regards, Renlin Li gcc/ChangeLog: 2013-11-20 Renlin Li renlin...@arm.com * config/arm/arm-protos.h (vfp_const_double_for_bits): Declare. * config/arm/constraints.md (Dp): Define new constraint. * config/arm/predicates.md ( const_double_vcvt_power_of_two): Define new predicate. * config/arm/arm.c (arm_print_operand): Add print for new fucntion. (vfp3_const_double_for_bits): New function. * config/arm/vfp.md (combine_vcvtf2i): Define new instruction. gcc/testsuite/ChangeLog: 2013-11-20 Renlin Li renlin...@arm.com * gcc.target/arm/fixed_float_conversion.c: New test case.
[Ada] Fix layout of packed record type with zero-sized component
In Ada we have the equivalent of bit-fields with zero size but they don't behave as in C (or rather according to most ABIs) when it comes to the layout. This patch makes it so that they are laid out manually instead of being passed to stor-layout.c as for the other fields. Tested on x86_64-suse-linux, applied on the mainline. 2013-12-04 Eric Botcazou ebotca...@adacore.com * gcc-interface/decl.c (components_to_record): Add specific handling for fields with zero size and no representation clause. 2013-12-04 Eric Botcazou ebotca...@adacore.com * gnat.dg/pack19.adb: New test. -- Eric BotcazouIndex: gcc-interface/decl.c === --- gcc-interface/decl.c (revision 205654) +++ gcc-interface/decl.c (working copy) @@ -6932,6 +6932,7 @@ components_to_record (tree gnu_record_ty tree gnu_rep_list = NULL_TREE; tree gnu_var_list = NULL_TREE; tree gnu_self_list = NULL_TREE; + tree gnu_zero_list = NULL_TREE; /* For each component referenced in a component declaration create a GCC field and add it to the list, skipping pragmas in the GNAT list. */ @@ -7262,6 +7263,10 @@ components_to_record (tree gnu_record_ty to do this in a separate pass since we want to handle the discriminants but can't play with them until we've used them in debugging data above. + Similarly, pull out the fields with zero size and no rep clause, as they + would otherwise modify the layout and thus very likely run afoul of the + Ada semantics, which are different from those of C here. + ??? If we reorder them, debugging information will be wrong but there is nothing that can be done about this at the moment. */ gnu_last = NULL_TREE; @@ -7300,6 +7305,19 @@ components_to_record (tree gnu_record_ty continue; } + if (DECL_SIZE (gnu_field) integer_zerop (DECL_SIZE (gnu_field))) + { + DECL_FIELD_OFFSET (gnu_field) = size_zero_node; + SET_DECL_OFFSET_ALIGN (gnu_field, BIGGEST_ALIGNMENT); + DECL_FIELD_BIT_OFFSET (gnu_field) = bitsize_zero_node; + if (field_is_aliased (gnu_field)) + TYPE_ALIGN (gnu_record_type) + = MAX (TYPE_ALIGN (gnu_record_type), + TYPE_ALIGN (TREE_TYPE (gnu_field))); + MOVE_FROM_FIELD_LIST_TO (gnu_zero_list); + continue; + } + gnu_last = gnu_field; } @@ -7392,6 +7410,11 @@ components_to_record (tree gnu_record_ty finish_record_type (gnu_record_type, gnu_field_list, layout_with_rep ? 1 : 0, debug_info !maybe_unused); + /* Chain the fields with zero size at the beginning of the field list. */ + if (gnu_zero_list) +TYPE_FIELDS (gnu_record_type) + = chainon (gnu_zero_list, TYPE_FIELDS (gnu_record_type)); + return (gnu_rep_list !p_gnu_rep_list) || variants_have_rep; } -- { dg-do run } procedure Pack19 is subtype Always_False is Boolean range False .. False; type Rec1 is record B1 : Boolean; B2 : Boolean; B3 : Boolean; B4 : Boolean; B5 : Boolean; B6 : Boolean; B7 : Always_False; B8 : Boolean; end record; pragma Pack (Rec1); subtype Always_True is Boolean range True .. True; type Rec2 is record B1 : Boolean; B2 : Boolean; B3 : Boolean; B4 : Boolean; B5 : Boolean; B6 : Boolean; B7 : Always_True; B8 : Boolean; end record; pragma Pack (Rec2); R1 : Rec1 := (True, True, True, True, True, True, False, False); R2 : Rec2 := (False, False, False, False, False, False, True, True); begin R1.B8 := True; if R1.B7 /= False then raise Program_Error; end if; R1.B7 := False; if R1.B7 /= False then raise Program_Error; end if; R2.B8 := False; if R2.B7 /= True then raise Program_Error; end if; R2.B7 := True; if R2.B7 /= True then raise Program_Error; end if; end;
Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs
On 12/04/13 10:30, Richard Biener wrote: On Wed, Dec 4, 2013 at 11:26 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Dec 3, 2013 at 11:04 PM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: On Tue, 2013-12-03 at 21:35 +0100, Richard Biener wrote: Yufeng Zhangyufeng.zh...@arm.com wrote: On 12/03/13 14:20, Richard Biener wrote: On Tue, Dec 3, 2013 at 1:50 PM, Yufeng Zhangyufeng.zh...@arm.com wrote: On 12/03/13 06:48, Jeff Law wrote: On 12/02/13 08:47, Yufeng Zhang wrote: Ping~ http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03360.html Thanks, Yufeng On 11/26/13 15:02, Yufeng Zhang wrote: On 11/26/13 12:45, Richard Biener wrote: On Thu, Nov 14, 2013 at 12:25 AM, Yufeng Zhangyufeng.zh...@arm.com wrote: On 11/13/13 20:54, Bill Schmidt wrote: The second version of your original patch is ok with me with the following changes. Sorry for the little side adventure into the next-interp logic; in the end that's going to hurt more than it helps in this case. Thanks for having a look at it, anyway. Thanks also for cleaning up this version to be less intrusive to common interfaces; I appreciate it. Thanks a lot for the review. I've attached an updated patch with the suggested changes incorporated. For the next-interp adventure, I was quite happy to do the experiment; it's a good chance of gaining insight into the pass. Many thanks for your prompt replies and patience in guiding! Everything else looks OK to me. Please ask Richard for final approval, as I'm not a maintainer. First a note, I need to check on voting for Bill as the slsr maintainer from the steering committee. Voting was in progress just before the close of stage1 development so I haven't tallied the results :-) Looking forward to some good news! :) Yes, you are right about the non-trivial 'base' tree are rarely shared. The cached is introduced mainly because get_alternative_base () may be called twice on the same 'base' tree, once in the find_basis_for_candidate () for look-up and the other time in alloc_cand_and_find_basis () for record_potential_basis (). I'm happy to leave out the cache if you think the benefit is trivial. Without some sense of how expensive the lookups are vs how often the cache hits it's awful hard to know if the cache is worth it. I'd say take it out unless you have some sense it's really saving time. It's a pretty minor implementation detail either way. I think the affine tree routines are generally expensive; it is worth having a cache to avoid calling them too many times. I run the slsr-*.c tests under gcc.dg/tree-ssa/ and find out that the cache hit rates range from 55.6% to 90%, with 73.5% as the average. The samples may not well represent the real world scenario, but they do show the fact that the 'base' tree can be shared to some extent. So I'd like to have the cache in the patch. +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-slsr } */ + +typedef int arr_2[50][50]; + +void foo (arr_2 a2, int v1) +{ + int i, j; + + i = v1 + 5; + j = i; + a2 [i-10] [j] = 2; + a2 [i] [j++] = i; + a2 [i+20] [j++] = i; + a2 [i-3] [i-1] += 1; + return; +} + +/* { dg-final { scan-tree-dump-times MEM 5 slsr } } */ +/* { dg-final { cleanup-tree-dump slsr } } */ scanning for 5 MEMs looks non-sensical. What transform do you expect? I see other slsr testcases do similar non-sensical checking which is bad, too. As the slsr optimizes CAND_REF candidates by simply lowering them to MEM_REF from e.g. ARRAY_REF, I think scanning for the number of MEM_REFs is an effective check. Alternatively, I can add a follow-up patch to add some dumping facility in replace_ref () to print out the replacing actions when -fdump-tree-slsr-details is on. I think adding some details to the dump and scanning for them would be better. That's the only change that is required for this to move forward. I've updated to patch to dump more details when -fdump-tree-slsr-details is on. The tests have also been updated to scan for these new dumps instead of MEMs. I suggest doing it quickly. We're well past stage1 close at this point. The bootstrapping on x86_64 is still running. OK to commit if it succeeds? I still don't like it. It's using the wrong and too expensive tools to do stuff. What kind of bases are we ultimately interested in? Browsing the code it looks like we're having /* Base expression for the chain of candidates: often, but not always, an SSA name. */ tree base_expr; which isn't really too informative but I suppose they are all kind-of-gimple_val()s? That said, I wonder if you can simply use get_addr_base_and_unit_offset in place of get_alternative_base (), ignoring the returned offset. 'base_expr' is essentially the base address of a handled_component_p, e.g. ARRAY_REF, COMPONENT_REF, etc. In most case, it is the address of the object returned by
Re: [RFC][LIBGCC][2 of 2] 64 bit divide implementation for processor without hw divide instruction
Committed on Kugan's behalf as rev 205666. Christophe. On 3 December 2013 20:47, Jeff Law l...@redhat.com wrote: On 12/02/13 23:39, Kugan wrote: +2013-11-27 Kugan Vivekanandarajah kug...@linaro.org + + * config/arm/bpapi-lib.h (TARGET_HAS_NO_HW_DIVIDE): Define for + architectures that does not have hardware divide instruction. + i.e. architectures that does not define __ARM_ARCH_EXT_IDIV__. + Is this OK for trunk now? Yes, this part is fine too. AFAICT, it just implements what Richard E. suggested ;-) jeff
Re: libsanitizer merge from upstream r196090
Of course various parties care about that. But that doesn't necessarily imply they can or want to run buildbots for a different compiler they don't care about, there are e.g. security implications to that, resources etc. This brings us back to the initial issue: the way asanco are developed. The fact of life is that the development happens in the LLVM tree. The only strategy to keeping GCC's version in sync with upstream that works for us it to have verbatim copy of the sources in GCC and to have the merge process purely mechanical. It has not been purely mechanical with the last merge. Any other strategy would mean that someone else does the merges. This is totally fine as long as the sources do not diverge over time; but I afraid they will diverge. Testing asanco upstream does not necessary require testing the rest of the LLVM compiler. It would be really great to have someone test the entire LLVM tree on e.g. old Fedora, but we can limit such testing just to the compiler-rt project. Ideally for me, GCC would use sanitizer sources from compiler-rt completely verbatim, retaining the entire directory structure with all the tests and not adding any extra files inside that tree. Maybe even use svn external for greater simplicity. The GCC's build files (Makefile.am, etc) and GCC-specific tests would live outside of that source tree. This way it would be easy to set up a build bot that takes fresh compiler-rt, puts it into the GCC tree, and runs the GCC testing. The merges than will become purely mechanical: check the bots, put the new sources into gcc (or even update the svn external revision!). For GCC new ports etc. we never require people running some buildbots, people just report issues they encounter and those issues are fixed. tsan makes many assumptions about the system (address space, etc) which may not hold on old systems anyway. And tsan, even if it builds, will not work reliably. I really think that we need to disable libsanitizer on old systems until someone volunteers to set up a proper testing process upstream. If no one volunteers -- no one really needs it. The problem is that the definition of old systems is extremelly fuzzy, for you it is clearly != Ubuntu 12.04 or similar, but the unwritten assumptions libsanitizer makes are not related to one exact version of a single dependent component, it is a matter of compiler, compiler configuration, binutils, glibc, kernel, kernel headers, ... So, how do you disable libsanitizer on old systems when it is so fuzzy? I would start from kernel version and glibc version, this should cover the majority of use cases. --kcc There may be assumptions you just don't know about, and fixing bugreports by just adding reporter's toolchain combinations to kill lists is certainly not the way GCC is developed. Jakub
Re: [wide-int] Add fast path for hosts with HWI widening multiplication
Richard Sandiford rdsandif...@googlemail.com writes: This patch handles multiplications using a single HWIxHWI-2HWI multiplication on hosts that have one. This removes all uses of the slow (half-HWI) path for insn-recog.ii. The slow path is still used 58 times for cp/parser.ii and 168 times for fold-const.ii, but at that kind of level it shouldn't matter much. I followed Joseph's suggestion and reused longlong.h. I copied it from libgcc rather than glibc since it seemed better for GCC to have a single version across both gcc/ and libgcc/. I can put it in include/ if that seems better. I've committed the patch to move longlong.h to trunk and merged back to the branch, so all that's left is the wide-int.cc patch. OK to install? Thanks, Richard Index: gcc/wide-int.cc === --- gcc/wide-int.cc 2013-12-03 23:59:08.133658567 + +++ gcc/wide-int.cc 2013-12-04 12:55:28.466895358 + @@ -27,6 +27,16 @@ along with GCC; see the file COPYING3. #include tree.h #include dumpfile.h +#if GCC_VERSION = 3000 +#define W_TYPE_SIZE HOST_BITS_PER_WIDE_INT +typedef unsigned HOST_HALF_WIDE_INT UHWtype; +typedef unsigned HOST_WIDE_INT UWtype; +typedef unsigned int UQItype __attribute__ ((mode (QI))); +typedef unsigned int USItype __attribute__ ((mode (SI))); +typedef unsigned int UDItype __attribute__ ((mode (DI))); +#include longlong.h +#endif + /* This is the maximal size of the buffer needed for dump. */ const unsigned int MAX_SIZE = (4 * (MAX_BITSIZE_MODE_ANY_INT / 4 + (MAX_BITSIZE_MODE_ANY_INT @@ -1255,8 +1265,8 @@ wi_pack (unsigned HOST_WIDE_INT *result, record in *OVERFLOW whether the result overflowed. SGN controls the signedness and is used to check overflow or if HIGH is set. */ unsigned int -wi::mul_internal (HOST_WIDE_INT *val, const HOST_WIDE_INT *op1, - unsigned int op1len, const HOST_WIDE_INT *op2, +wi::mul_internal (HOST_WIDE_INT *val, const HOST_WIDE_INT *op1val, + unsigned int op1len, const HOST_WIDE_INT *op2val, unsigned int op2len, unsigned int prec, signop sgn, bool *overflow, bool high) { @@ -1285,24 +1295,53 @@ wi::mul_internal (HOST_WIDE_INT *val, co if (needs_overflow) *overflow = false; + wide_int_ref op1 = wi::storage_ref (op1val, op1len, prec); + wide_int_ref op2 = wi::storage_ref (op2val, op2len, prec); + /* This is a surprisingly common case, so do it first. */ - if ((op1len == 1 op1[0] == 0) || (op2len == 1 op2[0] == 0)) + if (op1 == 0 || op2 == 0) { val[0] = 0; return 1; } +#ifdef umul_ppmm + if (sgn == UNSIGNED) +{ + /* If the inputs are single HWIs and the output has room for at +least two HWIs, we can use umul_ppmm directly. */ + if (prec = HOST_BITS_PER_WIDE_INT * 2 + wi::fits_uhwi_p (op1) + wi::fits_uhwi_p (op2)) + { + umul_ppmm (val[1], val[0], op1.ulow (), op2.ulow ()); + return 1 + (val[1] != 0 || val[0] 0); + } + /* Likewise if the output is a full single HWI, except that the +upper HWI of the result is only used for determining overflow. +(We handle this case inline when overflow isn't needed.) */ + else if (prec == HOST_BITS_PER_WIDE_INT) + { + unsigned HOST_WIDE_INT upper; + umul_ppmm (upper, val[0], op1.ulow (), op2.ulow ()); + if (needs_overflow) + *overflow = (upper != 0); + return 1; + } +} +#endif + /* Handle multiplications by 1. */ - if (op1len == 1 op1[0] == 1) + if (op1 == 1) { for (i = 0; i op2len; i++) - val[i] = op2[i]; + val[i] = op2val[i]; return op2len; } - if (op2len == 1 op2[0] == 1) + if (op2 == 1) { for (i = 0; i op1len; i++) - val[i] = op1[i]; + val[i] = op1val[i]; return op1len; } @@ -1316,13 +1355,13 @@ wi::mul_internal (HOST_WIDE_INT *val, co if (sgn == SIGNED) { - o0 = sext_hwi (op1[0], prec); - o1 = sext_hwi (op2[0], prec); + o0 = op1.to_shwi (); + o1 = op2.to_shwi (); } else { - o0 = zext_hwi (op1[0], prec); - o1 = zext_hwi (op2[0], prec); + o0 = op1.to_uhwi (); + o1 = op2.to_uhwi (); } r = o0 * o1; @@ -1344,9 +1383,9 @@ wi::mul_internal (HOST_WIDE_INT *val, co } /* We do unsigned mul and then correct it. */ - wi_unpack (u, (const unsigned HOST_WIDE_INT*)op1, op1len, + wi_unpack (u, (const unsigned HOST_WIDE_INT *) op1val, op1len, half_blocks_needed, prec, SIGNED); - wi_unpack (v, (const unsigned HOST_WIDE_INT*)op2, op2len, + wi_unpack (v, (const unsigned HOST_WIDE_INT *) op2val, op2len, half_blocks_needed, prec, SIGNED); /* The 2 is for a full mult. */ @@ -1371,7 +1410,7 @@ wi::mul_internal
Re: libsanitizer merge from upstream r196090
On Wed, Dec 04, 2013 at 04:49:22PM +0400, Konstantin Serebryany wrote: I would start from kernel version and glibc version, this should cover the majority of use cases. Well, for the kernel headers what we perhaps can do is just add libsanitizer/include/linux/ tree that will be maintained by GCC and will contain (where needed) wrappers for kernel headers or their replacements to make sure things compile, if you don't care about it in the compiler-rt tree. But for the ppc32 stuff, we can't avoid modifying sanitizer_common (the first patch I've posted recently, btw, I wonder if it works on sparc*, we'll need to wait for somebody to test it), or e.g. for the .cfi_* stuff (if you just apply the the .cfi_* related part of the patch I've posted with say the macros __GCC_HAVE_* replaced by SANITIZER_USE_CFI_DIRECTIVES or similar, I guess we can provide the right definition for that outside of the compiler-rt maintained files. Regarding the TLS size, can you e.g. just only do it for glibc 2.13 and later, rather than having an (even for glibc 2.11/2.12 incorrect) values for older glibcs? Jakub
Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs
On Wed, 2013-12-04 at 11:26 +0100, Richard Biener wrote: On Tue, Dec 3, 2013 at 11:04 PM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: On Tue, 2013-12-03 at 21:35 +0100, Richard Biener wrote: Yufeng Zhang yufeng.zh...@arm.com wrote: On 12/03/13 14:20, Richard Biener wrote: On Tue, Dec 3, 2013 at 1:50 PM, Yufeng Zhangyufeng.zh...@arm.com wrote: On 12/03/13 06:48, Jeff Law wrote: On 12/02/13 08:47, Yufeng Zhang wrote: Ping~ http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03360.html Thanks, Yufeng On 11/26/13 15:02, Yufeng Zhang wrote: On 11/26/13 12:45, Richard Biener wrote: On Thu, Nov 14, 2013 at 12:25 AM, Yufeng Zhangyufeng.zh...@arm.com wrote: On 11/13/13 20:54, Bill Schmidt wrote: The second version of your original patch is ok with me with the following changes. Sorry for the little side adventure into the next-interp logic; in the end that's going to hurt more than it helps in this case. Thanks for having a look at it, anyway. Thanks also for cleaning up this version to be less intrusive to common interfaces; I appreciate it. Thanks a lot for the review. I've attached an updated patch with the suggested changes incorporated. For the next-interp adventure, I was quite happy to do the experiment; it's a good chance of gaining insight into the pass. Many thanks for your prompt replies and patience in guiding! Everything else looks OK to me. Please ask Richard for final approval, as I'm not a maintainer. First a note, I need to check on voting for Bill as the slsr maintainer from the steering committee. Voting was in progress just before the close of stage1 development so I haven't tallied the results :-) Looking forward to some good news! :) Yes, you are right about the non-trivial 'base' tree are rarely shared. The cached is introduced mainly because get_alternative_base () may be called twice on the same 'base' tree, once in the find_basis_for_candidate () for look-up and the other time in alloc_cand_and_find_basis () for record_potential_basis (). I'm happy to leave out the cache if you think the benefit is trivial. Without some sense of how expensive the lookups are vs how often the cache hits it's awful hard to know if the cache is worth it. I'd say take it out unless you have some sense it's really saving time. It's a pretty minor implementation detail either way. I think the affine tree routines are generally expensive; it is worth having a cache to avoid calling them too many times. I run the slsr-*.c tests under gcc.dg/tree-ssa/ and find out that the cache hit rates range from 55.6% to 90%, with 73.5% as the average. The samples may not well represent the real world scenario, but they do show the fact that the 'base' tree can be shared to some extent. So I'd like to have the cache in the patch. +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-slsr } */ + +typedef int arr_2[50][50]; + +void foo (arr_2 a2, int v1) +{ + int i, j; + + i = v1 + 5; + j = i; + a2 [i-10] [j] = 2; + a2 [i] [j++] = i; + a2 [i+20] [j++] = i; + a2 [i-3] [i-1] += 1; + return; +} + +/* { dg-final { scan-tree-dump-times MEM 5 slsr } } */ +/* { dg-final { cleanup-tree-dump slsr } } */ scanning for 5 MEMs looks non-sensical. What transform do you expect? I see other slsr testcases do similar non-sensical checking which is bad, too. As the slsr optimizes CAND_REF candidates by simply lowering them to MEM_REF from e.g. ARRAY_REF, I think scanning for the number of MEM_REFs is an effective check. Alternatively, I can add a follow-up patch to add some dumping facility in replace_ref () to print out the replacing actions when -fdump-tree-slsr-details is on. I think adding some details to the dump and scanning for them would be better. That's the only change that is required for this to move forward. I've updated to patch to dump more details when -fdump-tree-slsr-details is on. The tests have also been updated to scan for these new dumps instead of MEMs. I suggest doing it quickly. We're well past stage1 close at this point. The bootstrapping on x86_64 is still running. OK to commit if it succeeds? I still don't like it. It's using the wrong and too expensive tools to do stuff. What kind of bases are we ultimately interested in? Browsing the code it looks like we're having /* Base expression for the chain of candidates: often, but not always, an SSA name. */ tree base_expr; which isn't really too informative but I suppose they are all kind-of-gimple_val()s? That said, I wonder if you can
Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs
On Wed, 2013-12-04 at 11:30 +0100, Richard Biener wrote: On Wed, Dec 4, 2013 at 11:26 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Dec 3, 2013 at 11:04 PM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: On Tue, 2013-12-03 at 21:35 +0100, Richard Biener wrote: Yufeng Zhang yufeng.zh...@arm.com wrote: On 12/03/13 14:20, Richard Biener wrote: On Tue, Dec 3, 2013 at 1:50 PM, Yufeng Zhangyufeng.zh...@arm.com wrote: On 12/03/13 06:48, Jeff Law wrote: On 12/02/13 08:47, Yufeng Zhang wrote: Ping~ http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03360.html Thanks, Yufeng On 11/26/13 15:02, Yufeng Zhang wrote: On 11/26/13 12:45, Richard Biener wrote: On Thu, Nov 14, 2013 at 12:25 AM, Yufeng Zhangyufeng.zh...@arm.com wrote: On 11/13/13 20:54, Bill Schmidt wrote: The second version of your original patch is ok with me with the following changes. Sorry for the little side adventure into the next-interp logic; in the end that's going to hurt more than it helps in this case. Thanks for having a look at it, anyway. Thanks also for cleaning up this version to be less intrusive to common interfaces; I appreciate it. Thanks a lot for the review. I've attached an updated patch with the suggested changes incorporated. For the next-interp adventure, I was quite happy to do the experiment; it's a good chance of gaining insight into the pass. Many thanks for your prompt replies and patience in guiding! Everything else looks OK to me. Please ask Richard for final approval, as I'm not a maintainer. First a note, I need to check on voting for Bill as the slsr maintainer from the steering committee. Voting was in progress just before the close of stage1 development so I haven't tallied the results :-) Looking forward to some good news! :) Yes, you are right about the non-trivial 'base' tree are rarely shared. The cached is introduced mainly because get_alternative_base () may be called twice on the same 'base' tree, once in the find_basis_for_candidate () for look-up and the other time in alloc_cand_and_find_basis () for record_potential_basis (). I'm happy to leave out the cache if you think the benefit is trivial. Without some sense of how expensive the lookups are vs how often the cache hits it's awful hard to know if the cache is worth it. I'd say take it out unless you have some sense it's really saving time. It's a pretty minor implementation detail either way. I think the affine tree routines are generally expensive; it is worth having a cache to avoid calling them too many times. I run the slsr-*.c tests under gcc.dg/tree-ssa/ and find out that the cache hit rates range from 55.6% to 90%, with 73.5% as the average. The samples may not well represent the real world scenario, but they do show the fact that the 'base' tree can be shared to some extent. So I'd like to have the cache in the patch. +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-slsr } */ + +typedef int arr_2[50][50]; + +void foo (arr_2 a2, int v1) +{ + int i, j; + + i = v1 + 5; + j = i; + a2 [i-10] [j] = 2; + a2 [i] [j++] = i; + a2 [i+20] [j++] = i; + a2 [i-3] [i-1] += 1; + return; +} + +/* { dg-final { scan-tree-dump-times MEM 5 slsr } } */ +/* { dg-final { cleanup-tree-dump slsr } } */ scanning for 5 MEMs looks non-sensical. What transform do you expect? I see other slsr testcases do similar non-sensical checking which is bad, too. As the slsr optimizes CAND_REF candidates by simply lowering them to MEM_REF from e.g. ARRAY_REF, I think scanning for the number of MEM_REFs is an effective check. Alternatively, I can add a follow-up patch to add some dumping facility in replace_ref () to print out the replacing actions when -fdump-tree-slsr-details is on. I think adding some details to the dump and scanning for them would be better. That's the only change that is required for this to move forward. I've updated to patch to dump more details when -fdump-tree-slsr-details is on. The tests have also been updated to scan for these new dumps instead of MEMs. I suggest doing it quickly. We're well past stage1 close at this point. The bootstrapping on x86_64 is still running. OK to commit if it succeeds? I still don't like it. It's using the wrong and too expensive tools to do stuff. What kind of bases are we ultimately interested in? Browsing the code it looks like we're having /* Base expression for the chain of candidates: often, but not always, an SSA name. */ tree base_expr; which isn't really too informative
Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs
On Wed, 2013-12-04 at 11:32 +, Yufeng Zhang wrote: On 12/04/13 10:30, Richard Biener wrote: On Wed, Dec 4, 2013 at 11:26 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Dec 3, 2013 at 11:04 PM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: On Tue, 2013-12-03 at 21:35 +0100, Richard Biener wrote: Yufeng Zhangyufeng.zh...@arm.com wrote: On 12/03/13 14:20, Richard Biener wrote: On Tue, Dec 3, 2013 at 1:50 PM, Yufeng Zhangyufeng.zh...@arm.com wrote: On 12/03/13 06:48, Jeff Law wrote: On 12/02/13 08:47, Yufeng Zhang wrote: Ping~ http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03360.html Thanks, Yufeng On 11/26/13 15:02, Yufeng Zhang wrote: On 11/26/13 12:45, Richard Biener wrote: On Thu, Nov 14, 2013 at 12:25 AM, Yufeng Zhangyufeng.zh...@arm.com wrote: On 11/13/13 20:54, Bill Schmidt wrote: The second version of your original patch is ok with me with the following changes. Sorry for the little side adventure into the next-interp logic; in the end that's going to hurt more than it helps in this case. Thanks for having a look at it, anyway. Thanks also for cleaning up this version to be less intrusive to common interfaces; I appreciate it. Thanks a lot for the review. I've attached an updated patch with the suggested changes incorporated. For the next-interp adventure, I was quite happy to do the experiment; it's a good chance of gaining insight into the pass. Many thanks for your prompt replies and patience in guiding! Everything else looks OK to me. Please ask Richard for final approval, as I'm not a maintainer. First a note, I need to check on voting for Bill as the slsr maintainer from the steering committee. Voting was in progress just before the close of stage1 development so I haven't tallied the results :-) Looking forward to some good news! :) Yes, you are right about the non-trivial 'base' tree are rarely shared. The cached is introduced mainly because get_alternative_base () may be called twice on the same 'base' tree, once in the find_basis_for_candidate () for look-up and the other time in alloc_cand_and_find_basis () for record_potential_basis (). I'm happy to leave out the cache if you think the benefit is trivial. Without some sense of how expensive the lookups are vs how often the cache hits it's awful hard to know if the cache is worth it. I'd say take it out unless you have some sense it's really saving time. It's a pretty minor implementation detail either way. I think the affine tree routines are generally expensive; it is worth having a cache to avoid calling them too many times. I run the slsr-*.c tests under gcc.dg/tree-ssa/ and find out that the cache hit rates range from 55.6% to 90%, with 73.5% as the average. The samples may not well represent the real world scenario, but they do show the fact that the 'base' tree can be shared to some extent. So I'd like to have the cache in the patch. +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-slsr } */ + +typedef int arr_2[50][50]; + +void foo (arr_2 a2, int v1) +{ + int i, j; + + i = v1 + 5; + j = i; + a2 [i-10] [j] = 2; + a2 [i] [j++] = i; + a2 [i+20] [j++] = i; + a2 [i-3] [i-1] += 1; + return; +} + +/* { dg-final { scan-tree-dump-times MEM 5 slsr } } */ +/* { dg-final { cleanup-tree-dump slsr } } */ scanning for 5 MEMs looks non-sensical. What transform do you expect? I see other slsr testcases do similar non-sensical checking which is bad, too. As the slsr optimizes CAND_REF candidates by simply lowering them to MEM_REF from e.g. ARRAY_REF, I think scanning for the number of MEM_REFs is an effective check. Alternatively, I can add a follow-up patch to add some dumping facility in replace_ref () to print out the replacing actions when -fdump-tree-slsr-details is on. I think adding some details to the dump and scanning for them would be better. That's the only change that is required for this to move forward. I've updated to patch to dump more details when -fdump-tree-slsr-details is on. The tests have also been updated to scan for these new dumps instead of MEMs. I suggest doing it quickly. We're well past stage1 close at this point. The bootstrapping on x86_64 is still running. OK to commit if it succeeds? I still don't like it. It's using the wrong and too expensive tools to do stuff. What kind of bases are we ultimately interested in? Browsing the code it looks like we're having /* Base expression for the chain of candidates: often, but not always, an SSA name. */ tree base_expr; which isn't really too informative but I suppose they are all kind-of-gimple_val()s? That said, I wonder if you can
Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs
On Wed, 2013-12-04 at 07:13 -0600, Bill Schmidt wrote: On Wed, 2013-12-04 at 11:30 +0100, Richard Biener wrote: On Wed, Dec 4, 2013 at 11:26 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Dec 3, 2013 at 11:04 PM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: On Tue, 2013-12-03 at 21:35 +0100, Richard Biener wrote: Yufeng Zhang yufeng.zh...@arm.com wrote: On 12/03/13 14:20, Richard Biener wrote: On Tue, Dec 3, 2013 at 1:50 PM, Yufeng Zhangyufeng.zh...@arm.com wrote: On 12/03/13 06:48, Jeff Law wrote: On 12/02/13 08:47, Yufeng Zhang wrote: Ping~ http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03360.html Thanks, Yufeng On 11/26/13 15:02, Yufeng Zhang wrote: On 11/26/13 12:45, Richard Biener wrote: On Thu, Nov 14, 2013 at 12:25 AM, Yufeng Zhangyufeng.zh...@arm.com wrote: On 11/13/13 20:54, Bill Schmidt wrote: The second version of your original patch is ok with me with the following changes. Sorry for the little side adventure into the next-interp logic; in the end that's going to hurt more than it helps in this case. Thanks for having a look at it, anyway. Thanks also for cleaning up this version to be less intrusive to common interfaces; I appreciate it. Thanks a lot for the review. I've attached an updated patch with the suggested changes incorporated. For the next-interp adventure, I was quite happy to do the experiment; it's a good chance of gaining insight into the pass. Many thanks for your prompt replies and patience in guiding! Everything else looks OK to me. Please ask Richard for final approval, as I'm not a maintainer. First a note, I need to check on voting for Bill as the slsr maintainer from the steering committee. Voting was in progress just before the close of stage1 development so I haven't tallied the results :-) Looking forward to some good news! :) Yes, you are right about the non-trivial 'base' tree are rarely shared. The cached is introduced mainly because get_alternative_base () may be called twice on the same 'base' tree, once in the find_basis_for_candidate () for look-up and the other time in alloc_cand_and_find_basis () for record_potential_basis (). I'm happy to leave out the cache if you think the benefit is trivial. Without some sense of how expensive the lookups are vs how often the cache hits it's awful hard to know if the cache is worth it. I'd say take it out unless you have some sense it's really saving time. It's a pretty minor implementation detail either way. I think the affine tree routines are generally expensive; it is worth having a cache to avoid calling them too many times. I run the slsr-*.c tests under gcc.dg/tree-ssa/ and find out that the cache hit rates range from 55.6% to 90%, with 73.5% as the average. The samples may not well represent the real world scenario, but they do show the fact that the 'base' tree can be shared to some extent. So I'd like to have the cache in the patch. +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-slsr } */ + +typedef int arr_2[50][50]; + +void foo (arr_2 a2, int v1) +{ + int i, j; + + i = v1 + 5; + j = i; + a2 [i-10] [j] = 2; + a2 [i] [j++] = i; + a2 [i+20] [j++] = i; + a2 [i-3] [i-1] += 1; + return; +} + +/* { dg-final { scan-tree-dump-times MEM 5 slsr } } */ +/* { dg-final { cleanup-tree-dump slsr } } */ scanning for 5 MEMs looks non-sensical. What transform do you expect? I see other slsr testcases do similar non-sensical checking which is bad, too. As the slsr optimizes CAND_REF candidates by simply lowering them to MEM_REF from e.g. ARRAY_REF, I think scanning for the number of MEM_REFs is an effective check. Alternatively, I can add a follow-up patch to add some dumping facility in replace_ref () to print out the replacing actions when -fdump-tree-slsr-details is on. I think adding some details to the dump and scanning for them would be better. That's the only change that is required for this to move forward. I've updated to patch to dump more details when -fdump-tree-slsr-details is on. The tests have also been updated to scan for these new dumps instead of MEMs. I suggest doing it quickly. We're well past stage1 close at this point. The bootstrapping on x86_64 is still running. OK to commit if it succeeds? I still don't like it. It's using the wrong and too expensive tools to do stuff. What kind of bases are we ultimately interested
Re: libsanitizer merge from upstream r196090
On Wed, Dec 4, 2013 at 5:02 PM, Jakub Jelinek ja...@redhat.com wrote: On Wed, Dec 04, 2013 at 04:49:22PM +0400, Konstantin Serebryany wrote: I would start from kernel version and glibc version, this should cover the majority of use cases. Well, for the kernel headers what we perhaps can do is just add libsanitizer/include/linux/ tree that will be maintained by GCC and will if that works for you, no objections. contain (where needed) wrappers for kernel headers or their replacements to make sure things compile, if you don't care about it in the compiler-rt tree. But for the ppc32 stuff, we can't avoid modifying sanitizer_common (the first patch I've posted recently, btw, I wonder if it works on sparc*, we'll need to wait for somebody to test it), or e.g. for the .cfi_* stuff (if you just apply the the .cfi_* related part of the patch I've posted with say the macros __GCC_HAVE_* replaced by SANITIZER_USE_CFI_DIRECTIVES or similar, I guess we can provide the right definition for that outside of the compiler-rt maintained files. .cfi is used only in tsan sources now, and tsan is not supported anywhere but x86_64 ppc32 never worked (last time I tried there were several different issues so we disabled 32-bit build) -- we should just disable it in GCC too. There is not value in building code that does not run. Regarding the TLS size, can you e.g. just only do it for glibc 2.13 and later, rather than having an (even for glibc 2.11/2.12 incorrect) values for older glibcs? That would work for me, although it may bring some surprises later. If we incorrectly compute the tls boundaries, lsan my produce false positives or false negatives. Having kThreadDescriptorSize=0 means that we include the stack descriptor in the lsan's root set and thus may miss a leak (with rather low probability). I can live with this. Like this (tested only on my box)? Index: sanitizer_linux_libcdep.cc === --- sanitizer_linux_libcdep.cc (revision 196375) +++ sanitizer_linux_libcdep.cc (working copy) @@ -207,12 +207,12 @@ #if defined(__x86_64__) || defined(__i386__) // sizeof(struct thread) from glibc. -// There has been a report of this being different on glibc 2.11 and 2.13. We -// don't know when this change happened, so 2.14 is a conservative estimate. -#if __GLIBC_PREREQ(2, 14) +// This may change between glibc versions, we only support the versions we know +// avout (= 2.13). For others we set kThreadDescriptorSize to 0. +#if __GLIBC_PREREQ(2, 13) const uptr kThreadDescriptorSize = FIRST_32_SECOND_64(1216, 2304); #else -const uptr kThreadDescriptorSize = FIRST_32_SECOND_64(1168, 2304); +const uptr kThreadDescriptorSize = 0; // Unknown. #endif uptr ThreadDescriptorSize() { @@ -255,7 +255,7 @@ *stk_addr = stack_bottom; *stk_size = stack_top - stack_bottom; - if (!main) { + if (!main kThreadDescriptorSize) { // If stack and tls intersect, make them non-intersecting. if (*tls_addr *stk_addr *tls_addr *stk_addr + *stk_size) { CHECK_GT(*tls_addr + *tls_size, *stk_addr); Index: tests/sanitizer_linux_test.cc === --- tests/sanitizer_linux_test.cc (revision 196375) +++ tests/sanitizer_linux_test.cc (working copy) @@ -224,6 +224,7 @@ TEST(SanitizerLinux, ThreadDescriptorSize) { pthread_t tid; + if (!ThreadDescriptorSize()) return; void *result; ASSERT_EQ(0, pthread_create(tid, 0, thread_descriptor_size_test_func, 0)); ASSERT_EQ(0, pthread_join(tid, result)); grumbling If I had a buildbot with old Fedora, I would simply submit the change and see if it broke/fixed it. /grumbling --kcc Jakub
Re: libsanitizer merge from upstream r196090
On Wed, Dec 04, 2013 at 05:28:40PM +0400, Konstantin Serebryany wrote: Well, for the kernel headers what we perhaps can do is just add libsanitizer/include/linux/ tree that will be maintained by GCC and will if that works for you, no objections. I haven't tried to do that yet, so don't know how much work it will be, but at least from the second patch posted recently it it might work fine, at least for now. .cfi is used only in tsan sources now, and tsan is not supported anywhere but x86_64 But the .cfi_* issue is platform independent. Whether the compiler decides to emit them or not depends on how it was configured, on assembler and on compiler flags. I don't see how it can be a maintainance problem to just guard the few (right now two) .cfi_* occurrences in the C++ sources, or using CFI_* macros instead of .cfi_* directives directly in the assembly file. Other projects (e.g. glibc) manage to do that for years without any trouble. ppc32 never worked (last time I tried there were several different issues so we disabled 32-bit build) -- we should just disable it in GCC too. There is not value in building code that does not run. That doesn't mean it can't be made to work, and the patch I've posted is at least an (IMHO correct) step towards that. Note, I had just much bigger problems on ppc64 with the addr2line symbolization because of the ppc64 opd/plt stuff, though supposedly that might go away once I patch libsanitizer to use libbacktrace for symbolization. There is no inherent reason why ppc32 wouldn't work and ppc64 would, after all ppc64 with its weirdo function descriptor stuff is much harder to support. Regarding the TLS size, can you e.g. just only do it for glibc 2.13 and later, rather than having an (even for glibc 2.11/2.12 incorrect) values for older glibcs? That would work for me, although it may bring some surprises later. If we incorrectly compute the tls boundaries, lsan my produce false positives or false negatives. But is that solely for lsan and nothing else? Because, the assertion was failing in asan tests, without any asan options to request leak checking. And for non-i?86/x86_64 you ignore the tls boundaries too. Having kThreadDescriptorSize=0 means that we include the stack descriptor in the lsan's root set and thus may miss a leak (with rather low probability). I can live with this. Like this (tested only on my box)? --- sanitizer_linux_libcdep.cc (revision 196375) +++ sanitizer_linux_libcdep.cc (working copy) @@ -207,12 +207,12 @@ #if defined(__x86_64__) || defined(__i386__) // sizeof(struct thread) from glibc. -// There has been a report of this being different on glibc 2.11 and 2.13. We -// don't know when this change happened, so 2.14 is a conservative estimate. -#if __GLIBC_PREREQ(2, 14) +// This may change between glibc versions, we only support the versions we know +// avout (= 2.13). For others we set kThreadDescriptorSize to 0. +#if __GLIBC_PREREQ(2, 13) const uptr kThreadDescriptorSize = FIRST_32_SECOND_64(1216, 2304); #else -const uptr kThreadDescriptorSize = FIRST_32_SECOND_64(1168, 2304); +const uptr kThreadDescriptorSize = 0; // Unknown. Depends on (as I've asked earlier) on if you need the exact precise value or if say conservatively smaller value is fine. Then you could say for glibc = 2.5 pick the minimum of the values I've gathered. Jakub
[PATCH 2/2] Implement -fsanitize=signed-integer-overflow (i?86 parts)
And this is the i?86 specific part of -fsanitize=signed-integer-overflow, split out of the huge patch. It really is dependent on the generic parts, when commiting, I'll put both parts together. Uros, would you mind taking a look at this? Regtested/bootstrapped on x86_64-linux. Ok for trunk? 2013-12-04 Jakub Jelinek ja...@redhat.com Marek Polacek pola...@redhat.com * config/i386/i386.md (addvmode4, subvmode4, mulvmode4, negvmode3, negvmode3_1): Define expands. (*addvmode4, *subvmode4, *mulvmode4, *negvmode3): Define insns. --- gcc/config/i386/i386.md.mp 2013-12-04 12:15:33.508905947 +0100 +++ gcc/config/i386/i386.md 2013-12-04 12:15:39.608929341 +0100 @@ -6153,6 +6153,42 @@ [(set_attr type alu) (set_attr mode QI)]) +(define_mode_attr widerintmode [(QI HI) (HI SI) (SI DI) (DI TI)]) + +;; Add with jump on overflow. +(define_expand addvmode4 + [(parallel [(set (reg:CCO FLAGS_REG) + (eq:CCO (plus:widerintmode + (sign_extend:widerintmode +(match_operand:SWI 1 register_operand)) + (sign_extend:widerintmode +(match_operand:SWI 2 general_operand))) + (sign_extend:widerintmode + (plus:SWI (match_dup 1) (match_dup 2) + (set (match_operand:SWI 0 register_operand) + (plus:SWI (match_dup 1) (match_dup 2)))]) + (set (pc) (if_then_else + (eq (reg:CCO FLAGS_REG) (const_int 0)) + (label_ref (match_operand 3)) + (pc)))] + ) + +(define_insn *addvmode4 + [(set (reg:CCO FLAGS_REG) + (eq:CCO (plus:widerintmode + (sign_extend:widerintmode + (match_operand:SWI 1 nonimmediate_operand %0,0)) + (sign_extend:widerintmode + (match_operand:SWI 2 general_operand g,ri))) + (sign_extend:widerintmode + (plus:SWI (match_dup 1) (match_dup 2) + (set (match_operand:SWI 0 nonimmediate_operand =r,rm) + (plus:SWI (match_dup 1) (match_dup 2)))] + ix86_binary_operator_ok (PLUS, MODEmode, operands) + add{imodesuffix}\t{%2, %0|%0, %2} + [(set_attr type alu) + (set_attr mode MODE)]) + ;; The lea patterns for modes less than 32 bits need to be matched by ;; several insns converted to real lea by splitters. @@ -6390,6 +6426,40 @@ [(set_attr type alu) (set_attr mode SI)]) +;; Subtract with jump on overflow. +(define_expand subvmode4 + [(parallel [(set (reg:CCO FLAGS_REG) + (eq:CCO (minus:widerintmode + (sign_extend:widerintmode +(match_operand:SWI 1 register_operand)) + (sign_extend:widerintmode +(match_operand:SWI 2 general_operand))) + (sign_extend:widerintmode + (minus:SWI (match_dup 1) (match_dup 2) + (set (match_operand:SWI 0 register_operand) + (minus:SWI (match_dup 1) (match_dup 2)))]) + (set (pc) (if_then_else + (eq (reg:CCO FLAGS_REG) (const_int 0)) + (label_ref (match_operand 3)) + (pc)))] + ) + +(define_insn *subvmode4 + [(set (reg:CCO FLAGS_REG) + (eq:CCO (minus:widerintmode + (sign_extend:widerintmode + (match_operand:SWI 1 nonimmediate_operand 0,0)) + (sign_extend:widerintmode + (match_operand:SWI 2 general_operand ri,rm))) + (sign_extend:widerintmode + (minus:SWI (match_dup 1) (match_dup 2) + (set (match_operand:SWI 0 nonimmediate_operand =rm,r) + (minus:SWI (match_dup 1) (match_dup 2)))] + ix86_binary_operator_ok (MINUS, MODEmode, operands) + sub{imodesuffix}\t{%2, %0|%0, %2} + [(set_attr type alu) + (set_attr mode MODE)]) + (define_insn *submode_3 [(set (reg FLAGS_REG) (compare (match_operand:SWI 1 nonimmediate_operand 0,0) @@ -6704,6 +6774,59 @@ (set_attr bdver1_decode direct) (set_attr mode QI)]) +;; Multiply with jump on overflow. +(define_expand mulvmode4 + [(parallel [(set (reg:CCO FLAGS_REG) + (eq:CCO (mult:widerintmode + (sign_extend:widerintmode +(match_operand:SWI48 1 register_operand)) + (sign_extend:widerintmode +(match_operand:SWI48 2 general_operand))) + (sign_extend:widerintmode + (mult:SWI48 (match_dup 1) (match_dup 2) + (set (match_operand:SWI48 0 register_operand) + (mult:SWI48 (match_dup 1) (match_dup 2)))]) + (set (pc) (if_then_else + (eq (reg:CCO FLAGS_REG) (const_int 0)) + (label_ref
[PATCH 1/2] Implement -fsanitize=signed-integer-overflow (generic parts)
This is a repost of rebased version of the signed-integer-overflow patch, split into generic parts and i?86 parts. By i?86 parts I mean the stuff that resides in config/i386, I haven't really tried to untangle it more. Except the two formatting fixes I also moved various PROB_ macros into predict.h and made the users include it, rather than duplicating the defines everywhere. Regtested/bootstrapped on x86_64-linux. Ok for trunk? There are still things to do, but I'd like to get this in first. 2013-12-04 Jakub Jelinek ja...@redhat.com Marek Polacek pola...@redhat.com * opts.c (common_handle_option): Handle -fsanitize=signed-integer-overflow. * sanitizer.def (BUILT_IN_UBSAN_HANDLE_ADD_OVERFLOW, BUILT_IN_UBSAN_HANDLE_SUB_OVERFLOW, BUILT_IN_UBSAN_HANDLE_MUL_OVERFLOW, BUILT_IN_UBSAN_HANDLE_NEGATE_OVERFLOW): Define. * ubsan.h (PROB_VERY_UNLIKELY, PROB_EVEN, PROB_VERY_LIKELY, PROB_ALWAYS): Define. (ubsan_build_overflow_builtin): Declare. * gimple-fold.c (gimple_fold_stmt_to_constant_1): Add folding of internal functions. * ubsan.c (PROB_VERY_UNLIKELY): Don't define here. (ubsan_build_overflow_builtin): New function. (instrument_si_overflow): Likewise. (ubsan_pass): Add signed integer overflow checking. (gate_ubsan): Enable the pass also when SANITIZE_SI_OVERFLOW. * flag-types.h (enum sanitize_code): Add SANITIZE_SI_OVERFLOW. * internal-fn.c: Include ubsan.h and target.h. (ubsan_expand_si_overflow_addsub_check): New function. (ubsan_expand_si_overflow_neg_check): Likewise. (ubsan_expand_si_overflow_mul_check): Likewise. (expand_UBSAN_CHECK_ADD): Likewise. (expand_UBSAN_CHECK_SUB): Likewise. (expand_UBSAN_CHECK_MUL): Likewise. * fold-const.c (fold_binary_loc): Don't fold A + (-B) - A - B and (-A) + B - B - A when doing the signed integer overflow checking. * internal-fn.def (UBSAN_CHECK_ADD, UBSAN_CHECK_SUB, UBSAN_CHECK_MUL): Define. * tree-vrp.c (extract_range_basic): Handle internal calls. * optabs.def (addv4_optab, subv4_optab, mulv4_optab, negv4_optab): New optabs. * asan.c: Include predict.h. (PROB_VERY_UNLIKELY, PROB_ALWAYS): Don't define here. * predict.c: Move the PROB_* macros... * predict.h (enum br_predictor): ...here. (PROB_LIKELY, PROB_UNLIKELY): Define. * trans-mem.c: Include predict.h. (PROB_VERY_UNLIKELY, PROB_ALWAYS, PROB_VERY_LIKELY, PROB_LIKELY, PROB_UNLIKELY): Don't define here. c-family/ * c-gimplify.c (c_gimplify_expr): If doing the integer-overflow sanitization, call unsigned_type_for only when !TYPE_OVERFLOW_WRAPS. testsuite/ * c-c++-common/ubsan/overflow-mul-2.c: New test. * c-c++-common/ubsan/overflow-add-1.c: New test. * c-c++-common/ubsan/overflow-add-2.c: New test. * c-c++-common/ubsan/overflow-mul-1.c: New test. * c-c++-common/ubsan/overflow-sub-1.c: New test. * c-c++-common/ubsan/overflow-sub-2.c: New test. * c-c++-common/ubsan/overflow-negate-1.c: New test. --- gcc/opts.c.mp 2013-12-04 12:15:33.517905987 +0100 +++ gcc/opts.c 2013-12-04 12:15:39.640929478 +0100 @@ -1460,6 +1460,8 @@ common_handle_option (struct gcc_options { vla-bound, SANITIZE_VLA, sizeof vla-bound - 1 }, { return, SANITIZE_RETURN, sizeof return - 1 }, { null, SANITIZE_NULL, sizeof null - 1 }, + { signed-integer-overflow, SANITIZE_SI_OVERFLOW, + sizeof signed-integer-overflow -1 }, { NULL, 0, 0 } }; const char *comma; --- gcc/predict.h.mp2013-12-04 12:15:33.520905999 +0100 +++ gcc/predict.h 2013-12-04 12:15:39.645929498 +0100 @@ -20,6 +20,16 @@ along with GCC; see the file COPYING3. #ifndef GCC_PREDICT_H #define GCC_PREDICT_H +/* Random guesstimation given names. + PROB_VERY_UNLIKELY should be small enough so basic block predicted + by it gets below HOT_BB_FREQUENCY_FRACTION. */ +#define PROB_VERY_UNLIKELY (REG_BR_PROB_BASE / 2000 - 1) +#define PROB_EVEN (REG_BR_PROB_BASE / 2) +#define PROB_VERY_LIKELY (REG_BR_PROB_BASE - PROB_VERY_UNLIKELY) +#define PROB_ALWAYS(REG_BR_PROB_BASE) +#define PROB_UNLIKELY (REG_BR_PROB_BASE / 5 - 1) +#define PROB_LIKELY (PROB_ALWAYS - PROB_VERY_LIKELY) + #define DEF_PREDICTOR(ENUM, NAME, HITRATE, FLAGS) ENUM, enum br_predictor { --- gcc/c-family/c-gimplify.c.mp2013-12-04 12:15:33.506905939 +0100 +++ gcc/c-family/c-gimplify.c 2013-12-04 12:15:39.598929297 +0100 @@ -199,7 +199,9 @@ c_gimplify_expr (tree *expr_p, gimple_se tree type = TREE_TYPE (TREE_OPERAND (*expr_p, 0)); if (INTEGRAL_TYPE_P (type) c_promoting_integer_type_p (type)) { - if
Re: [PATCH 2/2] Implement -fsanitize=signed-integer-overflow (i?86 parts)
On Wed, Dec 4, 2013 at 2:44 PM, Marek Polacek pola...@redhat.com wrote: And this is the i?86 specific part of -fsanitize=signed-integer-overflow, split out of the huge patch. It really is dependent on the generic parts, when commiting, I'll put both parts together. Just a question (I will review the patch later today): shouldn't generic parts also work without new target patterns and use __addv* stuff from libgcc when patterns are not present? Uros.
Re: [wide-int] Add fast path for hosts with HWI widening multiplication
On 12/04/2013 07:56 AM, Richard Sandiford wrote: Richard Sandiford rdsandif...@googlemail.com writes: This patch handles multiplications using a single HWIxHWI-2HWI multiplication on hosts that have one. This removes all uses of the slow (half-HWI) path for insn-recog.ii. The slow path is still used 58 times for cp/parser.ii and 168 times for fold-const.ii, but at that kind of level it shouldn't matter much. I followed Joseph's suggestion and reused longlong.h. I copied it from libgcc rather than glibc since it seemed better for GCC to have a single version across both gcc/ and libgcc/. I can put it in include/ if that seems better. I've committed the patch to move longlong.h to trunk and merged back to the branch, so all that's left is the wide-int.cc patch. OK to install? Thanks, Richard Index: gcc/wide-int.cc === --- gcc/wide-int.cc 2013-12-03 23:59:08.133658567 + +++ gcc/wide-int.cc 2013-12-04 12:55:28.466895358 + @@ -27,6 +27,16 @@ along with GCC; see the file COPYING3. #include tree.h #include dumpfile.h +#if GCC_VERSION = 3000 +#define W_TYPE_SIZE HOST_BITS_PER_WIDE_INT +typedef unsigned HOST_HALF_WIDE_INT UHWtype; +typedef unsigned HOST_WIDE_INT UWtype; +typedef unsigned int UQItype __attribute__ ((mode (QI))); +typedef unsigned int USItype __attribute__ ((mode (SI))); +typedef unsigned int UDItype __attribute__ ((mode (DI))); +#include longlong.h +#endif + /* This is the maximal size of the buffer needed for dump. */ const unsigned int MAX_SIZE = (4 * (MAX_BITSIZE_MODE_ANY_INT / 4 + (MAX_BITSIZE_MODE_ANY_INT @@ -1255,8 +1265,8 @@ wi_pack (unsigned HOST_WIDE_INT *result, record in *OVERFLOW whether the result overflowed. SGN controls the signedness and is used to check overflow or if HIGH is set. */ unsigned int -wi::mul_internal (HOST_WIDE_INT *val, const HOST_WIDE_INT *op1, - unsigned int op1len, const HOST_WIDE_INT *op2, +wi::mul_internal (HOST_WIDE_INT *val, const HOST_WIDE_INT *op1val, + unsigned int op1len, const HOST_WIDE_INT *op2val, unsigned int op2len, unsigned int prec, signop sgn, bool *overflow, bool high) { @@ -1285,24 +1295,53 @@ wi::mul_internal (HOST_WIDE_INT *val, co if (needs_overflow) *overflow = false; + wide_int_ref op1 = wi::storage_ref (op1val, op1len, prec); + wide_int_ref op2 = wi::storage_ref (op2val, op2len, prec); + /* This is a surprisingly common case, so do it first. */ - if ((op1len == 1 op1[0] == 0) || (op2len == 1 op2[0] == 0)) + if (op1 == 0 || op2 == 0) { val[0] = 0; return 1; } +#ifdef umul_ppmm + if (sgn == UNSIGNED) +{ + /* If the inputs are single HWIs and the output has room for at +least two HWIs, we can use umul_ppmm directly. */ + if (prec = HOST_BITS_PER_WIDE_INT * 2 + wi::fits_uhwi_p (op1) + wi::fits_uhwi_p (op2)) + { + umul_ppmm (val[1], val[0], op1.ulow (), op2.ulow ()); + return 1 + (val[1] != 0 || val[0] 0); + } + /* Likewise if the output is a full single HWI, except that the +upper HWI of the result is only used for determining overflow. +(We handle this case inline when overflow isn't needed.) */ + else if (prec == HOST_BITS_PER_WIDE_INT) + { + unsigned HOST_WIDE_INT upper; + umul_ppmm (upper, val[0], op1.ulow (), op2.ulow ()); + if (needs_overflow) + *overflow = (upper != 0); + return 1; + } +} +#endif + /* Handle multiplications by 1. */ - if (op1len == 1 op1[0] == 1) + if (op1 == 1) { for (i = 0; i op2len; i++) - val[i] = op2[i]; + val[i] = op2val[i]; return op2len; } - if (op2len == 1 op2[0] == 1) + if (op2 == 1) { for (i = 0; i op1len; i++) - val[i] = op1[i]; + val[i] = op1val[i]; return op1len; } @@ -1316,13 +1355,13 @@ wi::mul_internal (HOST_WIDE_INT *val, co if (sgn == SIGNED) { - o0 = sext_hwi (op1[0], prec); - o1 = sext_hwi (op2[0], prec); + o0 = op1.to_shwi (); + o1 = op2.to_shwi (); } else { - o0 = zext_hwi (op1[0], prec); - o1 = zext_hwi (op2[0], prec); + o0 = op1.to_uhwi (); + o1 = op2.to_uhwi (); } r = o0 * o1; @@ -1344,9 +1383,9 @@ wi::mul_internal (HOST_WIDE_INT *val, co } /* We do unsigned mul and then correct it. */ - wi_unpack (u, (const unsigned HOST_WIDE_INT*)op1, op1len, + wi_unpack (u, (const unsigned HOST_WIDE_INT *) op1val, op1len, half_blocks_needed, prec, SIGNED); - wi_unpack (v, (const unsigned HOST_WIDE_INT*)op2, op2len, + wi_unpack (v, (const unsigned HOST_WIDE_INT *) op2val, op2len, half_blocks_needed, prec, SIGNED);
Re: [PATCH 2/2] Implement -fsanitize=signed-integer-overflow (i?86 parts)
On Wed, Dec 04, 2013 at 02:52:25PM +0100, Uros Bizjak wrote: On Wed, Dec 4, 2013 at 2:44 PM, Marek Polacek pola...@redhat.com wrote: And this is the i?86 specific part of -fsanitize=signed-integer-overflow, split out of the huge patch. It really is dependent on the generic parts, when commiting, I'll put both parts together. Just a question (I will review the patch later today): shouldn't generic parts also work without new target patterns and use __addv* stuff from libgcc when patterns are not present? They work (except for multiplication checking with widest supported mode, to be supported later), but they can't use __addv* and co., because those functions __builtin_trap () on overflow, while for -fsanitize=signed-integer-overflow, if we wanted a library solution, we'd need library functions that would return us both result and bool whether overflow happened. As addition/subtraction/negation overflow checking is short and easily inlinable, that is done always inline now, and for multiplication the code right now expands WIDEN_MULT_EXPR if possible. Note that using get_range_info the generic expansion could be supposedly improved, for add/sub we right now at runtime compare op1 against zero and do one thing if it is negative and another if non-negative. If VRP info tells us that either op0 or op1 is known to be non-negative or known to be negative, we could just simplify the expansion. I guess similarly for the multiplication, but after all, I think the VRP info could be useful even for normal multiplication expansion, e.g. if we want to do a WIDEN_MULT_EXPR, but know that given the operand ranges we can actually do a MULT_EXPR only and then just sign/zero extend the result, that will likely be cheaper. If VRP figures out there will never be an overflow, then we already optimize the UBSAN_* internal builtins into normal PLUS_EXPR etc. Jakub
Re: [PATCH] Fix --with-long-double-128 for sparc32 when defaulting to 64-bit
On Wed, Dec 04, 2013 at 08:53:50AM +0100, Jakub Jelinek wrote: On Wed, Dec 04, 2013 at 08:49:32AM +0100, Aurelien Jarno wrote: On sparc, the --with-long-double-128 option doesn't change anything for a 64-bit compiler, as it always default to 128-bit long doubles. For a 32/64-bit compiler defaulting to 32-bit this correctly control the size of long double of the 32-bit compiler, however for a 32/64-bit compiler defaulting to 64-bit, the built-in specs force the -mlong-double-64 option. This makes the option useless in this case. The patch below fixes that by removing the -mlong-double-64 from the built-in spec, using the default instead. So how do you configure 64/32-bit compiler defaulting to 64-bit, where 32-bit defaults to -mlong-double-64? Naively I would have say by *not* passing --with-long-double-128 to configure like for a 64/32-bit compiler defaulting to 32-bit, but it stills defaults to 128-bit long doubles with my patch. Actually it's also the case for a 64/32-bit compiler defaulting to 32-bit, which make the --with-long-double-128 option completely useless on sparc64. Whatever the option, the result would always be the same with the current SVN: 64/32-bit compiler defaulting to 32-bit: - 128-bit long doubles for -m32 - 128-bit long doubles for -m64 64/32-bit compiler defaulting to 64-bit: - 64-bit long doubles for -m32 - 128-bit long doubles for -m64 I have to digg a bit more to see how to fix that, but even the current code is not really consistent. Changelog gcc/ 2013-12-04 Aurelien Jarno aurel...@aurel32.net * config/sparc/linux64.h (CC1_SPEC): When defaulting to 64-bit, don't force -mlong-double-64 when -m32 or -mv8plus is given. Index: gcc/config/sparc/linux64.h === --- gcc/config/sparc/linux64.h (revision 205647) +++ gcc/config/sparc/linux64.h (working copy) @@ -162,9 +162,9 @@ #else #define CC1_SPEC %{profile:-p} \ %{m32:%{m64:%emay not use both -m32 and -m64}} \ -%{m32:-mptr32 -mno-stack-bias %{!mlong-double-128:-mlong-double-64} \ +%{m32:-mptr32 -mno-stack-bias \ %{!mcpu*:-mcpu=cypress}} \ -%{mv8plus:-mptr32 -mno-stack-bias %{!mlong-double-128:-mlong-double-64} \ +%{mv8plus:-mptr32 -mno-stack-bias \ %{!mcpu*:-mcpu=v9}} \ %{!m32:%{!mcpu*:-mcpu=ultrasparc}} \ %{!mno-vis:%{!m32:%{!mcpu=v9:-mvis}}} \ Jakub -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net
Re: [PATCH 2/2] Implement -fsanitize=signed-integer-overflow (i?86 parts)
On Wed, Dec 04, 2013 at 02:52:25PM +0100, Uros Bizjak wrote: On Wed, Dec 4, 2013 at 2:44 PM, Marek Polacek pola...@redhat.com wrote: And this is the i?86 specific part of -fsanitize=signed-integer-overflow, split out of the huge patch. It really is dependent on the generic parts, when commiting, I'll put both parts together. Just a question (I will review the patch later today): shouldn't Perfect, thanks! generic parts also work without new target patterns and use __addv* stuff from libgcc when patterns are not present? If we can't use target patterns, we fall back to generic implementation, using emit_cmp_and_jump_insns/emit_jump etc. This generic implementation is indeed modelled after libgcc routines. Marek
.cfi in sanitizer code
[new subject. was: libsanitizer merge from upstream r196090] .cfi is used only in tsan sources now, and tsan is not supported anywhere but x86_64 But the .cfi_* issue is platform independent. Whether the compiler decides to emit them or not depends on how it was configured, on assembler and on compiler flags. I don't see how it can be a maintainance problem to just guard the few (right now two) .cfi_* occurrences in the C++ sources, or using CFI_* macros instead of .cfi_* directives directly in the assembly file. Other projects (e.g. glibc) manage to do that for years without any trouble. This is a maintenance problem because we can not test if we broke something during development. e.g. clang doesn't seem to support -fno-dwarf2-cfi-asm Then, if we get notified about the problem we spend 10x more time fixing it because 1. the context is different 2. the patch you or other GCC folks send applies to GCC tree while we need to apply it to LLVM (e.g. your patch has tsan/tsan_rtl.h but our tree has tsan/rtl/tsan_rtl.h and even with that fixed it does not apply) 3. we still can't easily verify the fix. I can commit a change similar to your cfi-related changes (guarded by SANITIZER_DONT_USE_CFI_ASM instead of __GCC_HAVE_DWARF2_CFI_ASM), but the problem will arise again --kcc
Re: [PATCH] Add signed integer overflow checking to ubsan
On Tue, Dec 03, 2013 at 02:14:17PM -0700, Jeff Law wrote: Perhaps split this patch into two parts which can be reviewed independently, but go into the tree at the same time. The obvious hope would be that Uros or one of the other x86 backend folks could chime in on that part. I posted the i?86 bits separately. --- gcc/ubsan.h.mp 2013-11-27 08:46:28.046629473 +0100 +++ gcc/ubsan.h 2013-11-27 08:46:57.578753342 +0100 @@ -21,6 +21,12 @@ along with GCC; see the file COPYING3. #ifndef GCC_UBSAN_H #define GCC_UBSAN_H +/* From predict.c. */ +#define PROB_VERY_UNLIKELY (REG_BR_PROB_BASE / 2000 - 1) +#define PROB_EVEN (REG_BR_PROB_BASE / 2) +#define PROB_VERY_LIKELY(REG_BR_PROB_BASE - PROB_VERY_UNLIKELY) +#define PROB_ALWAYS (REG_BR_PROB_BASE) Seems like this should factor out rather than get duplicated. I moved all the into predict.h, the users now include predict.h. --- gcc/gimple-fold.c.mp 2013-11-27 08:46:27.979629191 +0100 +++ gcc/gimple-fold.c2013-11-27 08:46:57.556753251 +0100 @@ -2660,8 +2660,30 @@ gimple_fold_stmt_to_constant_1 (gimple s tree fn; if (gimple_call_internal_p (stmt)) - /* No folding yet for these functions. */ - return NULL_TREE; + { +enum tree_code subcode = ERROR_MARK; +switch (gimple_call_internal_fn (stmt)) + { + case IFN_UBSAN_CHECK_ADD: subcode = PLUS_EXPR; break; + case IFN_UBSAN_CHECK_SUB: subcode = MINUS_EXPR; break; + case IFN_UBSAN_CHECK_MUL: subcode = MULT_EXPR; break; Minor detail, put the case value and associated codes on separate lines. case FU: code; more code break; case BAR blah; break; Done. --- gcc/tree-vrp.c.mp2013-11-27 08:46:28.043629459 +0100 +++ gcc/tree-vrp.c 2013-11-27 08:46:57.570753307 +0100 @@ -3757,6 +3757,40 @@ extract_range_basic (value_range_t *vr, break; } } + else if (is_gimple_call (stmt) +gimple_call_internal_p (stmt)) +{ + enum tree_code subcode = ERROR_MARK; + switch (gimple_call_internal_fn (stmt)) +{ +case IFN_UBSAN_CHECK_ADD: subcode = PLUS_EXPR; break; +case IFN_UBSAN_CHECK_SUB: subcode = MINUS_EXPR; break; +case IFN_UBSAN_CHECK_MUL: subcode = MULT_EXPR; break; +default: break; Formatting again. Done. Overall the stuff outside the i386 directory looks pretty good, though it needs some minor updates. I'd suggest extracting the i386 bits and pinging them as a separate patch in the hope that we'll get Uros's attention. Done, I posted splitted version of the patch. Thanks for the review. Marek
Re: .cfi in sanitizer code
On Wed, Dec 04, 2013 at 06:09:56PM +0400, Konstantin Serebryany wrote: This is a maintenance problem because we can not test if we broke something during development. e.g. clang doesn't seem to support -fno-dwarf2-cfi-asm It does, at least both clang 3.3 (from Fedora 19) and clang 3.4 r194685 (which I've built myself some time ago just to look at the use-after-return etc. sanitization). I can commit a change similar to your cfi-related changes (guarded by SANITIZER_DONT_USE_CFI_ASM instead of __GCC_HAVE_DWARF2_CFI_ASM), but the problem will arise again Why? Is it so hard to remember that when you add .cfi_* directives they should be guarded by that macro? Even if the patch author forgets about that, patch reviewer should catch that. Jakub
Re: libsanitizer merge from upstream r196090
On Wed, Dec 4, 2013 at 5:44 PM, Jakub Jelinek ja...@redhat.com wrote: On Wed, Dec 04, 2013 at 05:28:40PM +0400, Konstantin Serebryany wrote: Well, for the kernel headers what we perhaps can do is just add libsanitizer/include/linux/ tree that will be maintained by GCC and will if that works for you, no objections. I haven't tried to do that yet, so don't know how much work it will be, but at least from the second patch posted recently it it might work fine, at least for now. .cfi is used only in tsan sources now, and tsan is not supported anywhere but x86_64 But the .cfi_* issue is platform independent. Whether the compiler decides to emit them or not depends on how it was configured, on assembler and on compiler flags. I don't see how it can be a maintainance problem to just guard the few (right now two) .cfi_* occurrences in the C++ sources, or using CFI_* macros instead of .cfi_* directives directly in the assembly file. Other projects (e.g. glibc) manage to do that for years without any trouble. replied separately. ppc32 never worked (last time I tried there were several different issues so we disabled 32-bit build) -- we should just disable it in GCC too. There is not value in building code that does not run. That doesn't mean it can't be made to work, and the patch I've posted is at least an (IMHO correct) step towards that. Sure it can. But all my previous grumbling about maintenance cost and our inability to test changes, etc applies here. Note, I had just much bigger problems on ppc64 with the addr2line symbolization because of the ppc64 opd/plt stuff, though supposedly that might go away once I patch libsanitizer to use libbacktrace for symbolization. There is no inherent reason why ppc32 wouldn't work and ppc64 would, after all ppc64 with its weirdo function descriptor stuff is much harder to support. Regarding the TLS size, can you e.g. just only do it for glibc 2.13 and later, rather than having an (even for glibc 2.11/2.12 incorrect) values for older glibcs? That would work for me, although it may bring some surprises later. If we incorrectly compute the tls boundaries, lsan my produce false positives or false negatives. But is that solely for lsan and nothing else? Mmm. I *think* yes, today this is lsan-only. Because, the assertion was failing in asan tests, without any asan options to request leak checking. And for non-i?86/x86_64 you ignore the tls boundaries too. My patch above should remove the assertion on 2.13 Having kThreadDescriptorSize=0 means that we include the stack descriptor in the lsan's root set and thus may miss a leak (with rather low probability). I can live with this. Like this (tested only on my box)? --- sanitizer_linux_libcdep.cc (revision 196375) +++ sanitizer_linux_libcdep.cc (working copy) @@ -207,12 +207,12 @@ #if defined(__x86_64__) || defined(__i386__) // sizeof(struct thread) from glibc. -// There has been a report of this being different on glibc 2.11 and 2.13. We -// don't know when this change happened, so 2.14 is a conservative estimate. -#if __GLIBC_PREREQ(2, 14) +// This may change between glibc versions, we only support the versions we know +// avout (= 2.13). For others we set kThreadDescriptorSize to 0. +#if __GLIBC_PREREQ(2, 13) const uptr kThreadDescriptorSize = FIRST_32_SECOND_64(1216, 2304); #else -const uptr kThreadDescriptorSize = FIRST_32_SECOND_64(1168, 2304); +const uptr kThreadDescriptorSize = 0; // Unknown. Depends on (as I've asked earlier) on if you need the exact precise value or if say conservatively smaller value is fine. Then you could say for glibc = 2.5 pick the minimum of the values I've gathered. precise is better, otherwise we may lose leaks. Jakub
Re: [wwwdocs] Update obvious fix commit policy
On Tue, Dec 3, 2013 at 6:55 PM, Gerald Pfeifer ger...@pfeifer.com wrote: On Thu, 28 Nov 2013, Richard Biener wrote: Why remove ChangeLog files, web pages and comments? I was going to complain about web pages being removed. :-) On Thu, 28 Nov 2013, Diego Novillo wrote: -pFixes for obvious typos in ChangeLog files, docs, web pages, comments -and similar stuff. Just check in the fix and copy it to -codegcc-patches/code. We don't want to get overly anal-retentive -about checkin policies./p +pObvious fixes can be committed without prior approval. Just check +in the fix and copy it to codegcc-patches/code. A good test to +determine whether a fix is obvious: qwill the person who objects to +my work the most be able to find a fault with my fix?/q If the fix +is later found to be faulty, it can always be rolled back. We don't +want to get overly restrictive about checkin policies./p I am in favor of this change. To some extent, this is more a clarification of what I have seen as our current policy than a change in policy, though to a laywer-minded person it surely looks like the latter. Not sure what kind of approval this needs? Mind it has. I have not received any feedback against this change. I will wait another 48 hours and commit. Diego.
[PATCH][ARM][3/3] Implement crypto intrinsics in AArch32 ARMv8-A - documentation
Hi all, This is the final patch in the series, adding the documentation for the intrinsics. Most of it is autogenerated from neon-docgen.ml and the ones that are not are added explicitly in neon-docgen.ml so that they appear in the generated .texi file. Not much else to say on this patch. Ok for trunk? Thanks, Kyrill 2013-12-04 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/neon-docgen.ml: Add crypto intrinsics documentation. * doc/arm-neon-intrinsics.texi: Regenerate. diff --git a/gcc/config/arm/neon-docgen.ml b/gcc/config/arm/neon-docgen.ml index f17314f..41ae059 100644 --- a/gcc/config/arm/neon-docgen.ml +++ b/gcc/config/arm/neon-docgen.ml @@ -36,8 +36,8 @@ open Neon -(* The combined ops and reinterp table. *) -let ops_reinterp = reinterp @ ops +(* The combined ops and reinterp tables. *) +let ops_reinterp = reinterp @ reinterpq @ ops (* Helper functions for extracting things from the ops table. *) let single_opcode desired_opcode () = @@ -329,6 +329,77 @@ let gnu_header chan = @c This file is generated automatically using gcc/config/arm/neon-docgen.ml; @c Please do not edit manually.] +let crypto_doc = + +@itemize @bullet +@item poly128_t vldrq_p128(poly128_t const *) +@end itemize + +@itemize @bullet +@item void vstrq_p128(poly128_t *, poly128_t) +@end itemize + +@itemize @bullet +@item uint32_t vsha1h_u32 (uint32_t) +@*@emph{Form of expected instruction(s):} @code{sha1h.32 @var{q0}, @var{q1}} +@end itemize + +@itemize @bullet +@item uint32x4_t vsha1cq_u32 (uint32x4_t, uint32_t, uint32x4_t) +@*@emph{Form of expected instruction(s):} @code{sha1c.32 @var{q0}, @var{q1}, @var{q2}} +@end itemize + +@itemize @bullet +@item uint32x4_t vsha1pq_u32 (uint32x4_t, uint32_t, uint32x4_t) +@*@emph{Form of expected instruction(s):} @code{sha1p.32 @var{q0}, @var{q1}, @var{q2}} +@end itemize + +@itemize @bullet +@item uint32x4_t vsha1mq_u32 (uint32x4_t, uint32_t, uint32x4_t) +@*@emph{Form of expected instruction(s):} @code{sha1m.32 @var{q0}, @var{q1}, @var{q2}} +@end itemize + +@itemize @bullet +@item uint32x4_t vsha1su0q_u32 (uint32x4_t, uint32x4_t, uint32x4_t) +@*@emph{Form of expected instruction(s):} @code{sha1su0.32 @var{q0}, @var{q1}, @var{q2}} +@end itemize + +@itemize @bullet +@item uint32x4_t vsha1su1q_u32 (uint32x4_t, uint32x4_t) +@*@emph{Form of expected instruction(s):} @code{sha1su1.32 @var{q0}, @var{q1}, @var{q2}} +@end itemize + +@itemize @bullet +@item uint32x4_t vsha256hq_u32 (uint32x4_t, uint32x4_t, uint32x4_t) +@*@emph{Form of expected instruction(s):} @code{sha256h.32 @var{q0}, @var{q1}, @var{q2}} +@end itemize + +@itemize @bullet +@item uint32x4_t vsha256h2q_u32 (uint32x4_t, uint32x4_t, uint32x4_t) +@*@emph{Form of expected instruction(s):} @code{sha256h2.32 @var{q0}, @var{q1}, @var{q2}} +@end itemize + +@itemize @bullet +@item uint32x4_t vsha256su0q_u32 (uint32x4_t, uint32x4_t) +@*@emph{Form of expected instruction(s):} @code{sha256su0.32 @var{q0}, @var{q1}} +@end itemize + +@itemize @bullet +@item uint32x4_t vsha256su1q_u32 (uint32x4_t, uint32x4_t, uint32x4_t) +@*@emph{Form of expected instruction(s):} @code{sha256su1.32 @var{q0}, @var{q1}, @var{q2}} +@end itemize + +@itemize @bullet +@item poly128_t vmull_p64 (poly64_t a, poly64_t b) +@*@emph{Form of expected instruction(s):} @code{vmull.p64 @var{q0}, @var{d1}, @var{d2}} +@end itemize + +@itemize @bullet +@item poly128_t vmull_high_p64 (poly64x2_t a, poly64x2_t b) +@*@emph{Form of expected instruction(s):} @code{vmull.p64 @var{q0}, @var{d1}, @var{d2}} +@end itemize + + (* Program entry point. *) let _ = if Array.length Sys.argv 2 then @@ -339,6 +410,7 @@ let _ = let chan = open_out file in gnu_header chan; List.iter (document_group chan) intrinsic_groups; +Printf.fprintf chan %s\n crypto_doc; close_out chan with Sys_error sys - failwith (Could not create output file ^ file ^ : ^ sys)
[PATCH][ARM][0/3] Implement crypto intrinsics in AArch32 ARMv8-A
Hi all, This patch series implements the new arm_neon.h intrinsics that map down to the ARMv8-A cryptographic instructions. The instructions are considered to be part of NEON and they can be enabled by specifying -mfpu=crypto-neon-fp-armv8 (of course we still need the hard or softfp float ABI). Two of the intrinsics: vmull_p64 and vmull_high_p64 use the new poly64_t and poly128_t types and therefore these patches also add support for these types and most of the intrinsics associated with creating, reinterpreting, loading, storing and extracting these types. Most of these auxiliary intrinsics are autogenerated from the neon.ml scripts in the arm backend, but some had to be hardcoded because they don't follow a regular pattern. Note that these types and intrinsics are not available unless you specify the crypto-neon-fp-armv8 FPU. The __ARM_FEATURE_CRYPTO feature test macro is defined and is used throughout arm_neon.h to gate the new types and intrinsics. Patches 2 and 3 add the testsuite and documentation respectively. Most of it is autogenerated. Bootstrapped on arm-none-linux-gnueabihf and tested on a model. Note, this patch series' context depends on the CRC32 intrinsics patch that is in review at: http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02351.html Thanks, Kyrill P.S. These patches only touch the arm backend and do not affect any other parts of the compiler.
Re: [patch] combine ICE fix
On 12/03/2013 02:38 PM, Jeff Law wrote: On 12/03/13 12:25, Kenneth Zadeck wrote: On 12/03/2013 01:52 PM, Mike Stump wrote: On Dec 2, 2013, at 10:26 PM, Jeff Law l...@redhat.com wrote: On 11/27/13 17:13, Cesar Philippidis wrote: I looked into adding support for incremental DF scanning from df*.[ch] in combine but there are a couple of problems. First of all, combine does its own DF analysis. It does so because its usage falls under this category (df-core.c): c) If the pass modifies insns several times, this incremental updating may be expensive. Furthermore, combine's DF relies on the DF scanning to be deferred, so the DF_REF_DEF_COUNT values would be off. Eg, calls to SET_INSN_DELETED take place before it updates the notes for those insns. Also, combine has a tendency to undo its changes occasionally. I think at this stage of the release cycle, converting combine to incremental DF is probably a no-go. However, we should keep it in mind for the future -- while hairy I'd really like to see that happen in the long term. I think Kenny has some thoughts in this area. I'll cc him to ensure he sees it. it is the tendency to undo things (i would use the word frequently rather than) occasionally that kept me from doing this years ago. Shove a bunch of things together, simplify, then try to recognize the result. If that fails, undo everything. In theory, this could be replaced by making a copy of the original, doing the combination/simplification, then recognition. If successful, then update DF and remove the original I3, if not successful, drop the copy. That avoids the undo nonsense. jeff that could certainly work.
Re: [PATCH/AARCH64 3/6] Fix up multi-lib options
Looks good to me, but I cannot approve it. Yufeng On 12/03/13 21:24, Andrew Pinski wrote: Hi, The arguments to --with-multilib-list for AARCH64 are exclusive but currently is being treated as ones which are not. This causes problems in that we get four library sets with --with-multilib-list=lp64,ilp32: empty, lp64, ilp32, lp64/ilp32. The first and last one does not make sense and should not be there. This patch changes the definition of MULTILIB_OPTIONS so we have a / inbetween the options rather than a space. OK? Build and tested on aarch64-elf with both --with-multilib-list=lp64,ilp32 and without it. Thanks, Andrew Pinski * config/aarch64/t-aarch64 (MULTILIB_OPTIONS): Fix definition so that options are conflicting ones. --- gcc/ChangeLog|2 +- gcc/config/aarch64/t-aarch64 |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) iff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64 index 9f8d8cd..98a30d8 100644 --- a/gcc/config/aarch64/t-aarch64 +++ b/gcc/config/aarch64/t-aarch64 @@ -41,5 +41,5 @@ aarch-common.o: $(srcdir)/config/arm/aarch-common.c $(CONFIG_H) $(SYSTEM_H) \ $(srcdir)/config/arm/aarch-common.c comma=, -MULTILIB_OPTIONS= $(patsubst %, mabi=%, $(subst $(comma), ,$(TM_MULTILIB_CONFIG))) +MULTILIB_OPTIONS= $(subst $(comma),/, $(patsubst %, mabi=%, $(subst $(comma),$(comma)mabi=,$(TM_MULTILIB_CONFIG MULTILIB_DIRNAMES = $(subst $(comma), ,$(TM_MULTILIB_CONFIG))
[commited] Fix up testcase
I'm applying the following as obvious, GCC 4.7 doesn't grok -Wpedantic. Sorry for not testing that properly. 2013-12-04 Marek Polacek pola...@redhat.com PR c/59351 testsuite/ * gcc.dg/pr59351.c: Use -pedantic instead of -Wpedantic. --- gcc/testsuite/gcc.dg/pr59351.c.mp3 2013-12-04 16:49:17.232824975 +0100 +++ gcc/testsuite/gcc.dg/pr59351.c 2013-12-04 16:49:30.380873769 +0100 @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options -std=c99 -Wpedantic } */ +/* { dg-options -std=c99 -pedantic } */ unsigned int foo (void) Marek
Re: libsanitizer merge from upstream r196090
Well, it regresses against 4.8, so it still is a P1 regression. Does anyone care? Well, you’re one of the maintainers of libsanitizer for GCC, so if you do not care about regressions in your code, it makes little sense for GCC (the whole project) to keep libsanitizer. I’ve posted this regression a month ago, it was not addressed. I’m not sure under what specific arrangement libsanitizer was added to GCC, but in general there is a responsibility of maintainers not to break bootstrap in their code. Yes, it’s a cost, and if you are not willing to do it, why did you contribute in the first place? Or is it a “hit and run” approach to maintainership? FX
Re: [Patch, RTL] Eliminate redundant vec_select moves.
Richard Sandiford wrote: Tejas Belagod tbela...@arm.com writes: Richard Sandiford wrote: Tejas Belagod tbela...@arm.com writes: The problem is that one reg rtx can span several hard registers. E.g. (reg:V4SI 32) might represent one 64-bit register (no. 32), but it might instead represent two 32-bit registers (nos. 32 and 33). Obviously the latter's not very likely for vectors this small, but more likely for larger ones (including on NEON IIRC). So if we had 2 32-bit registers being treated as a V4HI, it would be: --3233-- msb lsb msb lsb --32-- for big endian and: --3332-- msb lsb msb lsb --32-- for little endian. Ah, ok, that makes things clearer. Thanks for that. I can't find any helper function that figures out if we're writing partial or full result regs. Would something like REGNO (src) == REGNO (dst) HARD_REGNO_NREGS (src) == HARD_REGNO_NREGS (dst) == 1 be a sane check for partial result regs? Yeah, that should work. I think a more general alternative would be: simplify_subreg_regno (REGNO (src), GET_MODE (src), offset, GET_MODE (dst)) == (int) REGNO (dst) where: offset = GET_MODE_UNIT_SIZE (GET_MODE (src)) * INTVAL (XVECEXP (sel, 0)) That offset is the byte offset of the first selected element from the start of a vector in memory, which is also the way that SUBREG_BYTEs are counted. For little-endian it gives the offset of the lsb of the slice, while for big-endian it gives the offset of the msb (which is also how SUBREG_BYTEs work). The simplify_subreg_regno should cope with both single-register vectors and multi-register vectors. Sorry for the delayed response to this. Thanks for the tip. Here's an improved patch that implements the simplify_sureg_regno () method of eliminating redundant moves. Regarding the test case, I failed to get the ppc back-end to generate RTL pattern that this patch checks for. I can easily write a test case for aarch64(big and little endian) on these lines typedef float float32x4_t __attribute__ ((__vector_size__ (16))); float foo_be (float32x4_t x) { return x[3]; } float foo_le (float32x4_t x) { return x[0]; } where I know that the vector indexing will generate a vec_select on the same src and dst regs that could be optimized away and hence test it. But I'm struggling to get a test case that the ppc altivec back-end will generate such a vec_select for. I see that altivec does not define vec_extract, so a simple indexing like this seems to happen via memory. Also, I don't know enough about the ppc PCS or architecture to write a test that will check for this optimization opportunity on same src and dst hard-registers. Any hints? Me neither, sorry. FWIW, the MIPS tests: typedef float float32x2_t __attribute__ ((__vector_size__ (8))); void bar (float); void foo_be (float32x2_t x) { bar (x[1]); } void foo_le (float32x2_t x) { bar (x[0]); } also exercise it, but I don't think they add anything over the aarch64 versions. I can add them to the testsuite anyway if it helps though. diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c index 0cd0c7e..ca25ce5 100644 --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -1180,6 +1180,22 @@ set_noop_p (const_rtx set) dst = SUBREG_REG (dst); } + /* It is a NOOP if destination overlaps with selected src vector + elements. */ + if (GET_CODE (src) == VEC_SELECT + REG_P (XEXP (src, 0)) REG_P (dst) + HARD_REGISTER_P (XEXP (src, 0)) + HARD_REGISTER_P (dst)) +{ + rtx par = XEXP (src, 1); + rtx src0 = XEXP (src, 0); + HOST_WIDE_INT offset = + GET_MODE_UNIT_SIZE (GET_MODE (src0)) * INTVAL (XVECEXP (par, 0, 0)); + + return simplify_subreg_regno (REGNO (src0), GET_MODE (src0), + offset, GET_MODE (dst)) == (int)REGNO (dst); +} + Since this also (correctly) triggers for vector results, we need to keep the check for consecutive indices that you had originally. (It's always the first index that should be used for the simplify_subreg_regno though.) Looks good to me otherwise, thanks. Thanks Richard. Here is a revised patch. Sorry about the delay - I was investigating to make sure an LRA ICE I was seeing on aarch64 was unrelated to this patch. I've added a test case that I expect to pass for aarch64. I've also added the tests that you suggested for MIPS, but haven't checked for the target because I'm not sure what optimizations happen on MIPS. OK for trunk? Thanks, Tejas. 2013-12-04 Tejas Belagod tejas.bela...@arm.com gcc/ * rtlanal.c (set_noop_p): Return nonzero in case of redundant vec_select for overlapping register lanes. testsuite/ * config/gcc.dg/vect/vect-nop-move.c: New. diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c index 0cd0c7e..e1388c8 100644 ---
[PATCH, ARM] Implement __builtin_trap
Hi, Currently, on ARM, you have to either call abort() or raise(SIGTRAP) to achieve a handy crash. This patch allows you to instead call __builtin_trap() which is much more efficient at falling over because it becomes just a single instruction that will trap for you. Two testcases have been added (for ARM and Thumb) and both pass. Note: This is a modified version of a patch originally submitted by Mark Mitchell back in 2010, which came in response to PR target/59091. http://gcc.gnu.org/ml/gcc-patches/2010-09/msg00639.html http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59091 The main update, other than cosmetic differences, is that we've chosen the same ARM encoding as LLVM for practical purposes. (The Thumb encoding in Mark's patch already matched LLVM.) OK for trunk? Cheers, Ian 2013-12-04 Ian Bolton ian.bol...@arm.com Mark Mitchell m...@codesourcery.com gcc/ * config/arm/arm.md (trap): New pattern. * config/arm/types.md: Added a type for trap. testsuite/ * gcc.target/arm/builtin-trap.c: New test. * gcc.target/arm/thumb-builtin-trap.c: Likewise. diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index dd73366..3b7a827 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -9927,6 +9927,22 @@ (set_attr type mov_reg)] ) +(define_insn trap + [(trap_if (const_int 1) (const_int 0))] + + * + if (TARGET_ARM) +return \.inst\\t0xe7f000f0\; + else +return \.inst\\t0xdeff\; + + [(set (attr length) + (if_then_else (eq_attr is_thumb yes) + (const_int 2) + (const_int 4))) + (set_attr type trap)] +) + ;; Patterns to allow combination of arithmetic, cond code and shifts diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md index 1c4b9e3..6351f08 100644 --- a/gcc/config/arm/types.md +++ b/gcc/config/arm/types.md @@ -152,6 +152,7 @@ ; store2 store 2 words to memory from arm registers. ; store3 store 3 words to memory from arm registers. ; store4 store 4 (or more) words to memory from arm registers. +; trap cause a trap in the kernel. ; udiv unsigned division. ; umaal unsigned multiply accumulate accumulate long. ; umlal unsigned multiply accumulate long. @@ -645,6 +646,7 @@ store2,\ store3,\ store4,\ + trap,\ udiv,\ umaal,\ umlal,\ diff --git a/gcc/testsuite/gcc.target/arm/builtin-trap.c b/gcc/testsuite/gcc.target/arm/builtin-trap.c new file mode 100644 index 000..4ff8d25 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/builtin-trap.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm32 } */ + +void +trap () +{ + __builtin_trap (); +} + +/* { dg-final { scan-assembler 0xe7f000f0 { target { arm_nothumb } } } } */ diff --git a/gcc/testsuite/gcc.target/arm/thumb-builtin-trap.c b/gcc/testsuite/gcc.target/arm/thumb-builtin-trap.c new file mode 100644 index 000..22e90e7 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/thumb-builtin-trap.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options -mthumb } */ +/* { dg-require-effective-target arm_thumb1_ok } */ + +void +trap () +{ + __builtin_trap (); +} + +/* { dg-final { scan-assembler 0xdeff } } */
Re: [Patch, RTL] Eliminate redundant vec_select moves.
On Wed, Dec 4, 2013 at 8:06 AM, Tejas Belagod tbela...@arm.com wrote: Thanks Richard. Here is a revised patch. Sorry about the delay - I was investigating to make sure an LRA ICE I was seeing on aarch64 was unrelated to this patch. I've added a test case that I expect to pass for aarch64. I've also added the tests that you suggested for MIPS, but haven't checked for the target because I'm not sure what optimizations happen on MIPS. OK for trunk? Thanks, Tejas. 2013-12-04 Tejas Belagod tejas.bela...@arm.com gcc/ * rtlanal.c (set_noop_p): Return nonzero in case of redundant vec_select for overlapping register lanes. testsuite/ * config/gcc.dg/vect/vect-nop-move.c: New. diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c index 0cd0c7e..e1388c8 100644 --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -1180,6 +1180,26 @@ set_noop_p (const_rtx set) dst = SUBREG_REG (dst); } + /* It is a NOOP if destination overlaps with selected src vector + elements. */ + if (GET_CODE (src) == VEC_SELECT + REG_P (XEXP (src, 0)) REG_P (dst) + HARD_REGISTER_P (XEXP (src, 0)) + HARD_REGISTER_P (dst)) +{ + int i; + rtx par = XEXP (src, 1); + rtx src0 = XEXP (src, 0); + int c0 = INTVAL (XVECEXP (par, 0, 0)); + HOST_WIDE_INT offset = GET_MODE_UNIT_SIZE (GET_MODE (src0)) * c0; + + for (i = 1; i XVECLEN (par, 0); i++) + if (INTVAL (XVECEXP (par, 0, i)) != c0 + i) + return 0; + return simplify_subreg_regno (REGNO (src0), GET_MODE (src0), + offset, GET_MODE (dst)) == (int)REGNO (dst); +} + return (REG_P (src) REG_P (dst) REGNO (src) == REGNO (dst)); } diff --git a/gcc/testsuite/gcc.dg/vect/vect-nop-move.c b/gcc/testsuite/gcc.dg/vect/vect-nop-move.c new file mode 100644 index 000..1941933 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-nop-move.c @@ -0,0 +1,64 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_float } */ +/* { dg-options -O3 -fdump-rtl-combine-details } */ + +extern void abort (void); + +#define NOINLINE __attribute__((noinline)) + +typedef float float32x4_t __attribute__ ((__vector_size__ (16))); +typedef float float32x2_t __attribute__ ((__vector_size__ (8))); + +NOINLINE float +foo32x4_be (float32x4_t x) +{ + return x[3]; +} + +NOINLINE float +foo32x4_le (float32x4_t x) +{ + return x[0]; +} + +NOINLINE float +bar (float a) +{ + return a; +} + +NOINLINE float +foo32x2_be (float32x2_t x) +{ + return bar (x[1]); +} + +NOINLINE float +foo32x2_le (float32x2_t x) +{ + return bar (x[0]); +} + +int +main() +{ + float32x4_t a = { 0.0f, 1.0f, 2.0f, 3.0f }; + float32x2_t b = { 0.0f, 1.0f }; + + if (foo32x4_be (a) != 3.0f) +abort (); + + if (foo32x4_le (a) != 0.0f) +abort (); + + if (foo32x2_be (b) != 1.0f) +abort (); + + if (foo32x2_le (b) != 0.0f) +abort (); + + return 0; +} + +/* { dg-final { scan-rtl-dump deleting noop move combine { target aarch64*-*-* } } } */ Any particular reason why it doesn't work for x86? +/* { dg-final { cleanup-rtl-dump combine } } */ Thanks. -- H.J.
Re: [PATCH] Fix force_to_mode not to modify in place the passed rtl (PR rtl-optimization/58726)
On 12/04/13 03:40, Richard Biener wrote: On Wed, Dec 4, 2013 at 11:07 AM, Eric Botcazou ebotca...@adacore.com wrote: Fixed by making sure force_to_mode doesn't modify x in place. I think that it's the way to go, force_to_mode doesn't modify its argument except for these 2 cases. I'm not sure what the story is, but calling SUBST for these 2 cases doesn't seem really necessary. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.8? 2013-12-03 Jakub Jelinek ja...@redhat.com PR rtl-optimization/58726 * combine.c (force_to_mode): Fix comment typo. Don't destructively modify x for ROTATE, ROTATERT and IF_THEN_ELSE. * gcc.c-torture/execute/pr58726.c: New test. IMO it's the best fix at this point of the release cycles. I agree. I can live with the nagging feeling that we've got a deeper problem here :-) So I won't object to this approach. jeff
Re: [PATCH/AARCH64 6/6] Support ILP32 multi-lib
I think together with this patch, the default value for --with-multilib-list when it is absent can be updated to lp64,ilp32 from lp64 only. This will make the multi-lib default setting on aarch64*-*-linux* consist that on aarch64*-*-elf. See gcc/config.gcc. Thanks, Yufeng P.S. Copypaste related configury snippet. aarch64*-*-linux*) tm_file=${tm_file} dbxelf.h elfos.h gnu-user.h linux.h glibc-stdint.h tm_file=${tm_file} aarch64/aarch64-elf.h aarch64/aarch64-linux.h tmake_file=${tmake_file} aarch64/t-aarch64 aarch64/t-aarch64-linux case $target in aarch64_be-*) tm_defines=${tm_defines} TARGET_BIG_ENDIAN_DEFAULT=1 ;; esac aarch64_multilibs=${with_multilib_list} if test $aarch64_multilibs = default; then # TODO: turn on ILP32 multilib build after its support is mature. # aarch64_multilibs=lp64,ilp32 aarch64_multilibs=lp64 fi On 12/03/13 21:24, Andrew Pinski wrote: Hi, This is the final patch which adds support for the dynamic linker and multi-lib directories for ILP32. I did not change multi-arch support as I did not know what it should be changed to and internally here at Cavium, we don't use multi-arch. OK? Build and tested for aarch64-linux-gnu with and without --with-multilib-list=lp64,ilp32. Thanks, Andrew Pinski * config/aarch64/aarch64-linux.h (GLIBC_DYNAMIC_LINKER): /lib/ld-linux32-aarch64.so.1 is used for ILP32. (LINUX_TARGET_LINK_SPEC): Add linker script file whose name depends on -mabi= and -mbig-endian. * config/aarch64/t-aarch64-linux (MULTILIB_OSDIRNAMES): Handle LP64 better and handle ilp32 too. (MULTILIB_OPTIONS): Delete. (MULTILIB_DIRNAMES): Delete. --- gcc/ChangeLog | 11 +++ gcc/config/aarch64/aarch64-linux.h |5 +++-- gcc/config/aarch64/t-aarch64-linux |7 ++- 3 files changed, 16 insertions(+), 7 deletions(-) diff --git a/gcc/config/aarch64/aarch64-linux.h b/gcc/config/aarch64/aarch64-linux.h index 83efad4..408297a 100644 --- a/gcc/config/aarch64/aarch64-linux.h +++ b/gcc/config/aarch64/aarch64-linux.h @@ -21,7 +21,7 @@ #ifndef GCC_AARCH64_LINUX_H #define GCC_AARCH64_LINUX_H -#define GLIBC_DYNAMIC_LINKER /lib/ld-linux-aarch64.so.1 +#define GLIBC_DYNAMIC_LINKER /lib/ld-linux%{mabi=ilp32:32}-aarch64.so.1 #define CPP_SPEC %{pthread:-D_REENTRANT} @@ -32,7 +32,8 @@ %{rdynamic:-export-dynamic}\ -dynamic-linker GNU_USER_DYNAMIC_LINKER \ -X \ - %{mbig-endian:-EB} %{mlittle-endian:-EL} + %{mbig-endian:-EB} %{mlittle-endian:-EL}\ + -maarch64linux%{mabi=ilp32:32}%{mbig-endian:b} #define LINK_SPEC LINUX_TARGET_LINK_SPEC diff --git a/gcc/config/aarch64/t-aarch64-linux b/gcc/config/aarch64/t-aarch64-linux index ca1525e..5032ea9 100644 --- a/gcc/config/aarch64/t-aarch64-linux +++ b/gcc/config/aarch64/t-aarch64-linux @@ -22,10 +22,7 @@ LIB1ASMSRC = aarch64/lib1funcs.asm LIB1ASMFUNCS = _aarch64_sync_cache_range AARCH_BE = $(if $(findstring TARGET_BIG_ENDIAN_DEFAULT=1, $(tm_defines)),_be) -MULTILIB_OSDIRNAMES = .=../lib64$(call if_multiarch,:aarch64$(AARCH_BE)-linux-gnu) +MULTILIB_OSDIRNAMES = mabi.lp64=../lib64$(call if_multiarch,:aarch64$(AARCH_BE)-linux-gnu) MULTIARCH_DIRNAME = $(call if_multiarch,aarch64$(AARCH_BE)-linux-gnu) -# Disable the multilib for linux-gnu targets for the time being; focus -# on the baremetal targets. -MULTILIB_OPTIONS= -MULTILIB_DIRNAMES = +MULTILIB_OSDIRNAMES += mabi.ilp32=../lib32
Re: libsanitizer merge from upstream r196090
On Wed, Dec 4, 2013 at 8:04 AM, FX fxcoud...@gmail.com wrote: Well, it regresses against 4.8, so it still is a P1 regression. Does anyone care? Well, you’re one of the maintainers of libsanitizer for GCC, so if you do not care about regressions in your code, it makes little sense for GCC (the whole project) to keep libsanitizer. I’ve posted this regression a month ago, it was not addressed. I’m not sure under what specific arrangement libsanitizer was added to GCC, but in general there is a responsibility of maintainers not to break bootstrap in their code. Yes, it’s a cost, and if you are not willing to do it, why did you contribute in the first place? Or is it a “hit and run” approach to maintainership? I believe this is a case where the GCC project gets more benefit from libsanitizer than libsanitizer gets from being part of the GCC project. We should work with the libsanitizer developers to make this work, not just push everything back on them. Ian
Re: [wwwdocs] Update obvious fix commit policy
On 12/04/13 07:20, Diego Novillo wrote: On Tue, Dec 3, 2013 at 6:55 PM, Gerald Pfeifer ger...@pfeifer.com wrote: On Thu, 28 Nov 2013, Richard Biener wrote: Why remove ChangeLog files, web pages and comments? I was going to complain about web pages being removed. :-) On Thu, 28 Nov 2013, Diego Novillo wrote: -pFixes for obvious typos in ChangeLog files, docs, web pages, comments -and similar stuff. Just check in the fix and copy it to -codegcc-patches/code. We don't want to get overly anal-retentive -about checkin policies./p +pObvious fixes can be committed without prior approval. Just check +in the fix and copy it to codegcc-patches/code. A good test to +determine whether a fix is obvious: qwill the person who objects to +my work the most be able to find a fault with my fix?/q If the fix +is later found to be faulty, it can always be rolled back. We don't +want to get overly restrictive about checkin policies./p I am in favor of this change. To some extent, this is more a clarification of what I have seen as our current policy than a change in policy, though to a laywer-minded person it surely looks like the latter. Not sure what kind of approval this needs? Mind it has. I have not received any feedback against this change. I will wait another 48 hours and commit. Here's feedback. Install it now :-) jeff
Re: libsanitizer merge from upstream r196090
I believe this is a case where the GCC project gets more benefit from libsanitizer than libsanitizer gets from being part of the GCC project. We should work with the libsanitizer developers to make this work, not just push everything back on them. You’re vastly better qualified than me to make this assessment, of course. My point is: unless someone (or multiple someones) is actually responsible for the thing, it cannot just work out of a sense of “someone should really do something about it”. The merge model of “we can break any target, except the single one we’re testing, every time we merge” seems poised for failure. FX
Re: libsanitizer merge from upstream r196090
On Wed, Dec 4, 2013 at 8:41 AM, Ian Lance Taylor i...@google.com wrote: On Wed, Dec 4, 2013 at 8:04 AM, FX fxcoud...@gmail.com wrote: Well, it regresses against 4.8, so it still is a P1 regression. Does anyone care? Well, you’re one of the maintainers of libsanitizer for GCC, so if you do not care about regressions in your code, it makes little sense for GCC (the whole project) to keep libsanitizer. I’ve posted this regression a month ago, it was not addressed. I’m not sure under what specific arrangement libsanitizer was added to GCC, but in general there is a responsibility of maintainers not to break bootstrap in their code. Yes, it’s a cost, and if you are not willing to do it, why did you contribute in the first place? Or is it a “hit and run” approach to maintainership? I believe this is a case where the GCC project gets more benefit from libsanitizer than libsanitizer gets from being part of the GCC project. We should work with the libsanitizer developers to make this work, not just push everything back on them. I think libsanitizer should be disabled automatically if kernel or glibc are too old. BTW, fixincludes should fix the bad kernel header files from SuSE. -- H.J.
Re: libsanitizer merge from upstream r196090
I think libsanitizer should be disabled automatically if kernel or glibc are too old. I think pretty much everyone agrees. But noone’s willing to put forward a patch, and so far the libsanitizer maintainers have failed to even document the requirements. (There are also binutils requirements, as I learnt the hard way.) FX
Re: libsanitizer merge from upstream r196090
On Wed, Dec 4, 2013 at 8:50 AM, FX fxcoud...@gmail.com wrote: I think libsanitizer should be disabled automatically if kernel or glibc are too old. I think pretty much everyone agrees. But noone’s willing to put forward a patch, What are the agreed-upon minimum kernel and glibc? I can give it a try. and so far the libsanitizer maintainers have failed to even document the requirements. (There are also binutils requirements, as I learnt the hard way.) What is the minimum binutils for libsanitizer? -- H.J.
Re: libsanitizer merge from upstream r196090
On Wed, Dec 04, 2013 at 08:47:41AM -0800, H.J. Lu wrote: I believe this is a case where the GCC project gets more benefit from libsanitizer than libsanitizer gets from being part of the GCC project. We should work with the libsanitizer developers to make this work, not just push everything back on them. I think libsanitizer should be disabled automatically if kernel or glibc are too old. For very old I agree, I just strongly disagree with saying that anything older than a year and half is too old. So, as very old and unsupportable I'd probably consider e.g. Linux kernels without futex support, libsanitizer apparently uses those in various places and doesn't have a fallback. The question is how to do that though, because libraries are now disabled through lib*/configure.tgt UNSUPPORTED=1, and that is sourced in by toplevel configure, so any configure checks would need to be in toplevel configure. Or of course, we could in those cases configure the libsanitizer directory, but just decide not to build anything in there. Anyway, my preference right now would be if the ppc32 bits would be acceptable to Kostya (either by committing them upstream or just applying them as GCC local change for the time being), so that we don't break bootstrap on powerpc*-linux*, add those and commit the merge, then deal with the older kernel headers through include/linux subdirectory (I'll work on it), very old headers through configure, the CFI I hope Kostya would accept some macro, even if it is always enabled in the compiler-rt build and just GCC can disable .cfi_* addition if compiler doesn't use those, and then we can start fixing rest of portability issues. Jakub
[PATCH i386] Introduce __readeflags () and __writeeflags () intrinsics.
Hello, MSVC and ICC (currently Windows version, Linux version soon) have dedicated intrinsics to read/set EFLAGS register ([1], [2]). Patch introduces these intrinsics and tests for them. Bootstrapped. New tests pass. Although gate is closed patch is obvious. So, is it ok for trunk? ChangeLog/ * config/i386/ia32intrin.h (__readeflags): New. (__writeeflags): Ditto. testsuite/ChangeLog/ * gcc.target/i386/readeflags-1.c: New. * gcc.target/i386/writeeflags-1.c: Ditto. [1] - http://msdn.microsoft.com/en-us/library/aa983406(v=vs.90).aspx [2] - http://msdn.microsoft.com/en-us/library/aa983392(v=vs.90).aspx -- Thanks, K diff --git a/gcc/config/i386/ia32intrin.h b/gcc/config/i386/ia32intrin.h index b26dc46..c9e68c5 100644 --- a/gcc/config/i386/ia32intrin.h +++ b/gcc/config/i386/ia32intrin.h @@ -238,6 +238,34 @@ __rorq (unsigned long long __X, int __C) return (__X __C) | (__X (64 - __C)); } +/* Read flags register */ +extern __inline unsigned long long +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__readeflags (void) +{ + unsigned long long result = 0; + __asm__ __volatile__ (pushf\n\t + popq %0\n + :=r(result) + : + : + ); + return result; +} + +/* Write flags register */ +extern __inline void +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__writeeflags (unsigned long long X) +{ + __asm__ __volatile__ (pushq %0\n\t + popf\n + : + :r(X) + :flags + ); +} + #define _bswap64(a)__bswapq(a) #define _popcnt64(a) __popcntq(a) #define _lrotl(a,b)__rolq((a), (b)) @@ -245,6 +273,35 @@ __rorq (unsigned long long __X, int __C) #else #define _lrotl(a,b)__rold((a), (b)) #define _lrotr(a,b)__rord((a), (b)) + +/* Read flags register */ +extern __inline unsigned int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__readeflags (void) +{ + unsigned int result = 0; + __asm__ __volatile__ (pushf\n\t + popl %0\n + :=r(result) + : + : + ); + return result; +} + +/* Write flags register */ +extern __inline void +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__writeeflags (unsigned int X) +{ + __asm__ __volatile__ (pushl %0\n\t + popf\n + : + :r(X) + :flags + ); +} + #endif #define _bit_scan_forward(a) __bsfd(a) diff --git a/gcc/testsuite/gcc.target/i386/readeflags-1.c b/gcc/testsuite/gcc.target/i386/readeflags-1.c new file mode 100644 index 000..6b2fa7e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/readeflags-1.c @@ -0,0 +1,40 @@ +/* { dg-do run } */ +/* { dg-options -O0 } */ + +#include x86intrin.h + +#ifdef __x86_64__ +#define EFLAGS_TYPE unsigned long long int +#else +#define EFLAGS_TYPE unsigned int +#endif + +static EFLAGS_TYPE +readeflags_test (unsigned int a, unsigned int b) +{ + unsigned x = (a == b); + return __readeflags (); +} + +int +main () +{ + EFLAGS_TYPE flags; + + flags = readeflags_test (100, 100); + + if ((flags 1) != 0) /* Read CF */ +abort (); + + flags = readeflags_test (100, 101); + + if ((flags 1) == 0) /* Read CF */ +abort (); + +#ifdef DEBUG +printf (PASSED\n); +#endif + + return 0; +} + diff --git a/gcc/testsuite/gcc.target/i386/writeeflags-1.c b/gcc/testsuite/gcc.target/i386/writeeflags-1.c new file mode 100644 index 000..446840c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/writeeflags-1.c @@ -0,0 +1,30 @@ +/* { dg-do run } */ +/* { dg-options -O0 } */ + +#include x86intrin.h + +#ifdef __x86_64__ +#define EFLAGS_TYPE unsigned long long int +#else +#define EFLAGS_TYPE unsigned int +#endif + +int +main () +{ + EFLAGS_TYPE flags = 0xD7; /* 111010111b */ + + __writeeflags (flags); + + flags = __readeflags (); + + if ((flags 0xFF) != 0xD7) +abort (); + +#ifdef DEBUG +printf (PASSED\n); +#endif + + return 0; +} +
Re: libsanitizer merge from upstream r196090
On Wed, Dec 4, 2013 at 8:58 AM, Jakub Jelinek ja...@redhat.com wrote: On Wed, Dec 04, 2013 at 08:47:41AM -0800, H.J. Lu wrote: I believe this is a case where the GCC project gets more benefit from libsanitizer than libsanitizer gets from being part of the GCC project. We should work with the libsanitizer developers to make this work, not just push everything back on them. I think libsanitizer should be disabled automatically if kernel or glibc are too old. For very old I agree, I just strongly disagree with saying that anything older than a year and half is too old. So, as very old and unsupportable I'd probably consider e.g. Linux kernels without futex support, libsanitizer apparently uses those in various places and doesn't have a fallback. The question is how to do that though, because libraries are now disabled through lib*/configure.tgt UNSUPPORTED=1, and that is sourced in by toplevel configure, so any configure checks would need to be in toplevel configure. Or of course, we could in those cases configure the libsanitizer directory, but just decide not to build anything in there. The kernel and glibc check should be done at the toplevel. So what are the minimum kernel and glibc we want to support? -- H.J.
Re: [PATCH 1/2] Implement -fsanitize=signed-integer-overflow (generic parts)
On 12/04/13 06:44, Marek Polacek wrote: This is a repost of rebased version of the signed-integer-overflow patch, split into generic parts and i?86 parts. By i?86 parts I mean the stuff that resides in config/i386, I haven't really tried to untangle it more. Except the two formatting fixes I also moved various PROB_ macros into predict.h and made the users include it, rather than duplicating the defines everywhere. Regtested/bootstrapped on x86_64-linux. Ok for trunk? Yes, it's OK. If it works without the x86 backend changes, you can install it now. If it requires the x86 backend changes, wait until those are approved and check in both together. jeff
Re: [PATCH, ARM] Implement __builtin_trap
On Wed, 4 Dec 2013, Ian Bolton wrote: The main update, other than cosmetic differences, is that we've chosen the same ARM encoding as LLVM for practical purposes. (The Thumb encoding in Mark's patch already matched LLVM.) Do the encodings match what plain udf does in recent-enough gas (too recent for us to assume it in GCC or glibc for now), or is it something else? -- Joseph S. Myers jos...@codesourcery.com
Re: [wwwdocs] Update obvious fix commit policy
On Wed, Dec 4, 2013 at 11:24 AM, Jeff Law l...@redhat.com wrote: Here's feedback. Install it now :-) Works for me :) Committed. Diego.
Re: libsanitizer merge from upstream r196090
On Wed, 4 Dec 2013, H.J. Lu wrote: The kernel and glibc check should be done at the toplevel. So what are the minimum kernel and glibc we want to support? Checking those at toplevel is tricky in general because you're checking something for the target rather than the host. You'd need to move the logic from gcc/configure.ac to compute target_header_dir and glibc_version_*, and GCC_GLIBC_VERSION_GTE_IFELSE from gcc/acinclude.m4, to something in toplevel config/ (and that logic depends on lots of other things in gcc/configure.ac). For binutils it's both easier to check (although the logic for binutils is also in gcc/acinclude.m4 at present) and more reasonable to require comparatively recent versions (for targets using binutils, which should cover everything supporting libsanitizer except Darwin) - I think there should be a minimum binutils version requirement generally when binutils is used with GCC, so we can reduce the need for conditionals on binutils features (unless of course the conditional code is still needed to support non-GNU assemblers and linkers for some target). It can be useful to build new tools for a target with old kernel and glibc in order to build binaries that will work on systems with a wide range of glibc versions. The oldest kernel and glibc versions I've used in that context with any post-4.3 GCC have been Linux 2.6.16 and glibc 2.4 (but the kernel headers were more recent than that, and this use case for old sysroots does *not* mean libsanitizer should necessarily be supported for them, simply that it's useful for the compiler and those libraries that may be used in production applications to be supported). If GCC were to desupport e.g. glibc before 2.4 you still get to deal with other libraries such as uClibc which pretends to be an older glibc (but again, you may well declare it unsupported for libsanitizer). -- Joseph S. Myers jos...@codesourcery.com
Re: [Patch, RTL] Eliminate redundant vec_select moves.
On 12/04/13 09:14, H.J. Lu wrote: + +/* { dg-final { scan-rtl-dump deleting noop move combine { target aarch64*-*-* } } } */ Any particular reason why it doesn't work for x86? I don't think so. I'm pretty sure Tejas is focused on ARM platforms for the obvious reason. jeff
Re: [Patch, RTL] Eliminate redundant vec_select moves.
On Wed, Dec 4, 2013 at 9:29 AM, Jeff Law l...@redhat.com wrote: On 12/04/13 09:14, H.J. Lu wrote: + +/* { dg-final { scan-rtl-dump deleting noop move combine { target aarch64*-*-* } } } */ Any particular reason why it doesn't work for x86? I don't think so. I'm pretty sure Tejas is focused on ARM platforms for the obvious reason. Then please add i?86-*-* x86_64-*-*. Thanks. -- H.J.
Re: [PATCH] Use DW_LANG_D for D
On 3 December 2013 19:42, Cary Coutant ccout...@google.com wrote: This patches gen_compile_unit_die to use the DW_LANG_D DWARF language code for D. Is in relation to some other D language fixes that are going to be submitted to gdb. Is this for a private front end? I'm not aware of any front ends that set the language name to GNU D. Since it's so trivial, though, I have no problem with this patch for Stage 3 -- if you do have a separate front end that sets that language string, then it's arguably a bug fix. If this patch is preparation for more substantial changes to the GCC tree, however, I suspect you're going to need to wait for Stage 1 to reopen anyway. So, if this is a standalone patch, it's OK, but you also need a ChangeLog entry. -cary The frontend isn't private, but is currently external to GCC. I've had plans to get the frontend merged for some time now. And was adviced last time I submitted the code for review to send patches that can be merged into GCC prior to re-submitting the frontend - which as you have already said will have to wait for Stage 1 to reopen. Will make a changelog entry for the patch. Regards Iain.
Re: [Patch, RTL] Eliminate redundant vec_select moves.
Tejas Belagod tbela...@arm.com writes: Richard Sandiford wrote: Tejas Belagod tbela...@arm.com writes: Richard Sandiford wrote: Tejas Belagod tbela...@arm.com writes: The problem is that one reg rtx can span several hard registers. E.g. (reg:V4SI 32) might represent one 64-bit register (no. 32), but it might instead represent two 32-bit registers (nos. 32 and 33). Obviously the latter's not very likely for vectors this small, but more likely for larger ones (including on NEON IIRC). So if we had 2 32-bit registers being treated as a V4HI, it would be: --3233-- msb lsb msb lsb --32-- for big endian and: --3332-- msb lsb msb lsb --32-- for little endian. Ah, ok, that makes things clearer. Thanks for that. I can't find any helper function that figures out if we're writing partial or full result regs. Would something like REGNO (src) == REGNO (dst) HARD_REGNO_NREGS (src) == HARD_REGNO_NREGS (dst) == 1 be a sane check for partial result regs? Yeah, that should work. I think a more general alternative would be: simplify_subreg_regno (REGNO (src), GET_MODE (src), offset, GET_MODE (dst)) == (int) REGNO (dst) where: offset = GET_MODE_UNIT_SIZE (GET_MODE (src)) * INTVAL (XVECEXP (sel, 0)) That offset is the byte offset of the first selected element from the start of a vector in memory, which is also the way that SUBREG_BYTEs are counted. For little-endian it gives the offset of the lsb of the slice, while for big-endian it gives the offset of the msb (which is also how SUBREG_BYTEs work). The simplify_subreg_regno should cope with both single-register vectors and multi-register vectors. Sorry for the delayed response to this. Thanks for the tip. Here's an improved patch that implements the simplify_sureg_regno () method of eliminating redundant moves. Regarding the test case, I failed to get the ppc back-end to generate RTL pattern that this patch checks for. I can easily write a test case for aarch64(big and little endian) on these lines typedef float float32x4_t __attribute__ ((__vector_size__ (16))); float foo_be (float32x4_t x) { return x[3]; } float foo_le (float32x4_t x) { return x[0]; } where I know that the vector indexing will generate a vec_select on the same src and dst regs that could be optimized away and hence test it. But I'm struggling to get a test case that the ppc altivec back-end will generate such a vec_select for. I see that altivec does not define vec_extract, so a simple indexing like this seems to happen via memory. Also, I don't know enough about the ppc PCS or architecture to write a test that will check for this optimization opportunity on same src and dst hard-registers. Any hints? Me neither, sorry. FWIW, the MIPS tests: typedef float float32x2_t __attribute__ ((__vector_size__ (8))); void bar (float); void foo_be (float32x2_t x) { bar (x[1]); } void foo_le (float32x2_t x) { bar (x[0]); } also exercise it, but I don't think they add anything over the aarch64 versions. I can add them to the testsuite anyway if it helps though. diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c index 0cd0c7e..ca25ce5 100644 --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -1180,6 +1180,22 @@ set_noop_p (const_rtx set) dst = SUBREG_REG (dst); } + /* It is a NOOP if destination overlaps with selected src vector + elements. */ + if (GET_CODE (src) == VEC_SELECT + REG_P (XEXP (src, 0)) REG_P (dst) + HARD_REGISTER_P (XEXP (src, 0)) + HARD_REGISTER_P (dst)) +{ + rtx par = XEXP (src, 1); + rtx src0 = XEXP (src, 0); + HOST_WIDE_INT offset = + GET_MODE_UNIT_SIZE (GET_MODE (src0)) * INTVAL (XVECEXP (par, 0, 0)); + + return simplify_subreg_regno (REGNO (src0), GET_MODE (src0), + offset, GET_MODE (dst)) == (int)REGNO (dst); +} + Since this also (correctly) triggers for vector results, we need to keep the check for consecutive indices that you had originally. (It's always the first index that should be used for the simplify_subreg_regno though.) Looks good to me otherwise, thanks. Thanks Richard. Here is a revised patch. Sorry about the delay - I was investigating to make sure an LRA ICE I was seeing on aarch64 was unrelated to this patch. I've added a test case that I expect to pass for aarch64. I've also added the tests that you suggested for MIPS, but haven't checked for the target because I'm not sure what optimizations happen on MIPS. Thanks, looks good to me, but I can't approve it. Just one minor formatting nit: + return simplify_subreg_regno (REGNO (src0), GET_MODE (src0), + offset, GET_MODE (dst)) == (int)REGNO
Re: [PATCH/middle-end 2/6] __builtin_thread_pointer and AARCH64 ILP32
On 12/03/13 21:24, Andrew Pinski wrote: Hi, With ILP32 AARCH64, Pmode (DImode) != ptrmode (SImode) so the variable decl has a mode of SImode while the register is DImode. So the target that gets passed down to expand_builtin_thread_pointer is NULL as expand does not know how to get a subreg for a pointer type. This fixes the problem by handling a NULL target like we are able to handle for a non register/correct mode target inside expand_builtin_thread_pointer. OK? Build and tested for aarch64-elf with no regressions. Thanks, Andrew Pinski * builtins.c (expand_builtin_thread_pointer): Create a new target when the target is NULL. --- gcc/ChangeLog |5 + gcc/builtins.c |2 +- 2 files changed, 6 insertions(+), 1 deletions(-) diff --git a/gcc/builtins.c b/gcc/builtins.c index 4f1c818..66797fa 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -5699,7 +5699,7 @@ expand_builtin_thread_pointer (tree exp, rtx target) if (icode != CODE_FOR_nothing) { struct expand_operand op; - if (!REG_P (target) || GET_MODE (target) != Pmode) + if (target == NULL_RTX || !REG_P (target) || GET_MODE (target) != Pmode) target = gen_reg_rtx (Pmode); create_output_operand (op, target, Pmode); expand_insn (icode, 1,op); Shouldn't thread pointer have ptr_mode instead? I'm aware that on AArch64 the thread pointer system register tpidr_el0 is 64-bit wide regardless of ILP32 or not, but in the abstracted view of AArch64 ILP32 world, the thread pointer shall be a 32-bit pointer; the OS should have taken care of the hardware register tpidr_el0 by having its higher 32 bits cleared. I think expand_builtin_thread_pointer and expand_builtin_set_thread_pointer should use ptr_mode instead. Correct me if I missed anything. Add Chung-Lin Tang to the CC list; Chung-Lin wrote these builtins in r192364 Yufeng
Re: PR37132 – RFC patch for generation of DWARF symbol for Fortran's namelists (DW_TAG_namelist)
gcc/ 2013-11-24 Tobias Burnus bur...@net-b.de PR debug/37132 * lto-streamer.h (LTO_tags): Add LTO_namelist_decl_ref. * tree.def (NAMELIST_DECL): Add. * tree.h (NAMELIST_DECL_ASSOCIATED_DECL): New macro. * tree.c (initialize_tree_contains_struct): Add asserts for it. * dwarf2out.c (gen_namelist_decl): New function. (gen_decl_die, dwarf2out_decl): Call it. (dwarf2out_imported_module_or_decl_1): Handle NAMELIST_DECL. * lto-streamer-in.c (lto_input_tree_ref): Handle NAMELIST_DECL. (lto_input_tree_ref, lto_input_tree_1): Update lto_tag_check_range call. * lto-streamer-out.c (lto_output_tree_ref): Handle NAMELIST_DECL. gcc/fortran 2013-11-24 Tobias Burnus bur...@net-b.de PR debug/37132 * trans-decl.c (generate_namelist_decl, create_module_nml_decl): New static functions. (gfc_generate_module_vars, generate_local_vars): Call them. (gfc_trans_use_stmts): Handle namelists for debug genertion. The DWARF parts of this patch are OK with me. -cary On Sun, Nov 24, 2013 at 2:12 AM, Tobias Burnus bur...@net-b.de wrote: Hi all, attached is an updated version of the patch. Change: Tobias Burnus wrote: But for USE mod_name, only: nml, one is supposed to generate a DW_TAG_imported_declaration. And there I am stuck. For normal variables, the DW_TAG_imported_declaration refers to a DW_TAG_variable die. Analogously, for a namelist one would have to refer to a DW_TAG_namelist die. But such DW_TAG_namelist comes with a DW_TAG_namelist_item list. And for the latter, one needs to have the die of all variables in the namelist. But with use-only the symbols aren't use associate and no decl or die exists. (Failing call tree with the patch: gfc_trans_use_stmts - dwarf2out_imported_module_or_decl_1 - force_decl_die.) With the attached patch, one now generates DW_TAG_namelist with no DW_TAG_namelist_item and sets DW_AT_declaration. Thus, for (first file) module mm integer :: ii real :: rr namelist /nml/ ii, rr end module mm and (second file): subroutine test use mm, only: nml write(*,nml) end subroutine test One now generates (first file): 11e: Abbrev Number: 2 (DW_TAG_module) 1f DW_AT_name: mm 22 DW_AT_decl_file : 1 23 DW_AT_decl_line : 1 24 DW_AT_sibling : 0x59 228: Abbrev Number: 3 (DW_TAG_variable) 29 DW_AT_name: ii 2c DW_AT_decl_file : 1 2d DW_AT_decl_line : 2 2e DW_AT_linkage_name: (indirect string, offset: 0x15): __mm_MOD_ii 32 DW_AT_type: 0x59 36 DW_AT_external: 1 36 DW_AT_location: 9 byte block: 3 0 0 0 0 0 0 0 0 (DW_OP_addr: 0) 240: Abbrev Number: 3 (DW_TAG_variable) 41 DW_AT_name: rr 44 DW_AT_decl_file : 1 45 DW_AT_decl_line : 2 46 DW_AT_linkage_name: (indirect string, offset: 0x9): __mm_MOD_rr 4a DW_AT_type: 0x60 4e DW_AT_external: 1 4e DW_AT_location: 9 byte block: 3 4 0 0 0 0 0 0 0 (DW_OP_addr: 4) 258: Abbrev Number: 0 159: Abbrev Number: 4 (DW_TAG_base_type) 5a DW_AT_byte_size : 4 5b DW_AT_encoding: 5(signed) 5c DW_AT_name: (indirect string, offset: 0x29): integer(kind=4) 160: Abbrev Number: 4 (DW_TAG_base_type) 61 DW_AT_byte_size : 4 62 DW_AT_encoding: 4(float) 63 DW_AT_name: (indirect string, offset: 0x12c): real(kind=4) 167: Abbrev Number: 5 (DW_TAG_namelist) 68 DW_AT_name: nml 26c: Abbrev Number: 6 (DW_TAG_namelist_item) 6d DW_AT_namelist_items: 0x28 271: Abbrev Number: 6 (DW_TAG_namelist_item) 72 DW_AT_namelist_items: 0x40 Second file: 24f: Abbrev Number: 3 (DW_TAG_imported_declaration) 50 DW_AT_decl_file : 1 51 DW_AT_decl_line : 2 52 DW_AT_import : 0x70 [Abbrev Number: 6 (DW_TAG_namelist)] 256: Abbrev Number: 4 (DW_TAG_lexical_block) 57 DW_AT_low_pc : 0xb 5f DW_AT_high_pc : 0xb0 267: Abbrev Number: 0 168: Abbrev Number: 5 (DW_TAG_module) 69 DW_AT_name: mm 6c DW_AT_declaration : 1 6c DW_AT_sibling : 0x76 270: Abbrev Number: 6 (DW_TAG_namelist) 71 DW_AT_name: nml 75 DW_AT_declaration : 1 275: Abbrev Number: 0 Does the dumps look okay? For the first file, DW_TAG_namelist doesn't come directly after DW_TAG_module but after its sibling 0x59; does one still see that nml belongs to that module? (On dwarf2out level, context die should point to the module tag, but I don't understand the readelf/eu-readelf output well enough to see whether that's also the case for the generated dwarf.) I assume that the compiler can see from the DWARF of the second file that nml comes from module mm and doesn't search the value elsewhere. (It is possible to have multiple
[PATCH] Don't warn for missing prototypes on inline fns (PR c/54113)
In C99, one way how to deal with inline functions is to put definition of the function into header: inline void foo (void) { /* ... */ } and put the declaration into exactly one .c file, with extern keyword (it can also have inline keyword): extern void foo (void); But in this case, we shouldn't issue the missing prototype warning. So the following should suppress that warning in C99 mode, when -fgnu89-inline is not in effect. (But the function could still have the gnu_inline attribute, so it might be better to disable that warning for all inline functions?) Regtested/bootstrapped on x86_64-unknown-linux-gnu. Ok for trunk? 2013-12-04 Marek Polacek pola...@redhat.com PR c/54113 c/ * c-decl.c (start_function): Don't warn for missing prototype for inline functions in C99+. testsuite/ * gcc.dg/pr54113.c: New test. --- gcc/c/c-decl.c.mp3 2013-12-04 17:11:43.063878926 +0100 +++ gcc/c/c-decl.c 2013-12-04 18:32:17.567008028 +0100 @@ -7974,7 +7974,10 @@ start_function (struct c_declspecs *decl old_decl != error_mark_node TREE_PUBLIC (decl1) !MAIN_NAME_P (DECL_NAME (decl1)) - C_DECL_ISNT_PROTOTYPE (old_decl)) + C_DECL_ISNT_PROTOTYPE (old_decl) + !(DECL_DECLARED_INLINE_P (decl1) +flag_isoc99 +!flag_gnu89_inline)) warning_at (loc, OPT_Wmissing_prototypes, no previous prototype for %qD, decl1); /* Optionally warn of any def with no previous prototype --- gcc/testsuite/gcc.dg/pr54113.c.mp3 2013-12-04 17:52:45.671288940 +0100 +++ gcc/testsuite/gcc.dg/pr54113.c 2013-12-04 17:36:43.0 +0100 @@ -0,0 +1,5 @@ +/* { dg-do compile } */ +/* { dg-options -std=c99 } */ + +inline int foo (void) { return 42; } /* { dg-bogus no previous prototype } */ +extern int foo(void); Marek
Re: [PATCH] Don't warn for missing prototypes on inline fns (PR c/54113)
On Wed, Dec 04, 2013 at 06:47:09PM +0100, Marek Polacek wrote: In C99, one way how to deal with inline functions is to put definition of the function into header: inline void foo (void) { /* ... */ } and put the declaration into exactly one .c file, with extern keyword (it can also have inline keyword): extern void foo (void); But in this case, we shouldn't issue the missing prototype warning. So the following should suppress that warning in C99 mode, when -fgnu89-inline is not in effect. (But the function could still have the gnu_inline attribute, so it might be better to disable that warning for all inline functions?) Regtested/bootstrapped on x86_64-unknown-linux-gnu. Ok for trunk? 2013-12-04 Marek Polacek pola...@redhat.com PR c/54113 c/ * c-decl.c (start_function): Don't warn for missing prototype for inline functions in C99+. testsuite/ * gcc.dg/pr54113.c: New test. --- gcc/c/c-decl.c.mp32013-12-04 17:11:43.063878926 +0100 +++ gcc/c/c-decl.c2013-12-04 18:32:17.567008028 +0100 @@ -7974,7 +7974,10 @@ start_function (struct c_declspecs *decl old_decl != error_mark_node TREE_PUBLIC (decl1) !MAIN_NAME_P (DECL_NAME (decl1)) - C_DECL_ISNT_PROTOTYPE (old_decl)) + C_DECL_ISNT_PROTOTYPE (old_decl) + !(DECL_DECLARED_INLINE_P (decl1) + flag_isoc99 + !flag_gnu89_inline)) warning_at (loc, OPT_Wmissing_prototypes, no previous prototype for %qD, decl1); /* Optionally warn of any def with no previous prototype --- gcc/testsuite/gcc.dg/pr54113.c.mp32013-12-04 17:52:45.671288940 +0100 +++ gcc/testsuite/gcc.dg/pr54113.c2013-12-04 17:36:43.0 +0100 @@ -0,0 +1,5 @@ +/* { dg-do compile } */ +/* { dg-options -std=c99 } */ -Wmissing-prototypes is missing here, in my local copy of the patch this is fixed. Marek
Re: [C++ PATCH] Don't ICE on POINTER_PLUS_EXPR during tsubst* (PR c++/59268)
OK. Jason
Re: [PATCH] Don't warn for missing prototypes on inline fns (PR c/54113)
On Wed, Dec 04, 2013 at 06:47:09PM +0100, Marek Polacek wrote: In C99, one way how to deal with inline functions is to put definition of the function into header: inline void foo (void) { /* ... */ } and put the declaration into exactly one .c file, with extern keyword (it can also have inline keyword): extern void foo (void); But in this case, we shouldn't issue the missing prototype warning. So the following should suppress that warning in C99 mode, when -fgnu89-inline is not in effect. (But the function could still have the gnu_inline attribute, so it might be better to disable that warning for all inline functions?) A function definition can't have attributes after the (), and start_function is called with the attributes argument, so you can just look through those for gnu_inline attribute. Jakub
Re: PR37132 – RFC patch for generation of DWARF symbol for Fortran's namelists (DW_TAG_namelist)
On Wed, Dec 04, 2013 at 09:47:36AM -0800, Cary Coutant wrote: gcc/ 2013-11-24 Tobias Burnus bur...@net-b.de PR debug/37132 * lto-streamer.h (LTO_tags): Add LTO_namelist_decl_ref. * tree.def (NAMELIST_DECL): Add. * tree.h (NAMELIST_DECL_ASSOCIATED_DECL): New macro. * tree.c (initialize_tree_contains_struct): Add asserts for it. * dwarf2out.c (gen_namelist_decl): New function. (gen_decl_die, dwarf2out_decl): Call it. (dwarf2out_imported_module_or_decl_1): Handle NAMELIST_DECL. * lto-streamer-in.c (lto_input_tree_ref): Handle NAMELIST_DECL. (lto_input_tree_ref, lto_input_tree_1): Update lto_tag_check_range call. * lto-streamer-out.c (lto_output_tree_ref): Handle NAMELIST_DECL. gcc/fortran 2013-11-24 Tobias Burnus bur...@net-b.de PR debug/37132 * trans-decl.c (generate_namelist_decl, create_module_nml_decl): New static functions. (gfc_generate_module_vars, generate_local_vars): Call them. (gfc_trans_use_stmts): Handle namelists for debug genertion. The DWARF parts of this patch are OK with me. The rest is okay too. Jakub
Re: [PATCH i386] Introduce __readeflags () and __writeeflags () intrinsics.
Hello, On 04 Dec 19:59, Kirill Yukhin wrote: So, is it ok for trunk? Small correction. I think it is better to use popfql/pushfql instead of popf/pushf (however they're encoded equally). -- Thanks, K
Ping: [tilegx] Avoid genrecog warning
Ping for this patch, which is the only one of the series that hasn't been approved. Thanks, Richard Richard Sandiford rdsandif...@googlemail.com writes: I have a patch to upgrade most genrecog warnings into errors. This patch fixes those for tilegx. There seemed to be two sources of warnings: - the intrinsics often used matched pointer_operands in an addition, so that the destination accepted constant pointers. I think the direct translation would be pmode_register_operand, but since these additions have a specific mode, I think a modeful register_operand is more natural. - some instructions used reg_or_0_operand as a destination. Tested by building tilegx-elf with the warnings turned to errors, and by comparing the before and after assembly output at -O2 for gcc.c-torture, gcc.dg and g++.dg. OK to install? Thanks, Richard gcc/ * config/tilegx/tilegx.md (insn_ld_addbitsuffix): Use register_operand rather than pointer_operand. Add modes to the operands. (insn_ldna_addbitsuffix): Likewise. (insn_ldI124MODE:ns_addI48MODE:bitsuffix): Likewise. (insn_ldnt_addbitsuffix): Likewise. (insn_ldntI124MODE:ns_addI48MODE:bitsuffix): Likewise. (insn_ld_add_L2bitsuffix): Likewise. (insn_ldna_add_L2bitsuffix): Likewise. (insn_ldI124MODE:ns_add_L2I48MODE:bitsuffix): Likewise. (insn_ldnt_add_L2bitsuffix): Likewise. (insn_ldntI124MODE:ns_add_L2I48MODE:bitsuffix): Likewise. (insn_ld_add_missbitsuffix): Likewise. (insn_ldna_add_missbitsuffix): Likewise. (insn_ldI124MODE:ns_add_missI48MODE:bitsuffix): Likewise. (insn_ldnt_add_missbitsuffix): Likewise. (insn_ldntI124MODE:ns_add_missI48MODE:bitsuffix): Likewise. (insn_st_addbitsuffix): Likewise. (insn_stI124MODE:n_addI48MODE:bitsuffix): Likewise. (*insn_stI124MODE:n_addI48MODE:bitsuffix): Likewise. (insn_stnt_addbitsuffix): Likewise. (insn_stntI124MODE:n_addI48MODE:bitsuffix): Likewise. (*insn_stntI124MODE:n_addI48MODE:bitsuffix): Likewise. (vec_pack_pack_optab_v4hi): Use register_operand rather than reg_or_0_operand for operand 0. (insn_v2pack_insn): Likewise. (vec_pack_hipart_v4hi): Likewise. (insn_v2packh): Likewise. (vec_pack_ssat_v2si): Likewise. (insn_v4packsc): Likewise. Index: gcc/config/tilegx/tilegx.md === --- gcc/config/tilegx/tilegx.md 2013-11-16 21:52:15.083787117 + +++ gcc/config/tilegx/tilegx.md 2013-11-16 21:59:07.745113525 + @@ -3284,9 +3284,9 @@ (define_expand insn_ld ) (define_insn insn_ld_addbitsuffix - [(set (match_operand:I48MODE 1 pointer_operand =r) -(plus:I48MODE (match_operand 3 pointer_operand 1) - (match_operand 2 s8bit_cint_operand i))) + [(set (match_operand:I48MODE 1 register_operand =r) +(plus:I48MODE (match_operand:I48MODE 3 register_operand 1) + (match_operand:I48MODE 2 s8bit_cint_operand i))) (set (match_operand:DI 0 register_operand =r) (mem:DI (match_dup 3)))] @@ -3302,9 +3302,9 @@ (define_insn insn_ldna [(set_attr type X1_2cycle)]) (define_insn insn_ldna_addbitsuffix - [(set (match_operand:I48MODE 1 pointer_operand =r) -(plus:I48MODE (match_operand 3 pointer_operand 1) - (match_operand 2 s8bit_cint_operand i))) + [(set (match_operand:I48MODE 1 register_operand =r) +(plus:I48MODE (match_operand:I48MODE 3 register_operand 1) + (match_operand:I48MODE 2 s8bit_cint_operand i))) (set (match_operand:DI 0 register_operand =r) (mem:DI (and:DI (match_dup 3) (const_int -8] @@ -3318,9 +3318,9 @@ (define_expand insn_ldns ) (define_insn insn_ldI124MODE:ns_addI48MODE:bitsuffix - [(set (match_operand:I48MODE 1 pointer_operand =r) -(plus:I48MODE (match_operand 3 pointer_operand 1) - (match_operand 2 s8bit_cint_operand i))) + [(set (match_operand:I48MODE 1 register_operand =r) +(plus:I48MODE (match_operand:I48MODE 3 register_operand 1) + (match_operand:I48MODE 2 s8bit_cint_operand i))) (set (match_operand:DI 0 register_operand =r) (any_extend:DI (mem:I124MODE (match_dup 3] @@ -3338,9 +3338,9 @@ (define_insn insn_ldnt [(set_attr type X1_2cycle)]) (define_insn insn_ldnt_addbitsuffix - [(set (match_operand:I48MODE 1 pointer_operand =r) -(plus:I48MODE (match_operand 3 pointer_operand 1) - (match_operand 2 s8bit_cint_operand i))) + [(set (match_operand:I48MODE 1 register_operand =r) +(plus:I48MODE (match_operand:I48MODE 3 register_operand 1) + (match_operand:I48MODE 2 s8bit_cint_operand i))) (set (match_operand:DI 0 register_operand =r) (unspec:DI [(mem:DI (match_dup 3))]
Re: [PATCH i386] Introduce __readeflags () and __writeeflags () intrinsics.
On Wed, Dec 4, 2013 at 9:58 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, On 04 Dec 19:59, Kirill Yukhin wrote: So, is it ok for trunk? Small correction. I think it is better to use popfql/pushfql instead of popf/pushf (however they're encoded equally). If you define the proper type, you can use pushf/pop and push/popf in the same readeflags/writeflags implementation for -m32/-mx32/-m64. -- H.J.
RE: [PATCH, ARM] Implement __builtin_trap
On Wed, 4 Dec 2013, Ian Bolton wrote: The main update, other than cosmetic differences, is that we've chosen the same ARM encoding as LLVM for practical purposes. (The Thumb encoding in Mark's patch already matched LLVM.) Do the encodings match what plain udf does in recent-enough gas (too recent for us to assume it in GCC or glibc for now), or is it something else? Hi Joseph, Yes, these encodings match the UDF instruction that is defined in the most recent edition of the ARM architecture reference manual. Thumb: 0xde00 | imm8 (we chose 0xff for the imm8) ARM: 0xe7f000f0 | (imm12 8) | imm4 (we chose to use 0 for both imms) So as not to break old versions of gas that don't recognise UDF, the encoding is output directly. Apologies if I have over-explained there! Cheers, Ian
Re: [PATCH, ARM] Implement __builtin_trap
On 04/12/13 16:05, Ian Bolton wrote: Hi, Currently, on ARM, you have to either call abort() or raise(SIGTRAP) to achieve a handy crash. This patch allows you to instead call __builtin_trap() which is much more efficient at falling over because it becomes just a single instruction that will trap for you. Two testcases have been added (for ARM and Thumb) and both pass. Note: This is a modified version of a patch originally submitted by Mark Mitchell back in 2010, which came in response to PR target/59091. The PR came as a result of the A64 implementation of __builtin_trap. The original patch was much earlier than that :) http://gcc.gnu.org/ml/gcc-patches/2010-09/msg00639.html http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59091 The main update, other than cosmetic differences, is that we've chosen the same ARM encoding as LLVM for practical purposes. (The Thumb encoding in Mark's patch already matched LLVM.) OK for trunk? This is OK for trunk. Please put the PR numbers in the changelog entries before committing i.e. PR target/59091. FTR, these match with the encodings for the udf mnemonic with an immediate value of 0 in ARM state and #0xff in Thumb state. Obviously we cannot put out the udf mnemonic out because an older gas will not support it. These immediates were chosen to match the values as in other compiler implementations (I know these match with LLVM as something I can point to externally) and have been double checked with folks who have an avid interest in the kernel world. Thanks, Ramana
Re: [PATCH/AARCH64 5/6] Fix TLS for ILP32.
On 12/03/13 21:24, Andrew Pinski wrote: Hi, With ILP32, some simple usage of TLS variables causes an unrecognizable instruction due to needing to use SImode for loading pointers from memory. This fixes the three (tlsie_small, tlsle_small, tlsdesc_small) patterns to support SImode for pointers. OK? Build and tested on aarch64-elf with no regressions. Thanks, Andrew Pinski * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Handle TLS for ILP32. * config/aarch64/aarch64.md (tlsie_small): Change to an expand to handle ILP32. (tlsie_small_mode): New pattern. (tlsle_small): Change to an expand to handle ILP32. (tlsle_small_mode): New pattern. (tlsdesc_small): Change to an expand to handle ILP32. (tlsdesc_small_mode): New pattern. --- gcc/ChangeLog | 12 ++ gcc/config/aarch64/aarch64.c | 23 ++-- gcc/config/aarch64/aarch64.md | 76 ++--- 3 files changed, 94 insertions(+), 17 deletions(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index b1b4eef..a3e4532 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -628,22 +628,37 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm, case SYMBOL_SMALL_TLSDESC: { - rtx x0 = gen_rtx_REG (Pmode, R0_REGNUM); + enum machine_mode mode = GET_MODE (dest); + rtx x0 = gen_rtx_REG (mode, R0_REGNUM); rtx tp; + gcc_assert (mode == Pmode || mode == ptr_mode); + emit_insn (gen_tlsdesc_small (imm)); tp = aarch64_load_tp (NULL); - emit_insn (gen_rtx_SET (Pmode, dest, gen_rtx_PLUS (Pmode, tp, x0))); + + if (mode != Pmode) + tp = gen_lowpart (mode, tp); + + emit_insn (gen_rtx_SET (mode, dest, gen_rtx_PLUS (mode, tp, x0))); set_unique_reg_note (get_last_insn (), REG_EQUIV, imm); return; } case SYMBOL_SMALL_GOTTPREL: { - rtx tmp_reg = gen_reg_rtx (Pmode); + enum machine_mode mode = GET_MODE (dest); + rtx tmp_reg = gen_reg_rtx (mode); rtx tp = aarch64_load_tp (NULL); + + gcc_assert (mode == Pmode || mode == ptr_mode); + emit_insn (gen_tlsie_small (tmp_reg, imm)); - emit_insn (gen_rtx_SET (Pmode, dest, gen_rtx_PLUS (Pmode, tp, tmp_reg))); + + if (mode != Pmode) + tp = gen_lowpart (mode, tp); + + emit_insn (gen_rtx_SET (mode, dest, gen_rtx_PLUS (mode, tp, tmp_reg))); set_unique_reg_note (get_last_insn (), REG_EQUIV, imm); return; } diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 313517f..08fcc94 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3577,35 +3577,85 @@ [(set_attr type call) (set_attr length 16)]) -(define_insn tlsie_small - [(set (match_operand:DI 0 register_operand =r) -(unspec:DI [(match_operand:DI 1 aarch64_tls_ie_symref S)] +(define_expand tlsie_small + [(set (match_operand 0 register_operand =r) +(unspec [(match_operand 1 aarch64_tls_ie_symref S)] + UNSPEC_GOTSMALLTLS))] + +{ + if (TARGET_ILP32) +{ + operands[0] = gen_lowpart (ptr_mode, operands[0]); + emit_insn (gen_tlsie_small_si (operands[0], operands[1])); +} + else +emit_insn (gen_tlsie_small_di (operands[0], operands[1])); + DONE; +}) + +(define_insn tlsie_small_mode + [(set (match_operand:PTR 0 register_operand =r) +(unspec:PTR [(match_operand 1 aarch64_tls_ie_symref S)] UNSPEC_GOTSMALLTLS))] - adrp\\t%0, %A1\;ldr\\t%0, [%0, #%L1] + adrp\\t%0, %A1\;ldr\\t%w0, [%0, #%L1] [(set_attr type load1) (set_attr length 8)] ) -(define_insn tlsle_small - [(set (match_operand:DI 0 register_operand =r) -(unspec:DI [(match_operand:DI 1 register_operand r) - (match_operand:DI 2 aarch64_tls_le_symref S)] + +(define_expand tlsle_small + [(set (match_operand 0 register_operand =r) +(unspec [(match_operand 1 register_operand r) + (match_operand 2 aarch64_tls_le_symref S)] + UNSPEC_GOTSMALLTLS))] + +{ + if (TARGET_ILP32) +{ + rtx temp = gen_reg_rtx (ptr_mode); + operands[1] = gen_lowpart (ptr_mode, operands[1]); + emit_insn (gen_tlsle_small_si (temp, operands[1], operands[2])); + emit_move_insn (operands[0], gen_lowpart (GET_MODE (operands[0]), temp)); +} Looks like you hit the similar issue where the matched RTX can have either SImode or DImode in ILP32. The mechanism looks OK but I think the approach that 'add_losym' adopts is neater, which checks on the mode instead of TARGET_ILP32 and calls gen_add_losym_di or gen_add_losym_si accordingly. Note that the iterator used in add_losym_mode is P instead of PTR. Same for tlsie_small above. + else +emit_insn (gen_tlsle_small_di (operands[0], operands[1],
Re: [PATCH] Don't warn for missing prototypes on inline fns (PR c/54113)
On Wed, Dec 04, 2013 at 06:55:52PM +0100, Jakub Jelinek wrote: On Wed, Dec 04, 2013 at 06:47:09PM +0100, Marek Polacek wrote: In C99, one way how to deal with inline functions is to put definition of the function into header: inline void foo (void) { /* ... */ } and put the declaration into exactly one .c file, with extern keyword (it can also have inline keyword): extern void foo (void); But in this case, we shouldn't issue the missing prototype warning. So the following should suppress that warning in C99 mode, when -fgnu89-inline is not in effect. (But the function could still have the gnu_inline attribute, so it might be better to disable that warning for all inline functions?) A function definition can't have attributes after the (), and start_function is called with the attributes argument, so you can just look through those for gnu_inline attribute. I can, the question is whether we want that. Anyway, this is version which looks for the gnu_inline attribute. 2013-12-04 Marek Polacek pola...@redhat.com PR c/54113 c/ * c-decl.c (start_function): Don't warn for missing prototype for inline functions in C99+. testsuite/ * gcc.dg/pr54113.c: New test. --- gcc/c/c-decl.c.mp3 2013-12-04 17:11:43.063878926 +0100 +++ gcc/c/c-decl.c 2013-12-04 19:13:29.043160116 +0100 @@ -7974,7 +7974,12 @@ start_function (struct c_declspecs *decl old_decl != error_mark_node TREE_PUBLIC (decl1) !MAIN_NAME_P (DECL_NAME (decl1)) - C_DECL_ISNT_PROTOTYPE (old_decl)) + C_DECL_ISNT_PROTOTYPE (old_decl) + !(DECL_DECLARED_INLINE_P (decl1) +flag_isoc99 +!flag_gnu89_inline +!lookup_attribute (gnu_inline, + DECL_ATTRIBUTES (decl1 warning_at (loc, OPT_Wmissing_prototypes, no previous prototype for %qD, decl1); /* Optionally warn of any def with no previous prototype --- gcc/testsuite/gcc.dg/pr54113.c.mp3 2013-12-04 17:52:45.671288940 +0100 +++ gcc/testsuite/gcc.dg/pr54113.c 2013-12-04 18:48:31.012682675 +0100 @@ -0,0 +1,5 @@ +/* { dg-do compile } */ +/* { dg-options -std=c99 -Wmissing-prototypes } */ + +inline int foo (void) { return 42; } /* { dg-bogus no previous prototype } */ +extern int foo(void); Marek
Re: [PATCH] Don't warn for missing prototypes on inline fns (PR c/54113)
On Wed, 4 Dec 2013, Marek Polacek wrote: In C99, one way how to deal with inline functions is to put definition of the function into header: inline void foo (void) { /* ... */ } and put the declaration into exactly one .c file, with extern keyword (it can also have inline keyword): extern void foo (void); But in this case, we shouldn't issue the missing prototype warning. So the following should suppress that warning in C99 mode, when -fgnu89-inline is not in effect. (But the function could still have the gnu_inline attribute, so it might be better to disable that warning for all inline functions?) Regtested/bootstrapped on x86_64-unknown-linux-gnu. Ok for trunk? OK. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Don't warn for missing prototypes on inline fns (PR c/54113)
On Wed, Dec 04, 2013 at 06:22:28PM +, Joseph S. Myers wrote: On Wed, 4 Dec 2013, Marek Polacek wrote: In C99, one way how to deal with inline functions is to put definition of the function into header: inline void foo (void) { /* ... */ } and put the declaration into exactly one .c file, with extern keyword (it can also have inline keyword): extern void foo (void); But in this case, we shouldn't issue the missing prototype warning. So the following should suppress that warning in C99 mode, when -fgnu89-inline is not in effect. (But the function could still have the gnu_inline attribute, so it might be better to disable that warning for all inline functions?) Regtested/bootstrapped on x86_64-unknown-linux-gnu. Ok for trunk? OK. Should I commit the version with or without the lookup for gnu_inline attribute? Thanks, Marek
Re: [PATCH] Don't warn for missing prototypes on inline fns (PR c/54113)
On Wed, 4 Dec 2013, Marek Polacek wrote: I can, the question is whether we want that. Anyway, this is version which looks for the gnu_inline attribute. If anything, I'd think it should apply to all inline functions. The point of this warning is that non-static functions should be declared in header files, separate from their definition outside a header file, and inline functions in general are expected to be defined directly in a header file, so making a separate declaration redundant. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Don't warn for missing prototypes on inline fns (PR c/54113)
On Wed, Dec 04, 2013 at 06:30:37PM +, Joseph S. Myers wrote: On Wed, 4 Dec 2013, Marek Polacek wrote: I can, the question is whether we want that. Anyway, this is version which looks for the gnu_inline attribute. If anything, I'd think it should apply to all inline functions. The point of this warning is that non-static functions should be declared in header files, separate from their definition outside a header file, and inline functions in general are expected to be defined directly in a header file, so making a separate declaration redundant. In that case, I'll apply this one after one more regtest. Thanks. 2013-12-04 Marek Polacek pola...@redhat.com PR c/54113 c/ * c-decl.c (start_function): Don't warn for missing prototype for inline functions. testsuite/ * gcc.dg/pr54113.c: New test. --- gcc/c/c-decl.c.mp3 2013-12-04 17:11:43.063878926 +0100 +++ gcc/c/c-decl.c 2013-12-04 19:33:00.581512253 +0100 @@ -7974,7 +7974,8 @@ start_function (struct c_declspecs *decl old_decl != error_mark_node TREE_PUBLIC (decl1) !MAIN_NAME_P (DECL_NAME (decl1)) - C_DECL_ISNT_PROTOTYPE (old_decl)) + C_DECL_ISNT_PROTOTYPE (old_decl) + !DECL_DECLARED_INLINE_P (decl1)) warning_at (loc, OPT_Wmissing_prototypes, no previous prototype for %qD, decl1); /* Optionally warn of any def with no previous prototype --- gcc/testsuite/gcc.dg/pr54113.c.mp3 2013-12-04 17:52:45.671288940 +0100 +++ gcc/testsuite/gcc.dg/pr54113.c 2013-12-04 18:48:31.012682675 +0100 @@ -0,0 +1,5 @@ +/* { dg-do compile } */ +/* { dg-options -Wmissing-prototypes } */ + +inline int foo (void) { return 42; } /* { dg-bogus no previous prototype } */ +extern int foo(void); Marek
Re: [PATCH 2/2] Implement -fsanitize=signed-integer-overflow (i?86 parts)
On Wed, Dec 4, 2013 at 2:44 PM, Marek Polacek pola...@redhat.com wrote: And this is the i?86 specific part of -fsanitize=signed-integer-overflow, split out of the huge patch. It really is dependent on the generic parts, when commiting, I'll put both parts together. Uros, would you mind taking a look at this? Regtested/bootstrapped on x86_64-linux. Ok for trunk? 2013-12-04 Jakub Jelinek ja...@redhat.com Marek Polacek pola...@redhat.com * config/i386/i386.md (addvmode4, subvmode4, mulvmode4, negvmode3, negvmode3_1): Define expands. (*addvmode4, *subvmode4, *mulvmode4, *negvmode3): Define insns. --- gcc/config/i386/i386.md.mp 2013-12-04 12:15:33.508905947 +0100 +++ gcc/config/i386/i386.md 2013-12-04 12:15:39.608929341 +0100 @@ -6153,6 +6153,42 @@ [(set_attr type alu) (set_attr mode QI)]) +(define_mode_attr widerintmode [(QI HI) (HI SI) (SI DI) (DI TI)]) Please name this widerint and put it just above existing DWI/dwi mode attribute definitions. We will merge them together. + +;; Add with jump on overflow. +(define_expand addvmode4 + [(parallel [(set (reg:CCO FLAGS_REG) + (eq:CCO (plus:widerintmode + (sign_extend:widerintmode +(match_operand:SWI 1 register_operand)) + (sign_extend:widerintmode +(match_operand:SWI 2 general_operand))) + (sign_extend:widerintmode + (plus:SWI (match_dup 1) (match_dup 2) + (set (match_operand:SWI 0 register_operand) + (plus:SWI (match_dup 1) (match_dup 2)))]) + (set (pc) (if_then_else + (eq (reg:CCO FLAGS_REG) (const_int 0)) + (label_ref (match_operand 3)) + (pc)))] + ) Please use nonimmediate_operand for operand 1 and fixup input operands with ix86_fixup_binary_operands_no_copy. Ideally, we could use nonimmediate_operand also for operand 0, but in this case, we would need to fixup output operand _after_ the PLUS pattern is emitted - not worth, IMO. Please also change sub expander below in this way. +(define_insn *addvmode4 + [(set (reg:CCO FLAGS_REG) + (eq:CCO (plus:widerintmode + (sign_extend:widerintmode + (match_operand:SWI 1 nonimmediate_operand %0,0)) + (sign_extend:widerintmode + (match_operand:SWI 2 general_operand g,ri))) + (sign_extend:widerintmode + (plus:SWI (match_dup 1) (match_dup 2) + (set (match_operand:SWI 0 nonimmediate_operand =r,rm) + (plus:SWI (match_dup 1) (match_dup 2)))] + ix86_binary_operator_ok (PLUS, MODEmode, operands) + add{imodesuffix}\t{%2, %0|%0, %2} + [(set_attr type alu) + (set_attr mode MODE)]) + ;; The lea patterns for modes less than 32 bits need to be matched by ;; several insns converted to real lea by splitters. @@ -6390,6 +6426,40 @@ [(set_attr type alu) (set_attr mode SI)]) +;; Subtract with jump on overflow. +(define_expand subvmode4 + [(parallel [(set (reg:CCO FLAGS_REG) + (eq:CCO (minus:widerintmode + (sign_extend:widerintmode +(match_operand:SWI 1 register_operand)) + (sign_extend:widerintmode +(match_operand:SWI 2 general_operand))) + (sign_extend:widerintmode + (minus:SWI (match_dup 1) (match_dup 2) + (set (match_operand:SWI 0 register_operand) + (minus:SWI (match_dup 1) (match_dup 2)))]) + (set (pc) (if_then_else + (eq (reg:CCO FLAGS_REG) (const_int 0)) + (label_ref (match_operand 3)) + (pc)))] + ) + +(define_insn *subvmode4 + [(set (reg:CCO FLAGS_REG) + (eq:CCO (minus:widerintmode + (sign_extend:widerintmode + (match_operand:SWI 1 nonimmediate_operand 0,0)) + (sign_extend:widerintmode + (match_operand:SWI 2 general_operand ri,rm))) + (sign_extend:widerintmode + (minus:SWI (match_dup 1) (match_dup 2) + (set (match_operand:SWI 0 nonimmediate_operand =rm,r) + (minus:SWI (match_dup 1) (match_dup 2)))] + ix86_binary_operator_ok (MINUS, MODEmode, operands) + sub{imodesuffix}\t{%2, %0|%0, %2} + [(set_attr type alu) + (set_attr mode MODE)]) + (define_insn *submode_3 [(set (reg FLAGS_REG) (compare (match_operand:SWI 1 nonimmediate_operand 0,0) @@ -6704,6 +6774,59 @@ (set_attr bdver1_decode direct) (set_attr mode QI)]) +;; Multiply with jump on overflow. +(define_expand mulvmode4 + [(parallel [(set (reg:CCO FLAGS_REG) + (eq:CCO (mult:widerintmode
Re: [PATCH 2/2] Implement -fsanitize=signed-integer-overflow (i?86 parts)
On Wed, Dec 04, 2013 at 07:58:20PM +0100, Uros Bizjak wrote: @@ -8617,6 +8740,49 @@ [(set_attr type negnot) (set_attr mode SI)]) +;; Negate with jump on overflow. +(define_expand negvmode3 + [(parallel [(set (reg:CCO FLAGS_REG) + (ne:CCO (match_operand:SWI 1 register_operand) + (const_int 0))) + (set (match_operand:SWI 0 register_operand) + (neg:SWI (match_dup 1)))]) + (set (pc) (if_then_else + (eq (reg:CCO FLAGS_REG) (const_int 0)) + (label_ref (match_operand 2)) + (pc)))] + +{ + rtx minv = GEN_INT (HOST_WIDE_INT_M1U + (GET_MODE_BITSIZE (MODEmode) - 1)); + emit_insn (gen_negvmode3_1 (operands[0], operands[1], minv, operands[2])); + DONE; +}) No, please use operands[3] = GEN_INT (); and use (match_dup 3) in the pattern. The pattern below is not needed then. My memory is fuzzy about that, but I think that was my first version which didn't work, because with match_dup then it requires on the internal-fn.c side to pass 4 arguments instead of just 3. I can try again though. BTW: can we use gen_int_mode (1 (GET_MODE_BITSIZE (mode) - 1), mode) instead? With HOST_WIDE_INT_1U instead of 1 and s/mode/MODEmode/g perhaps. Jakub
Re: [PATCH i386] Introduce __readeflags () and __writeeflags () intrinsics.
On Wed, Dec 4, 2013 at 5:59 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: MSVC and ICC (currently Windows version, Linux version soon) have dedicated intrinsics to read/set EFLAGS register ([1], [2]). Patch introduces these intrinsics and tests for them. Bootstrapped. New tests pass. Although gate is closed patch is obvious. So, is it ok for trunk? ChangeLog/ * config/i386/ia32intrin.h (__readeflags): New. (__writeeflags): Ditto. testsuite/ChangeLog/ * gcc.target/i386/readeflags-1.c: New. * gcc.target/i386/writeeflags-1.c: Ditto. [1] - http://msdn.microsoft.com/en-us/library/aa983406(v=vs.90).aspx [2] - http://msdn.microsoft.com/en-us/library/aa983392(v=vs.90).aspx -- Thanks, K diff --git a/gcc/config/i386/ia32intrin.h b/gcc/config/i386/ia32intrin.h index b26dc46..c9e68c5 100644 --- a/gcc/config/i386/ia32intrin.h +++ b/gcc/config/i386/ia32intrin.h @@ -238,6 +238,34 @@ __rorq (unsigned long long __X, int __C) return (__X __C) | (__X (64 - __C)); } +/* Read flags register */ +extern __inline unsigned long long +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__readeflags (void) +{ + unsigned long long result = 0; + __asm__ __volatile__ (pushf\n\t + popq %0\n + :=r(result) + : + : + ); + return result; +} + +/* Write flags register */ +extern __inline void +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__writeeflags (unsigned long long X) +{ + __asm__ __volatile__ (pushq %0\n\t + popf\n + : + :r(X) + :flags + ); +} + Oh, no. We don't want assembly in this century ;) The proper implementation is to introduce a __builtin_readflags/__builtin_writeflags that expand the sequence by calling gen_push and gen_pop functions. You will need new patterns for pushfl and popfl, something like: (define_insn *pushflmode [(set (match_operand:DWIH 0 push_operand =) (match_operand:DWIH 0 flags_reg_operand))] pushf{mode} [(set_attr type push) (set_attr mode MODE)]) (define_insn *popflmode1 [(set (match_operand:DWIH 0 flags_reg_operand) (match_operand:DWIH 1 pop_operand ))] popf{imodesuffix}\t%0 [(set_attr type pop) (set_attr mode MODE)]) Uros.
Re: [PATCH 2/2] Implement -fsanitize=signed-integer-overflow (i?86 parts)
On Wed, Dec 4, 2013 at 8:07 PM, Jakub Jelinek ja...@redhat.com wrote: @@ -8617,6 +8740,49 @@ [(set_attr type negnot) (set_attr mode SI)]) +;; Negate with jump on overflow. +(define_expand negvmode3 + [(parallel [(set (reg:CCO FLAGS_REG) + (ne:CCO (match_operand:SWI 1 register_operand) + (const_int 0))) + (set (match_operand:SWI 0 register_operand) + (neg:SWI (match_dup 1)))]) + (set (pc) (if_then_else + (eq (reg:CCO FLAGS_REG) (const_int 0)) + (label_ref (match_operand 2)) + (pc)))] + +{ + rtx minv = GEN_INT (HOST_WIDE_INT_M1U + (GET_MODE_BITSIZE (MODEmode) - 1)); + emit_insn (gen_negvmode3_1 (operands[0], operands[1], minv, operands[2])); + DONE; +}) No, please use operands[3] = GEN_INT (); and use (match_dup 3) in the pattern. The pattern below is not needed then. My memory is fuzzy about that, but I think that was my first version which didn't work, because with match_dup then it requires on the internal-fn.c side to pass 4 arguments instead of just 3. I can try again though. I believe it should work, please see for example expNcorexf3 expander and many of its (match_dup X) expressions. BTW: can we use gen_int_mode (1 (GET_MODE_BITSIZE (mode) - 1), mode) instead? With HOST_WIDE_INT_1U instead of 1 and s/mode/MODEmode/g perhaps. gen_int_mode calls trunc_int_for_mode that is introduced by the comment: /* Truncate and perhaps sign-extend C as appropriate for MODE. */ But, admittedly, I didn't test it... Uros.
Re: [PATCH] Add reference binding instrumentation
On 12/03/2013 02:45 PM, Marek Polacek wrote: You're right. I wanted to use cp_save_expr and/or stabilize_expr, but that didn't work out. So I resorted to restrict the condition a bit and only pass INDIRECT_REFs to the ubsan routine (which, after all, has if (!INDIRECT_REF_P (init)) return init; And in that case, it seems we don't have to worry about multiple evaluation of the initializer. Hmm? You can have an INDIRECT_REF where the operand has side-effect, i.e *f() where f returns a pointer. stabilize_expr ought to work. Your main problem with that was probably that you were trying to call it here, at which point init is just what the user wrote, whereas you want to wait until you have an expression with REFERENCE_TYPE. Try adding the instrumentation in store_init_value instead. Jason
Re: [PATCH 2/2] Implement -fsanitize=signed-integer-overflow (i?86 parts)
On Wed, Dec 04, 2013 at 08:23:22PM +0100, Uros Bizjak wrote: My memory is fuzzy about that, but I think that was my first version which didn't work, because with match_dup then it requires on the internal-fn.c side to pass 4 arguments instead of just 3. I can try again though. Weird, now it works, dunno what I have done differently before. Though, I've discovered a bug in internal-fn.c for the negation case. So is this everything you wanted? 2013-12-04 Jakub Jelinek ja...@redhat.com Marek Polacek pola...@redhat.com * config/i386/i386.md (DWI, dwi): Add QImode and HImode cases. (addvmode4, subvmode4, mulvmode4, negvmode3): New expanders. (*addvmode4, *subvmode4, *mulvmode4, *negvmode3): New insns. * internal-fn.c (ubsan_expand_si_overflow_neg_check): The return value lives in res rather than target. --- gcc/config/i386/i386.md.jj 2013-12-04 12:05:46.689185140 +0100 +++ gcc/config/i386/i386.md 2013-12-04 20:40:25.417309596 +0100 @@ -905,8 +905,8 @@ (define_mode_iterator DWI [(DI !TARGET_ (TI TARGET_64BIT)]) ;; Double word integer modes as mode attribute. -(define_mode_attr DWI [(SI DI) (DI TI)]) -(define_mode_attr dwi [(SI di) (DI ti)]) +(define_mode_attr DWI [(QI HI) (HI SI) (SI DI) (DI TI)]) +(define_mode_attr dwi [(QI hi) (HI si) (SI di) (DI ti)]) ;; Half mode for double word integer modes. (define_mode_iterator DWIH [(SI !TARGET_64BIT) @@ -6160,6 +6160,41 @@ (define_insn *addqi_ext_2 [(set_attr type alu) (set_attr mode QI)]) +;; Add with jump on overflow. +(define_expand addvmode4 + [(parallel [(set (reg:CCO FLAGS_REG) + (eq:CCO (plus:DWI + (sign_extend:DWI +(match_operand:SWI 1 nonimmediate_operand)) + (sign_extend:DWI +(match_operand:SWI 2 general_operand))) + (sign_extend:DWI + (plus:SWI (match_dup 1) (match_dup 2) + (set (match_operand:SWI 0 register_operand) + (plus:SWI (match_dup 1) (match_dup 2)))]) + (set (pc) (if_then_else + (eq (reg:CCO FLAGS_REG) (const_int 0)) + (label_ref (match_operand 3)) + (pc)))] + + ix86_fixup_binary_operands_no_copy (PLUS, MODEmode, operands);) + +(define_insn *addvmode4 + [(set (reg:CCO FLAGS_REG) + (eq:CCO (plus:DWI + (sign_extend:DWI + (match_operand:SWI 1 nonimmediate_operand %0,0)) + (sign_extend:DWI + (match_operand:SWI 2 general_operand g,ri))) + (sign_extend:DWI + (plus:SWI (match_dup 1) (match_dup 2) + (set (match_operand:SWI 0 nonimmediate_operand =r,rm) + (plus:SWI (match_dup 1) (match_dup 2)))] + ix86_binary_operator_ok (PLUS, MODEmode, operands) + add{imodesuffix}\t{%2, %0|%0, %2} + [(set_attr type alu) + (set_attr mode MODE)]) + ;; The lea patterns for modes less than 32 bits need to be matched by ;; several insns converted to real lea by splitters. @@ -6397,6 +6432,41 @@ (define_insn *subsi_2_zext [(set_attr type alu) (set_attr mode SI)]) +;; Subtract with jump on overflow. +(define_expand subvmode4 + [(parallel [(set (reg:CCO FLAGS_REG) + (eq:CCO (minus:DWI + (sign_extend:DWI +(match_operand:SWI 1 nonimmediate_operand)) + (sign_extend:DWI +(match_operand:SWI 2 general_operand))) + (sign_extend:DWI + (minus:SWI (match_dup 1) (match_dup 2) + (set (match_operand:SWI 0 register_operand) + (minus:SWI (match_dup 1) (match_dup 2)))]) + (set (pc) (if_then_else + (eq (reg:CCO FLAGS_REG) (const_int 0)) + (label_ref (match_operand 3)) + (pc)))] + + ix86_fixup_binary_operands_no_copy (MINUS, MODEmode, operands);) + +(define_insn *subvmode4 + [(set (reg:CCO FLAGS_REG) + (eq:CCO (minus:DWI + (sign_extend:DWI + (match_operand:SWI 1 nonimmediate_operand 0,0)) + (sign_extend:DWI + (match_operand:SWI 2 general_operand ri,rm))) + (sign_extend:DWI + (minus:SWI (match_dup 1) (match_dup 2) + (set (match_operand:SWI 0 nonimmediate_operand =rm,r) + (minus:SWI (match_dup 1) (match_dup 2)))] + ix86_binary_operator_ok (MINUS, MODEmode, operands) + sub{imodesuffix}\t{%2, %0|%0, %2} + [(set_attr type alu) + (set_attr mode MODE)]) + (define_insn *submode_3 [(set (reg FLAGS_REG) (compare (match_operand:SWI 1 nonimmediate_operand 0,0) @@ -6711,6 +6781,58 @@ (define_insn *mulqi3_1 (set_attr bdver1_decode direct) (set_attr mode QI)]) +;; Multiply
Re: [Patch, RTL] Eliminate redundant vec_select moves.
On 12/04/13 09:06, Tejas Belagod wrote: Richard Sandiford wrote: Tejas Belagod tbela...@arm.com writes: Richard Sandiford wrote: Tejas Belagod tbela...@arm.com writes: The problem is that one reg rtx can span several hard registers. E.g. (reg:V4SI 32) might represent one 64-bit register (no. 32), but it might instead represent two 32-bit registers (nos. 32 and 33). Obviously the latter's not very likely for vectors this small, but more likely for larger ones (including on NEON IIRC). So if we had 2 32-bit registers being treated as a V4HI, it would be: --3233-- msb lsb msb lsb --32-- for big endian and: --3332-- msb lsb msb lsb --32-- for little endian. Ah, ok, that makes things clearer. Thanks for that. I can't find any helper function that figures out if we're writing partial or full result regs. Would something like REGNO (src) == REGNO (dst) HARD_REGNO_NREGS (src) == HARD_REGNO_NREGS (dst) == 1 be a sane check for partial result regs? Yeah, that should work. I think a more general alternative would be: simplify_subreg_regno (REGNO (src), GET_MODE (src), offset, GET_MODE (dst)) == (int) REGNO (dst) where: offset = GET_MODE_UNIT_SIZE (GET_MODE (src)) * INTVAL (XVECEXP (sel, 0)) That offset is the byte offset of the first selected element from the start of a vector in memory, which is also the way that SUBREG_BYTEs are counted. For little-endian it gives the offset of the lsb of the slice, while for big-endian it gives the offset of the msb (which is also how SUBREG_BYTEs work). The simplify_subreg_regno should cope with both single-register vectors and multi-register vectors. Sorry for the delayed response to this. Thanks for the tip. Here's an improved patch that implements the simplify_sureg_regno () method of eliminating redundant moves. Regarding the test case, I failed to get the ppc back-end to generate RTL pattern that this patch checks for. I can easily write a test case for aarch64(big and little endian) on these lines typedef float float32x4_t __attribute__ ((__vector_size__ (16))); float foo_be (float32x4_t x) { return x[3]; } float foo_le (float32x4_t x) { return x[0]; } where I know that the vector indexing will generate a vec_select on the same src and dst regs that could be optimized away and hence test it. But I'm struggling to get a test case that the ppc altivec back-end will generate such a vec_select for. I see that altivec does not define vec_extract, so a simple indexing like this seems to happen via memory. Also, I don't know enough about the ppc PCS or architecture to write a test that will check for this optimization opportunity on same src and dst hard-registers. Any hints? Me neither, sorry. FWIW, the MIPS tests: typedef float float32x2_t __attribute__ ((__vector_size__ (8))); void bar (float); void foo_be (float32x2_t x) { bar (x[1]); } void foo_le (float32x2_t x) { bar (x[0]); } also exercise it, but I don't think they add anything over the aarch64 versions. I can add them to the testsuite anyway if it helps though. diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c index 0cd0c7e..ca25ce5 100644 --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -1180,6 +1180,22 @@ set_noop_p (const_rtx set) dst = SUBREG_REG (dst); } + /* It is a NOOP if destination overlaps with selected src vector + elements. */ + if (GET_CODE (src) == VEC_SELECT + REG_P (XEXP (src, 0)) REG_P (dst) + HARD_REGISTER_P (XEXP (src, 0)) + HARD_REGISTER_P (dst)) +{ + rtx par = XEXP (src, 1); + rtx src0 = XEXP (src, 0); + HOST_WIDE_INT offset = +GET_MODE_UNIT_SIZE (GET_MODE (src0)) * INTVAL (XVECEXP (par, 0, 0)); + + return simplify_subreg_regno (REGNO (src0), GET_MODE (src0), +offset, GET_MODE (dst)) == (int)REGNO (dst); +} + Since this also (correctly) triggers for vector results, we need to keep the check for consecutive indices that you had originally. (It's always the first index that should be used for the simplify_subreg_regno though.) Looks good to me otherwise, thanks. Thanks Richard. Here is a revised patch. Sorry about the delay - I was investigating to make sure an LRA ICE I was seeing on aarch64 was unrelated to this patch. I've added a test case that I expect to pass for aarch64. I've also added the tests that you suggested for MIPS, but haven't checked for the target because I'm not sure what optimizations happen on MIPS. OK for trunk? Thanks, Tejas. 2013-12-04 Tejas Belagod tejas.bela...@arm.com gcc/ * rtlanal.c (set_noop_p): Return nonzero in case of redundant vec_select for overlapping register lanes. testsuite/ * config/gcc.dg/vect/vect-nop-move.c: New. Per HJ's request please test vect-nop-move on x86/x86_64 and if the redundant move
Re: [PATCH 2/2] Implement -fsanitize=signed-integer-overflow (i?86 parts)
On Wed, Dec 4, 2013 at 9:01 PM, Jakub Jelinek ja...@redhat.com wrote: My memory is fuzzy about that, but I think that was my first version which didn't work, because with match_dup then it requires on the internal-fn.c side to pass 4 arguments instead of just 3. I can try again though. Weird, now it works, dunno what I have done differently before. Though, I've discovered a bug in internal-fn.c for the negation case. So is this everything you wanted? Yes, thanks! 2013-12-04 Jakub Jelinek ja...@redhat.com Marek Polacek pola...@redhat.com * config/i386/i386.md (DWI, dwi): Add QImode and HImode cases. (addvmode4, subvmode4, mulvmode4, negvmode3): New expanders. (*addvmode4, *subvmode4, *mulvmode4, *negvmode3): New insns. * internal-fn.c (ubsan_expand_si_overflow_neg_check): The return value lives in res rather than target. The i386 part is OK. Thanks, Uros.
Re: [C++ Patch] Avoid pairs of error calls in duplicate_decls
OK, thanks. Jason
Re: Ping: [tilegx] Avoid genrecog warning
On 12/04/13 11:01, Richard Sandiford wrote: Ping for this patch, which is the only one of the series that hasn't been approved. Thanks, Richard Richard Sandiford rdsandif...@googlemail.com writes: I have a patch to upgrade most genrecog warnings into errors. This patch fixes those for tilegx. There seemed to be two sources of warnings: - the intrinsics often used matched pointer_operands in an addition, so that the destination accepted constant pointers. I think the direct translation would be pmode_register_operand, but since these additions have a specific mode, I think a modeful register_operand is more natural. - some instructions used reg_or_0_operand as a destination. Tested by building tilegx-elf with the warnings turned to errors, and by comparing the before and after assembly output at -O2 for gcc.c-torture, gcc.dg and g++.dg. OK to install? Thanks, Richard gcc/ * config/tilegx/tilegx.md (insn_ld_addbitsuffix): Use register_operand rather than pointer_operand. Add modes to the operands. (insn_ldna_addbitsuffix): Likewise. (insn_ldI124MODE:ns_addI48MODE:bitsuffix): Likewise. (insn_ldnt_addbitsuffix): Likewise. (insn_ldntI124MODE:ns_addI48MODE:bitsuffix): Likewise. (insn_ld_add_L2bitsuffix): Likewise. (insn_ldna_add_L2bitsuffix): Likewise. (insn_ldI124MODE:ns_add_L2I48MODE:bitsuffix): Likewise. (insn_ldnt_add_L2bitsuffix): Likewise. (insn_ldntI124MODE:ns_add_L2I48MODE:bitsuffix): Likewise. (insn_ld_add_missbitsuffix): Likewise. (insn_ldna_add_missbitsuffix): Likewise. (insn_ldI124MODE:ns_add_missI48MODE:bitsuffix): Likewise. (insn_ldnt_add_missbitsuffix): Likewise. (insn_ldntI124MODE:ns_add_missI48MODE:bitsuffix): Likewise. (insn_st_addbitsuffix): Likewise. (insn_stI124MODE:n_addI48MODE:bitsuffix): Likewise. (*insn_stI124MODE:n_addI48MODE:bitsuffix): Likewise. (insn_stnt_addbitsuffix): Likewise. (insn_stntI124MODE:n_addI48MODE:bitsuffix): Likewise. (*insn_stntI124MODE:n_addI48MODE:bitsuffix): Likewise. (vec_pack_pack_optab_v4hi): Use register_operand rather than reg_or_0_operand for operand 0. (insn_v2pack_insn): Likewise. (vec_pack_hipart_v4hi): Likewise. (insn_v2packh): Likewise. (vec_pack_ssat_v2si): Likewise. (insn_v4packsc): Likewise. This looks pretty mechanical. Hopefully there wasn't a compelling reason to use pointer_operand instead of either register_operand or other alternatives. Let's give Walt a bit more time to chime in just in case there was a particular reason for the prior choice of pointer_operand. Perhaps Monday morning. If you haven't heard from Walt by then, consider the patch approved by me. jeff