Re: std::regex: inserting std::wregex to std::vector loses some std::wregex values
On Tue, Sep 16, 2014 at 5:28 PM, Tim Shen tims...@google.com wrote: So I'll change the patch to move _M_traits to _NFA, and add a new basic_regex::_M_loc member. Here it is :). Bootstrapped and tested with debug flag. Should the abi compatible fix be another patch for branch 4.9? In which the move ctor is not noexcept and calls the copy ctor? I'll make another patch for it. -- Regards, Tim Shen commit 58b73dfbd04eefcfa4a1ff570e38de83b2f0daa9 Author: Tim Shen tims...@google.com Date: Sun Sep 21 16:23:13 2014 -0700 PR libstdc++/63199 * include/bits/regex.h (basic_regex::basic_regex, basic_regex::assign, basic_regex::imbue, basic_regex::getloc, basic_regex::swap): Add _M_loc for basic_regex. * include/bits/regex_automaton.h: Add _M_traits for _NFA. * include/bits/regex_compiler.h (_Compiler::_M_get_nfa, __compile_nfa): Make _Compiler::_M_nfa heap allocated. * include/bits/regex_compiler.tcc (_Compiler::_Compiler): Make _Compiler::_M_nfa heap allocated. * include/bits/regex_executor.h (_Executor::_M_is_word): Fix accessing _M_traits. * include/bits/regex_executor.tcc (_Executor::_M_dfs): Fix accessing _M_traits. * testsuite/28_regex/algorithms/regex_match/ecma/wchar_t/63199.cc: New testcase. diff --git a/libstdc++-v3/include/bits/regex.h b/libstdc++-v3/include/bits/regex.h index 5205089..4ec20d7 100644 --- a/libstdc++-v3/include/bits/regex.h +++ b/libstdc++-v3/include/bits/regex.h @@ -64,7 +64,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION inline std::shared_ptr_NFA_TraitsT __compile_nfa(const typename _TraitsT::char_type* __first, const typename _TraitsT::char_type* __last, - const _TraitsT __traits, + const typename _TraitsT::locale_type __loc, regex_constants::syntax_option_type __flags); _GLIBCXX_END_NAMESPACE_VERSION @@ -433,7 +433,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION * character sequence. */ basic_regex() - : _M_flags(ECMAScript), _M_automaton(nullptr) + : _M_flags(ECMAScript), _M_loc(), _M_original_str(), _M_automaton(nullptr) { } /** @@ -481,10 +481,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION * * @param __rhs A @p regex object. */ - basic_regex(const basic_regex __rhs) noexcept - : _M_flags(__rhs._M_flags), _M_traits(__rhs._M_traits), - _M_automaton(std::move(__rhs._M_automaton)) - { } + basic_regex(basic_regex __rhs) noexcept = default; /** * @brief Constructs a basic regular expression from the string @@ -520,12 +517,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION basic_regex(_FwdIter __first, _FwdIter __last, flag_type __f = ECMAScript) : _M_flags(__f), + _M_loc(), _M_original_str(__first, __last), - _M_automaton(__detail::__compile_nfa(_M_original_str.c_str(), - _M_original_str.c_str() -+ _M_original_str.size(), - _M_traits, - _M_flags)) + _M_automaton(__detail::__compile_nfa_Rx_traits( + _M_original_str.c_str(), + _M_original_str.c_str() + _M_original_str.size(), + _M_loc, + _M_flags)) { } /** @@ -662,9 +660,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_flags = __flags; _M_original_str.assign(__s.begin(), __s.end()); auto __p = _M_original_str.c_str(); - _M_automaton = __detail::__compile_nfa(__p, -__p + _M_original_str.size(), -_M_traits, _M_flags); + _M_automaton = __detail::__compile_nfa_Rx_traits( + __p, + __p + _M_original_str.size(), + _M_loc, + _M_flags); return *this; } @@ -728,9 +728,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION locale_type imbue(locale_type __loc) { - auto __ret = _M_traits.imbue(__loc); - this-assign(_M_original_str, _M_flags); - return __ret; + std::swap(__loc, _M_loc); + if (_M_automaton != nullptr) + this-assign(_M_original_str, _M_flags); + return __loc; } /** @@ -739,7 +740,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION */ locale_type getloc() const - { return _M_traits.getloc(); } + { return _M_loc; } // [7.8.6] swap /** @@ -751,7 +752,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION swap(basic_regex __rhs) { std::swap(_M_flags, __rhs._M_flags); - std::swap(_M_traits, __rhs._M_traits); + std::swap(_M_loc, __rhs._M_loc); + std::swap(_M_original_str, __rhs._M_original_str); std::swap(_M_automaton,
Re: [PATCH] Do not remove labels with LABEL_PRESERVE_P
2014-09-23 20:06 GMT+04:00 Jeff Law l...@redhat.com: On 09/23/14 10:01, Steven Bosscher wrote: On Fri, Sep 19, 2014 at 10:03 PM, Jeff Law l...@redhat.com wrote: On 09/19/14 13:36, Ilya Enkovich wrote: Hi, During my work on enabling pseudo PIC register I've found that cfg cleaunp may remove lables with LABEL_PRESERVE_P set to 1. In my case I generated SET_RIP during expand pass and cfg cleanup removed label it used as an operand. Below is a patch that fixes it. It is not actually required for our latest PIC related patch but still seems to make sense. Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- 2014-09-19 Ilya Enkovich ilya.enkov...@intel.com * cfgcleanup.c (try_optimize_cfg): Do not remove label with LABEL_PRESERVE_P flag set. OK. Please install. Note for those not following the x86 32 bit PIC register discussion, I asked Ilya to submit this separately. It was something an earlier version of his patch triggered and it stood out as something that ought to be fixed regardless of the final form of the PIC register changes that are in progress. Jeff, Are you sure this patch is necessary, and is not just papering over another problem? In the past, all cases I've seen where labels were removed inadvertently were caused by incorrect reference counting or missing REG_LABEL_* notes. Description of LABEL_PRESERVE_P says label that should always be considered to be needed. That means even if we do not have any usages we shouldn't remove it. Why can't we add some additional usages later? Did the label use count drop to zero? Is there a REG_LABEL_TARGET note for the label operand? In the current code of ix86_expand_prologue I don't see any notes generation for set_rip_rex64 instruction which actually uses label. But IMO this is another potential issue and we still shouldn't remove labels with LABEL_PRESERVE_P. The way it was described to me is, yes, the label count dropped to zero. In simplest terms, it was a single use label that was marked with LABEL_PRESERVE_P. The combiner removed the last reference, then cfgcleanup came along and *boom*. There was also another case in 64bit target with large code model where I had combiner unrelated problem with removed label used by still existing set_rip_rex64. Ilya It was with some ongoing development work that's going in a slight different direction, so we don't have a testcase to include. jeff
Re: Enable EBX for x86 in 32bits PIC code
2014-09-23 20:10 GMT+04:00 Jeff Law l...@redhat.com: On 09/23/14 10:03, Jakub Jelinek wrote: On Tue, Sep 23, 2014 at 10:00:00AM -0600, Jeff Law wrote: On 09/23/14 08:34, Jakub Jelinek wrote: On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote: use fixed EBX at least until we make sure pseudo PIC doesn't harm debug info generation. If we have such option then gcc.target/i386/pic-1.c and For debug info, it seems you are already handling this in delegitimize_address target hook, I'd suggest just building some very large shared library at -O2 -g -fpic on i?86 and either look at the sizes of .debug_info/.debug_loc sections with/without the patch, or use the locstat utility from elfutils (talk to Petr Machata if needed). Can't hurt, but I really don't see how changing from a fixed to an allocatable register is going to muck up debug info in any significant way. What matters is if the delegitimize_address target hook is as efficient in delegitimization as before. E.g. if it previously matched only when seeing %ebx + gotoff or similar, and wouldn't match anything now, some vars could have debug locations including UNSPEC and be dropped on the floor. Ah, yea, that makes sense. jeff After register allocation we have no idea where GOT address is and therefore delegitimize_address target hook becomes less efficient and cannot remove UNSPECs. That's what I see now when build GCC with patch applied: ../../../../gcc/libgfortran/generated/sum_r4.c: In function 'msum_r4': ../../../../gcc/libgfortran/generated/sum_r4.c:195:1: note: non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location msum_r4 (gfc_array_r4 * const restrict retarray, ^ ../../../../gcc/libgfortran/generated/sum_r4.c:195:1: note: non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location ../../../../gcc/libgfortran/generated/sum_r4.c:195:1: note: non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location ../../../../gcc/libgfortran/generated/sum_r4.c:195:1: note: non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location ../../../../gcc/libgfortran/generated/sum_r8.c: In function 'msum_r8': ../../../../gcc/libgfortran/generated/sum_r8.c:195:1: note: non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location msum_r8 (gfc_array_r8 * const restrict retarray, ^ ../../../../gcc/libgfortran/generated/sum_r8.c:195:1: note: non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location ../../../../gcc/libgfortran/generated/sum_r8.c:195:1: note: non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location ../../../../gcc/libgfortran/generated/sum_r8.c:195:1: note: non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location Ilya
Re: [PATCH, i386, Pointer Bounds Checker 33/x] MPX ABI
2014-09-23 22:01 GMT+04:00 Jeff Law l...@redhat.com: On 09/23/14 00:31, Ilya Enkovich wrote: I did this change a couple of years ago and don't remember exactly what problem was caused by PARALLEL. But from my comment it seems parallel lead to values in BND0 and BND1 not to be actually defined by call from DF point of view. I'll try to reproduce a problem I had. Please do. That would indicate a bug in the DF infrastructure. I'm not real familiar with the DF implementation, but a quick glance at df_def_record_1 seems to indicate it's got support for a set destination being a PARALLEL. This kind of scheme also doesn't tend to play well with exception handling scheduling becuase you can't guarantee the sets and the call are in the same block and scheduler as a single group. How can the sets and the call no be in the same block/group if all of them are parts of a single instruction? Obviously in the cases where we've had these problems in the past they were distinct instructions. So EH interactions isn't going to be an issue for MPX. However, we've still got the problem that the RTL you've generated is ill-formed. If I understand things correctly, the assignments are the result of the call, that should be modeled by having the destination be a PARALLEL as mentioned earlier. OK. Will try it. BTW call_value_pop patterns have two sets. One for returned value and one for stack register. How comes it differs much from what I do with bound regs? Thanks, Ilya Jeff
Re: [gomp4] OpenACC wait directive
Hi Cesar! Thank you for the patch! On 24.09.2014 02:29, Cesar Philippidis wrote: This patch adds support for the async clause in the wait directive in fortran. It should be pretty straight forward. The fortran FE already supports the wait directive, but the async clause was introduced to the wait directive in OpenACC 2.0 and that was missing in gomp-4_0-branch. Yes, I've mostly focused on spec. ver. 1.0. Is this OK for gomp-4_0-branch? No, it isn't. According to the spec and this presentation: http://www.pgroup.com/lit/presentations/cea-3.pdf (See slide 1-35) it is possible to write construction like: !$acc wait(1) async(2) However, your patch doesn't support this. Also, don't forget to check whether a queue waits itself (for example, wait(1) async(1)). In addition, it breaks current support of the directive (for example, wait(1)). Note that this patch doesn't actually implement the async or wait clause in the middle end yet, because that requires additional runtime support. Thanks, Cesar -- Ilmir.
Re: [PATCH, Pointer Bounds Checker 22/x] Inline
2014-09-23 23:55 GMT+04:00 Jeff Law l...@redhat.com: On 08/18/14 09:35, Ilya Enkovich wrote: Here is an updated version. Thanks, Ilya -- 2014-08-15 Ilya Enkovich ilya.enkov...@intel.com * ipa-inline.c (early_inliner): Check edge has summary allocated. * tree-inline.c: Include tree-chkp.h. (declare_return_variable): Add arg holding returned bounds slot. Create and initialize returned bounds var. (remap_gimple_stmt): Handle returned bounds. Return sequence of statements instead of a single statement. (insert_init_stmt): Add declaration. (remap_gimple_seq): Adjust to new remap_gimple_stmt signature. (copy_bb): Adjust to changed return type of remap_gimple_stmt. (expand_call_inline): Handle returned bounds. Add bounds copy for generated mem to mem assignments. * tree-inline.h (copy_body_data): Add fields retbnd and assign_stmts. * cgraph.c: Include tree-chkp.h. (cgraph_redirect_edge_call_stmt_to_callee): Support returned bounds. * value-prof.c: Include tree-chkp.h. (gimple_ic): Support returned bounds. OK for the trunk. FWIW, when building up gimple (or RTL if you were ever to do that one day), it's sometimes helpful to the reviewer to show what you're doing. For example, it took me a bit of time to realize that you needed the output from the direct call as an argument to the duplicated RETBND statement. It looked for quite a while like you'd simply made a mistake. Got it. Will try to give more useful descriptions for my patches in the future. I'm a bit curious why you removed the original RETBND statement in value-prof, only to reinsert it. Is there some reason you needed to do that? After call transformation we have smth like that: if (confition) new_lhs = direct_call (...); else old_lhs = call (...); old_bnd = __builtin_retbnd (old_lhs); Original retbnd statement removal + reinsertion is used to transform it into: if (confition) new_lhs = direct_call (...); else { old_lhs = call (...); old_bnd = __builtin_retbnd (old_lhs); } The rest of code inserts bounds for new_lhs and creates phi node for bounds similar to what is done for call return value. Thanks, Ilya Richi -- in response to your comment about working around a bug earlier in this thread. As Ilya mentioned, he just cloned existing practice in that code for creating the copy of the call. Jeff
Re: [PATCH] Fix PR63266: Keep track of impact of sign extension in bswap
On Tue, Sep 16, 2014 at 12:24 PM, Thomas Preud'homme thomas.preudho...@arm.com wrote: Hi all, The fix for PR61306 disabled bswap when a sign extension is detected. However this led to a test case regression (and potential performance regression) in case where a sign extension happens but its effect is canceled by other bit manipulation. This patch aims to fix that by having a special marker to track bytes whose value is unpredictable due to sign extension. If the final result of a bit manipulation doesn't contain any such marker then the bswap optimization can proceed. Nice and simple idea. Ok. Thanks, Richard. *** gcc/ChangeLog *** 2014-09-15 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/63266 * tree-ssa-math-opts.c (struct symbolic_number): Add comment about marker for unknown byte value. (MARKER_MASK): New macro. (MARKER_BYTE_UNKNOWN): New macro. (HEAD_MARKER): New macro. (do_shift_rotate): Mark bytes with unknown values due to sign extension when doing an arithmetic right shift. Replace hardcoded mask for marker by new MARKER_MASK macro. (find_bswap_or_nop_1): Likewise and adjust ORing of two symbolic numbers accordingly. *** gcc/testsuite/ChangeLog *** 2014-09-15 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/63266 * gcc.dg/optimize-bswapsi-1.c (swap32_d): New bswap pass test. Testing: * Built an arm-none-eabi-gcc cross-compiler and used it to run the testsuite on QEMU emulating Cortex-M3 without any regression * Bootstrapped on x86_64-linux-gnu target and testsuite was run without regression Ok for trunk?
Re: [PATCH, testsuite]: PR 58757: Check for FP denormal values without triggering denormal exceptions
On Tue, Sep 23, 2014 at 8:40 PM, Marc Glisse marc.gli...@inria.fr wrote: Attached patch avoids triggering denormal exceptions when FP insns are used to check for non-zero denormal values. But I thought the point of the test was to verify that the compiler's understanding of existence of subnormal values was consistent with the processor. If the processor is in a mode supporting such values, the exceptions should be masked. That is, the present test should pass unconditionally, if it doesn't pass that indicates a bug (which might be appropriate for XFAILing). Alpha needs special instruction mode to process denormals. Without this special mode the insn traps as soon as denormal value is processed. Yes, but I thought the point of that PR was that unless -mieee was given to support such values, *_TRUE_MIN should be the same as *_MIN, reflecting that they aren't supported. And so the failure is showing that this bug is present (and so XFAILing with a comment referring to the bug is appropriate, rather than changing the test to pass). That's also my understanding, I am sorry Uros that I wasn't clear enough in the PR... I see the intention now. However, alpha *does* support all IEEE features, the only problem is in its default model, which is for some reason High-Performance IEEE-Format Arithmetic (please see alpha AHB [1], section 4.7.6.5). This model does not require the overhead of an operating system completion handler and can be the fastest of the three IEEE models.. Unfortunately, this model also notifies applications of all exceptional floating-point operations. Denormals are considered non-finite IEEE values, so they trap. When the target is in certain high-speed mode, it is up to the user to obey all the limitations, in this particular case, that only IEEE finite numbers are provided. This is not the case with the original testcase, so I'd say that the test is out of specs. It beats me, why -mieee is not the default on alpha, since current default suits -ffast-math more, but it looks that we have to live with this mess. To avoid traps on denormals, -mieee has to be specified. This option enables FP software completion that completes denormal handling, so there is no need to notify application IMO, instead of XFAILing the test, we should simply provide -mieee. __*_DENORM_MIN__ should indeed apply to the underlying FP format, not to sme target-dependent model and its implementation details. [1] http://www.compaq.com/cpq-alphaserver/technology/literature/alphaahb.pdf Uros.
Re: [PATCH, Pointer Bounds Checker 19/x] Support bounds in expand
2014-09-24 0:58 GMT+04:00 Jeff Law l...@redhat.com: On 06/05/14 08:46, Ilya Enkovich wrote: 2014-06-05 Ilya Enkovich ilya.enkov...@intel.com * calls.c: Include tree-chkp.h, rtl-chkp.h, bitmap.h. (arg_data): Add fields special_slot, pointer_arg and pointer_offset. (store_bounds): New. (emit_call_1): Propagate instrumentation flag for CALL. (initialize_argument_information): Compute pointer_arg, pointer_offset and special_slot for pointer bounds arguments. (finalize_must_preallocate): Preallocate when storing bounds in bounds table. (compute_argument_addresses): Skip pointer bounds. (expand_call): Store bounds into tables separately. Return result joined with resulting bounds. * cfgexpand.c: Include tree-chkp.h, rtl-chkp.h. (expand_call_stmt): Propagate bounds flag for CALL_EXPR. (expand_return): Add returned bounds arg. Handle returned bounds. (expand_gimple_stmt_1): Adjust to new expand_return signature. (gimple_expand_cfg): Reset rtx bounds map. * expr.c: Include tree-chkp.h, rtl-chkp.h. (expand_assignment): Handle returned bounds. (store_expr_with_bounds): New. Replaces store_expr with new bounds target argument. Handle bounds returned by calls. (store_expr): Now wraps store_expr_with_bounds. * expr.h (store_expr_with_bounds): New. * function.c: Include tree-chkp.h, rtl-chkp.h. (bounds_parm_data): New. (use_register_for_decl): Do not registerize decls used for bounds stores and loads. (assign_parms_augmented_arg_list): Add bounds of the result structure pointer as the second argument. (assign_parm_find_entry_rtl): Mark bounds are never passed on the stack. (assign_parm_is_stack_parm): Likewise. (assign_parm_load_bounds): New. (assign_bounds): New. (assign_parms): Load bounds and determine a location for returned bounds. (diddle_return_value_1): New. (diddle_return_value): Handle returned bounds. * function.h (rtl_data): Add field for returned bounds. diff --git a/gcc/calls.c b/gcc/calls.c index e1dc8eb..5fbbe9f 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -44,11 +44,14 @@ along with GCC; see the file COPYING3. If not see #include tm_p.h #include timevar.h #include sbitmap.h +#include bitmap.h #include langhooks.h #include target.h #include cgraph.h #include except.h #include dbgcnt.h +#include tree-chkp.h +#include rtl-chkp.h /* Like PREFERRED_STACK_BOUNDARY but in units of bytes, not bits. */ #define STACK_BYTES (PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT) @@ -76,6 +79,15 @@ struct arg_data /* If REG is a PARALLEL, this is a copy of VALUE pulled into the correct form for emit_group_move. */ rtx parallel_value; + /* If value is passed in neither reg nor stack, this field holds a number + of a special slot to be used. */ + rtx special_slot; I really dislike special_slot and the comment here. The comment that it's neither a reg nor stack is just bogus. What hardware resource does special_slot refer to? It's a register, but one that we do not typically expose. Let's at least clarify the comment and then we'll see if something other than special_slot as a name makes sense. Yes, I realize this is a bit of bikeshedding, but when the comments/terminology is confusing, the code becomes even harder to understand. Special slot is not a register. When bounds are passed in a register then everything work as if we pass any other argument in a register. Special slot is used when we are out of bounds registers and pass bounds for pointer passed in a register. It doesn't refer to any hardware resource. In MPX ABI we state that special Bounds Table entries (related to stack pointer value (and lower) right before a call) are used. In software implementation it also may be some other places like vars in TLS. I'm a bit concerned that this is exposing more details of the MPX implementation than is advisable to the front/middle end. On the other hand, I'd expect any other implementation that seeks to work in a transparent manner is going to have many of the same implementation properties as we see with MPX, so perhaps it's not a major problem. I'm trying to not introduce any hardware dependencies into middle end. Several months ago I created a simple prototype of generic target support in Pointer Bounds Checker which used library calls instead of MPX instructions, TLS for bounds passing etc. I did it to check our design is not bound to MPX and allows such implementation. It was very useful and showed some MPX details soaked into GIMPLE part. E.g. chkp_initialize_bounds and chkp_make_bounds_constant hooks appeared during that work. Special slots mechanism worked well
[PATCH i386 AVX512] [51/n] Add pd2dq and dq2pd converts.
Hello, Patch in the bottom adds support for pd2dq and dq2pd conversions. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/i386.c (avx512f_ufix_notruncv8dfv8si_mask_round): Rename to ... (ufix_notruncv8dfv8si2_mask_round): this. * config/i386/sse.md (define_insn avx512f_cvtdq2pd512_2): Update TARGET check. (define_insn avx_cvtdq2pd256_2): Add EVEX version. (define_insn sse2_cvtdq2pdmask_name): Add masking. (define_insn avx_cvtpd2dq256mask_name): Ditto. (define_expand sse2_cvtpd2dq): Delete. (define_insn sse2_cvtpd2dqmask_name): Add masking. (define_insn avx512f_ufix_notruncv8dfv8simask_nameround_name): Delete. (define_mode_attr pd2udqsuff): New. (define_insn ufix_notruncmodesi2dfmodelower2mask_nameround_name): Ditto. (define_insn ufix_notruncv2dfv2si2mask_name): Ditto. (define_insn *avx_cvttpd2dq256_2): Delete. (define_expand sse2_cvttpd2dq): Ditto. (define_insn sse2_cvttpd2dqmask_name): Add masking. -- Thanks, K diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index d70420d..1aec70f 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -30246,7 +30246,7 @@ static const struct builtin_description bdesc_round_args[] = { OPTION_MASK_ISA_AVX512F, CODE_FOR_floatv16siv16sf2_mask_round, __builtin_ia32_cvtdq2ps512_mask, IX86_BUILTIN_CVTDQ2PS512, UNKNOWN, (int) V16SF_FTYPE_V16SI_V16SF_HI_INT }, { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_cvtpd2dq512_mask_round, __builtin_ia32_cvtpd2dq512_mask, IX86_BUILTIN_CVTPD2DQ512, UNKNOWN, (int) V8SI_FTYPE_V8DF_V8SI_QI_INT }, { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_cvtpd2ps512_mask_round, __builtin_ia32_cvtpd2ps512_mask, IX86_BUILTIN_CVTPD2PS512, UNKNOWN, (int) V8SF_FTYPE_V8DF_V8SF_QI_INT }, - { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_ufix_notruncv8dfv8si_mask_round, __builtin_ia32_cvtpd2udq512_mask, IX86_BUILTIN_CVTPD2UDQ512, UNKNOWN, (int) V8SI_FTYPE_V8DF_V8SI_QI_INT }, + { OPTION_MASK_ISA_AVX512F, CODE_FOR_ufix_notruncv8dfv8si2_mask_round, __builtin_ia32_cvtpd2udq512_mask, IX86_BUILTIN_CVTPD2UDQ512, UNKNOWN, (int) V8SI_FTYPE_V8DF_V8SI_QI_INT }, { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_vcvtph2ps512_mask_round, __builtin_ia32_vcvtph2ps512_mask, IX86_BUILTIN_CVTPH2PS512, UNKNOWN, (int) V16SF_FTYPE_V16HI_V16SF_HI_INT }, { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_fix_notruncv16sfv16si_mask_round, __builtin_ia32_cvtps2dq512_mask, IX86_BUILTIN_CVTPS2DQ512, UNKNOWN, (int) V16SI_FTYPE_V16SF_V16SI_HI_INT }, { OPTION_MASK_ISA_AVX512F, CODE_FOR_avx512f_cvtps2pd512_mask_round, __builtin_ia32_cvtps2pd512_mask, IX86_BUILTIN_CVTPS2PD512, UNKNOWN, (int) V8DF_FTYPE_V8SF_V8DF_QI_INT }, diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 287fd11..b2e1d4f 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -4463,33 +4463,33 @@ (const_int 2) (const_int 3) (const_int 4) (const_int 5) (const_int 6) (const_int 7)]] - TARGET_AVX + TARGET_AVX512F vcvtdq2pd\t{%t1, %0|%0, %t1} [(set_attr type ssecvt) (set_attr prefix evex) (set_attr mode V8DF)]) (define_insn avx_cvtdq2pd256_2 - [(set (match_operand:V4DF 0 register_operand =x) + [(set (match_operand:V4DF 0 register_operand =v) (float:V4DF (vec_select:V4SI - (match_operand:V8SI 1 nonimmediate_operand xm) + (match_operand:V8SI 1 nonimmediate_operand vm) (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)]] TARGET_AVX vcvtdq2pd\t{%x1, %0|%0, %x1} [(set_attr type ssecvt) - (set_attr prefix vex) + (set_attr prefix maybe_evex) (set_attr mode V4DF)]) -(define_insn sse2_cvtdq2pd - [(set (match_operand:V2DF 0 register_operand =x) +(define_insn sse2_cvtdq2pdmask_name + [(set (match_operand:V2DF 0 register_operand =v) (float:V2DF (vec_select:V2SI - (match_operand:V4SI 1 nonimmediate_operand xm) + (match_operand:V4SI 1 nonimmediate_operand vm) (parallel [(const_int 0) (const_int 1)]] - TARGET_SSE2 - %vcvtdq2pd\t{%1, %0|%0, %q1} + TARGET_SSE2 mask_avx512vl_condition + %vcvtdq2pd\t{%1, %0mask_operand2|%0mask_operand2, %q1} [(set_attr type ssecvt) (set_attr prefix maybe_vex) (set_attr ssememalign 64) @@ -4506,14 +4506,14 @@ (set_attr prefix evex) (set_attr mode OI)]) -(define_insn avx_cvtpd2dq256 - [(set (match_operand:V4SI 0 register_operand =x) - (unspec:V4SI [(match_operand:V4DF 1 nonimmediate_operand xm)] +(define_insn avx_cvtpd2dq256mask_name + [(set (match_operand:V4SI 0 register_operand =v) + (unspec:V4SI [(match_operand:V4DF 1 nonimmediate_operand vm)] UNSPEC_FIX_NOTRUNC))] - TARGET_AVX - vcvtpd2dq{y}\t{%1, %0|%0, %1}
[PATCH i386 AVX512] [52/n] Add convert ps2pd and ps2dq.
Hello, Patch in the bottom adds support for ps2dq and ps2pd conversions. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_c_enum unspec): Add UNSPEC_CVTINT2MASK. (define_insn fixsuffixfix_truncmodesselongvecmodelower2mask_nameround_saeonly_name): New. (define_insn fixsuffixfix_truncv2sfv2di2mask_name): Ditto. (define_insn ufix_truncmodesseintvecmodelower2mask_name): Ditto. (define_insn sse2_cvtss2sdround_saeonly_name): Change nonimmediate_operand to round_saeonly_nimm_predicate. (define_insn avx_cvtpd2ps256mask_name): Add masking. (define_expand sse2_cvtpd2ps_mask): New. (define_insn *sse2_cvtpd2psmask_name): Add masking. (define_insn avx512_cvtssemodesuffix2maskmode): New. (define_insn avx512_cvtmask2ssemodesuffixmode): Ditto. (define_insn sse2_cvtps2pdmask_name): Add masking. -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index b2e1d4f..c9d6e00 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -132,6 +132,7 @@ ;; For AVX512BW support UNSPEC_PSHUFHW UNSPEC_PSHUFLW + UNSPEC_CVTINT2MASK ;; For AVX512DQ support UNSPEC_REDUCE @@ -4659,6 +4660,38 @@ (set_attr prefix evex) (set_attr mode sseintvecmode2)]) +(define_insn fixsuffixfix_truncmodesselongvecmodelower2mask_nameround_saeonly_name + [(set (match_operand:sselongvecmode 0 register_operand =v) + (any_fix:sselongvecmode + (match_operand:VF1_128_256VL 1 round_saeonly_nimm_predicate round_saeonly_constraint)))] + TARGET_AVX512DQ round_saeonly_modev8sf_condition + vcvttps2fixsuffixqq\t{round_saeonly_mask_op2%1, %0mask_operand2|%0mask_operand2, %1round_saeonly_mask_op2} + [(set_attr type ssecvt) + (set_attr prefix evex) + (set_attr mode sseintvecmode3)]) + +(define_insn fixsuffixfix_truncv2sfv2di2mask_name + [(set (match_operand:V2DI 0 register_operand =v) + (any_fix:V2DI + (vec_select:V2SF + (match_operand:V4SF 1 nonimmediate_operand vm) + (parallel [(const_int 0) (const_int 1)]] + TARGET_AVX512DQ TARGET_AVX512VL + vcvttps2fixsuffixqq\t{%1, %0mask_operand2|%0mask_operand2, %1} + [(set_attr type ssecvt) + (set_attr prefix evex) + (set_attr mode TI)]) + +(define_insn ufix_truncmodesseintvecmodelower2mask_name + [(set (match_operand:sseintvecmode 0 register_operand =v) + (unsigned_fix:sseintvecmode + (match_operand:VF1_128_256VL 1 nonimmediate_operand vm)))] + TARGET_AVX512VL + vcvttps2udq\t{%1, %0mask_operand2|%0mask_operand2, %1} + [(set_attr type ssecvt) + (set_attr prefix evex) + (set_attr mode sseintvecmode2)]) + (define_expand avx_cvttpd2dq256_2 [(set (match_operand:V8SI 0 register_operand) (vec_concat:V8SI @@ -4713,7 +4746,7 @@ (vec_merge:V2DF (float_extend:V2DF (vec_select:V2SF - (match_operand:V4SF 2 nonimmediate_operand x,m,round_saeonly_constraint) + (match_operand:V4SF 2 round_saeonly_nimm_predicate x,m,round_saeonly_constraint) (parallel [(const_int 0) (const_int 1)]))) (match_operand:V2DF 1 register_operand 0,0,v) (const_int 1)))] @@ -4741,14 +4774,14 @@ (set_attr prefix evex) (set_attr mode V8SF)]) -(define_insn avx_cvtpd2ps256 - [(set (match_operand:V4SF 0 register_operand =x) +(define_insn avx_cvtpd2ps256mask_name + [(set (match_operand:V4SF 0 register_operand =v) (float_truncate:V4SF - (match_operand:V4DF 1 nonimmediate_operand xm)))] - TARGET_AVX - vcvtpd2ps{y}\t{%1, %0|%0, %1} + (match_operand:V4DF 1 nonimmediate_operand vm)))] + TARGET_AVX mask_avx512vl_condition + vcvtpd2ps{y}\t{%1, %0mask_operand2|%0mask_operand2, %1} [(set_attr type ssecvt) - (set_attr prefix vex) + (set_attr prefix maybe_evex) (set_attr btver2_decode vector) (set_attr mode V4SF)]) @@ -4761,16 +4794,28 @@ TARGET_SSE2 operands[2] = CONST0_RTX (V2SFmode);) -(define_insn *sse2_cvtpd2ps - [(set (match_operand:V4SF 0 register_operand =x) +(define_expand sse2_cvtpd2ps_mask + [(set (match_operand:V4SF 0 register_operand) + (vec_merge:V4SF + (vec_concat:V4SF + (float_truncate:V2SF + (match_operand:V2DF 1 nonimmediate_operand)) + (match_dup 4)) + (match_operand:V4SF 2 register_operand) + (match_operand:QI 3 register_operand)))] + TARGET_SSE2 + operands[4] = CONST0_RTX (V2SFmode);) + +(define_insn *sse2_cvtpd2psmask_name + [(set (match_operand:V4SF 0 register_operand =v) (vec_concat:V4SF (float_truncate:V2SF - (match_operand:V2DF 1 nonimmediate_operand xm)) + (match_operand:V2DF 1 nonimmediate_operand vm)) (match_operand:V2SF 2 const0_operand)))] - TARGET_SSE2 + TARGET_SSE2 mask_avx512vl_condition { if (TARGET_AVX) -return
[PATCH] Fix asan optimization for aligned accesses. (PR sanitizer/63316)
On Tue, Sep 02, 2014 at 07:09:50PM +0400, Marat Zakirov wrote: Here's a simple optimization patch for Asan. It stores alignment information into ASAN_CHECK which is then extracted by sanopt to reduce number of and 0x7 instructions for sufficiently aligned accesses. I checked it on linux kernel by comparing results of objdump -d -j .text vmlinux | grep and.*0x7, for optimized and regular cases. It eliminates 12% of and 0x7's. No regressions. Sanitized GCC was successfully Asan-bootstrapped. No false positives were found in kernel. Unfortunately it broke PR63316. The problem is that you've just replaced base_addr 7 with base_addr in the (base_addr 7) + (real_size_in_bytes - 1) = shadow computation. 7 is of course not useless there, ~7 would be. For known sufficiently aligned base_addr, instead we know that (base_addr 7) is always 0 and thus can simplify the test to (real_size_in_bytes - 1) = shadow where (real_size_in_bytes - 1) is a constant. Fixed thusly, committed to trunk. BTW, I've noticed that perhaps using BIT_AND_EXPR for the (shadow != 0) ((base_addr 7) + (real_size_in_bytes - 1) = shadow) tests isn't best, maybe we could get better code if we expanded it as (shadow != 0) ((base_addr 7) + (real_size_in_bytes - 1) = shadow) (i.e. an extra basic block containing the second half of the test and fastpath for the shadow == 0 case if it is sufficiently common (probably it is)). Will try to code this up unless somebody beats me to that, but if somebody volunteered to benchmark such a change, it would be very much appreciated. 2014-09-24 Jakub Jelinek ja...@redhat.com PR sanitizer/63316 * asan.c (asan_expand_check_ifn): Fix up align = 8 optimization. * c-c++-common/asan/pr63316.c: New test. --- gcc/asan.c.jj 2014-09-24 08:26:49.0 +0200 +++ gcc/asan.c 2014-09-24 11:00:59.380298362 +0200 @@ -2585,19 +2585,26 @@ asan_expand_check_ifn (gimple_stmt_itera gimple shadow_test = build_assign (NE_EXPR, shadow, 0); gimple_seq seq = NULL; gimple_seq_add_stmt (seq, shadow_test); - /* Aligned (= 8 bytes) access do not need 7. */ + /* Aligned (= 8 bytes) can test just +(real_size_in_bytes - 1 = shadow), as base_addr 7 is known +to be 0. */ if (align 8) - gimple_seq_add_stmt (seq, build_assign (BIT_AND_EXPR, -base_addr, 7)); - gimple_seq_add_stmt (seq, build_type_cast (shadow_type, - gimple_seq_last (seq))); - if (real_size_in_bytes 1) - gimple_seq_add_stmt (seq, -build_assign (PLUS_EXPR, gimple_seq_last (seq), - real_size_in_bytes - 1)); - gimple_seq_add_stmt (seq, build_assign (GE_EXPR, + { + gimple_seq_add_stmt (seq, build_assign (BIT_AND_EXPR, + base_addr, 7)); + gimple_seq_add_stmt (seq, + build_type_cast (shadow_type, + gimple_seq_last (seq))); + if (real_size_in_bytes 1) + gimple_seq_add_stmt (seq, +build_assign (PLUS_EXPR, gimple_seq_last (seq), - shadow)); + real_size_in_bytes - 1)); + t = gimple_assign_lhs (gimple_seq_last_stmt (seq)); + } + else + t = build_int_cst (shadow_type, real_size_in_bytes - 1); + gimple_seq_add_stmt (seq, build_assign (GE_EXPR, t, shadow)); gimple_seq_add_stmt (seq, build_assign (BIT_AND_EXPR, shadow_test, gimple_seq_last (seq))); t = gimple_assign_lhs (gimple_seq_last (seq)); --- gcc/testsuite/c-c++-common/asan/pr63316.c.jj2014-09-24 10:57:21.879454411 +0200 +++ gcc/testsuite/c-c++-common/asan/pr63316.c 2014-09-24 11:04:16.773241665 +0200 @@ -0,0 +1,22 @@ +/* PR sanitizer/63316 */ +/* { dg-do run } */ +/* { dg-options -fsanitize=address -O2 } */ + +#ifdef __cplusplus +extern C { +#endif +extern void *malloc (__SIZE_TYPE__); +extern void free (void *); +#ifdef __cplusplus +} +#endif + +int +main () +{ + int *p = (int *) malloc (sizeof (int)); + *p = 3; + asm volatile ( : : r (p) : memory); + free (p); + return 0; +} Jakub
Re: libsanitizer merge from upstream r218156
On Tue, Sep 23, 2014 at 11:03:55AM -0700, Konstantin Serebryany wrote: OT, will you please look at the underaligned asan malloc etc.? GCC assumes that even malloc (1) or malloc (7) is sizeof (void *) aligned on Linux (and can and will assume 2 * sizeof (void *) alignment hopefully soon). What's wrong here? I am pretty confident that asan's malloc always returns 16-aligned pointers. Sorry, that was just my guess, I haven't really analyzed PR63316 before writing this. Analyzed it now and fixed. Jakub
Re: [PATCH] Do not remove labels with LABEL_PRESERVE_P
On Wed, Sep 24, 2014 at 8:41 AM, Ilya Enkovich wrote: 2014-09-23 20:06 GMT+04:00 Jeff Law: On 09/23/14 10:01, Steven Bosscher wrote: Are you sure this patch is necessary, and is not just papering over another problem? In the past, all cases I've seen where labels were removed inadvertently were caused by incorrect reference counting or missing REG_LABEL_* notes. Description of LABEL_PRESERVE_P says label that should always be considered to be needed. It's more specific than that, really: @item LABEL_PRESERVE_P (@var{x}) In a @code{code_label} or @code{note}, indicates that the label is referenced by code or data not visible to the RTL of a given function. The not visible part is important. If there are visible references to a label, then they should never be removed (obviously) and that should work through LABEL_NUSES. Unfortunately we are not very good at keeping LABEL_NUSES up-to-date (this is why all the rebuild_jump_labels() are still required). What appears to be the case here, is that you have a label between two basic blocks B1 and B2, and the label acts as a control flow barrier: B1 and B2 cannot be merged. Then this should be expressed in the CFG. Otherwise: What else prevents the merge_blocks CFG hooks from deleting the label? That means even if we do not have any usages we shouldn't remove it. Sorry, no. Even a LABEL_PRESERVE_P label can be deleted: It will be replaced by a NOTE_INSN_DELETED_LABEL. See cfgrtl.c:delete_insn(). If you really want to prevent a label from being deleted, then LABEL_PRESERVE_P is not a sufficient condition. Why can't we add some additional usages later? If you add the usages later, then you're lying to the compiler ;-) Did the label use count drop to zero? Is there a REG_LABEL_TARGET note for the label operand? In the current code of ix86_expand_prologue I don't see any notes generation for set_rip_rex64 instruction which actually uses label. But IMO this is another potential issue and we still shouldn't remove labels with LABEL_PRESERVE_P. Notes are generated in jump.c:rebuild_jump_labels. They are automatically added when a label is not Ciao! Steven
Re: [patch] Implement move semantics for iostreams
On 22/09/14 14:35 +0100, Jonathan Wakely wrote: This adds move and swap functions to the iostream classes. This fixes a silly typo. Tested x86_64-linux, committed to trunk. commit acaef9854dff5f37d86b80fc8236df5fd90b0ca5 Author: Jonathan Wakely jwak...@redhat.com Date: Wed Sep 24 10:10:28 2014 +0100 PR libstdc++/63353 * src/c++11/ios.cc (ios_base::_M_swap): Fix typo. diff --git a/libstdc++-v3/src/c++11/ios.cc b/libstdc++-v3/src/c++11/ios.cc index b5124ec..0e136d4 100644 --- a/libstdc++-v3/src/c++11/ios.cc +++ b/libstdc++-v3/src/c++11/ios.cc @@ -229,7 +229,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION std::swap(_M_local_word, __rhs._M_local_word); // array swap else { - if (!__lhs_local !__lhs_local) + if (!__lhs_local !__rhs_local) std::swap(_M_word, __rhs._M_word); else {
Re: [PATCH 1/14][AArch64] Temporarily remove aarch64_gimple_fold_builtin code for reduction operations
On 18 September 2014 12:45, Alan Lawrence alan.lawre...@arm.com wrote: The gimple folding ties the AArch64 backend to the tree representation of the midend via the neon intrinsics. This code enables constant folding of Neon intrinsics reduction ops, so improves performance, but is not necessary for correctness. By temporarily removing it (here), we can then change the midend representation independently of the AArch64 backend + intrinsics. However, I'm leaving the code in place, as a later patch will bring it all back in a very similar form (but enabled for bigendian). Bootstrapped on aarch64-none-linux; tested aarch64.exp on aarch64-none-elf and aarch64_be-none-elf. (The removed code was already disabled for bigendian; and this is solely a __builtin-folding mechanism, i.e. used only for Neon/ACLE intrinsics.) gcc/ChangeLog: * config/aarch64/aarch64.c (TARGET_GIMPLE_FOLD_BUILTIN): Comment out. * config/aarch64/aarch64-builtins.c (aarch64_gimple_fold_builtin): Remove using preprocessor directives. OK /Marcus
Re: [PATCH 4/14][AArch64] Use new reduc_plus_scal optabs, inc. for __builtins
On 18 September 2014 12:59, Alan Lawrence alan.lawre...@arm.com wrote: This migrates AArch64 over to the new optab for 'plus' reductions, i.e. so the define_expands produce scalars by generating a MOV to a GPR. Effectively, this moves the vget_lane inside every arm_neon.h intrinsic, into the inside of the define_expand. Tested: aarch64.exp vect.exp on aarch64-none-elf and aarch64_be-none-elf (full check-gcc on next patch for reduc_min/max) +(define_expand reduc_splus_mode + Can't we just drop the define_expands for the old optabs altogether? /Marcus
Re: [PATCH 5/14][AArch64] Use new reduc_[us](min|max)_scal optabs, inc. for builtins
On 18 September 2014 13:02, Alan Lawrence alan.lawre...@arm.com wrote: Similarly to the previous patch (r/2205), this migrates AArch64 to the new reduce-to-scalar optabs for min and max. For consistency we apply the same treatment to the smax_nan and smin_nan patterns (used for __builtins), even though reduc_smin_nan_scal (etc.) is not a standard name. Tested: check-gcc on aarch64-none-elf and aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64-simd-builtins.def (reduc_smax_, reduc_smin_, reduc_umax_, reduc_umin_, reduc_smax_nan_, reduc_smin_nan_): Remove. (reduc_smax_scal_, reduc_smin_scal_, reduc_umax_scal_, reduc_umin_scal_, reduc_smax_nan_scal_, reduc_smin_nan_scal_): New. * config/aarch64/aarch64-simd.md (reduc_maxmin_uns_mode): Rename VDQV_S variant to... (reduc_maxmin_uns_internalmode): ...this. (reduc_maxmin_uns_mode): New (VDQ_BHSI). (reduc_maxmin_uns_scal_mode): New (*2). (reduc_maxmin_uns_v2si): Combine with below, renaming... (reduc_maxmin_uns_mode): Combine V2F with above, renaming... (reduc_maxmin_uns_internal_mode): ...to this (VDQF). * config/aarch64/arm_neon.h (vmaxv_f32, vmaxv_s8, vmaxv_s16, vmaxv_s32, vmaxv_u8, vmaxv_u16, vmaxv_u32, vmaxvq_f32, vmaxvq_f64, vmaxvq_s8, vmaxvq_s16, vmaxvq_s32, vmaxvq_u8, vmaxvq_u16, vmaxvq_u32, vmaxnmv_f32, vmaxnmvq_f32, vmaxnmvq_f64, vminv_f32, vminv_s8, vminv_s16, vminv_s32, vminv_u8, vminv_u16, vminv_u32, vminvq_f32, vminvq_f64, vminvq_s8, vminvq_s16, vminvq_s32, vminvq_u8, vminvq_u16, vminvq_u32, vminnmv_f32, vminnmvq_f32, vminnmvq_f64): Update to use __builtin_aarch64_reduc_..._scal; remove vget_lane wrapper. If we don;t need the old optabs, I think would be better to drop those define_expands, otherwise OK. /Marcus
Re: [PATCH 6/14][AArch64] Restore gimple_folding of reduction intrinsics
On 18 September 2014 13:05, Alan Lawrence alan.lawre...@arm.com wrote: This gives us back the constant-folding of the neon-intrinsics that was removed in the first patch, but is now OK for bigendian too. bootstrapped on aarch64-none-linux-gnu. check-gcc on aarch64-none-elf and aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64.c (TARGET_GIMPLE_FOLD_BUILTIN): Define again. * config/aarch64/aarch64-builtins.c (aarch64_gimple_fold_builtin): Restore, enable for bigendian, update to use __builtin..._scal... OK /Marcus
Re: [PATCH] Do not remove labels with LABEL_PRESERVE_P
2014-09-24 13:30 GMT+04:00 Steven Bosscher stevenb@gmail.com: On Wed, Sep 24, 2014 at 8:41 AM, Ilya Enkovich wrote: 2014-09-23 20:06 GMT+04:00 Jeff Law: On 09/23/14 10:01, Steven Bosscher wrote: Are you sure this patch is necessary, and is not just papering over another problem? In the past, all cases I've seen where labels were removed inadvertently were caused by incorrect reference counting or missing REG_LABEL_* notes. Description of LABEL_PRESERVE_P says label that should always be considered to be needed. It's more specific than that, really: @item LABEL_PRESERVE_P (@var{x}) In a @code{code_label} or @code{note}, indicates that the label is referenced by code or data not visible to the RTL of a given function. I read another description: /* 1 if RTX is a code_label that should always be considered to be needed. */ #define LABEL_PRESERVE_P(RTX) \ (RTL_FLAG_CHECK2 (LABEL_PRESERVE_P, (RTX), CODE_LABEL, NOTE)-in_struct) The not visible part is important. If there are visible references to a label, then they should never be removed (obviously) and that should work through LABEL_NUSES. Unfortunately we are not very good at keeping LABEL_NUSES up-to-date (this is why all the rebuild_jump_labels() are still required). Does rebuild handle all kinds of instructions including those which use UNSPEC? What appears to be the case here, is that you have a label between two basic blocks B1 and B2, and the label acts as a control flow barrier: B1 and B2 cannot be merged. Then this should be expressed in the CFG. Otherwise: What else prevents the merge_blocks CFG hooks from deleting the label? Label acts as a barrier here but it is a side effect. I don't care about block merging. I just don't want label with usages to be removed. That means even if we do not have any usages we shouldn't remove it. Sorry, no. Even a LABEL_PRESERVE_P label can be deleted: It will be replaced by a NOTE_INSN_DELETED_LABEL. See cfgrtl.c:delete_insn(). According to description you quoted label marked by LABEL_PRESERVE_P is used by some code or data. Let this use be not visible to the RTL of a given function. It is still used, right? How can you remove it? Ilya If you really want to prevent a label from being deleted, then LABEL_PRESERVE_P is not a sufficient condition. Why can't we add some additional usages later? If you add the usages later, then you're lying to the compiler ;-) Did the label use count drop to zero? Is there a REG_LABEL_TARGET note for the label operand? In the current code of ix86_expand_prologue I don't see any notes generation for set_rip_rex64 instruction which actually uses label. But IMO this is another potential issue and we still shouldn't remove labels with LABEL_PRESERVE_P. Notes are generated in jump.c:rebuild_jump_labels. They are automatically added when a label is not Ciao! Steven
Re: [patch] Implement move semantics for iostreams
On Wed, Sep 24, 2014 at 10:40:09AM +0100, Jonathan Wakely wrote: On 22/09/14 14:35 +0100, Jonathan Wakely wrote: This adds move and swap functions to the iostream classes. This fixes a silly typo. Tested x86_64-linux, committed to trunk. commit acaef9854dff5f37d86b80fc8236df5fd90b0ca5 Author: Jonathan Wakely jwak...@redhat.com Date: Wed Sep 24 10:10:28 2014 +0100 PR libstdc++/63353 * src/c++11/ios.cc (ios_base::_M_swap): Fix typo. diff --git a/libstdc++-v3/src/c++11/ios.cc b/libstdc++-v3/src/c++11/ios.cc index b5124ec..0e136d4 100644 --- a/libstdc++-v3/src/c++11/ios.cc +++ b/libstdc++-v3/src/c++11/ios.cc @@ -229,7 +229,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION std::swap(_M_local_word, __rhs._M_local_word); // array swap else { - if (!__lhs_local !__lhs_local) + if (!__lhs_local !__rhs_local) std::swap(_M_word, __rhs._M_word); else { Wouldn't this be something for a (non-Wall?) warning? I mean if or || contains the same conditions, perhaps we should warn. Jakub
Re: Fix libgomp crash without TLS (PR42616)
*Ping* 2014-09-19 15:41 GMT+04:00 Varvara Rainchik varvara.s.rainc...@gmail.com: I've corrected my patch accordingly to what you said. To diffirentiate second case in destructor I've added pthread_setspecific (gomp_tls_key, NULL) at the end of gomp_thread_start. So, destructor can simply skip the case when pthread_getspecific (gomp_tls_key) returns 0. I also think that it's better to set 0 in gomp_thread_start explicitly as thread data is initialized by a local variable in this function. But, I see that pthread_getspecific always returns 0 in destrucor because data pointer is implicitly set to 0 before destructor call in glibc: (pthread_create.c): /* Always clear the data. */ level2[inner].data = NULL; /* Make sure the data corresponds to a valid key. This test fails if the key was deallocated and also if it was re-allocated. It is the user's responsibility to free the memory in this case. */ if (level2[inner].seq == __pthread_keys[idx].seq /* It is not necessary to register a destructor function. */ __pthread_keys[idx].destr != NULL) /* Call the user-provided destructor. */ __pthread_keys[idx].destr (data); I suppose it's not necessary if everything is cleaned up in gomp_thread_start and destructor. What do you think? Changes are bootstrapped and regtested on x86_64-linux. 2014-09-19 Varvara Rainchik varvara.rainc...@intel.com * libgomp.h (gomp_thread): For non TLS case create thread data. * team.c (non_tls_thread_data_destructor, create_non_tls_thread_data): New functions. --- diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index bcd5b34..2f33d99 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -467,9 +467,15 @@ static inline struct gomp_thread *gomp_thread (void) } #else extern pthread_key_t gomp_tls_key; -static inline struct gomp_thread *gomp_thread (void) +extern struct gomp_thread *create_non_tls_thread_data (void); +static struct gomp_thread *gomp_thread (void) { - return pthread_getspecific (gomp_tls_key); + struct gomp_thread *thr = pthread_getspecific (gomp_tls_key); + if (thr == NULL) + { +thr = create_non_tls_thread_data (); + } + return thr; } #endif diff --git a/libgomp/team.c b/libgomp/team.c index e6a6d8f..a692df8 100644 --- a/libgomp/team.c +++ b/libgomp/team.c @@ -41,6 +41,7 @@ pthread_key_t gomp_thread_destructor; __thread struct gomp_thread gomp_tls_data; #else pthread_key_t gomp_tls_key; +struct gomp_thread initial_thread_tls_data; #endif @@ -130,6 +131,7 @@ gomp_thread_start (void *xdata) gomp_sem_destroy (thr-release); thr-thread_pool = NULL; thr-task = NULL; + pthread_setspecific (gomp_tls_key, NULL); return NULL; } @@ -222,8 +224,16 @@ gomp_free_pool_helper (void *thread_pool) void gomp_free_thread (void *arg __attribute__((unused))) { - struct gomp_thread *thr = gomp_thread (); - struct gomp_thread_pool *pool = thr-thread_pool; + struct gomp_thread *thr; + struct gomp_thread_pool *pool; +#ifdef HAVE_TLS + thr = gomp_thread (); +#else + thr = pthread_getspecific (gomp_tls_key); + if (thr == NULL) +return; +#endif + pool = thr-thread_pool; if (pool) { if (pool-threads_used 0) @@ -910,6 +920,21 @@ gomp_team_end (void) } } +/* Destructor for data created in create_non_tls_thread_data. */ + +#ifndef HAVE_TLS +void +non_tls_thread_data_destructor (void *arg __attribute__((unused))) +{ + struct gomp_thread *thr = pthread_getspecific (gomp_tls_key); + if (thr != NULL thr != initial_thread_tls_data) + { +gomp_free_thread (arg); +free (thr); +pthread_setspecific (gomp_tls_key, NULL); + } +} +#endif /* Constructors for this file. */ @@ -917,9 +942,7 @@ static void __attribute__((constructor)) initialize_team (void) { #ifndef HAVE_TLS - static struct gomp_thread initial_thread_tls_data; - - pthread_key_create (gomp_tls_key, NULL); + pthread_key_create (gomp_tls_key, non_tls_thread_data_destructor); pthread_setspecific (gomp_tls_key, initial_thread_tls_data); #endif @@ -927,6 +950,19 @@ initialize_team (void) gomp_fatal (could not create thread pool destructor.); } +/* Create data for thread created by pthread_create. */ + +#ifndef HAVE_TLS +struct gomp_thread *create_non_tls_thread_data (void) +{ + struct gomp_thread *thr = gomp_malloc_cleared (sizeof (struct gomp_thread)); + pthread_setspecific (gomp_tls_key, thr); + gomp_sem_init (thr-release, 0); + + return thr; +} +#endif + static void __attribute__((destructor)) team_destructor (void) { 2014-09-02 14:36 GMT+04:00 Varvara Rainchik varvara.s.rainc...@gmail.com: May I use gomp_free_thread as a destructor for pthread_key_create? Then I'll make initial_thread_tls_data global for the first case, but how can I differentiate thread created by gomp_thread_start (second case)? 2014-09-01 14:51 GMT+04:00 Jakub Jelinek
[PATCH][match-and-simplify][2/2] Delay for lowering
This delays for lowering by recording fors to apply in simplify similar to how we record ifs. Bootstrapped on x86_64-unknown-linux-gnu. Richard. 2014-09-24 Richard Biener rguent...@suse.de * genmatch.c (id_base): Derive from typed_noop_remove. (struct user_id): New id_base derivative. (struct simplify): Add vector of fors. (lower_commutative): Adjust. (lower_opt_convert): Likewise. (replace_id): Work with user_id / id_base pairs. (lower_for): New function, split out from ... (parse_for): ... here. Maintain a stack of active fors, record substitutes in user_id. (everywhere): Adjust for simplify constructor change and maintaining of the stack of active fors. * match-bitwise.pd: Enable truth_valued_p for comparison codes using for. Index: gcc/genmatch.c === --- gcc/genmatch.c (revision 215546) +++ gcc/genmatch.c (working copy) @@ -153,7 +153,7 @@ END_BUILTINS /* Hashtable of known pattern operators. This is pre-seeded from all known tree codes and all known builtin function ids. */ -struct id_base : typed_free_removeid_base +struct id_base : typed_noop_removeid_base { enum id_kind { CODE, FN, PREDICATE, USER_DEFINED } kind; @@ -221,6 +221,14 @@ struct predicate_id : public id_base int nargs; }; +struct user_id : public id_base +{ + user_id (const char *id_) +: id_base (id_base::USER_DEFINED, id_), substitutes (vNULL), nargs(-1) {} + vecid_base * substitutes; + int nargs; +}; + template template inline bool @@ -439,16 +447,17 @@ struct if_or_with { struct simplify { simplify (operand *match_, source_location match_location_, - struct operand *result_, source_location result_location_, vecif_or_with ifexpr_vec_ = vNULL) + struct operand *result_, source_location result_location_, vecif_or_with ifexpr_vec_, vecvecuser_id * for_vec_) : match (match_), match_location (match_location_), result (result_), result_location (result_location_), - ifexpr_vec (ifexpr_vec_) {} + ifexpr_vec (ifexpr_vec_), for_vec (for_vec_) {} operand *match; source_location match_location; struct operand *result; source_location result_location; vecif_or_with ifexpr_vec; + vecvecuser_id * for_vec; }; struct dt_node @@ -686,7 +695,8 @@ lower_commutative (simplify *s, vecsimp for (unsigned i = 0; i matchers.length (); ++i) { simplify *ns = new simplify (matchers[i], s-match_location, - s-result, s-result_location, s-ifexpr_vec); + s-result, s-result_location, s-ifexpr_vec, + s-for_vec); simplifiers.safe_push (ns); } } @@ -814,7 +824,8 @@ lower_opt_convert (simplify *s, vecsimp for (unsigned i = 0; i matchers.length (); ++i) { simplify *ns = new simplify (matchers[i], s-match_location, - s-result, s-result_location, s-ifexpr_vec); + s-result, s-result_location, s-ifexpr_vec, + s-for_vec); simplifiers.safe_push (ns); } } @@ -837,48 +848,105 @@ check_operator (id_base *op, unsigned n_ else fatal (%s expects %u operands, got %u operands, opr-id, opr-get_required_nargs (), n_ops); } - + +/* In AST operand O replace operator ID with operator WITH. */ + operand * -replace_id (operand *o, const char *user_id, const char *oper) +replace_id (operand *o, user_id *id, id_base *with) { - if (o-type == operand::OP_CAPTURE) + if (capture *c = dyn_castcapture * (o)) { - capture *c = static_castcapture * (o); if (!c-what) return c; - capture *nc = new capture (c-where, replace_id (c-what, user_id, oper)); - return nc; + return new capture (c-where, replace_id (c-what, id, with)); } + /* For c_expr we simply record a string replacement table which is + applied at code-generation time. */ if (c_expr *ce = dyn_castc_expr * (o)) { - id_base *idb = get_operator (oper); vecc_expr::id_tab ids = ce-ids.copy (); - ids.safe_push (c_expr::id_tab (user_id, idb-id)); + ids.safe_push (c_expr::id_tab (id-id, with-id)); return new c_expr (ce-r, ce-code, ce-nr_stmts, ids); } - if (o-type != operand::OP_EXPR) + expr *e = dyn_castexpr * (o); + if (!e) return o; - expr *e = static_castexpr * (o); expr *ne; - - if (e-operation-kind == id_base::USER_DEFINED - strcmp (e-operation-id, user_id) == 0) + if (e-operation == id) { - ne = new expr (get_operator (oper), e-is_commutative); + ne = new expr (with, e-is_commutative); check_operator (ne-operation, e-ops.length ()); } else ne = new expr (e-operation, e-is_commutative); for (unsigned i = 0; i e-ops.length
Re: [patch] Implement move semantics for iostreams
On Wed, Sep 24, 2014 at 12:01:13PM +0200, Jakub Jelinek wrote: - if (!__lhs_local !__lhs_local) + if (!__lhs_local !__rhs_local) std::swap(_M_word, __rhs._M_word); else { Wouldn't this be something for a (non-Wall?) warning? I mean if or || contains the same conditions, perhaps we should warn. Yeah, I think it'd make sense to warn. I don't think we have an option for this (-Wlogical-op does something little bit different). Hence: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63357 Marek
Re: [PATCH] Do not remove labels with LABEL_PRESERVE_P
On Wed, Sep 24, 2014 at 11:57 AM, Ilya Enkovich wrote: 2014-09-24 13:30 GMT+04:00 Steven Bosscher : Description of LABEL_PRESERVE_P says label that should always be considered to be needed. It's more specific than that, really: @item LABEL_PRESERVE_P (@var{x}) In a @code{code_label} or @code{note}, indicates that the label is referenced by code or data not visible to the RTL of a given function. I read another description: /* 1 if RTX is a code_label that should always be considered to be needed. */ #define LABEL_PRESERVE_P(RTX) \ (RTL_FLAG_CHECK2 (LABEL_PRESERVE_P, (RTX), CODE_LABEL, NOTE)-in_struct) Yes, from rtl.h. I'd recommend to always read the descriptions in doc/ (in this case doc/rtl.texi). The documentation in the header files is often not very comprehensive. The not visible part is important. If there are visible references to a label, then they should never be removed (obviously) and that should work through LABEL_NUSES. Unfortunately we are not very good at keeping LABEL_NUSES up-to-date (this is why all the rebuild_jump_labels() are still required). Does rebuild handle all kinds of instructions including those which use UNSPEC? Yes. Patterns are walked (deep) and REG_LABEL notes are added for all labels encountered that are not already the JUMP_LABEL of INSN. If the label is reachable from XEXP(UNSPEC, 0) -- the 'E' operand -- then that label is visible. What appears to be the case here, is that you have a label between two basic blocks B1 and B2, and the label acts as a control flow barrier: B1 and B2 cannot be merged. Then this should be expressed in the CFG. Otherwise: What else prevents the merge_blocks CFG hooks from deleting the label? Label acts as a barrier here but it is a side effect. I don't care about block merging. I just don't want label with usages to be removed. Understood. Only, LABEL_PRESERVE_P is not the right means to achieve that. So let's get back to basics and see what the usages look like. AFAIU now, you emit the code label early, and add the references much later (in machine reorg?). Does your UNSPEC have the code_label as an operand? If so, what breaks if cfgcleanup removes the label? Is the insn no longer recognized? Or does the label not end up in the assembly output? Or ...? I can try to help figure out what breaks if you have a test case. FWIW, the LABEL_PRESERVE_P uses in config/i386/i386.c look suspect. It probably only works because those labels are added late, and the code paths that use (x86_64 large PIC code model) are not tested all that well... That means even if we do not have any usages we shouldn't remove it. Sorry, no. Even a LABEL_PRESERVE_P label can be deleted: It will be replaced by a NOTE_INSN_DELETED_LABEL. See cfgrtl.c:delete_insn(). According to description you quoted label marked by LABEL_PRESERVE_P is used by some code or data. Let this use be not visible to the RTL of a given function. It is still used, right? How can you remove it? The code_label rtx is removed, but the label itself is still output to the object file. The label number is retained in the CODE_LABEL_NUMBER of the NOTE_INSN_DELETED_LABEL. Look for how NOTE_INSN_DELETED_LABEL is handled in final.c. It's a hack IMHO, but that's how it has been since day 0 (see https://gcc.gnu.org/r104). Ciao! Steven
[PATCH] Provide global var location info for asan
On Tue, Sep 23, 2014 at 11:03:55AM -0700, Konstantin Serebryany wrote: (asan_add_global): Ditto. I'll handle creation of location aggregates as follow-up. Here it is, only lightly tested so far: int a = 1; int b = 2; int c = 3; int * foo (int x) { return x ? b : c; } int main () { char *p = (char *) foo (1); int x = p[sizeof (int)]; asm ( : : r (x)); return 0; } used to print: 0x00601104 is located 60 bytes to the left of global variable 'a' defined in 'aa.c' (0x601140) of size 4 0x00601104 is located 0 bytes to the right of global variable 'b' defined in 'aa.c' (0x601100) of size 4 but now does: 0x00601104 is located 60 bytes to the left of global variable 'a' defined in 'aa.c:1:5' (0x601140) of size 4 0x00601104 is located 0 bytes to the right of global variable 'b' defined in 'aa.c:2:5' (0x601100) of size 4 I think this test is too fragile for the testsuite though, the order of the vars in the data section can be arbitrary etc. make -j16 -k check-gcc check-g++ check-gfortran RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} asan.exp' passed. For Marek: the patch also uses just one __ubsan_source_location RECORD_TYPE everywhere, I've been really surprised we created a new type node each time we needed it. Ok for trunk? 2014-09-24 Jakub Jelinek ja...@redhat.com * ubsan.h (ubsan_get_source_location): New prototype. * ubsan.c (ubsan_source_location_type): New variable. Function renamed to ... (ubsan_get_source_location_type): ... this. Cache return value in ubsan_source_location_type variable. (ubsan_source_location, ubsan_create_data): Use ubsan_get_source_location_type instead of ubsan_source_location_type. * asan.c (asan_protect_global): Don't protect globals with ubsan_get_source_location_type () type. (asan_add_global): Provide global decl location info if possible. --- gcc/ubsan.h.jj 2014-09-24 08:26:49.635418299 +0200 +++ gcc/ubsan.h 2014-09-24 11:35:05.231330166 +0200 @@ -47,6 +47,6 @@ extern tree ubsan_encode_value (tree, bo extern bool is_ubsan_builtin_p (tree); extern tree ubsan_build_overflow_builtin (tree_code, location_t, tree, tree, tree); extern tree ubsan_instrument_float_cast (location_t, tree, tree); +extern tree ubsan_get_source_location_type (void); #endif /* GCC_UBSAN_H */ - --- gcc/ubsan.c.jj 2014-09-24 08:26:49.639418278 +0200 +++ gcc/ubsan.c 2014-09-24 11:35:56.662054997 +0200 @@ -197,6 +197,9 @@ ubsan_type_descriptor_type (void) return ret; } +/* Cached ubsan_get_source_location_type () return value. */ +static GTY(()) tree ubsan_source_location_type; + /* Build struct __ubsan_source_location { @@ -206,12 +209,15 @@ ubsan_type_descriptor_type (void) } type. */ -static tree -ubsan_source_location_type (void) +tree +ubsan_get_source_location_type (void) { static const char *field_names[3] = { __filename, __line, __column }; tree fields[3], ret; + if (ubsan_source_location_type) +return ubsan_source_location_type; + tree const_char_type = build_qualified_type (char_type_node, TYPE_QUAL_CONST); @@ -229,6 +235,7 @@ ubsan_source_location_type (void) TYPE_FIELDS (ret) = fields[0]; TYPE_NAME (ret) = get_identifier (__ubsan_source_location); layout_type (ret); + ubsan_source_location_type = ret; return ret; } @@ -239,7 +246,7 @@ static tree ubsan_source_location (location_t loc) { expanded_location xloc; - tree type = ubsan_source_location_type (); + tree type = ubsan_get_source_location_type (); xloc = expand_location (loc); tree str; @@ -484,7 +491,7 @@ ubsan_create_data (const char *name, int { gcc_checking_assert (i 2); fields[i] = build_decl (UNKNOWN_LOCATION, FIELD_DECL, NULL_TREE, - ubsan_source_location_type ()); + ubsan_get_source_location_type ()); DECL_CONTEXT (fields[i]) = ret; if (i) DECL_CHAIN (fields[i - 1]) = fields[i]; --- gcc/asan.c.jj 2014-09-24 11:13:43.548211574 +0200 +++ gcc/asan.c 2014-09-24 12:06:13.122500445 +0200 @@ -1316,7 +1316,8 @@ asan_protect_global (tree decl) || DECL_SIZE (decl) == 0 || ASAN_RED_ZONE_SIZE * BITS_PER_UNIT MAX_OFILE_ALIGNMENT || !valid_constant_size_p (DECL_SIZE_UNIT (decl)) - || DECL_ALIGN_UNIT (decl) 2 * ASAN_RED_ZONE_SIZE) + || DECL_ALIGN_UNIT (decl) 2 * ASAN_RED_ZONE_SIZE + || TREE_TYPE (decl) == ubsan_get_source_location_type ()) return false; rtl = DECL_RTL (decl); @@ -2224,8 +2225,38 @@ asan_add_global (tree decl, tree type, v int has_dynamic_init = vnode ? vnode-dynamically_initialized : 0; CONSTRUCTOR_APPEND_ELT (vinner, NULL_TREE, build_int_cst (uptr, has_dynamic_init)); - CONSTRUCTOR_APPEND_ELT (vinner, NULL_TREE, - build_int_cst
[PATCH][match-and-simplify] Cleanup operator arity related diagnostics
The following removes a bunch of code dealing with late verifying of operator presence and matching arity. This can now all be verified at parsing, giving proper locations and operator names. Bootstrap pending. Richard. 2014-09-24 Richard Biener rguent...@suse.de * genmatch.c (struct id_base): Move nargs member here. (check_operator): Remove. (check_no_user_id): Likewise. (parse_operation): Fix error locations, handle convert0/2 properly. (parse_expr): Error on non-matching arity. (parse_for): Compute arity of user-ids and complain for inconsistent substitutions. Index: gcc/genmatch.c === --- gcc/genmatch.c (revision 215550) +++ gcc/genmatch.c (working copy) @@ -157,9 +157,10 @@ struct id_base : typed_noop_removeid_ba { enum id_kind { CODE, FN, PREDICATE, USER_DEFINED } kind; - id_base (id_kind, const char *); + id_base (id_kind, const char *, int = -1); hashval_t hashval; + int nargs; const char *id; /* hash_table support. */ @@ -185,10 +186,11 @@ id_base::equal (const value_type *op1, static hash_tableid_base *operators; -id_base::id_base (id_kind kind_, const char *id_) +id_base::id_base (id_kind kind_, const char *id_, int nargs_) { kind = kind_; id = id_; + nargs = nargs_; hashval = htab_hash_string (id); } @@ -196,11 +198,8 @@ struct operator_id : public id_base { operator_id (enum tree_code code_, const char *id_, unsigned nargs_, const char *tcc_) - : id_base (id_base::CODE, id_), - code (code_), nargs (nargs_), tcc (tcc_) {} - unsigned get_required_nargs () const { return nargs; } + : id_base (id_base::CODE, id_, nargs_), code (code_), tcc (tcc_) {} enum tree_code code; - unsigned nargs; const char *tcc; }; @@ -216,17 +215,15 @@ struct simplify; struct predicate_id : public id_base { predicate_id (const char *id_) -: id_base (id_base::PREDICATE, id_), matchers (vNULL), nargs(-1) {} +: id_base (id_base::PREDICATE, id_), matchers (vNULL) {} vecsimplify * matchers; - int nargs; }; struct user_id : public id_base { user_id (const char *id_) -: id_base (id_base::USER_DEFINED, id_), substitutes (vNULL), nargs(-1) {} +: id_base (id_base::USER_DEFINED, id_), substitutes (vNULL) {} vecid_base * substitutes; - int nargs; }; template @@ -830,36 +827,27 @@ lower_opt_convert (simplify *s, vecsimp } } -void -check_operator (id_base *op, unsigned n_ops, const cpp_token *token = 0) -{ - if (!op) -return; - - if (op-kind != id_base::CODE) -return; - - operator_id *opr = static_castoperator_id * (op); - if (opr-get_required_nargs () == n_ops) -return; - - if (token) -fatal_at (token, %s expects %u operands, got %u operands, opr-id, opr-get_required_nargs (), n_ops); - else -fatal (%s expects %u operands, got %u operands, opr-id, opr-get_required_nargs (), n_ops); -} - /* In AST operand O replace operator ID with operator WITH. */ operand * replace_id (operand *o, user_id *id, id_base *with) { + /* Deep-copy captures and expressions, replacing operations as + needed. */ if (capture *c = dyn_castcapture * (o)) { if (!c-what) return c; return new capture (c-where, replace_id (c-what, id, with)); } + else if (expr *e = dyn_castexpr * (o)) +{ + expr *ne = new expr (e-operation == id ? with : e-operation, + e-is_commutative); + for (unsigned i = 0; i e-ops.length (); ++i) + ne-append_op (replace_id (e-ops[i], id, with)); + return ne; +} /* For c_expr we simply record a string replacement table which is applied at code-generation time. */ @@ -870,23 +858,7 @@ replace_id (operand *o, user_id *id, id_ return new c_expr (ce-r, ce-code, ce-nr_stmts, ids); } - expr *e = dyn_castexpr * (o); - if (!e) -return o; - - expr *ne; - if (e-operation == id) -{ - ne = new expr (with, e-is_commutative); - check_operator (ne-operation, e-ops.length ()); -} - else -ne = new expr (e-operation, e-is_commutative); - - for (unsigned i = 0; i e-ops.length (); ++i) -ne-append_op (replace_id (e-ops[i], id, with)); - - return ne; + return o; } /* Lower recorded fors for SIN and output to SIMPLIFIERS. */ @@ -947,40 +919,6 @@ lower_for (simplify *sin, vecsimplify * simplifiers.safe_push (worklist[i]); } -void -check_no_user_id (operand *o) -{ - if (o-type == operand::OP_CAPTURE) -{ - capture *c = static_castcapture * (o); - if (c-what c-what-type == operand::OP_EXPR) - { - o = c-what; - goto check_expr; - } - return; -} - - if (o-type != operand::OP_EXPR) -return; - -check_expr: - expr *e = static_castexpr * (o); - if (e-operation-kind == id_base::USER_DEFINED) -fatal (%s is not defined in for,
[PATCH][match-and-simplify] Remove outlining of C exprs
It no longer works in the face of (with { } which would need to pass down all named temporaries. Instead we can simply inline all C exprs now that we pass 'output' to all gen_transform calls. Boostrap pending on x86_64-unknown-linux-gnu. Richard. 2014-09-24 Richard Biener rguent...@suse.de * genmatch.c (c_expr::output_code): Remove and inline into ... (c_expr::gen_transform): ... here. (outline_c_exprs): Remove. (main): Do not call outline_c_exprs. Index: gcc/genmatch.c === --- gcc/genmatch.c (revision 215551) +++ gcc/genmatch.c (working copy) @@ -355,7 +355,6 @@ struct c_expr : public operand vecid_tab ids; virtual void gen_transform (FILE *f, const char *, bool, int, const char *, dt_operand **); - void output_code (FILE *f, bool); }; struct capture : public operand @@ -1051,9 +1050,17 @@ expr::gen_transform (FILE *f, const char fprintf (f, }\n); } +/* Generate code for a c_expr which is either the expression inside + an if statement or a sequence of statements which computes a + result to be stored to DEST. */ + void -c_expr::output_code (FILE *f, bool for_fn) +c_expr::gen_transform (FILE *f, const char *dest, + bool, int, const char *, dt_operand **) { + if (dest nr_stmts == 1) +fprintf (f, %s = , dest); + unsigned stmt_nr = 1; for (unsigned i = 0; i code.length (); ++i) { @@ -1100,35 +1107,14 @@ c_expr::output_code (FILE *f, bool for_f if (token-type == CPP_SEMICOLON) { stmt_nr++; - if (for_fn stmt_nr == nr_stmts) - fputs (\n return , f); + if (dest stmt_nr == nr_stmts) + fprintf (f, \n %s = , dest); else fputc ('\n', f); } } } - -void -c_expr::gen_transform (FILE *f, const char *dest, bool, int, const char *, dt_operand **) -{ - /* If this expression has an outlined function variant, call it. */ - if (fname) -{ - fprintf (f, %s = %s (type, captures);\n, dest, fname); - return; -} - - /* All multi-stmt expressions should have been outlined. Expressions - with nr_stmts == 0 are used for if-expressions. */ - gcc_assert (nr_stmts = 1); - - if (nr_stmts == 1) -fprintf (f, %s = , dest); - - output_code (f, false); -} - void capture::gen_transform (FILE *f, const char *dest, bool gimple, int depth, const char *in_type, dt_operand **indexes) { @@ -2172,40 +2158,6 @@ write_predicate (FILE *f, predicate_id * static void -outline_c_exprs (FILE *f, struct operand *op) -{ - if (op-type == operand::OP_C_EXPR) -{ - c_expr *e = static_cast c_expr *(op); - static unsigned fnnr = 1; - if (e-nr_stmts 1 - !e-fname) - { - e-fname = (char *)xmalloc (sizeof (cexprfn) + 4); - sprintf (e-fname, cexprfn%d, fnnr); - fprintf (f, \nstatic tree\ncexprfn%d (tree type, tree *captures)\n, - fnnr); - fprintf (f, {\n); - e-output_code (f, true); - fprintf (f, }\n); - fnnr++; - } -} - else if (op-type == operand::OP_CAPTURE) -{ - capture *c = static_cast capture *(op); - if (c-what) - outline_c_exprs (f, c-what); -} - else if (op-type == operand::OP_EXPR) -{ - expr *e = static_cast expr *(op); - for (unsigned i = 0; i e-ops.length (); ++i) - outline_c_exprs (f, e-ops[i]); -} -} - -static void write_header (FILE *f, const char *head) { fprintf (f, /* Generated automatically by the program `genmatch' from\n); @@ -3001,10 +2953,6 @@ add_operator (CONVERT2, CONVERT2, tcc if (verbose) dt.print (stderr); - /* Outline complex C expressions to helper functions. */ - for (unsigned i = 0; i out_simplifiers.length (); ++i) -outline_c_exprs (stdout, out_simplifiers[i]-result); - if (gimple) dt.gen_gimple (stdout); else
RE: RFA: another patch to fix PR61360
The r-x alternative results in vector decoding on amdfam10. This is AMD-speak for microcoded instructions, and AMD optimization manual strongly recommends avoiding them. I have CC'd Ganesh, maybe he can provide more relevant data on the performance impact. Thanks Uros! Yes, the AMD SWOG recommends precisely what Uros mentions. snip from SWOG for BD When moving data from a GPR to an XMM register, use separate store and load instructions to move the data first from the source register to a temporary location in memory and then from memory into the destination register /snip This is listed as an optimization too. This holds good for all amdfam10 and BD family processors. I have to dig through the performance numbers will try to get them. Regards Ganesh
Re: [PATCH IRA] update_equiv_regs fails to set EQUIV reg-note for pseudo with more than one definition
Hi Jeff, Thanks for the comments. I updated the patch adding some enhancements. Bootstrapped on x86_64-suse-linux. Please apply this patch if OK for trunk. Three points: 1. For multiple-set register, it is not qualified to have a equiv note once it is marked by no_equiv. The patch is updated with this consideration. 2. For the rtx_insn_list new interface, I noticed that the old style XEXP accessor macros is still used in function no_equiv. And I choose to the old style macros with this patch and should come up with another patch to fix this issue, OK? 3. For the conditions that an insn on the init_insns list which did not have a note, I reconsider this and find that this can never happens. So I replaced the check with a gcc assertion. Index: gcc/ChangeLog === --- gcc/ChangeLog(revision 215550) +++ gcc/ChangeLog(working copy) @@ -1,3 +1,11 @@ +2014-09-24 Felix Yang felix.y...@huawei.com + +* ira.c (struct equivalence): Add no_equiv member. +(no_equiv): Set no_equiv of struct equivalence if register is marked +as having no known equivalence. +(update_equiv_regs): Check all definitions for a multiple-set +register to make sure that the RHS have the same value. + 2014-09-24 Jakub Jelinek ja...@redhat.com PR sanitizer/63316 Index: gcc/ira.c === --- gcc/ira.c(revision 215550) +++ gcc/ira.c(working copy) @@ -2900,6 +2900,8 @@ struct equivalence /* Set when an attempt should be made to replace a register with the associated src_p entry. */ char replace; + /* Set if this register has no known equivalence. */ + char no_equiv; }; /* reg_equiv[N] (where N is a pseudo reg number) is the equivalence @@ -3247,6 +3249,7 @@ no_equiv (rtx reg, const_rtx store ATTRIBUTE_UNUSE if (!REG_P (reg)) return; regno = REGNO (reg); + reg_equiv[regno].no_equiv = 1; list = reg_equiv[regno].init_insns; if (list == const0_rtx) return; @@ -3258,7 +3261,7 @@ no_equiv (rtx reg, const_rtx store ATTRIBUTE_UNUSE return; ira_reg_equiv[regno].defined_p = false; ira_reg_equiv[regno].init_insns = NULL; - for (; list; list = XEXP (list, 1)) + for (; list; list = XEXP (list, 1)) { rtx insn = XEXP (list, 0); remove_note (insn, find_reg_note (insn, REG_EQUIV, NULL_RTX)); @@ -3373,7 +3376,7 @@ update_equiv_regs (void) /* If this insn contains more (or less) than a single SET, only mark all destinations as having no known equivalence. */ - if (set == 0) + if (set == NULL_RTX) { note_stores (PATTERN (insn), no_equiv, NULL); continue; @@ -3467,16 +3470,48 @@ update_equiv_regs (void) if (note GET_CODE (XEXP (note, 0)) == EXPR_LIST) note = NULL_RTX; - if (DF_REG_DEF_COUNT (regno) != 1 - (! note + if (DF_REG_DEF_COUNT (regno) != 1) +{ + rtx list; + bool equal_p = true; + + /* Check if it is possible that this multiple-set register has + a known equivalence. */ + if (reg_equiv[regno].no_equiv) +continue; + + if (! note || rtx_varies_p (XEXP (note, 0), 0) || (reg_equiv[regno].replacement ! rtx_equal_p (XEXP (note, 0), -reg_equiv[regno].replacement -{ - no_equiv (dest, set, NULL); - continue; +reg_equiv[regno].replacement))) +{ + no_equiv (dest, set, NULL); + continue; +} + + list = reg_equiv[regno].init_insns; + for (; list; list = XEXP (list, 1)) +{ + rtx note_tmp, insn_tmp; + + insn_tmp = XEXP (list, 0); + note_tmp = find_reg_note (insn_tmp, REG_EQUAL, NULL_RTX); + gcc_assert (note_tmp); + if (! rtx_equal_p (XEXP (note, 0), XEXP (note_tmp, 0))) +{ + equal_p = false; + break; +} +} + + if (! equal_p) +{ + no_equiv (dest, set, NULL); + continue; +} } + /* Record this insn as initializing this register. */ reg_equiv[regno].init_insns = gen_rtx_INSN_LIST (VOIDmode, insn, reg_equiv[regno].init_insns); @@ -3505,10 +3540,9 @@ update_equiv_regs (void) a register used only in one basic block from a MEM. If so, and the MEM remains unchanged for the life of the register, add a REG_EQUIV note. */ - note = find_reg_note (insn, REG_EQUIV, NULL_RTX); - if (note == 0 REG_BASIC_BLOCK (regno) = NUM_FIXED_BLOCKS + if (note == NULL_RTX REG_BASIC_BLOCK (regno) = NUM_FIXED_BLOCKS MEM_P (SET_SRC (set)) validate_equiv_mem (insn, dest, SET_SRC (set))) note = set_unique_reg_note (insn, REG_EQUIV,
[PATCH, bootstrap PR63235] Fix bootstrap.
Hello, Patch in the bottom fixes bootstrap (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63235) gcc/ * varpool.c (varpool_node::add): Pass decl attributes to lookup_attribute. Is it ok for trunk? -- Thanks, K diff --git a/gcc/varpool.c b/gcc/varpool.c index 8001c93..3761f14 100644 --- a/gcc/varpool.c +++ b/gcc/varpool.c @@ -449,7 +449,7 @@ varpool_node::add (tree decl) symtab-call_varpool_insertion_hooks (node); if (node-externally_visible_p ()) node-externally_visible = true; - if (lookup_attribute (no_reorder, decl)) + if (lookup_attribute (no_reorder, DECL_ATTRIBUTES (decl))) node-no_reorder = 1; }
Re: [PATCH, bootstrap PR63235] Fix bootstrap.
On Wed, Sep 24, 2014 at 04:16:50PM +0400, Kirill Yukhin wrote: Hello, Patch in the bottom fixes bootstrap (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63235) gcc/ * varpool.c (varpool_node::add): Pass decl attributes to lookup_attribute. Is it ok for trunk? Ok, thanks. diff --git a/gcc/varpool.c b/gcc/varpool.c index 8001c93..3761f14 100644 --- a/gcc/varpool.c +++ b/gcc/varpool.c @@ -449,7 +449,7 @@ varpool_node::add (tree decl) symtab-call_varpool_insertion_hooks (node); if (node-externally_visible_p ()) node-externally_visible = true; - if (lookup_attribute (no_reorder, decl)) + if (lookup_attribute (no_reorder, DECL_ATTRIBUTES (decl))) node-no_reorder = 1; } Jakub
Re: [PATCH, testsuite]: PR 58757: Check for FP denormal values without triggering denormal exceptions
On Wed, 24 Sep 2014, Uros Bizjak wrote: However, alpha *does* support all IEEE features, the only problem is in its default model, which is for some reason High-Performance IEEE-Format Arithmetic (please see alpha AHB [1], section 4.7.6.5). This model does not require the overhead of an operating system completion handler and can be the fastest of the three IEEE models.. Unfortunately, this model also notifies applications of all exceptional floating-point operations. Denormals are considered non-finite IEEE values, so they trap. When the target is in certain high-speed mode, it is up to the user to obey all the limitations, in this particular case, that only IEEE finite numbers are provided. This is not the case with the original testcase, so I'd say that the test is out of specs. It beats me, why -mieee is not the default on alpha, since current default suits -ffast-math more, but it looks that we have to live with this mess. (I believe -mieee is the default on some alpha platforms, maybe debian or bsd?) To avoid traps on denormals, -mieee has to be specified. This option enables FP software completion that completes denormal handling, so there is no need to notify application IMO, instead of XFAILing the test, we should simply provide -mieee. __*_DENORM_MIN__ should indeed apply to the underlying FP format, not to sme target-dependent model and its implementation details. [1] http://www.compaq.com/cpq-alphaserver/technology/literature/alphaahb.pdf In 4.7.6.5, I see: Underflow results are set to zero. so this is a functional model without denormals. According to the C11 standard, this means DBL_HAS_SUBNORM should be 0 and DBL_TRUE_MIN should be the same as DBL_MIN. The same is probably true on x86 with -ffast-math. Giving DBL_TRUE_MIN an unusable value (zero or trapping) is not very useful, while providing the real usable minimum lets users do something meaningful with it. The main issue is using incompatible flags in different objects or at link time... -- Marc Glisse
Re: [PATCH][AArch64] Use __aarch64_vget_lane* macros for getting the lane in some lane multiply intrinsics
Must have slipped through the cracks. https://gcc.gnu.org/ml/gcc-patches/2014-09/msg00586.html Ping? Thanks, Kyrill On 08/09/14 11:29, Kyrill Tkachov wrote: Hi all, The included testcase currently ICEs at -O0 because vget_lane_f64 is a function, so if it's properly called with a constant argument but without constant propagation it will not be recognised as constant, causing an ICE. This patch changes it to use the macro version directly. I think there is work being done to fix this issue up as part of a more general rework, but until that comes this patch implements the concerned intrinsics using the __aarch64_vget_lane* macros like the other lane intrinsics around them. Tested aarch64-none-elf. Ok for trunk? Thanks, Kyrill 2014-09-08 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/arm_neon.h (vmuld_lane_f64): Use macro for getting the lane. (vmuld_laneq_f64): Likewise. (vmuls_lane_f32): Likewise. (vmuls_laneq_f32): Likewise. 2014-09-08 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/simd/vmul_lane_const_lane_1.c: New test.
Re: [PATCH] Do not remove labels with LABEL_PRESERVE_P
2014-09-24 14:35 GMT+04:00 Steven Bosscher stevenb@gmail.com: On Wed, Sep 24, 2014 at 11:57 AM, Ilya Enkovich wrote: 2014-09-24 13:30 GMT+04:00 Steven Bosscher : Description of LABEL_PRESERVE_P says label that should always be considered to be needed. It's more specific than that, really: @item LABEL_PRESERVE_P (@var{x}) In a @code{code_label} or @code{note}, indicates that the label is referenced by code or data not visible to the RTL of a given function. I read another description: /* 1 if RTX is a code_label that should always be considered to be needed. */ #define LABEL_PRESERVE_P(RTX) \ (RTL_FLAG_CHECK2 (LABEL_PRESERVE_P, (RTX), CODE_LABEL, NOTE)-in_struct) Yes, from rtl.h. I'd recommend to always read the descriptions in doc/ (in this case doc/rtl.texi). The documentation in the header files is often not very comprehensive. The not visible part is important. If there are visible references to a label, then they should never be removed (obviously) and that should work through LABEL_NUSES. Unfortunately we are not very good at keeping LABEL_NUSES up-to-date (this is why all the rebuild_jump_labels() are still required). Does rebuild handle all kinds of instructions including those which use UNSPEC? Yes. Patterns are walked (deep) and REG_LABEL notes are added for all labels encountered that are not already the JUMP_LABEL of INSN. If the label is reachable from XEXP(UNSPEC, 0) -- the 'E' operand -- then that label is visible. What appears to be the case here, is that you have a label between two basic blocks B1 and B2, and the label acts as a control flow barrier: B1 and B2 cannot be merged. Then this should be expressed in the CFG. Otherwise: What else prevents the merge_blocks CFG hooks from deleting the label? Label acts as a barrier here but it is a side effect. I don't care about block merging. I just don't want label with usages to be removed. Understood. Only, LABEL_PRESERVE_P is not the right means to achieve that. So let's get back to basics and see what the usages look like. AFAIU now, you emit the code label early, and add the references much later (in machine reorg?). Does your UNSPEC have the code_label as an operand? If so, what breaks if cfgcleanup removes the label? Is the insn no longer recognized? Or does the label not end up in the assembly output? Or ...? I can try to help figure out what breaks if you have a test case. FWIW, the LABEL_PRESERVE_P uses in config/i386/i386.c look suspect. It probably only works because those labels are added late, and the code paths that use (x86_64 large PIC code model) are not tested all that well... I didn't generate references separately from label. Now I found an old patch and a test where this problem appeared. In this patch I moved set_rip generation currently performed in ix86_expand_prologue into expand pass. And I got following code in expand dump for testsuite/gcc.target/i386/pr55154.c test: (note 7 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (note/s 2 7 3 2 NOTE_INSN_DELETED_LABEL 2) (insn 3 2 4 2 (set (reg:DI 85) (unspec:DI [ (label_ref [2 deleted]) ] UNSPEC_SET_RIP)) /export/users/ienkovic/issues/4161/gcc/gcc/testsuite/gcc.target/i386/pr55154.c:9 -1 (insn_list:REG_LABEL_OPERAND 2 (nil))) There is a REG_LABEL_OPERAND generated but label is still removed. Ilya That means even if we do not have any usages we shouldn't remove it. Sorry, no. Even a LABEL_PRESERVE_P label can be deleted: It will be replaced by a NOTE_INSN_DELETED_LABEL. See cfgrtl.c:delete_insn(). According to description you quoted label marked by LABEL_PRESERVE_P is used by some code or data. Let this use be not visible to the RTL of a given function. It is still used, right? How can you remove it? The code_label rtx is removed, but the label itself is still output to the object file. The label number is retained in the CODE_LABEL_NUMBER of the NOTE_INSN_DELETED_LABEL. Look for how NOTE_INSN_DELETED_LABEL is handled in final.c. It's a hack IMHO, but that's how it has been since day 0 (see https://gcc.gnu.org/r104). Ciao! Steven
Re: [PATCH 0/10] OpenACC 2.0 support for libgomp
On Tue, Sep 23, 2014 at 07:17:25PM +0100, Julian Brown wrote: The upcoming patch series constitutes our current (still in-progress) implementation of run-time support for OpenACC 2.0 in libgomp. We've tried to build on top of the (also currently WIP) support for OpenMP 4.0's target construct, sharing code where possible: because of this, I've also prepared versions of (a fairly minimal, hopefully correct set of) prerequisite patches that apply to current mainline (and were previously on the gomp 4.0 branch), although in many cases we weren't the original authors of those. Other parts of the OpenACC support for GCC are being sent upstream concurrently with this runtime support (and are co-dependent with it), so unfortunately, though the main part of the implementation (part 7/10) works on our internal branch, I haven't yet been able to convincingly test the series I'm about to post upstream. However this code will be useful to others who are posting their bits of OpenACC support upstream, so perhaps it'd be useful to commit it anyway (we have to start somewhere!). I've tried to retain proper attribution for all the forthcoming patches, but I may have made mistakes. Please let me know if so! Just random comments about all the 10 patches: --- libgomp/Makefile.am (revision 215546) +++ libgomp/Makefile.am (working copy) @@ -14,13 +14,35 @@ libsubincludedir = $(libdir)/gcc/$(targe vpath % $(strip $(search_path)) -AM_CPPFLAGS = $(addprefix -I, $(search_path)) +AM_CPPFLAGS = $(addprefix -I, $(search_path)) \ + $(addprefix -I, $(search_path)/../include) AM_CFLAGS = $(XCFLAGS) AM_LDFLAGS = $(XLDFLAGS) $(SECTION_LDFLAGS) $(OPT_LDFLAGS) This looks wrong, search_path is typically something like: $(top_srcdir)/config/linux/x86 $(top_srcdir)/config/linux \ $(top_srcdir)/config/posix $(top_srcdir) so $(search_path)/../include means you duplicate all the */config/* paths again. Just add -I$(top_srcdir)/../include to AM_CPPFLAGS. As for plugins, my preference would be to move their sources to a libgomp/plugins/ subdirectory and build them in that subdirectory (for mic, which builds its plugin inside of libmicoffload it could copy it there). # TODO: not for OpenACC? libgomp really needs to be built against libpthread, so if you don't want that, you'd need to move the openacc bits to a separate shared library. In general, I'd prefer if the stuff that gets committed to trunk contains as few TODO: and FIXME: comments as possible, keep them on the branch if you really need them. static void +goacc_parse_device_num (void) +{ Any reason why you don't want to use parse_int for this? Does the standard require you parse and don't reject negative numbers? oacc-init.c:__thread void *ACC_handle; oacc-init.c:static __thread int handle_num = -1; oacc-init.c:static __thread struct gomp_device_descr const *saved_bound_dev; oacc-mem.c:__thread struct memmap_t *ACC_memmap; oacc-parallel.c:static __thread struct devgeom devgeom = { 1, 1, 1 }; oacc-parallel.c:static __thread struct target_mem_desc *mapped_data = NULL; Do you really need all those __thread vars? As libgomp uses IE model for performance reasons, growing the total size too much might very well mean that the dynamic linker will refuse to dlopen it. Couldn't you e.g. use just a single __thread pointer to a struct that will contain all of this? Also, note that libgomp must be supported also for the !HAVE_TLS case, where you shouldn't use __thread at all, use pthread_getspecific etc. instead (so it would really help if you'd just use a single pointer). +void +gomp_notify(const char *msg, ...) Formatting, missing space before (. char bind_var; + int acc_notify_var; /* Internal ICV. */ struct target_mem_desc *target_data; This is again in TLS, and duplicated/copied on any OpenMP parallel/task etc., so it also affects performance of #pragma omp parallel/task. Why do you need to put ACC stuff in there? Can't it live in target_data or elsewhere? + gomp_plugin_malloc; + gomp_plugin_malloc_cleared; ... Please use GOMP_PLUGIN_ instead. Also, please make sure the entrypoints libgomp looks for in the plugins have similar/same prefix. +__attribute__((used)) static void +dump_mappings (FILE *f, splay_tree_node node) +{ IMHO this should be guarded by some define, while it can be useful for debugging the library, it is unneeded for production libgomp. + if (device-get_caps_func () TARGET_CAP_OPENMP_400) +DLSYM (device_run); + if (device-get_caps_func () TARGET_CAP_OPENACC_200) Cache the return value? Also, I must say I'm not particularly excited about different plugins not supporting both OpenMP 4.0 and OpenACC 2.0 offloading. Why is that needed? + /* Make sure all the CUDA functions are there if any of them are. */ + if (optional_present optional_present != optional_total) + { + err = plugin missing OpenACC CUDA handler function; + goto out; + } So, any plugin
Re: [PATCH][AArch64] Use __aarch64_vget_lane* macros for getting the lane in some lane multiply intrinsics
On 8 September 2014 11:29, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, The included testcase currently ICEs at -O0 because vget_lane_f64 is a function, so if it's properly called with a constant argument but without constant propagation it will not be recognised as constant, causing an ICE. This patch changes it to use the macro version directly. I think there is work being done to fix this issue up as part of a more general rework, but until that comes this patch implements the concerned intrinsics using the __aarch64_vget_lane* macros like the other lane intrinsics around them. Tested aarch64-none-elf. Ok for trunk? Thanks, Kyrill 2014-09-08 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/arm_neon.h (vmuld_lane_f64): Use macro for getting the lane. (vmuld_laneq_f64): Likewise. (vmuls_lane_f32): Likewise. (vmuls_laneq_f32): Likewise. 2014-09-08 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/simd/vmul_lane_const_lane_1.c: New test. OK /Marcus
Re: [PATCH] PR63300 'const volatile' sometimes stripped in debug info.
On Tue, Sep 23 2014, Mark Wielaard wrote: This certainly looks nicer than how I wrote it. It took me a while (again) to realize why this works. We rely on the fact that earlier in the function a match would have been found if there was already a fully qualified type available. So here we know some subset will be found and at least one qualifier we need will not be in the result returned by get_nearest_type_subqualifiers. Maybe add a comment saying that to the code? Could you add the testcases I wrote for my variant of the fix to your patch and make sure they PASS? OK, here's the adjusted patch with a comment added and your testcases included. I changed the patch a bit further, to reduce unnecessary iterations and recursions, and tested it again. A few style aspects I'm not sure about: * Is it OK to use __builtin_popcount in tree.c? * Is it acceptable to add such a specialized function as get_nearest_type_subqualifiers to the tree interface? Or would it be preferable to move it as a static function to dwarf2out.c, since that's the only user right now? -- 8 -- Subject: [PATCH v3] PR63300 'const volatile' sometimes stripped in debug info. When adding DW_TAG_restrict_type the handling of multiple modifiers was adjusted incorrectly. This patch fixes it with the help of a new tree function get_nearest_type_subqualifiers. The old tests didn't catch this case because there always was an existing sub-qualified type already. The new guality testcase fails before and succeeds after this patch. The new dwarf2 testcases make sure the optimization works and doesn't introduce unnecessary type tags. gcc/ChangeLog * tree.c (check_base_type): New. (check_qualified_type): Exploit new helper function above. (get_nearest_type_subqualifiers): New. * tree.h (get_nearest_type_subqualifiers): New prototype. * dwarf2out.c (modified_type_die): Fix handling for qualifiers. Next qualifier to peel off is now determined with the help of get_nearest_type_subqualifiers. gcc/testsuite/ChangeLog * gcc.dg/debug/dwarf2/stacked-qualified-types-1.c: New testcase. * gcc.dg/debug/dwarf2/stacked-qualified-types-2.c: Likewise. * gcc.dg/guality/pr63300-const-volatile.c: New testcase. --- gcc/dwarf2out.c| 62 +++--- .../debug/dwarf2/stacked-qualified-types-1.c | 18 +++ .../debug/dwarf2/stacked-qualified-types-2.c | 19 +++ .../gcc.dg/guality/pr63300-const-volatile.c| 12 + gcc/tree.c | 52 -- gcc/tree.h | 7 +++ 6 files changed, 135 insertions(+), 35 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/stacked-qualified-types-1.c create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/stacked-qualified-types-2.c create mode 100644 gcc/testsuite/gcc.dg/guality/pr63300-const-volatile.c diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index e87ade2..abd9df9 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -10474,12 +10474,14 @@ modified_type_die (tree type, int cv_quals, dw_die_ref context_die) tree qualified_type; tree name, low, high; dw_die_ref mod_scope; + /* Only these cv-qualifiers are currently handled. */ + const int cv_qual_mask = (TYPE_QUAL_CONST | TYPE_QUAL_VOLATILE + | TYPE_QUAL_RESTRICT); if (code == ERROR_MARK) return NULL; - /* Only these cv-qualifiers are currently handled. */ - cv_quals = (TYPE_QUAL_CONST | TYPE_QUAL_VOLATILE | TYPE_QUAL_RESTRICT); + cv_quals = cv_qual_mask; /* Don't emit DW_TAG_restrict_type for DWARFv2, since it is a type tag modifier (and not an attribute) old consumers won't be able @@ -10530,7 +10532,7 @@ modified_type_die (tree type, int cv_quals, dw_die_ref context_die) else { int dquals = TYPE_QUALS_NO_ADDR_SPACE (dtype); - dquals = (TYPE_QUAL_CONST | TYPE_QUAL_VOLATILE | TYPE_QUAL_RESTRICT); + dquals = cv_qual_mask; if ((dquals ~cv_quals) != TYPE_UNQUALIFIED || (cv_quals == dquals DECL_ORIGINAL_TYPE (name) != type)) /* cv-unqualified version of named type. Just use @@ -10543,33 +10545,33 @@ modified_type_die (tree type, int cv_quals, dw_die_ref context_die) mod_scope = scope_die_for (type, context_die); - if ((cv_quals TYPE_QUAL_CONST) - /* If there are multiple type modifiers, prefer a path which -leads to a qualified type. */ - (((cv_quals ~TYPE_QUAL_CONST) == TYPE_UNQUALIFIED) - || get_qualified_type (type, cv_quals) == NULL_TREE - || (get_qualified_type (type, cv_quals ~TYPE_QUAL_CONST) - != NULL_TREE))) -{ - mod_type_die = new_die (DW_TAG_const_type, mod_scope, type); - sub_die = modified_type_die (type, cv_quals ~TYPE_QUAL_CONST, - context_die);
[PATCH, rs6000] Teach analyze_swaps to avoid vec_ste
Hi, The analyze_swaps pass performs special handling on certain non-swapping loads and stores so that computations involving them can still be optimized. However, the intent was to avoid this for lvx, stvx, lve*, and stve*. The existing logic excludes these by looking for a PARALLEL as the rtx code for the insn body. It turns out this works for lvx, stvx, and lve*, but stve* was implemented slightly differently, so this check doesn't catch it. This patch fixes the problem by looking for the pattern that matches stve* as well; we now exclude any store with an UNSPEC as its SET_SRC. I've added a new compile-time test case to verify the fix. The test ICEs on existing trunk but passes with the new changes. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Is this ok for trunk? Thanks, Bill [gcc] 2014-09-24 Bill Schmidt wschm...@linux.vnet.ibm.com * config/rs6000/rs6000.c (insn_is_swappable_p): Don't provide special handling for stores whose SET_SRC is an UNSPEC (such as UNSPEC_STVE). [gcc/testsuite] 2014-09-24 Bill Schmidt wschm...@linux.vnet.ibm.com * gcc.target/powerpc/swaps-p8-17.c: New test. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 215486) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -33793,9 +33793,10 @@ insn_is_swappable_p (swap_web_entry *insn_entry, r return 0; /* Loads and stores seen here are not permuting, but we can still - fix them up by converting them to permuting ones. Exception: - UNSPEC_LVX and UNSPEC_STVX, which have a PARALLEL body instead - of a SET. */ + fix them up by converting them to permuting ones. Exceptions: + UNSPEC_LVE, UNSPEC_LVX, and UNSPEC_STVX, which have a PARALLEL + body instead of a SET; and UNSPEC_STVE, which has an UNSPEC + for the SET source. */ rtx body = PATTERN (insn); int i = INSN_UID (insn); @@ -33812,7 +33813,7 @@ insn_is_swappable_p (swap_web_entry *insn_entry, r if (insn_entry[i].is_store) { - if (GET_CODE (body) == SET) + if (GET_CODE (body) == SET GET_CODE (SET_SRC (body)) != UNSPEC) { *special = SH_NOSWAP_ST; return 1; Index: gcc/testsuite/gcc.target/powerpc/swaps-p8-17.c === --- gcc/testsuite/gcc.target/powerpc/swaps-p8-17.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/swaps-p8-17.c (working copy) @@ -0,0 +1,14 @@ +/* { dg-do compile { target { powerpc64le-*-* } } } */ +/* { dg-options -mcpu=power8 -O1 } */ +/* { dg-final { scan-assembler lxvd2x } } */ +/* { dg-final { scan-assembler xxpermdi } } */ + +/* Verify that we don't try to do permute removal in the presence of + vec_ste. This used to ICE. */ +#include altivec.h + +void f (void *p) +{ + vector unsigned int u32 = vec_vsx_ld (1, (const unsigned int *)p); + vec_ste (u32, 1, (unsigned int *)p); +}
Re: [PATCH] Do not remove labels with LABEL_PRESERVE_P
On Wed, Sep 24, 2014 at 2:30 PM, Ilya Enkovich wrote: I didn't generate references separately from label. Now I found an old patch and a test where this problem appeared. In this patch I moved set_rip generation currently performed in ix86_expand_prologue into expand pass. And I got following code in expand dump for testsuite/gcc.target/i386/pr55154.c test: (note 7 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (note/s 2 7 3 2 NOTE_INSN_DELETED_LABEL 2) (insn 3 2 4 2 (set (reg:DI 85) (unspec:DI [ (label_ref [2 deleted]) ] UNSPEC_SET_RIP)) /export/users/ienkovic/issues/4161/gcc/gcc/testsuite/gcc.target/i386/pr55154.c:9 -1 (insn_list:REG_LABEL_OPERAND 2 (nil))) There is a REG_LABEL_OPERAND generated but label is still removed. Because it should be a REG_LABEL_TARGET? AFAUI this is a contol flow insn so I'd expect it to be a jump_insn (and the note will be a TARGET note). But it's not a PC-set insn and a jump target the compiler will interpret as an infinite loop (if the insns are really in the order as above) which is clearly not what you want. So if you emit it as a jump_insn I'm not sure what will happen... Is it necessary to emit the label into a basic block? Ciao! Steven
[PATCH i386 AVX512] [53/n] Update vec_setmode_0 pattern constraints.
Hello, Patch in the bottom extends to EVEX constraints of vec_setmode_0 insn pattern. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_insn vec_setmode_0): Add EVEX version. -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index c9d6e00..5f2fe5b 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -6259,13 +6259,13 @@ ;; see comment above inline_secondary_memory_needed function in i386.c (define_insn vec_setmode_0 [(set (match_operand:VI4F_128 0 nonimmediate_operand - =x,x,x ,x,x,x,x ,x ,m ,m ,m) + =v,v,v ,x,x,v,x ,x ,m ,m ,m) (vec_merge:VI4F_128 (vec_duplicate:VI4F_128 (match_operand:ssescalarmode 2 general_operand - x,m,*r,m,x,x,*rm,*rm,!x,!*re,!*fF)) + v,m,*r,m,x,v,*rm,*rm,!x,!*re,!*fF)) (match_operand:VI4F_128 1 vector_move_operand - C,C,C ,C,0,x,0 ,x ,0 ,0 ,0) + C,C,C ,C,0,v,0 ,x ,0 ,0 ,0) (const_int 1)))] TARGET_SSE @
Re: [PATCH] Do not remove labels with LABEL_PRESERVE_P
2014-09-24 16:47 GMT+04:00 Steven Bosscher stevenb@gmail.com: On Wed, Sep 24, 2014 at 2:30 PM, Ilya Enkovich wrote: I didn't generate references separately from label. Now I found an old patch and a test where this problem appeared. In this patch I moved set_rip generation currently performed in ix86_expand_prologue into expand pass. And I got following code in expand dump for testsuite/gcc.target/i386/pr55154.c test: (note 7 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (note/s 2 7 3 2 NOTE_INSN_DELETED_LABEL 2) (insn 3 2 4 2 (set (reg:DI 85) (unspec:DI [ (label_ref [2 deleted]) ] UNSPEC_SET_RIP)) /export/users/ienkovic/issues/4161/gcc/gcc/testsuite/gcc.target/i386/pr55154.c:9 -1 (insn_list:REG_LABEL_OPERAND 2 (nil))) There is a REG_LABEL_OPERAND generated but label is still removed. Because it should be a REG_LABEL_TARGET? AFAUI this is a contol flow insn so I'd expect it to be a jump_insn (and the note will be a TARGET note). But it's not a PC-set insn and a jump target the compiler will interpret as an infinite loop (if the insns are really in the order as above) which is clearly not what you want. So if you emit it as a jump_insn I'm not sure what will happen... Is it necessary to emit the label into a basic block? It is not a control flow instruction. It copies value of instruction pointer into a general purpose register. Therefore REG_LABEL_OPERAND seems to be correct. Ilya Ciao! Steven
[PATCH i386 AVX512] [54/n] Add mov[dlh]dup insns support.
Hello, patch in the bottom introduces support for vmov[dlh]dup insns. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_insn avx_movshdup256mask_name): Add masking. (define_insn sse3_movshdupmask_name): Ditto. (define_insn avx_movsldup256mask_name): Ditto. (define_insn sse3_movsldupmask_name): Ditto. (define_insn vec_dupv2dfmask_name): Ditto. (define_insn *vec_concatv2df): Add EVEX version. -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 5f2fe5b..862c280 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -5776,34 +5776,34 @@ ;; These are modeled with the same vec_concat as the others so that we ;; capture users of shufps that can use the new instructions -(define_insn avx_movshdup256 - [(set (match_operand:V8SF 0 register_operand =x) +(define_insn avx_movshdup256mask_name + [(set (match_operand:V8SF 0 register_operand =v) (vec_select:V8SF (vec_concat:V16SF - (match_operand:V8SF 1 nonimmediate_operand xm) + (match_operand:V8SF 1 nonimmediate_operand vm) (match_dup 1)) (parallel [(const_int 1) (const_int 1) (const_int 3) (const_int 3) (const_int 5) (const_int 5) (const_int 7) (const_int 7)])))] - TARGET_AVX - vmovshdup\t{%1, %0|%0, %1} + TARGET_AVX mask_avx512vl_condition + vmovshdup\t{%1, %0mask_operand2|%0mask_operand2, %1} [(set_attr type sse) (set_attr prefix vex) (set_attr mode V8SF)]) -(define_insn sse3_movshdup - [(set (match_operand:V4SF 0 register_operand =x) +(define_insn sse3_movshdupmask_name + [(set (match_operand:V4SF 0 register_operand =v) (vec_select:V4SF (vec_concat:V8SF - (match_operand:V4SF 1 nonimmediate_operand xm) + (match_operand:V4SF 1 nonimmediate_operand vm) (match_dup 1)) (parallel [(const_int 1) (const_int 1) (const_int 7) (const_int 7)])))] - TARGET_SSE3 - %vmovshdup\t{%1, %0|%0, %1} + TARGET_SSE3 mask_avx512vl_condition + %vmovshdup\t{%1, %0mask_operand2|%0mask_operand2, %1} [(set_attr type sse) (set_attr prefix_rep 1) (set_attr prefix maybe_vex) @@ -5829,34 +5829,34 @@ (set_attr prefix evex) (set_attr mode V16SF)]) -(define_insn avx_movsldup256 - [(set (match_operand:V8SF 0 register_operand =x) +(define_insn avx_movsldup256mask_name + [(set (match_operand:V8SF 0 register_operand =v) (vec_select:V8SF (vec_concat:V16SF - (match_operand:V8SF 1 nonimmediate_operand xm) + (match_operand:V8SF 1 nonimmediate_operand vm) (match_dup 1)) (parallel [(const_int 0) (const_int 0) (const_int 2) (const_int 2) (const_int 4) (const_int 4) (const_int 6) (const_int 6)])))] - TARGET_AVX - vmovsldup\t{%1, %0|%0, %1} + TARGET_AVX mask_avx512vl_condition + vmovsldup\t{%1, %0mask_operand2|%0mask_operand2, %1} [(set_attr type sse) (set_attr prefix vex) (set_attr mode V8SF)]) -(define_insn sse3_movsldup - [(set (match_operand:V4SF 0 register_operand =x) +(define_insn sse3_movsldupmask_name + [(set (match_operand:V4SF 0 register_operand =v) (vec_select:V4SF (vec_concat:V8SF - (match_operand:V4SF 1 nonimmediate_operand xm) + (match_operand:V4SF 1 nonimmediate_operand vm) (match_dup 1)) (parallel [(const_int 0) (const_int 0) (const_int 6) (const_int 6)])))] - TARGET_SSE3 - %vmovsldup\t{%1, %0|%0, %1} + TARGET_SSE3 mask_avx512vl_condition + %vmovsldup\t{%1, %0mask_operand2|%0mask_operand2, %1} [(set_attr type sse) (set_attr prefix_rep 1) (set_attr prefix maybe_vex) @@ -8342,24 +8342,24 @@ (set_attr prefix orig,vex,orig,vex,maybe_vex,orig,orig,vex,maybe_vex) (set_attr mode DF,DF,V1DF,V1DF,V1DF,V2DF,V1DF,V1DF,V1DF)]) -(define_insn vec_dupv2df - [(set (match_operand:V2DF 0 register_operand =x,x) +(define_insn vec_dupv2dfmask_name + [(set (match_operand:V2DF 0 register_operand =x,v) (vec_duplicate:V2DF - (match_operand:DF 1 nonimmediate_operand 0,xm)))] - TARGET_SSE2 + (match_operand:DF 1 nonimmediate_operand 0,vm)))] + TARGET_SSE2 mask_avx512vl_condition @ unpcklpd\t%0, %0 - %vmovddup\t{%1, %0|%0, %1} + %vmovddup\t{%1, %0mask_operand2|%0mask_operand2, %1} [(set_attr isa noavx,sse3) (set_attr type sselog1) (set_attr prefix orig,maybe_vex) (set_attr mode V2DF,DF)]) (define_insn *vec_concatv2df - [(set (match_operand:V2DF 0 register_operand =x,x,x,x,x,x,x,x) + [(set (match_operand:V2DF 0 register_operand =x,v,v,x,x,v,x,x) (vec_concat:V2DF -
[PATCH i386 AVX512] [55/n] Extend `perm' insn patterns.
Hello, Patch in the bottom extends `perm' insn patterns. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_expand avx2_avx512f_permmode): Rename to ... (define_expand avx2_avx512bw_permmode): this. (define_expand avx512_permmode_mask): Add 128/256-bit wide version. (define_insn avx2_avx512f_permmode_1mask_name): Rename to ... (define_insn avx2_avx512bw_permmode_1mask_name): this. -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 862c280..7c02629 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -15962,14 +15962,14 @@ (set_attr prefix mask_prefix2) (set_attr mode sseinsnmode)]) -(define_expand avx2_avx512f_permmode +(define_expand avx2_avx512_permmode [(match_operand:VI8F_256_512 0 register_operand) (match_operand:VI8F_256_512 1 nonimmediate_operand) (match_operand:SI 2 const_0_to_255_operand)] TARGET_AVX2 { int mask = INTVAL (operands[2]); - emit_insn (gen_avx2_avx512f_permmode_1 (operands[0], operands[1], + emit_insn (gen_avx2_avx512_permmode_1 (operands[0], operands[1], GEN_INT ((mask 0) 3), GEN_INT ((mask 2) 3), GEN_INT ((mask 4) 3), @@ -15977,16 +15977,16 @@ DONE; }) -(define_expand avx512f_permmode_mask - [(match_operand:V8FI 0 register_operand) - (match_operand:V8FI 1 nonimmediate_operand) +(define_expand avx512_permmode_mask + [(match_operand:VI8F_256_512 0 register_operand) + (match_operand:VI8F_256_512 1 nonimmediate_operand) (match_operand:SI 2 const_0_to_255_operand) - (match_operand:V8FI 3 vector_move_operand) + (match_operand:VI8F_256_512 3 vector_move_operand) (match_operand:avx512fmaskmode 4 register_operand)] TARGET_AVX512F { int mask = INTVAL (operands[2]); - emit_insn (gen_avx2_avx512f_permmode_1_mask (operands[0], operands[1], + emit_insn (gen_avx2_avx512_permmode_1_mask (operands[0], operands[1], GEN_INT ((mask 0) 3), GEN_INT ((mask 2) 3), GEN_INT ((mask 4) 3), @@ -15995,7 +15995,7 @@ DONE; }) -(define_insn avx2_avx512f_permmode_1mask_name +(define_insn avx2_avx512_permmode_1mask_name [(set (match_operand:VI8F_256_512 0 register_operand =v) (vec_select:VI8F_256_512 (match_operand:VI8F_256_512 1 nonimmediate_operand vm)
[wwwdocs] Update C++1y status page now that C++14 is finished.
C++14 is no longer the next standard, it's here, so update the project page. Committed to cvs. Index: projects/cxx1y.html === RCS file: /cvs/gcc/wwwdocs/htdocs/projects/cxx1y.html,v retrieving revision 1.15 retrieving revision 1.16 diff -u -u -r1.15 -r1.16 --- projects/cxx1y.html 23 Aug 2014 06:56:45 - 1.15 +++ projects/cxx1y.html 24 Sep 2014 12:46:05 - 1.16 @@ -13,28 +13,28 @@ body h1C++1y/C++14 Support in GCC/h1 - pGCC is beginning to introduce support for the next revision of the C++ - standard, which is expected to be published in 2014./p + pGCC is beginning to introduce support for the latest revision of the C++ + standard, which was published in 2014./p - pC++1y features are available as part of the mainline GCC + pC++14 features are available as part of the mainline GCC compiler in the trunk of a href=../svn.htmlGCC's Subversion - repository/a and in GCC 4.8 and later. To enable C++1y - support, add the command-line parameter code-std=c++1y/code + repository/a and in GCC 4.8 and later. To enable C++14 + support, add the command-line parameter code-std=c++14/code to your codeg++/code command line. Or, to enable GNU extensions in addition to C++0x extensions, - add code-std=gnu++1y/code to your codeg++/code command + add code-std=gnu++14/code to your codeg++/code command line./p - pstrongImportant/strong: Because the ISO C++14 draft is still - evolving, GCC's support is strongexperimental/strong. No attempt + pstrongImportant/strong: Because the final ISO C++14 standard was only + recently published, GCC's support is strongexperimental/strong. No attempt will be made to maintain backward compatibility with implementations of - C++1y features that do not reflect the final standard./p + C++14 features that do not reflect the final standard./p -h2C++1y Language Features/h2 +h2C++14 Language Features/h2 - pThe following table lists new language features that have been - accepted into the C++1y standard. The Proposal column + pThe following table lists new language features that are part of + the C++14 standard. The Proposal column provides a link to the ISO C++ committee proposal that describes the feature, while the Available in GCC? column indicates the first version of GCC that contains an implementation of this feature (if @@ -127,11 +127,11 @@ /tr /table - !-- h2C++11 Library Features/h2 -- + h2C++14 Library Features/h2 - !-- pThe status of the library implementation can be tracked in this -- - !-- a href=https://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.200x;table/a -- - !-- /p -- + pThe status of the library implementation can be tracked in this + a href=https://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2014;table/a + /p h2Development Branches/h2
[wwwdocs] Update libstdc++ section of gcc-5/changes.html
I forgot to send this patch a couple of days ago. Document recent libstdc++ changes on trunk. Also tweak the 4.9 notes slightly for consistency. Committed to cvs. Index: gcc-5/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v retrieving revision 1.9 retrieving revision 1.10 diff -u -r1.9 -r1.10 --- gcc-5/changes.html 5 Sep 2014 08:25:46 - 1.9 +++ gcc-5/changes.html 22 Sep 2014 15:31:58 - 1.10 @@ -84,12 +84,16 @@ lia href=https://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2011; Improved support for C++11/a, including: ul +li codestd::deque/code meets the allocator-aware container requirements;/li +li movable and swappable iostream classes;/li li support for codestd::aligned_union/code;/li /ul /li liAn implementation of codestd::experimental::any/code./li liNew random number distributions codelogistic_distribution/code and codeuniform_on_sphere_distribution/code as extensions./li +lia href=https://sourceware.org/gdb/current/onlinedocs/gdb/Xmethods-In-Python.html;GDB + Xmethods/a for codestd::vector/code and codestd::unique_ptr/code;/li /ul h3 id=fortranFortran/h3 Index: gcc-4.9/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/changes.html,v retrieving revision 1.79 retrieving revision 1.80 diff -u -r1.79 -r1.80 --- gcc-4.9/changes.html23 Aug 2014 15:58:55 - 1.79 +++ gcc-4.9/changes.html22 Sep 2014 15:31:57 - 1.80 @@ -344,7 +344,7 @@ li implemention of codestd::make_unique/code; /li li implemention of codestd::shared_lock/code; /li li making codestd::result_of/code SFINAE-friendly; /li -li adding codeoperator()/code to codeintegral_constant/code; /li +li adding codeoperator()/code to codestd::integral_constant/code; /li li adding user-defined literals for standard library types codestd::basic_string/code, codestd::chrono::duration/code, and codestd::complex/code; /li
Re: [PATCH] PR63300 'const volatile' sometimes stripped in debug info.
On Wed, Sep 24, 2014 at 02:40:14PM +0200, Andreas Arnez wrote: A few style aspects I'm not sure about: * Is it OK to use __builtin_popcount in tree.c? Definitely not, you can use popcount_hwi instead, which for GCC host compiler (= 3.4) will use __builtin_popcount*, otherwise fallback to a library function. * Is it acceptable to add such a specialized function as get_nearest_type_subqualifiers to the tree interface? Or would it be preferable to move it as a static function to dwarf2out.c, since that's the only user right now? I agree it should be kept in dwarf2out.c, it is too specialized. Jakub
Re: [PATCH, rs6000] Teach analyze_swaps to avoid vec_ste
On Wed, Sep 24, 2014 at 8:46 AM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: Hi, The analyze_swaps pass performs special handling on certain non-swapping loads and stores so that computations involving them can still be optimized. However, the intent was to avoid this for lvx, stvx, lve*, and stve*. The existing logic excludes these by looking for a PARALLEL as the rtx code for the insn body. It turns out this works for lvx, stvx, and lve*, but stve* was implemented slightly differently, so this check doesn't catch it. This patch fixes the problem by looking for the pattern that matches stve* as well; we now exclude any store with an UNSPEC as its SET_SRC. I've added a new compile-time test case to verify the fix. The test ICEs on existing trunk but passes with the new changes. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Is this ok for trunk? Thanks, Bill [gcc] 2014-09-24 Bill Schmidt wschm...@linux.vnet.ibm.com * config/rs6000/rs6000.c (insn_is_swappable_p): Don't provide special handling for stores whose SET_SRC is an UNSPEC (such as UNSPEC_STVE). [gcc/testsuite] 2014-09-24 Bill Schmidt wschm...@linux.vnet.ibm.com * gcc.target/powerpc/swaps-p8-17.c: New test. Okay. Thanks, David
Re: [PATCH, i386, Pointer Bounds Checker 33/x] MPX ABI
2014-09-24 11:05 GMT+04:00 Ilya Enkovich enkovich@gmail.com: 2014-09-23 22:01 GMT+04:00 Jeff Law l...@redhat.com: On 09/23/14 00:31, Ilya Enkovich wrote: I did this change a couple of years ago and don't remember exactly what problem was caused by PARALLEL. But from my comment it seems parallel lead to values in BND0 and BND1 not to be actually defined by call from DF point of view. I'll try to reproduce a problem I had. Please do. That would indicate a bug in the DF infrastructure. I'm not real familiar with the DF implementation, but a quick glance at df_def_record_1 seems to indicate it's got support for a set destination being a PARALLEL. This kind of scheme also doesn't tend to play well with exception handling scheduling becuase you can't guarantee the sets and the call are in the same block and scheduler as a single group. How can the sets and the call no be in the same block/group if all of them are parts of a single instruction? Obviously in the cases where we've had these problems in the past they were distinct instructions. So EH interactions isn't going to be an issue for MPX. However, we've still got the problem that the RTL you've generated is ill-formed. If I understand things correctly, the assignments are the result of the call, that should be modeled by having the destination be a PARALLEL as mentioned earlier. OK. Will try it. BTW call_value_pop patterns have two sets. One for returned value and one for stack register. How comes it differs much from what I do with bound regs? Thanks, Ilya Jeff I tried to generate PARALLEL with all regs set by call. Here is a memset call I got: (call_insn 23 22 24 2 (set (parallel [ (expr_list:REG_DEP_TRUE (reg:DI 0 ax) (const_int 0 [0])) (expr_list:REG_DEP_TRUE (reg:BND64 77 bnd0) (const_int 0 [0])) (expr_list:REG_DEP_TRUE (reg:BND64 78 bnd1) (const_int 0 [0])) ]) (call/j (mem:QI (symbol_ref:DI (memset) [flags 0x41] function_decl 0x77f79400 memset.chkp) [0 __builtin_memset S1 A8]) (const_int 0 [0]))) /export/users/ienkovic/mpx/tests/own-tests/255/test-255.c:11 652 {*call_value} (expr_list:REG_RETURNED (reg/f:DI 100) (expr_list:REG_DEAD (reg:DI 5 di) (expr_list:REG_DEAD (reg:SI 4 si) (expr_list:REG_DEAD (reg:DI 1 dx) (expr_list:REG_UNUSED (reg:BND64 78 bnd1) (expr_list:REG_UNUSED (reg:BND64 77 bnd0) (expr_list:REG_UNUSED (reg:DI 0 ax) (expr_list:REG_CALL_DECL (symbol_ref:DI (memset) [flags 0x41] function_decl 0x77f79400 memset.chkp) (expr_list:REG_EH_REGION (const_int 0 [0]) (nil)) (expr_list:DI (set (reg:DI 0 ax) (reg:DI 5 di)) (expr_list:DI (use (reg:DI 5 di)) (expr_list:BND64 (use (reg:BND64 77 bnd0)) (expr_list:SI (use (reg:SI 4 si)) (expr_list:DI (use (reg:DI 1 dx)) (nil))) During register allocation LRA generated a weird move instruction: (insn 63 0 0 (set (reg/f:DI 100) (parallel [ (expr_list:REG_DEP_TRUE (reg:DI 0 ax) (const_int 0 [0])) (expr_list:REG_DEP_TRUE (reg:BND64 77 bnd0) (const_int 0 [0])) (expr_list:REG_DEP_TRUE (reg:BND64 78 bnd1) (const_int 0 [0])) ])) -1 (nil)) Which caused ICE later in LRA. This move happens because of REG_RETURNED (reg/f:DI 100) (see condition in inherit_in_ebb at lra-constraints.c:5312). Thus this code in LRA doesn't accept PARALLEL dest for calls. So my question here is should I go through problems to enable PARALLEL call destination or current sets are OK taking into account we would still have multiple sets due to call_value_pop patterns? Thanks, Ilya
[GOOGLE] Fix new tests
The new tests added for -mpatch-functions-for-instrumentation did not correctly restrict themselves to x86_64 since tree-prof.exp doesn't support dg-do. Work around this by using target selectors on the dg-options. I apply the -mpatch and related options only if it is x86_64, otherwise it simply does splitting. Ok for google branches? Teresa 2014-09-24 Teresa Johnson tejohn...@google.com * testsuite/gcc.dg/tree-prof/cold_partition_patch.c: * testsuite/g++.dg/tree-prof/partition_patch.C: Index: testsuite/gcc.dg/tree-prof/cold_partition_patch.c === --- testsuite/gcc.dg/tree-prof/cold_partition_patch.c (revision 215525) +++ testsuite/gcc.dg/tree-prof/cold_partition_patch.c (working copy) @@ -1,8 +1,7 @@ /* Check if patching works with function splitting. */ -/* { dg-do compile { target x86_64-*-* } } */ /* { dg-require-effective-target freorder } */ -/* { dg-options -O2 -freorder-blocks-and-partition -save-temps -mpatch-functions-for-instrumentation -fno-optimize-sibling-calls } */ - +/* { dg-options -O2 -freorder-blocks-and-partition -save-temps { target { ! x86_64-*-* } } } +/* { dg-options -O2 -freorder-blocks-and-partition -save-temps -mpatch-functions-for-instrumentation -fno-optimize-sibling-calls { target x86_64-*-* } } */ #define SIZE 1 const char *sarr[SIZE]; Index: testsuite/g++.dg/tree-prof/partition_patch.C === --- testsuite/g++.dg/tree-prof/partition_patch.C(revision 215525) +++ testsuite/g++.dg/tree-prof/partition_patch.C(working copy) @@ -1,7 +1,7 @@ // Check if patching works with function splitting. -// { dg-do compile { target x86_64-*-* } } // { dg-require-effective-target freorder } -// { dg-options -O2 -fnon-call-exceptions -freorder-blocks-and-partition -mpatch-functions-for-instrumentation -fno-optimize-sibling-calls } +// { dg-options -O2 -fnon-call-exceptions -freorder-blocks-and-partition { target { ! x86_64-*-* } } } +// { dg-options -O2 -fnon-call-exceptions -freorder-blocks-and-partition -mpatch-functions-for-instrumentation -fno-optimize-sibling-calls { target x86_64-*-* } } int k; -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [patch] Implement move semantics for iostreams
On 23/09/14 15:58 +0200, Rainer Orth wrote: This patch broke Solaris bootstrap with Sun ld: when linking libstdc++.so, ld complains ld: fatal: libstdc++-symbols.ver-sun: 4520: symbol 'std::basic_ioschar, std::char_traitschar ::move(std::basic_ioschar, std::char_traitschar )': symbol version conflict and many more. In that case, I find that this symbols is matched by both the GLIBCXX_3.4 and GLIBCXX_3.4.21 patterns: GLIBCXX_3.4 ##std::basic_i[g-r]* (cxx) _ZNSt9basic_iosIcSt11char_traitsIcEE4moveEOS2_; GLIBCXX_3.4.21 ##_ZNSt9basic_iosI[cw]St11char_traitsI[cw]EE4moveE[OR]S2_ (glob) _ZNSt9basic_iosIcSt11char_traitsIcEE4moveEOS2_; Rainer, I think this patch should fix it, could you test it please? (I tried installing Solaris in a VM but couldn't get it to work, maybe I should use the VirtualBox image instead of trying qemu/kvm.) commit 61937e94b69fb848efd7925364fbb965ade8a444 Author: Jonathan Wakely jwak...@redhat.com Date: Wed Sep 24 14:24:38 2014 +0100 * config/abi/pre/gnu.ver: Make GLIBCXX_3.4 patterns stricter so the new GLIBCXX_3.4.21 symbols don't match them. diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver index 58c90d6..f736240 100644 --- a/libstdc++-v3/config/abi/pre/gnu.ver +++ b/libstdc++-v3/config/abi/pre/gnu.ver @@ -39,10 +39,11 @@ GLIBCXX_3.4 { std::basic_[g-h]*; std::basic_i[a-e]*; # std::basic_ifstream; - std::basic_i[g-r]*; +# std::basic_ios; +# std::basic_iostream; std::basic_istr[a-d]*; # std::basic_istream; - std::basic_istr[f-z]*; +# std::basic_istringstream; std::basic_i[t-z]*; std::basic_[j-n]*; std::basic_o[a-e]*; @@ -50,12 +51,12 @@ GLIBCXX_3.4 { # std::basic_o[g-z]*; std::basic_o[g-r]*; std::basic_ostr[a-d]*; - std::basic_ostr[f-z]*; +# std::basic_ostringstream; std::basic_[p-r]*; # std::basic_streambuf # std::basic_string # std::basic_stringbuf - std::basic_stringstream*; +# std::basic_stringstream; std::basic_[t-z]*; std::ba[t-z]*; std::b[b-z]*; @@ -94,7 +95,7 @@ GLIBCXX_3.4 { std::i[p-r]*; # std::istream # std::istreambuf_iterator - std::istringstream*; +# std::istringstream*; std::istrstream*; std::i[t-z]*; std::[A-Zj-k]*; @@ -306,12 +307,14 @@ GLIBCXX_3.4 { # std::basic_streambuf _ZNSt15basic_streambufI[cw]St11char_traitsI[cw]EE[CD]*; _ZNKSt15basic_streambufI[cw]St11char_traitsI[cw]EE[0-9]*; -_ZNSt15basic_streambufI[cw]St11char_traitsI[cw]EE[0-9][a-z][^t]*; +_ZNSt15basic_streambufI[cw]St11char_traitsI[cw]EE4set[gp]*; +_ZNSt15basic_streambufI[cw]St11char_traitsI[cw]EE4sync*; +_ZNSt15basic_streambufI[cw]St11char_traitsI[cw]EE[5-9][a-z][^t]*; _ZNSt15basic_streambufI[cw]St11char_traitsI[cw]EE[0-9][0-9][a-z][^t]*; _ZNSt15basic_streambufI[cw]St11char_traitsI[cw]EEaSERKS2_; # std::basic_stringbuf -_ZNSt15basic_stringbufI[cw]St11char_traitsI[cw]ESaI[cw]EEC*; +_ZNSt15basic_stringbufI[cw]St11char_traitsI[cw]ESaI[cw]EEC[12]E[RS]*; _ZNSt15basic_stringbufI[cw]St11char_traitsI[cw]ESaI[cw]EED[^2]*; _ZNSt15basic_stringbufI[cw]St11char_traitsI[cw]ESaI[cw]EE[0-9][a-r]*; _ZNSt15basic_stringbufI[cw]St11char_traitsI[cw]ESaI[cw]EE[0-9]seek*; @@ -325,12 +328,46 @@ GLIBCXX_3.4 { _ZNSt15basic_stringbufI[cw]St11char_traitsI[cw]ESaI[cw]EE[0-9]_M_[q-z]*; _ZNSt15basic_stringbufI[cw]St11char_traitsI[cw]ESaI[cw]EE[0-9][0-9]_M_[a-z]*; -# std::basic_iostream constructors, destructors -_ZNSdC*; +# std::basic_istringstream +_ZNSt19basic_istringstreamI[cw]St11char_traitsI[cw]ESaI[cw]EEC[12]E[RS]*; +_ZNSt19basic_istringstreamI[cw]St11char_traitsI[cw]ESaI[cw]EED*; +_ZNSt19basic_istringstreamI[cw]St11char_traitsI[cw]ESaI[cw]EE3str*; +_ZNKSt19basic_istringstream*; + +# std::basic_ostringstream +_ZNSt19basic_ostringstreamI[cw]St11char_traitsI[cw]ESaI[cw]EEC[12]E[RS]*; +_ZNSt19basic_ostringstreamI[cw]St11char_traitsI[cw]ESaI[cw]EED*; +_ZNSt19basic_ostringstreamI[cw]St11char_traitsI[cw]ESaI[cw]EE3str*; +_ZNKSt19basic_ostringstream*; + +# std::basic_stringstream +_ZNSt18basic_stringstreamI[cw]St11char_traitsI[cw]ESaI[cw]EEC[12]E[RS]*; +_ZNSt18basic_stringstreamI[cw]St11char_traitsI[cw]ESaI[cw]EED*; +_ZNSt18basic_stringstreamI[cw]St11char_traitsI[cw]ESaI[cw]EE3str*; +_ZNKSt18basic_stringstream*; + +# std::basic_iostream constructors (except move), destructors +_ZNSdC[12]Ev; +_ZNSdC[12]EP*; _ZNSdD*; +_ZNSt14basic_iostreamIwSt11char_traitsIwEEC[12]Ev; +_ZNSt14basic_iostreamIwSt11char_traitsIwEEC[12]EP*; +_ZNSt14basic_iostreamIwSt11char_traitsIwEED*; + +# std::basic_ios constructors, destructors +_ZNSt9basic_iosI[cw]St11char_traitsI[cw]EEC*; +_ZNSt9basic_iosI[cw]St11char_traitsI[cw]EED*; + +# std::basic_ios members (except move, swap, set_rdbuf) +
Re: [PATCH] PR63300 'const volatile' sometimes stripped in debug info.
On Wed, Sep 24 2014, Jakub Jelinek wrote: On Wed, Sep 24, 2014 at 02:40:14PM +0200, Andreas Arnez wrote: A few style aspects I'm not sure about: * Is it OK to use __builtin_popcount in tree.c? Definitely not, you can use popcount_hwi instead, which for GCC host compiler (= 3.4) will use __builtin_popcount*, otherwise fallback to a library function. * Is it acceptable to add such a specialized function as get_nearest_type_subqualifiers to the tree interface? Or would it be preferable to move it as a static function to dwarf2out.c, since that's the only user right now? I agree it should be kept in dwarf2out.c, it is too specialized. Jakub OK, I'm using popcount_hwi now and moved get_nearest_type_subqualifiers to dwarf2out.c. Does this look OK? -- 8 -- Subject: [PATCH v4] PR63300 'const volatile' sometimes stripped in debug info. When adding DW_TAG_restrict_type the handling of multiple modifiers was adjusted incorrectly. This patch fixes it with the help of a new tree function get_nearest_type_subqualifiers. The old tests didn't catch this case because there always was an existing sub-qualified type already. The new guality testcase fails before and succeeds after this patch. The new dwarf2 testcases make sure the optimization works and doesn't introduce unnecessary type tags. gcc/ChangeLog * tree.c (check_base_type): New. (check_qualified_type): Exploit new helper function above. * tree.h (check_base_type): New prototype. * dwarf2out.c (get_nearest_type_subqualifiers): New. (modified_type_die): Fix handling for qualifiers. Qualifiers to peel off are now determined using get_nearest_type_subqualifiers. gcc/testsuite/ChangeLog * gcc.dg/debug/dwarf2/stacked-qualified-types-1.c: New testcase. * gcc.dg/debug/dwarf2/stacked-qualified-types-2.c: Likewise. * gcc.dg/guality/pr63300-const-volatile.c: New testcase. --- gcc/dwarf2out.c| 96 +++--- .../debug/dwarf2/stacked-qualified-types-1.c | 18 .../debug/dwarf2/stacked-qualified-types-2.c | 19 + .../gcc.dg/guality/pr63300-const-volatile.c| 12 +++ gcc/tree.c | 16 +++- gcc/tree.h | 4 + 6 files changed, 131 insertions(+), 34 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/stacked-qualified-types-1.c create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/stacked-qualified-types-2.c create mode 100644 gcc/testsuite/gcc.dg/guality/pr63300-const-volatile.c diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index e87ade2..e15b42b 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -10461,6 +10461,40 @@ decl_quals (const_tree decl) ? TYPE_QUAL_VOLATILE : TYPE_UNQUALIFIED)); } +/* Determine the TYPE whose qualifiers match the largest strict subset + of the given TYPE_QUALS, and return its qualifiers. Ignore all + qualifiers outside QUAL_MASK. */ + +static int +get_nearest_type_subqualifiers (tree type, int type_quals, int qual_mask) +{ + tree t; + int best_rank = 0, best_qual = 0, max_rank; + + type_quals = qual_mask; + max_rank = popcount_hwi (type_quals) - 1; + + for (t = TYPE_MAIN_VARIANT (type); t best_rank max_rank; + t = TYPE_NEXT_VARIANT (t)) +{ + int q = TYPE_QUALS (t) qual_mask; + + if ((q type_quals) == q q != type_quals + check_base_type (t, type)) + { + int rank = popcount_hwi (q); + + if (rank best_rank) + { + best_rank = rank; + best_qual = q; + } + } +} + + return best_qual; +} + /* Given a pointer to an arbitrary ..._TYPE tree node, return a debugging entry that chains various modifiers in front of the given type. */ @@ -10474,12 +10508,14 @@ modified_type_die (tree type, int cv_quals, dw_die_ref context_die) tree qualified_type; tree name, low, high; dw_die_ref mod_scope; + /* Only these cv-qualifiers are currently handled. */ + const int cv_qual_mask = (TYPE_QUAL_CONST | TYPE_QUAL_VOLATILE + | TYPE_QUAL_RESTRICT); if (code == ERROR_MARK) return NULL; - /* Only these cv-qualifiers are currently handled. */ - cv_quals = (TYPE_QUAL_CONST | TYPE_QUAL_VOLATILE | TYPE_QUAL_RESTRICT); + cv_quals = cv_qual_mask; /* Don't emit DW_TAG_restrict_type for DWARFv2, since it is a type tag modifier (and not an attribute) old consumers won't be able @@ -10530,7 +10566,7 @@ modified_type_die (tree type, int cv_quals, dw_die_ref context_die) else { int dquals = TYPE_QUALS_NO_ADDR_SPACE (dtype); - dquals = (TYPE_QUAL_CONST | TYPE_QUAL_VOLATILE | TYPE_QUAL_RESTRICT); + dquals = cv_qual_mask; if ((dquals ~cv_quals) != TYPE_UNQUALIFIED || (cv_quals == dquals DECL_ORIGINAL_TYPE (name) != type))
Re: std::regex: inserting std::wregex to std::vector loses some std::wregex values
On 23/09/14 23:11 -0700, Tim Shen wrote: So I'll change the patch to move _M_traits to _NFA, and add a new basic_regex::_M_loc member. Here it is :). Bootstrapped and tested with debug flag. OK for trunk - thanks.
Re: [PATCH] Fix asan optimization for aligned accesses. (PR sanitizer/63316)
BTW, I've noticed that perhaps using BIT_AND_EXPR for the (shadow != 0) ((base_addr 7) + (real_size_in_bytes - 1) = shadow) tests isn't best, maybe we could get better code if we expanded it as (shadow != 0) ((base_addr 7) + (real_size_in_bytes - 1) = shadow) (i.e. an extra basic block containing the second half of the test and fastpath for the shadow == 0 case if it is sufficiently common (probably it is)). BIT_AND_EXPR allows efficient branchless implementation on platforms which allow chained conditional compares (e.g. ARM). You can't repro this on current trunk though because I'm still waiting for ccmp patches from Zhenqiang Chen to be approved :( Will try to code this up unless somebody beats me to that, but if somebody volunteered to benchmark such a change, it would be very much appreciated. AFAIK LLVM team recently got some 1% on SPEC from this. -Y -- View this message in context: http://gcc.1065356.n5.nabble.com/Re-please-verify-my-mail-to-community-tp1066917p1073370.html Sent from the gcc - patches mailing list archive at Nabble.com.
Re: [PATCH] Do not remove labels with LABEL_PRESERVE_P
On Wed, Sep 24, 2014 at 2:51 PM, Ilya Enkovich wrote: 2014-09-24 16:47 GMT+04:00 Steven Bosscher : It is not a control flow instruction. It copies value of instruction pointer into a general purpose register. Therefore REG_LABEL_OPERAND seems to be correct. OK - sorry for being a bit slow on the up-take, I got confused by the asm syntax :-) So I'm going to speculate a bit more... What you want to have is: foo: insns... L2: leaq L2(%rip), rXX What happens is that L2 is deleted, which is to say converted to a NOTE_INSN_DELETED_LABEL. Then the notes are re-ordered (NOTE_INSN_DELETED_LABEL notes are not tied to anything in the insns stream and can end up anywhere) so you end up with something like, foo: L2: # (was deleted) insns... leaq L2(%rip),rXX I bet you'd find that in the failing test case the label is output to the assembly file but it's simply in the wrong place. For the large code model, we get away with it because the prologue is output late and the order of the insns is not adjusted (a few passes later, the CFG doesn't even exist anymore so you don't go through cfgcleanup). But if you emit the label early and let it go through the entire RTL pipeline then anything can happen. If the above makes sense, then you'll want to emit the label late, or not at all, to the insns stream. If you emit the label late into the insns stream, you'd rewrite the set_rip as a define_insn_and_split that emits the label as part of the last splitting pass. But there is no splitting pass late enough to guarantee that the label and insns won't get separated. If you don't emit the label to the insns stream, you would write ix86_output_set_rip() and call that from the define_insns for set_rip. You'd not emit the label in the expander. You'd create it and make it an operand, but not emit it. Your ix86_output_set_rip() would write the label and the set_rip instruction. This is probably the only way to make 100% sure that the label is always exactly at the set_rip instruction. Something like below (completely untested, etc...). Hope this helps, Ciao! Steven Index: config/i386/i386-protos.h === --- config/i386/i386-protos.h (revision 215483) +++ config/i386/i386-protos.h (working copy) @@ -303,6 +303,7 @@ extern enum attr_cpu ix86_schedule; #endif extern const char * ix86_output_call_insn (rtx_insn *insn, rtx call_op); +extern const char * ix86_output_set_rip_insn (rtx *operands); #ifdef RTX_CODE /* Target data for multipass lookahead scheduling. Index: config/i386/i386.c === --- config/i386/i386.c (revision 215483) +++ config/i386/i386.c (working copy) @@ -11225,8 +11225,6 @@ ix86_expand_prologue (void) gcc_assert (Pmode == DImode); label = gen_label_rtx (); - emit_label (label); - LABEL_PRESERVE_P (label) = 1; tmp_reg = gen_rtx_REG (Pmode, R11_REG); gcc_assert (REGNO (pic_offset_table_rtx) != REGNO (tmp_reg)); insn = emit_insn (gen_set_rip_rex64 (pic_offset_table_rtx, @@ -12034,8 +12032,6 @@ ix86_expand_split_stack_prologue (void) rtx x; label = gen_label_rtx (); - emit_label (label); - LABEL_PRESERVE_P (label) = 1; emit_insn (gen_set_rip_rex64 (reg10, label)); emit_insn (gen_set_got_offset_rex64 (reg11, label)); emit_insn (ix86_gen_add3 (reg10, reg10, reg11)); @@ -25016,6 +25012,17 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op return ; } + +/* Output the assembly for a SET_RIP instruction. We do so with this output + function to ensure that the label and %rip load instruction are together. */ + +const char * +ix86_output_set_rip_insn (rtx *operands) +{ + output_asm_label (operands[1]); + output_asm_insn (lea{q}\t{%l1(%%rip), %0|%0, %l1[rip]}, operands); + return ; +} /* Clear stack slot assignments remembered from previous functions. This is called from INIT_EXPANDERS once before RTL is emitted for each Index: config/i386/i386.md === --- config/i386/i386.md (revision 215483) +++ config/i386/i386.md (working copy) @@ -12010,7 +12010,7 @@ [(set (match_operand:DI 0 register_operand =r) (unspec:DI [(label_ref (match_operand 1))] UNSPEC_SET_RIP))] TARGET_64BIT - lea{q}\t{%l1(%%rip), %0|%0, %l1[rip]} + * return ix86_output_set_rip_insn (operands); [(set_attr type lea) (set_attr length_address 4) (set_attr mode DI)])
Re: [PATCH] Fix asan optimization for aligned accesses. (PR sanitizer/63316)
AFAIK LLVM team recently got some 1% on SPEC from this. On x64 that is. -- View this message in context: http://gcc.1065356.n5.nabble.com/Re-please-verify-my-mail-to-community-tp1066917p1073371.html Sent from the gcc - patches mailing list archive at Nabble.com.
Re: Updated no_reorder patchkit
On 09/16/2014 05:15 AM, Andi Kleen wrote: This version addresses earlier comments and has an updated testsuite (still no LTO tests however). The assembler statements also no stay in order with ordered statements. It doesn't disable sorting of paritions with ordered symbols. I think that's an existing bug and is best addressed separately. Passed LTO boot strap and test on x86_64-linux, plus build of a large project that needs LTO order. -Andi Hello. I've just merged trunk to my branch and observed regression connected to this patchset: ../../../libcilkrts/runtime/config/x86/os-unix-sysdep.c:114:5: internal compiler error: tree check: expected tree_list, have var_decl in get_attribute_name, at attribs.c:679 if (__builtin_cpu_supports(sse)) ^ 0xc757a4 tree_check_failed(tree_node const*, char const*, int, char const*, ...) ../../gcc/tree.c:9167 0x566a35 tree_check ../../gcc/tree.h:2967 0x566a35 get_attribute_name(tree_node const*) ../../gcc/attribs.c:679 0xc788b5 private_lookup_attribute(char const*, unsigned long, tree_node*) ../../gcc/tree.c:5753 0xcd0468 lookup_attribute ../../gcc/tree.h:3773 0xcd0468 varpool_node::add(tree_node*) ../../gcc/varpool.c:452 0xced982 fold_builtin_cpu ../../gcc/config/i386/i386.c:32480 0x6826ef fold_builtin_call_array(unsigned int, tree_node*, tree_node*, int, tree_node**) ../../gcc/builtins.c:10565 0x59ec54 build_function_call_vec(unsigned int, vecunsigned int, va_heap, vl_ptr, tree_node*, vectree_node*, va_gc, vl_embed*, vectree_node*, va_gc, vl_embed*) ../../gcc/c/c-typeck.c:2958 0x5c659e c_parser_postfix_expression_after_primary ../../gcc/c/c-parser.c:7770 0x5b97bb c_parser_postfix_expression ../../gcc/c/c-parser.c:7590 0x5bbe6a c_parser_unary_expression ../../gcc/c/c-parser.c:6517 0x5c1ff6 c_parser_cast_expression ../../gcc/c/c-parser.c:6355 0x5c2235 c_parser_binary_expression ../../gcc/c/c-parser.c:6170 0x5c2de5 c_parser_conditional_expression ../../gcc/c/c-parser.c:5946 0x5c3420 c_parser_expr_no_commas ../../gcc/c/c-parser.c:5864 0x5c3ae6 c_parser_expression ../../gcc/c/c-parser.c:7897 0x5c45a9 c_parser_expression_conv ../../gcc/c/c-parser.c:7930 0x5c4622 c_parser_condition ../../gcc/c/c-parser.c:5050 0x5c46b7 c_parser_paren_condition ../../gcc/c/c-parser.c:5069 There's missing DECL_ATTRIBUTES in varpool.c in lookup_attribute call. Ready for trunk? Martin diff --git a/gcc/varpool.c b/gcc/varpool.c index 8001c93..3761f14 100644 --- a/gcc/varpool.c +++ b/gcc/varpool.c @@ -449,7 +449,7 @@ varpool_node::add (tree decl) symtab-call_varpool_insertion_hooks (node); if (node-externally_visible_p ()) node-externally_visible = true; - if (lookup_attribute (no_reorder, decl)) + if (lookup_attribute (no_reorder, DECL_ATTRIBUTES (decl))) node-no_reorder = 1; }
Re: Updated no_reorder patchkit
There's missing DECL_ATTRIBUTES in varpool.c in lookup_attribute call. Ready for trunk? OK, thanks Honza Martin diff --git a/gcc/varpool.c b/gcc/varpool.c index 8001c93..3761f14 100644 --- a/gcc/varpool.c +++ b/gcc/varpool.c @@ -449,7 +449,7 @@ varpool_node::add (tree decl) symtab-call_varpool_insertion_hooks (node); if (node-externally_visible_p ()) node-externally_visible = true; - if (lookup_attribute (no_reorder, decl)) + if (lookup_attribute (no_reorder, DECL_ATTRIBUTES (decl))) node-no_reorder = 1; }
Re: Updated no_reorder patchkit
On Wed, Sep 24, 2014 at 04:16:44PM +0200, Martin Liška wrote: On 09/16/2014 05:15 AM, Andi Kleen wrote: This version addresses earlier comments and has an updated testsuite (still no LTO tests however). The assembler statements also no stay in order with ordered statements. It doesn't disable sorting of paritions with ordered symbols. I think that's an existing bug and is best addressed separately. Passed LTO boot strap and test on x86_64-linux, plus build of a large project that needs LTO order. -Andi Hello. I've just merged trunk to my branch and observed regression connected to this patchset: This is already fixed, see r215552. Jakub
Re: Updated no_reorder patchkit
On 09/24/2014 04:17 PM, Jan Hubicka wrote: There's missing DECL_ATTRIBUTES in varpool.c in lookup_attribute call. Ready for trunk? OK, thanks Honza Ah, it has been fixed in r215552. Martin Martin diff --git a/gcc/varpool.c b/gcc/varpool.c index 8001c93..3761f14 100644 --- a/gcc/varpool.c +++ b/gcc/varpool.c @@ -449,7 +449,7 @@ varpool_node::add (tree decl) symtab-call_varpool_insertion_hooks (node); if (node-externally_visible_p ()) node-externally_visible = true; - if (lookup_attribute (no_reorder, decl)) + if (lookup_attribute (no_reorder, DECL_ATTRIBUTES (decl))) node-no_reorder = 1; }
Re: [PATCH 2/5] Existing call graph infrastructure enhancement
On 06/13/2014 12:26 PM, mliska wrote: Hi, this small patch prepares remaining needed infrastructure for the new pass. Changelog: 2014-06-13 Martin Liska mli...@suse.cz Honza Hubicka hubi...@ucw.cz * ipa-utils.h (polymorphic_type_binfo_p): Function marked external instead of static. * ipa-devirt.c (polymorphic_type_binfo_p): Likewise. * ipa-prop.h (count_formal_params): Likewise. * ipa-prop.c (count_formal_params): Likewise. * ipa-utils.c (ipa_merge_profiles): Be more tolerant if we merge profiles for semantically equivalent functions. * passes.c (do_per_function): If we load body of a function during WPA, this condition should behave same. * varpool.c (ctor_for_folding): More tolerant assert for variable aliases created during WPA. diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c index d733461..18592d7 100644 --- a/gcc/ipa-devirt.c +++ b/gcc/ipa-devirt.c @@ -176,7 +176,7 @@ struct GTY(()) odr_type_d inheritance (because vtables are shared). Look up the BINFO of type and check presence of its vtable. */ -static inline bool +bool polymorphic_type_binfo_p (tree binfo) { /* See if BINFO's type has an virtual table associtated with it. */ diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c index b67deed..60bda71 100644 --- a/gcc/ipa-prop.c +++ b/gcc/ipa-prop.c @@ -210,7 +210,7 @@ ipa_populate_param_decls (struct cgraph_node *node, /* Return how many formal parameters FNDECL has. */ -static inline int +int count_formal_params (tree fndecl) { tree parm; diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h index cb23698..87573ff 100644 --- a/gcc/ipa-prop.h +++ b/gcc/ipa-prop.h @@ -529,6 +529,7 @@ void ipa_free_all_edge_args (void); void ipa_free_all_structures_after_ipa_cp (void); void ipa_free_all_structures_after_iinln (void); void ipa_register_cgraph_hooks (void); +int count_formal_params (tree fndecl); /* This function ensures the array of node param infos is big enough to accommodate a structure for all nodes and reallocates it if not. */ diff --git a/gcc/ipa-utils.c b/gcc/ipa-utils.c index 8e7c7cb..bc2b958 100644 --- a/gcc/ipa-utils.c +++ b/gcc/ipa-utils.c @@ -660,13 +660,8 @@ ipa_merge_profiles (struct cgraph_node *dst, if (dst-tp_first_run src-tp_first_run src-tp_first_run) dst-tp_first_run = src-tp_first_run; - if (src-profile_id) -{ - if (!dst-profile_id) - dst-profile_id = src-profile_id; - else - gcc_assert (src-profile_id == dst-profile_id); -} + if (src-profile_id !dst-profile_id) +dst-profile_id = src-profile_id; if (!dst-count) return; diff --git a/gcc/ipa-utils.h b/gcc/ipa-utils.h index a2c985a..996249a 100644 --- a/gcc/ipa-utils.h +++ b/gcc/ipa-utils.h @@ -72,6 +72,8 @@ struct odr_type_d; typedef odr_type_d *odr_type; void build_type_inheritance_graph (void); void update_type_inheritance_graph (void); +bool polymorphic_type_binfo_p (tree binfo); + vec cgraph_node * possible_polymorphic_call_targets (tree, HOST_WIDE_INT, ipa_polymorphic_call_context, diff --git a/gcc/passes.c b/gcc/passes.c index 4366251..9fdfe51 100644 --- a/gcc/passes.c +++ b/gcc/passes.c @@ -1506,7 +1506,7 @@ do_per_function (void (*callback) (function *, void *data), void *data) { struct cgraph_node *node; FOR_EACH_DEFINED_FUNCTION (node) - if (node-analyzed gimple_has_body_p (node-decl) + if (node-analyzed (gimple_has_body_p (node-decl) !in_lto_p) (!node-clone_of || node-decl != node-clone_of-decl)) callback (DECL_STRUCT_FUNCTION (node-decl), data); } diff --git a/gcc/varpool.c b/gcc/varpool.c index ff67127..5cc558e 100644 --- a/gcc/varpool.c +++ b/gcc/varpool.c @@ -293,6 +293,7 @@ ctor_for_folding (tree decl) if (decl != real_decl) { gcc_assert (!DECL_INITIAL (decl) + || (node-alias varpool_alias_target (node) == real_node) || DECL_INITIAL (decl) == error_mark_node); if (lookup_attribute (weakref, DECL_ATTRIBUTES (decl))) { Hi. Following patch enhances API functions to be ready for main patch of this patchset. Ready for thunk? Thank you, Martin gcc/ChangeLog: 2014-09-21 Martin Liška mli...@suse.cz * cgraph.c (cgraph_node::release_body): New argument keep_arguments introduced. * cgraph.h: Likewise. * cgraphunit.c (cgraph_node::create_wrapper): Usage of new argument introduced. * ipa-devirt.c (polymorphic_type_binfo_p): Safe check for binfos created by Java. * tree-ssa-alias.c (ao_ref_base_alias_set): Static function transformed to global. * tree-ssa-alias.h: Likewise. diff --git a/gcc/cgraph.c b/gcc/cgraph.c index 8f04284..d40a2922 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -1637,13 +1637,15 @@ release_function_body (tree decl) are free'd in final.c via
Re: [PATCH] microblaze: microblaze.md: Use 'SI' instead of 'VOID' for operand 1 of 'call_value_intern'
Hello Michael: Firstly, thank you very much for always providing your aid to me for microblaze. At present, after try testsuite, the result is much better than my original trying, please help check the result: is it enough for our microblaze testsuite (can we say it pass checking)? Current result: # of expected passes65987 # of unexpected failures82 # of unexpected successes 1 # of expected failures 97 # of unresolved testcases 16378 # of unsupported tests 1810 Original result: # of expected passes 48408 # of unexpected failures 17253 # of unexpected successes 1 # of expected failures97 # of unresolved testcases 16570 # of unsupported tests1854 After check the current result log, I find many remote target test related sentences, do we have to process it? e.g. Download to microblaze-xilinx-gdb failed, couldn't execute rcp: no such file or directory. And I guess, it is a glibc bug: which still add root directory (e.g. /upstream/release) in 'libc.so' when already has --with-sysroot for configure. Oh, sorry, glibc should also need --with-sysroot. I shall try it today, hope it will let all things OK. After add --with-sysroot for glibc, this issue is still existance. And I remove the redundant direcltory manually for libc.so and libpthread.so. If our microblaze testsuite is OK, I will skip this issue (since I have no enough time resource on glibc, at present). Thanks. -- Chen Gang Open share and attitude like air water and life which God blessed
[jit] Update the various *_c_finalize functions
Joseph - thanks for looking through the jit diff. I plan to fix the issues you raise as a series of separate patches. Here's the first: On Tue, 2014-09-23 at 23:27 +, Joseph S. Myers wrote: Various *_finalize functions are missing comments explaining their semantics. Also the return type should be on the line before the function name. I've committed the following fix to branch dmalcolm/jit: Five of the *_c_finalize functions were empty, since their files contain no state [1][2]. Delete them. Fix up the formatting of the remaining *_c_finalize functions, and ensure they have descriptive leading comments. [1] Most of these lost their state when the symbol_table class was introduced, in r214422. [2] predict.c has state in the form of these variables: static sreal real_zero, real_one, real_almost_one, real_br_prob_base, real_inv_br_prob_base, real_one_half, real_bb_freq_max; and, within function estimate_bb_frequencies: static int real_values_initialized = 0; but it seems to me that this state doesn't need to be reset between repeated in-process invocations. gcc/ChangeLog.jit: * cgraph.h (cgraphbuild_c_finalize): Delete prototype of empty function. (ipa_c_finalize): Likewise. (predict_c_finalize): Likewise. (symtab_c_finalize): Likewise. (varpool_c_finalize): Likewise. * cgraph.c (cgraph_c_finalize): Add leading comment. Put return type on line before function name. * cgraphunit.c (cgraphunit_c_finalize): Likewise. * dwarf2out.c (dwarf2out_c_finalize): Likewise. * gcse.c (gcse_c_finalize): Likewise. * ipa-cp.c (ipa_cp_c_finalize): Likewise. * ipa-reference.c (ipa_reference_c_finalize): Likewise. * params.c (params_c_finalize): Update leading comment to match format of the others mentioned above. * cgraphbuild.c (cgraphbuild_c_finalize): Delete empty function. * ipa.c (ipa_c_finalize): Likewise. * predict.c (predict_c_finalize): Likewise. * symtab.c (symtab_c_finalize): Likewise. * varpool.c (varpool_c_finalize): Likewise. * toplev.c (toplev::finalize): Remove calls to empty functions cgraphbuild_c_finalize, ipa_c_finalize, predict_c_finalize, symtab_c_finalize, varpool_c_finalize. --- gcc/cgraph.c| 6 +- gcc/cgraph.h| 9 - gcc/cgraphbuild.c | 4 gcc/cgraphunit.c| 6 +- gcc/dwarf2out.c | 6 +- gcc/gcse.c | 6 +- gcc/ipa-cp.c| 3 +++ gcc/ipa-reference.c | 6 +- gcc/ipa.c | 4 gcc/params.c| 3 ++- gcc/predict.c | 4 gcc/symtab.c| 4 gcc/toplev.c| 5 - gcc/varpool.c | 4 14 files changed, 30 insertions(+), 40 deletions(-) diff --git a/gcc/cgraph.c b/gcc/cgraph.c index 736dd73..1721634 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -3078,7 +3078,11 @@ gimple_check_call_matching_types (gimple call_stmt, tree callee, return true; } -void cgraph_c_finalize (void) +/* Reset all state within cgraph.c so that we can rerun the compiler + within the same process. For use by toplev::finalize. */ + +void +cgraph_c_finalize (void) { symtab = NULL; diff --git a/gcc/cgraph.h b/gcc/cgraph.h index c407a3b..fd45e01 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -1958,25 +1958,16 @@ void tree_function_versioning (tree, tree, vecipa_replace_map *, va_gc *, /* In cgraphbuild.c */ int compute_call_stmt_bb_frequency (tree, basic_block bb); void record_references_in_initializer (tree, bool); -void cgraphbuild_c_finalize (void); /* In ipa.c */ void cgraph_build_static_cdtor (char which, tree body, int priority); void ipa_discover_readonly_nonaddressable_vars (void); -void ipa_c_finalize (void); /* In ipa-cp.c */ void ipa_cp_c_finalize (void); -/* In predict.c */ -void predict_c_finalize (void); - -/* In symtab.c */ -void symtab_c_finalize (void); - /* In varpool.c */ tree ctor_for_folding (tree); -void varpool_c_finalize (void); /* Return true when the symbol is real symbol, i.e. it is not inline clone or abstract function kept for debug info purposes only. */ diff --git a/gcc/cgraphbuild.c b/gcc/cgraphbuild.c index 5610064..96d7015 100644 --- a/gcc/cgraphbuild.c +++ b/gcc/cgraphbuild.c @@ -576,7 +576,3 @@ make_pass_remove_cgraph_callee_edges (gcc::context *ctxt) { return new pass_remove_cgraph_callee_edges (ctxt); } - -void cgraphbuild_c_finalize (void) -{ -} diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index 1f52d35..9a3834a 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -2288,7 +2288,11 @@ symbol_table::finalize_compilation_unit (void) timevar_pop (TV_CGRAPH); } -void cgraphunit_c_finalize (void) +/* Reset all state within cgraphunit.c so that we can rerun the compiler + within the same process. For use by toplev::finalize. */ + +void +cgraphunit_c_finalize (void) {
Re: [PATCH] Do not remove labels with LABEL_PRESERVE_P
2014-09-24 17:50 GMT+04:00 Steven Bosscher stevenb@gmail.com: On Wed, Sep 24, 2014 at 2:51 PM, Ilya Enkovich wrote: 2014-09-24 16:47 GMT+04:00 Steven Bosscher : It is not a control flow instruction. It copies value of instruction pointer into a general purpose register. Therefore REG_LABEL_OPERAND seems to be correct. OK - sorry for being a bit slow on the up-take, I got confused by the asm syntax :-) So I'm going to speculate a bit more... What you want to have is: foo: insns... L2: leaq L2(%rip), rXX What happens is that L2 is deleted, which is to say converted to a NOTE_INSN_DELETED_LABEL. Then the notes are re-ordered (NOTE_INSN_DELETED_LABEL notes are not tied to anything in the insns stream and can end up anywhere) so you end up with something like, foo: L2: # (was deleted) insns... leaq L2(%rip),rXX I bet you'd find that in the failing test case the label is output to the assembly file but it's simply in the wrong place. For the large code model, we get away with it because the prologue is output late and the order of the insns is not adjusted (a few passes later, the CFG doesn't even exist anymore so you don't go through cfgcleanup). But if you emit the label early and let it go through the entire RTL pipeline then anything can happen. Actually label removal causes ICE later in CSE, so there is no output to examine. Having back a patch which allows me to reproduce a problem I can finally answer your initial questions :) This should not be necessary, you're probably papering over another problem. Did the label use count drop to zero? Is there a reg note for the label operand? Here is a dump of basic_block I got in debugger right before label is removed: (code_label/s 2 1 7 2 2 [3 uses]) (note 7 2 3 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (insn 3 7 4 2 (set (reg:DI 85) (unspec:DI [ (label_ref 2) ] UNSPEC_SET_RIP)) /gnumnt/msticlxl7_users/ienkovic/issues/4161/gcc/gcc/testsuite/gcc.target/i386/pr55154.c:9 -1 (insn_list:REG_LABEL_OPERAND 2 (nil))) So we have non zero uses count and appropriate reg note. But label is still removed. BTW in my patch I should check LABEL_NUSES instead of LABEL_PRESERVE_P. I assumed it is possible to have LABEL_PRESERVE_P and zero uses but now I see init_label_info called by rebuild_jump_labels sets uses to 1 for all LABEL_PRESERVE_P labels. Condition I modified doesn't care about count of label usages at all. It just checks that BBs only predecessor doesn't jump to it. Thus all label uses by non jump instructions are ignored. So I propose a new patch (not tested): diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c index 9325ea0..fe2a444 100644 --- a/gcc/cfgcleanup.c +++ b/gcc/cfgcleanup.c @@ -2701,17 +2701,7 @@ try_optimize_cfg (int mode) (single_pred_edge (b)-flags EDGE_FALLTHRU) !(single_pred_edge (b)-flags EDGE_COMPLEX) LABEL_P (BB_HEAD (b)) - !LABEL_PRESERVE_P (BB_HEAD (b)) - /* If the previous block ends with a branch to this -block, we can't delete the label. Normally this -is a condjump that is yet to be simplified, but -if CASE_DROPS_THRU, this can be a tablejump with -some element going to the same place as the -default (fallthru). */ - (single_pred (b) == ENTRY_BLOCK_PTR_FOR_FN (cfun) - || !JUMP_P (BB_END (single_pred (b))) - || ! label_is_jump_target_p (BB_HEAD (b), - BB_END (single_pred (b) + !LABEL_NUSES (BB_HEAD (b))) { delete_insn (BB_HEAD (b)); if (dump_file) If the above makes sense, then you'll want to emit the label late, or not at all, to the insns stream. If you emit the label late into the insns stream, you'd rewrite the set_rip as a define_insn_and_split that emits the label as part of the last splitting pass. But there is no splitting pass late enough to guarantee that the label and insns won't get separated. If you don't emit the label to the insns stream, you would write ix86_output_set_rip() and call that from the define_insns for set_rip. You'd not emit the label in the expander. You'd create it and make it an operand, but not emit it. Your ix86_output_set_rip() would write the label and the set_rip instruction. This is probably the only way to make 100% sure that the label is always exactly at the set_rip instruction. Something like below (completely untested, etc...). Hope this helps, Your point about misplaced label is quite reasonable. I didn't see such problems but agree that might happen. Thank you for proposed patch! I think we should try to make changes you propose to securely insert set_rip instructions any time we want.
Re: [patch] Implement move semantics for iostreams
Hi Jonathan, On 23/09/14 15:58 +0200, Rainer Orth wrote: This patch broke Solaris bootstrap with Sun ld: when linking libstdc++.so, ld complains ld: fatal: libstdc++-symbols.ver-sun: 4520: symbol 'std::basic_ioschar, std::char_traitschar ::move(std::basic_ioschar, std::char_traitschar )': symbol version conflict and many more. In that case, I find that this symbols is matched by both the GLIBCXX_3.4 and GLIBCXX_3.4.21 patterns: GLIBCXX_3.4 ##std::basic_i[g-r]* (cxx) _ZNSt9basic_iosIcSt11char_traitsIcEE4moveEOS2_; GLIBCXX_3.4.21 ##_ZNSt9basic_iosI[cw]St11char_traitsI[cw]EE4moveE[OR]S2_ (glob) _ZNSt9basic_iosIcSt11char_traitsIcEE4moveEOS2_; Rainer, I think this patch should fix it, could you test it please? almost there: now I only get ld: fatal: libstdc++-symbols.ver-sun: 4622: symbol 'std::basic_ostreamwchar_t, std::char_traitswchar_t ::basic_ostream(std::basic_iostreamwchar_t, std::char_traitswchar_t )': symbol version conflict ld: fatal: libstdc++-symbols.ver-sun: 4623: symbol 'std::basic_ostreamwchar_t, std::char_traitswchar_t ::basic_ostream(std::basic_iostreamwchar_t, std::char_traitswchar_t )': symbol version conflict from GLIBCXX_3.4: ##_ZNSt13basic_ostreamIwSt11char_traitsIwEEC[12]E[RP]* (glob) _ZNSt13basic_ostreamIwSt11char_traitsIwEEC1ERSt14basic_iostreamIwS1_E; _ZNSt13basic_ostreamIwSt11char_traitsIwEEC2ERSt14basic_iostreamIwS1_E; GLIBCXX_3.4.21: ##_ZNSt13basic_ostreamIwSt11char_traitsIwEEC[12]ERSt14basic_iostreamIwS1_E (glob) _ZNSt13basic_ostreamIwSt11char_traitsIwEEC1ERSt14basic_iostreamIwS1_E; _ZNSt13basic_ostreamIwSt11char_traitsIwEEC2ERSt14basic_iostreamIwS1_E; The glob in the 3.4 version also matches _ZNSt13basic_ostreamIwSt11char_traitsIwEEC1EPSt15basic_streambufIwS1_E; _ZNSt13basic_ostreamIwSt11char_traitsIwEEC2EPSt15basic_streambufIwS1_E; (I tried installing Solaris in a VM but couldn't get it to work, maybe I should use the VirtualBox image instead of trying qemu/kvm.) VirtualBox works for me in principle, but I often found bootstrapping gcc inside some VM almost intolerably slow... There's been some talk on getting Solaris up and running in the compile farm. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
PING: Re: [patch] rename DECL_ABSTRACT to DECL_ABSTRACT_P
On 09/18/14 12:39, Aldy Hernandez wrote: Yeah, sure, either way it's a good cleanup ;). No strong opinions. Though I think true/false are the way we want folks to write new code. Given that's the long term direction, might as well fix that nit for DECL_ABSTRACT_P. Alright... fixed. OK? Ping.
Re: [PATCH, i386, Pointer Bounds Checker 32/x] Pointer Bounds Checker hooks for i386 target
2014-09-23 21:17 GMT+04:00 Jeff Law l...@redhat.com: On 09/23/14 08:10, Ilya Enkovich wrote: Please use fold_convert (size_ptr, build_fold_addr_expr (var)). Is 'var' always accessed via a size_t effective type? Watch out for TBAA issues if not. (if it is, why is 'var' not of type size_t or size_t[]?) var has pointer bounds type. I have to initialize it by parts and thus access it as a couple of integers having size of a pointer (I use integer instead of pointer because non poiner arithmetic is used). Size type is not the best for this purpose and therefore I replace it with pointer_sized_int_node. So I have accesses of var's parts as integers and accesses of whole var as bounds. Should I expect some problems from TBAA here? How can I avoid problems with TBAA if any exists? In general, anytime you access a hunk of memory using two different types, then you run the risk of problems with TBAA. In the case of bounds, we aren't exposing them to usercode, so you just have to worry about the refs/sets that you create. I think you could create an alias set for the bounds and attach it to every load/store if you aren't type safe for all the loads/stores. That will create a dependency between all the bounds loads/stores, but not with unrelated loads/stores. Alternately ensure all the loads/stores are in alias set 0, but that will likely have performance implications. I access parts of bounds using pointer_sized_int_node only in constructors which initialize static bound variables. These constructors do not have other usages of these vars and all other usages of these vars in other functions use bounds type for access. That should make me safe from TBAA point of view. Ilya Jeff
Re: [PATCH] PR63300 'const volatile' sometimes stripped in debug info.
On 09/23/2014 06:53 PM, Mark Wielaard wrote: And for the default case (gcc doesn't create type sections by default) the optimization is useful. I'm skeptical. These DIEs are very small, and I wouldn't expect a hole in the qualifier space like this to come up that often. Jason
Re: [patch] Implement move semantics for iostreams
On 24/09/14 16:38 +0200, Rainer Orth wrote: Hi Jonathan, On 23/09/14 15:58 +0200, Rainer Orth wrote: This patch broke Solaris bootstrap with Sun ld: when linking libstdc++.so, ld complains ld: fatal: libstdc++-symbols.ver-sun: 4520: symbol 'std::basic_ioschar, std::char_traitschar ::move(std::basic_ioschar, std::char_traitschar )': symbol version conflict and many more. In that case, I find that this symbols is matched by both the GLIBCXX_3.4 and GLIBCXX_3.4.21 patterns: GLIBCXX_3.4 ##std::basic_i[g-r]* (cxx) _ZNSt9basic_iosIcSt11char_traitsIcEE4moveEOS2_; GLIBCXX_3.4.21 ##_ZNSt9basic_iosI[cw]St11char_traitsI[cw]EE4moveE[OR]S2_ (glob) _ZNSt9basic_iosIcSt11char_traitsIcEE4moveEOS2_; Rainer, I think this patch should fix it, could you test it please? almost there: now I only get ld: fatal: libstdc++-symbols.ver-sun: 4622: symbol 'std::basic_ostreamwchar_t, std::char_traitswchar_t ::basic_ostream(std::basic_iostreamwchar_t, std::char_traitswchar_t )': symbol version conflict ld: fatal: libstdc++-symbols.ver-sun: 4623: symbol 'std::basic_ostreamwchar_t, std::char_traitswchar_t ::basic_ostream(std::basic_iostreamwchar_t, std::char_traitswchar_t )': symbol version conflict from GLIBCXX_3.4: ##_ZNSt13basic_ostreamIwSt11char_traitsIwEEC[12]E[RP]* (glob) _ZNSt13basic_ostreamIwSt11char_traitsIwEEC1ERSt14basic_iostreamIwS1_E; _ZNSt13basic_ostreamIwSt11char_traitsIwEEC2ERSt14basic_iostreamIwS1_E; GLIBCXX_3.4.21: ##_ZNSt13basic_ostreamIwSt11char_traitsIwEEC[12]ERSt14basic_iostreamIwS1_E (glob) _ZNSt13basic_ostreamIwSt11char_traitsIwEEC1ERSt14basic_iostreamIwS1_E; _ZNSt13basic_ostreamIwSt11char_traitsIwEEC2ERSt14basic_iostreamIwS1_E; Doh, yes, this additional tweak should solve that: index f736240..95fc3c7 100644 --- a/libstdc++-v3/config/abi/pre/gnu.ver +++ b/libstdc++-v3/config/abi/pre/gnu.ver @@ -460,7 +460,7 @@ GLIBCXX_3.4 { # std::basic_ostreamwchar_t _ZNSt13basic_ostreamIwSt11char_traitsIwEEC[12]Ev; -_ZNSt13basic_ostreamIwSt11char_traitsIwEEC[12]E[RP]*; +_ZNSt13basic_ostreamIwSt11char_traitsIwEEC[12]EP*; _ZNSt13basic_ostreamIwSt11char_traitsIwEED*; _ZNKSt13basic_ostreamIwSt11char_traitsIwEE[0-9][a-z]*; _ZNSt13basic_ostreamIwSt11char_traitsIwEE3putEw; The glob in the 3.4 version also matches _ZNSt13basic_ostreamIwSt11char_traitsIwEEC1EPSt15basic_streambufIwS1_E; _ZNSt13basic_ostreamIwSt11char_traitsIwEEC2EPSt15basic_streambufIwS1_E; Yes, that's all it needs to match, so changing [RP] to just P should work. (I tried installing Solaris in a VM but couldn't get it to work, maybe I should use the VirtualBox image instead of trying qemu/kvm.) VirtualBox works for me in principle, but I often found bootstrapping gcc inside some VM almost intolerably slow... There's been some talk on getting Solaris up and running in the compile farm. That would be very useful. Thanks for the quick testing and analysis.
Re: parallel check output changes?
On 09/23/2014 11:33 AM, Richard Sandiford wrote: Segher Boessenkoolseg...@kernel.crashing.org writes: On Thu, Sep 18, 2014 at 01:44:55PM -0500, Segher Boessenkool wrote: I am testing a patch that is just diff --git a/contrib/dg-extract-results.py b/contrib/dg-extract-results.py index cccbfd3..3781423 100644 --- a/contrib/dg-extract-results.py +++ b/contrib/dg-extract-results.py @@ -117,7 +117,7 @@ class Prog: self.tool_re = re.compile (r'^\t\t=== (.*) tests ===$') self.result_re = re.compile (r'^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED' r'|WARNING|ERROR|UNSUPPORTED|UNTESTED' - r'|KFAIL):\s*(\S+)') + r'|KFAIL):\s*(.+)') self.completed_re = re.compile (r'.* completed at (.*)') # Pieces of text to write at the head of the output. # start_line is a pair in which the first element is a datetime Tested that with four runs on powerpc64-linux, four configs each time; test-summary shows the same in all cases. Many lines have moved compared to without the patch, but that cannot be helped. Okay for mainline? 2014-09-19 Segher Boessenkoolseg...@kernel.crashing.org contrib/ * dg-extract-results.py (Prog.result_re): Include options in test name. FWIW, the \S+ thing was deliberate. When one test is run multiple times with different options, those options aren't necessarily tried in alphabetical order. The old sh/awk script therefore used just the test name as the key and kept tests with the same name in the order that they were encountered: /^(PASS|XPASS|FAIL|XFAIL|UNRESOLVED|WARNING|ERROR|UNSUPPORTED|UNTESTED|KFAIL):/ { testname=\$2 # Ugly hack for gfortran.dg/dg.exp if ($TOOL == gfortran testname ~ /^gfortran.dg\/g77\//) testname=htestname } (note the $2). This means that the output of the script is in the same order as it would be for non-parallel runs. I was following (or trying to follow) that behaviour in the python script. Your patch instead sorts based on the full test name, including options, which means that the output no longer matches what you'd get from a non-parallel run. AFAICT, it also no longer matches what you'd get from the .sh version. That might be OK, just thought I'd mention it. Thanks, Richard Is this suppose to be resolved now? I'm still seeing some issues with a branch cut from mainline from yesterday. This is from the following sequence: check out revision 215511 , build, make -j16 check, make -j16 check, then compare all the .sum files: PASS: gcc.dg/tls/asm-1.c (test for errors, line 7) PASS: gcc.dg/tls/asm-1.c (test for excess errors) PASS: gcc.dg/tls/debug-1.c (test for excess errors) PASS: gcc.dg/tls/diag-1.c (test for excess errors) PASS: gcc.dg/tls/diag-2.c (test for errors, line 4) PASS: gcc.dg/tls/diag-2.c (test for errors, line 5) PASS: gcc.dg/tls/diag-2.c (test for errors, line 6) PASS: gcc.dg/tls/diag-2.c (test for errors, line 7) PASS: gcc.dg/tls/diag-2.c (test for errors, line 11) PASS: gcc.dg/tls/diag-2.c (test for errors, line 12) PASS: gcc.dg/tls/diag-2.c (test for errors, line 13) PASS: gcc.dg/tls/diag-2.c (test for errors, line 14) PASS: gcc.dg/tls/diag-2.c (test for errors, line 17) PASS: gcc.dg/tls/diag-2.c (test for errors, line 18) PASS: gcc.dg/tls/diag-2.c (test for errors, line 19) PASS: gcc.dg/tls/diag-2.c (test for errors, line 20) PASS: gcc.dg/tls/diag-2.c (test for errors, line 22) and then PASS: gcc.dg/tls/asm-1.c (test for errors, line 7) PASS: gcc.dg/tls/asm-1.c (test for excess errors) PASS: gcc.dg/tls/debug-1.c (test for excess errors) PASS: gcc.dg/tls/diag-1.c (test for excess errors) PASS: gcc.dg/tls/diag-2.c (test for errors, line 11) PASS: gcc.dg/tls/diag-2.c (test for errors, line 12) PASS: gcc.dg/tls/diag-2.c (test for errors, line 13) PASS: gcc.dg/tls/diag-2.c (test for errors, line 14) PASS: gcc.dg/tls/diag-2.c (test for errors, line 17) PASS: gcc.dg/tls/diag-2.c (test for errors, line 18) PASS: gcc.dg/tls/diag-2.c (test for errors, line 19) PASS: gcc.dg/tls/diag-2.c (test for errors, line 20) PASS: gcc.dg/tls/diag-2.c (test for errors, line 22) PASS: gcc.dg/tls/diag-2.c (test for errors, line 4) PASS: gcc.dg/tls/diag-2.c (test for errors, line 5) PASS: gcc.dg/tls/diag-2.c (test for errors, line 6) PASS: gcc.dg/tls/diag-2.c (test for errors, line 7) it looks like the first time sorted by line numerically (or just happened to leave the run order) and the second time did the sort alphabetically... Andrew
Re: [PATCH 2/5] Existing call graph infrastructure enhancement
Hi. Following patch enhances API functions to be ready for main patch of this patchset. Ready for thunk? Thank you, Martin gcc/ChangeLog: 2014-09-21 Martin Liška mli...@suse.cz * cgraph.c (cgraph_node::release_body): New argument keep_arguments introduced. * cgraph.h: Likewise. * cgraphunit.c (cgraph_node::create_wrapper): Usage of new argument introduced. * ipa-devirt.c (polymorphic_type_binfo_p): Safe check for binfos created by Java. * tree-ssa-alias.c (ao_ref_base_alias_set): Static function transformed to global. * tree-ssa-alias.h: Likewise. diff --git a/gcc/cgraph.c b/gcc/cgraph.c index 8f04284..d40a2922 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -1637,13 +1637,15 @@ release_function_body (tree decl) are free'd in final.c via free_after_compilation(). */ void -cgraph_node::release_body (void) +cgraph_node::release_body (bool keep_arguments) { ipa_transforms_to_apply.release (); if (!used_as_abstract_origin symtab-state != PARSING) { DECL_RESULT (decl) = NULL; - DECL_ARGUMENTS (decl) = NULL; + + if (!keep_arguments) + DECL_ARGUMENTS (decl) = NULL; } /* If the node is abstract and needed, then do not clear DECL_INITIAL of its associated function function declaration because it's diff --git a/gcc/cgraph.h b/gcc/cgraph.h index a316e40..19ce3b8 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -915,7 +915,7 @@ public: Use this only for functions that are released before being translated to target code (i.e. RTL). Functions that are compiled to RTL and beyond are free'd in final.c via free_after_compilation(). */ - void release_body (void); + void release_body (bool keep_arguments = false); Please add documentation for KEEP_ARGUMENTS explaining that it is useful only if you want to rebuild body as thunk. /* cgraph_node is no longer nested function; update cgraph accordingly. */ void unnest (void); diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index 3e3b8d2..c4597e2 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -2300,7 +2300,7 @@ cgraph_node::create_wrapper (cgraph_node *target) tree decl_result = DECL_RESULT (decl); /* Remove the function's body. */ I would say Remove the function's body but keep arguments to be reused for thunk. -release_body (); +release_body (true); reset (); DECL_RESULT (decl) = decl_result; diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c index af42c6d..f374933 100644 --- a/gcc/ipa-devirt.c +++ b/gcc/ipa-devirt.c @@ -225,7 +225,7 @@ static inline bool polymorphic_type_binfo_p (tree binfo) { /* See if BINFO's type has an virtual table associtated with it. */ - return BINFO_VTABLE (TYPE_BINFO (BINFO_TYPE (binfo))); + return BINFO_TYPE (binfo) BINFO_VTABLE (TYPE_BINFO (BINFO_TYPE (binfo))); Aha, this change was for Java, right? Please add comment that Java produces BINFOs without BINFO_TYPE set. } /* Return TRUE if all derived types of T are known and thus diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c index 442112a..1bf88e2 100644 --- a/gcc/tree-ssa-alias.c +++ b/gcc/tree-ssa-alias.c @@ -559,7 +559,7 @@ ao_ref_base (ao_ref *ref) /* Returns the base object alias set of the memory reference *REF. */ -static alias_set_type +alias_set_type ao_ref_base_alias_set (ao_ref *ref) { tree base_ref; diff --git a/gcc/tree-ssa-alias.h b/gcc/tree-ssa-alias.h index 436381a..0d35283 100644 --- a/gcc/tree-ssa-alias.h +++ b/gcc/tree-ssa-alias.h @@ -98,6 +98,7 @@ extern void ao_ref_init (ao_ref *, tree); extern void ao_ref_init_from_ptr_and_size (ao_ref *, tree, tree); extern tree ao_ref_base (ao_ref *); extern alias_set_type ao_ref_alias_set (ao_ref *); +extern alias_set_type ao_ref_base_alias_set (ao_ref *); I can not approve this change, but I suppose it is what Richard suggested? Patch is OK except for the tree-ssa-alias bits. Honza extern bool ptr_deref_may_alias_global_p (tree); extern bool ptr_derefs_may_alias_p (tree, tree); extern bool ref_may_alias_global_p (tree);
Re: [PATCH 2/14][Vectorizer] Make REDUC_xxx_EXPR tree codes produce a scalar result
So it looks like patches 1-6 (reduc_foo) are relatively close to final, and given these fix PR/61114, I'm gonna try to land these while working on a respin of the second half (vec_shr)...(summary: yes I like the vec_perm idea too, but the devil is in the detail!) However my CompileFarm account is still pending, so to that end, if you were able to test patch 2/14 (attached inc. Richie's s/VIEW_CONVERT_EXPR/NOP_EXPR/) on the CompileFarm PowerPC machine, that'd be great, many thanks indeed. It should apply on its own without patch 1. I'll aim to get an alternative patch 3 back to the list shortly, and follow up with .md updates to the various backends. Cheers, Alan Richard Biener wrote: On Thu, Sep 18, 2014 at 1:50 PM, Alan Lawrence alan.lawre...@arm.com wrote: This fixes PR/61114 by redefining the REDUC_{MIN,MAX,PLUS}_EXPR tree codes. These are presently documented as producing a vector with the result in element 0, and this is inconsistent with their use in tree-vect-loop.c (which on bigendian targets pulls the bits out of the wrong end of the vector result). This leads to bugs on bigendian targets - see also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114. I discounted fixing the vectorizer (to read from element 0) and then making bigendian targets (whose architectural insn produces the result in lane N-1) permute the result vector, as optimization of vectors in RTL seems unlikely to remove such a permute and would lead to a performance regression. Instead it seems more natural for the tree code to produce a scalar result (producing a vector with the result in lane 0 has already caused confusion, e.g. https://gcc.gnu.org/ml/gcc-patches/2012-10/msg01100.html). However, this patch preserves the meaning of the optab (producing a result in lane 0 on little-endian architectures or N-1 on bigendian), thus generally avoiding the need to change backends. Thus, expr.c extracts an endianness-dependent element from the optab result to give the result expected for the tree code. Previously posted as an RFC https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html , now with an extra VIEW_CONVERT_EXPR if the types of the reduction/result do not match. Huh. Does that ever happen? Please use a NOP_EXPR instead of a VIEW_CONVERT_EXPR. Ok with that change. Thanks, Richard. Testing: x86_86-none-linux-gnu: bootstrap, check-gcc, check-g++ aarch64-none-linux-gnu: bootstrap aarch64-none-elf: check-gcc, check-g++ arm-none-eabi: check-gcc aarch64_be-none-elf: check-gcc, showing FAIL-PASS: gcc.dg/vect/no-scevccp-outer-7.c execution test FAIL-PASS: gcc.dg/vect/no-scevccp-outer-13.c execution test Passes the (previously-failing) reduced testcase on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61114 Have also assembler/stage-1 tested that testcase on PowerPC, also fixed. gcc/ChangeLog: * expr.c (expand_expr_real_2): For REDUC_{MIN,MAX,PLUS}_EXPR, add extract_bit_field around optab result. * fold-const.c (fold_unary_loc): For REDUC_{MIN,MAX,PLUS}_EXPR, produce scalar not vector. * tree-cfg.c (verify_gimple_assign_unary): Check result vs operand type for REDUC_{MIN,MAX,PLUS}_EXPR. * tree-vect-loop.c (vect_analyze_loop): Update comment. (vect_create_epilog_for_reduction): For direct vector reduction, use result of tree code directly without extract_bit_field. * tree.def (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR): Update comment. commit a7b173d5efc6f08589b04fffeec9b3942b6282a0 Author: Alan Lawrence alan.lawre...@arm.com Date: Tue Jul 29 11:46:01 2014 +0100 Make tree codes produce scalar, with NOP_EXPRs. (tree-vect-loop.c mess) diff --git a/gcc/expr.c b/gcc/expr.c index a6233f3..c792028 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -9044,7 +9044,17 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode, { op0 = expand_normal (treeop0); this_optab = optab_for_tree_code (code, type, optab_default); -temp = expand_unop (mode, this_optab, op0, target, unsignedp); +enum machine_mode vec_mode = TYPE_MODE (TREE_TYPE (treeop0)); +temp = expand_unop (vec_mode, this_optab, op0, NULL_RTX, unsignedp); +gcc_assert (temp); +/* The tree code produces a scalar result, but (somewhat by convention) + the optab produces a vector with the result in element 0 if + little-endian, or element N-1 if big-endian. So pull the scalar + result out of that element. */ +int index = BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (vec_mode) - 1 : 0; +int bitsize = GET_MODE_BITSIZE (GET_MODE_INNER (vec_mode)); +temp = extract_bit_field (temp, bitsize, bitsize * index, unsignedp, + target, mode, mode); gcc_assert (temp); return temp; } diff --git a/gcc/fold-const.c b/gcc/fold-const.c index
[AArch64] Wire up vqdmullh_laneq_s16 and vqdmullh_laneq_s32
Hi, As per the subject line this patch adds support for two arm_neon.h intrinsics that we had missed. We also need to fix the signature of vqdmulls_lane_s32, which is an obvious extension to this patch while we are in the area. Tested for simd.exp and aarch64.exp with no issues. OK? Thanks, James --- gcc/ 2014-09-24 James Greenhalgh james.greenha...@arm.com * config/aarch64/aarch64-simd-builtins.def (sqdmull_laneq): Expand iterator. * config/aarch64/aarch64-simd.md (aarch64_sqdmull_laneqmode): Expand iterator. * config/aarch64/arm_neon.h (vqdmullh_laneq_s16): New. (vqdmulls_lane_s32): Fix return type. (vqdmulls_laneq_s32): New. gcc/testsuite/ 2014-09-24 James Greenhalgh james.greenha...@arm.com * gcc.target/aarch64/simd/vqdmullh_laneq_s16.c: New. * gcc.target/aarch64/simd/vqdmulls_laneq_s32.c: Likewise. * gcc.target/aarch64/simd/vqdmulls_lane_s32.c: Fix return type. * gcc.target/aarch64/scalar_intrinsics.c (test_vqdmulls_s32): Fix return type. diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index de264c4..2367436 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -155,7 +155,7 @@ BUILTIN_VSD_HSI (BINOP, sqdmull, 0) BUILTIN_VSD_HSI (TERNOP, sqdmull_lane, 0) - BUILTIN_VD_HSI (TERNOP, sqdmull_laneq, 0) + BUILTIN_VSD_HSI (TERNOP, sqdmull_laneq, 0) BUILTIN_VD_HSI (BINOP, sqdmull_n, 0) BUILTIN_VQ_HSI (BINOP, sqdmull2, 0) BUILTIN_VQ_HSI (TERNOP, sqdmull2_lane, 0) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 493e88628c2a7ef2c4f87031d86d1a5edcbca06b..45ea9d7895e93d4c4b137de1c01f6a1e93942d11 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3398,7 +3398,7 @@ (define_expand aarch64_sqdmull_lanemod (define_expand aarch64_sqdmull_laneqmode [(match_operand:VWIDE 0 register_operand =w) - (match_operand:VD_HSI 1 register_operand w) + (match_operand:VSD_HSI 1 register_operand w) (match_operand:VCONQ 2 register_operand vwx) (match_operand:SI 3 immediate_operand i)] TARGET_SIMD diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index feca00e..9b1873f 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -19420,16 +19420,28 @@ vqdmullh_lane_s16 (int16_t __a, int16x4_t __b, const int __c) return __builtin_aarch64_sqdmull_lanehi (__a, __b, __c); } +__extension__ static __inline int32_t __attribute__ ((__always_inline__)) +vqdmullh_laneq_s16 (int16_t __a, int16x8_t __b, const int __c) +{ + return __builtin_aarch64_sqdmull_laneqhi (__a, __b, __c); +} + __extension__ static __inline int64_t __attribute__ ((__always_inline__)) vqdmulls_s32 (int32_t __a, int32_t __b) { return __builtin_aarch64_sqdmullsi (__a, __b); } -__extension__ static __inline int64x1_t __attribute__ ((__always_inline__)) +__extension__ static __inline int64_t __attribute__ ((__always_inline__)) vqdmulls_lane_s32 (int32_t __a, int32x2_t __b, const int __c) { - return (int64x1_t) {__builtin_aarch64_sqdmull_lanesi (__a, __b, __c)}; + return __builtin_aarch64_sqdmull_lanesi (__a, __b, __c); +} + +__extension__ static __inline int64_t __attribute__ ((__always_inline__)) +vqdmulls_laneq_s32 (int32_t __a, int32x4_t __b, const int __c) +{ + return __builtin_aarch64_sqdmull_laneqsi (__a, __b, __c); } /* vqmovn */ diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c b/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c index c07c94c..ea29066 100644 --- a/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c +++ b/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c @@ -501,7 +501,7 @@ test_vqdmulls_s32 (int32_t a, int32_t b) /* { dg-final { scan-assembler-times \\tsqdmull\\td\[0-9\]+, s\[0-9\]+, v 1 } } */ -int64x1_t +int64_t test_vqdmulls_lane_s32 (int32_t a, int32x2_t b) { return vqdmulls_lane_s32 (a, b, 1); diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_laneq_s16.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_laneq_s16.c new file mode 100644 index 000..947ebf4 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/vqdmullh_laneq_s16.c @@ -0,0 +1,15 @@ +/* Test the vqdmullh_laneq_s16 AArch64 SIMD intrinsic. */ + +/* { dg-do compile } */ +/* { dg-options -save-temps -O3 -fno-inline } */ + +#include arm_neon.h + +int32_t +t_vqdmullh_laneq_s16 (int16_t a, int16x8_t b) +{ + return vqdmullh_laneq_s16 (a, b, 0); +} + +/* { dg-final { scan-assembler-times sqdmull\[ \t\]+\[sS\]\[0-9\]+, ?\[hH\]\[0-9\]+, ?\[vV\]\[0-9\]+\.\[hH\]\\\[0\\\]\n 1 } } */ +/* { dg-final { cleanup-saved-temps } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_lane_s32.c b/gcc/testsuite/gcc.target/aarch64/simd/vqdmulls_lane_s32.c index 6ed8e3a..24daaab 100644 ---
Re: [PATCH] Fix PR 58867: asan and ubsan tests not run for installed testing.
Hi Andrew! I tried to run ASan and UBSan tests on installed toolchain, but failed because current GCC doesn't support this opportunity. I see, you had fixed this issue (http://patchwork.ozlabs.org/patch/286866/), but the patch wasn't applied to GCC. So, I wonder if you are going to commit this. -Maxim
Re: PING: Re: [patch] rename DECL_ABSTRACT to DECL_ABSTRACT_P
On 09/24/14 08:40, Aldy Hernandez wrote: On 09/18/14 12:39, Aldy Hernandez wrote: Yeah, sure, either way it's a good cleanup ;). No strong opinions. Though I think true/false are the way we want folks to write new code. Given that's the long term direction, might as well fix that nit for DECL_ABSTRACT_P. Alright... fixed. OK? Ping. OK for the trunk. Sorry I didn't pre-approve the trivial update. Jeff
Re: [PATCH, Pointer Bounds Checker 22/x] Inline
On 09/24/14 01:28, Ilya Enkovich wrote: I'm a bit curious why you removed the original RETBND statement in value-prof, only to reinsert it. Is there some reason you needed to do that? After call transformation we have smth like that: if (confition) new_lhs = direct_call (...); else old_lhs = call (...); old_bnd = __builtin_retbnd (old_lhs); Original retbnd statement removal + reinsertion is used to transform it into: if (confition) new_lhs = direct_call (...); else { old_lhs = call (...); old_bnd = __builtin_retbnd (old_lhs); } The rest of code inserts bounds for new_lhs and creates phi node for bounds similar to what is done for call return value. Oh yea, makes perfect sense, the earlier code inserted the conditional, but left the bounds setting bits in their prior (now the merge point) location. Thanks, Jeff
Re: [GOOGLE] Fix new tests
not sure if there is a better way, but ok. David On Wed, Sep 24, 2014 at 6:20 AM, Teresa Johnson tejohn...@google.com wrote: The new tests added for -mpatch-functions-for-instrumentation did not correctly restrict themselves to x86_64 since tree-prof.exp doesn't support dg-do. Work around this by using target selectors on the dg-options. I apply the -mpatch and related options only if it is x86_64, otherwise it simply does splitting. Ok for google branches? Teresa 2014-09-24 Teresa Johnson tejohn...@google.com * testsuite/gcc.dg/tree-prof/cold_partition_patch.c: * testsuite/g++.dg/tree-prof/partition_patch.C: Index: testsuite/gcc.dg/tree-prof/cold_partition_patch.c === --- testsuite/gcc.dg/tree-prof/cold_partition_patch.c (revision 215525) +++ testsuite/gcc.dg/tree-prof/cold_partition_patch.c (working copy) @@ -1,8 +1,7 @@ /* Check if patching works with function splitting. */ -/* { dg-do compile { target x86_64-*-* } } */ /* { dg-require-effective-target freorder } */ -/* { dg-options -O2 -freorder-blocks-and-partition -save-temps -mpatch-functions-for-instrumentation -fno-optimize-sibling-calls } */ - +/* { dg-options -O2 -freorder-blocks-and-partition -save-temps { target { ! x86_64-*-* } } } +/* { dg-options -O2 -freorder-blocks-and-partition -save-temps -mpatch-functions-for-instrumentation -fno-optimize-sibling-calls { target x86_64-*-* } } */ #define SIZE 1 const char *sarr[SIZE]; Index: testsuite/g++.dg/tree-prof/partition_patch.C === --- testsuite/g++.dg/tree-prof/partition_patch.C(revision 215525) +++ testsuite/g++.dg/tree-prof/partition_patch.C(working copy) @@ -1,7 +1,7 @@ // Check if patching works with function splitting. -// { dg-do compile { target x86_64-*-* } } // { dg-require-effective-target freorder } -// { dg-options -O2 -fnon-call-exceptions -freorder-blocks-and-partition -mpatch-functions-for-instrumentation -fno-optimize-sibling-calls } +// { dg-options -O2 -fnon-call-exceptions -freorder-blocks-and-partition { target { ! x86_64-*-* } } } +// { dg-options -O2 -fnon-call-exceptions -freorder-blocks-and-partition -mpatch-functions-for-instrumentation -fno-optimize-sibling-calls { target x86_64-*-* } } int k; -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[Patch] Fix PR61889 for the w64-mingw32 case
The following patch fixes PR61889 for x86_64-w64-mingw32. Details can be found on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61889 The patch was bootstrapped on x86_64-w64-mingw32. If patch the patch is ok, Kai would you apply, please? Rainer 2014-09-24 Rainer Emrich rai...@emrich-ebersheim.de PR gcov-profile/61889 * gcc/gcov-tool.c: Remove wrong #if !defined(_WIN32) * libgcc/libgcov-driver-system.c: undefine clashing macro for mkdir Index: gcc/gcov-tool.c === --- gcc/gcov-tool.c (Revision 215554) +++ gcc/gcov-tool.c (Arbeitskopie) @@ -89,11 +89,7 @@ gcov_output_files (const char *out, stru /* Try to make directory if it doesn't already exist. */ if (access (out, F_OK) == -1) { -#if !defined(_WIN32) if (mkdir (out, S_IRWXU | S_IRWXG | S_IRWXO) == -1 errno != EEXIST) -#else - if (mkdir (out) == -1 errno != EEXIST) -#endif fatal_error (Cannot make directory %s, out); } else unlink_profile_dir (out); Index: libgcc/libgcov-driver-system.c === --- libgcc/libgcov-driver-system.c (Revision 215554) +++ libgcc/libgcov-driver-system.c (Arbeitskopie) @@ -66,6 +66,9 @@ create_file_directory (char *filename) #ifdef TARGET_POSIX_IO mkdir (filename, 0755) == -1 #else +#ifdef mkdir +#undef mkdir +#endif mkdir (filename) == -1 #endif /* The directory might have been made by another process. */
Re: [wwwdocs] Update C++1y status page now that C++14 is finished.
On Sep 24, 2014, at 5:54 AM, Jonathan Wakely jwak...@redhat.com wrote: C++14 is no longer the next standard, it's here, so update the project page. Can we have a web doc person update the name of the page (projects/cxx1y.html - projects/cxx14.html) and add a redirect as necessary?
[wwwdocs] IPA/LTO/FDO updates for gcc-5/changes.html
Hi, this patch adds list of changes to IPA/LTO/FDO before I forget about them ;) Honza Index: changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v retrieving revision 1.9 diff -c -p -r1.9 changes.html *** changes.html5 Sep 2014 08:25:46 - 1.9 --- changes.html24 Sep 2014 15:23:35 - *** *** 17,22 --- 17,67 h2 id=generalGeneral Optimizer Improvements/h2 ul + liInter-procedural optimization improvements: + ul + liDevirtualization pass was significantly improved by adding +better support for speculative devirtualization and dynamic type +detection. About 50% of virtual calls in Firefox are speculatively +devirtualized during link-time optimization. + liNew comdat localization pass lets linker to eliminate more dead code +in presence of C++ inline functions./li + liVirtual tables are now optimized. Local aliases are used to reduce +dynamic linking time of C++ virtual tables on ELF targets and +data alignment has been reduced to limit data segment bloat./li + liNew code-fno-semantic-interposition/code flag can be used +to improve code quality of shared libraries where interposition of +exported symbols is not allowed./li + liWrite-only variables are now detected and optimized out./li + liWith profile feedback the function inliner can now bypass +code--param inline-insns-auto/code and code--param +inline-insns-single/code limits for hot calls./li + liIPA reference pass was significantly sped up making it feasible +to enable code-fipa-reference/code with +code-fprofile-generage/code. This also solve bottleneck +seen when optimizing Chromium with link time optimization./li + liSymbol table and call-graph API was reworked to C++ and +simplified./li + /ul/li + liLink-time optimization improvements: + ul + liNew One Definition Rule based merging of C++ types implemented. + Type merging enables better devirtualization and alias analysis. + Streaming extra information needed to merge types adds about 2-6% of + memory size and object size increase. This can be controlled by + code-flto-odr-type-merging/code./li + liGCC bootstrap now use slim LTO object files./li + liMemory usage and link times was improved. Tree merging was sped up, + memory usage of GIMPLE declarations and types was reduced, and, + support for on-demand streaming of variable constructors was added./li + /ul/li + liFeedback directed optimization improvements: + ul + liProfile precision was improved in presence of C++ inline and extern + inline functions./li + liNew codegcov-tool/code to manipulate profiles./li + liProfile is now more tolerant to source file changes (this can be + controlled by code--param profile-func-internal-id/code)./li + /ul/li liUndefinedBehaviorSanitizer gained a few new sanitization options: ul licode-fsanitize=float-divide-by-zero/code: detect floating-point *** *** 54,59 --- 99,107 liFull support for a href=https://www.cilkplus.org/;Cilk Plus/a has been added to the GCC compiler. Cilk Plus is an extension to the C and C++ languages to support data and task parallelism./li + liNew attribute codeno_reorder/code prevents reordering of selected symbols. + This enables to link-time optimize Linux kernel without need to use + code-fno-toplevel-reorder/code that disable several optimizations./li /ul h3 id=cC/h3 *** *** 90,95 --- 138,152 liAn implementation of codestd::experimental::any/code./li liNew random number distributions codelogistic_distribution/code and codeuniform_on_sphere_distribution/code as extensions./li + liNew One Definition Rule violation warning (controlled by code-Wodr/code) + detects mismatches in type definitions and virtual table contents + during link-time optimization./li + liNew warnings code-Wsuggest-final-types/code and + code-Wsuggest-final-methods/code helps developers + to annotate programs by codefinal/code specifiers (or anonymous + namespaces) in the cases where code generation improves. + These warnings can be used at compile time, but they are more + useful in combination with link-time optimization./li /ul h3 id=fortranFortran/h3
Re: Enable EBX for x86 in 32bits PIC code
On 09/24/14 00:56, Ilya Enkovich wrote: 2014-09-23 20:10 GMT+04:00 Jeff Law l...@redhat.com: On 09/23/14 10:03, Jakub Jelinek wrote: On Tue, Sep 23, 2014 at 10:00:00AM -0600, Jeff Law wrote: On 09/23/14 08:34, Jakub Jelinek wrote: On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote: use fixed EBX at least until we make sure pseudo PIC doesn't harm debug info generation. If we have such option then gcc.target/i386/pic-1.c and For debug info, it seems you are already handling this in delegitimize_address target hook, I'd suggest just building some very large shared library at -O2 -g -fpic on i?86 and either look at the sizes of .debug_info/.debug_loc sections with/without the patch, or use the locstat utility from elfutils (talk to Petr Machata if needed). Can't hurt, but I really don't see how changing from a fixed to an allocatable register is going to muck up debug info in any significant way. What matters is if the delegitimize_address target hook is as efficient in delegitimization as before. E.g. if it previously matched only when seeing %ebx + gotoff or similar, and wouldn't match anything now, some vars could have debug locations including UNSPEC and be dropped on the floor. Ah, yea, that makes sense. jeff After register allocation we have no idea where GOT address is and therefore delegitimize_address target hook becomes less efficient and cannot remove UNSPECs. That's what I see now when build GCC with patch applied: In theory this shouldn't be too hard to fix. I haven't looked at the code, but it might be something looking explicitly for ebx by register #, or something similar. Which case within delegitimize_address isn't firing as it should after your changes? jeff
Re: [GOOGLE] Fix new tests
On Wed, Sep 24, 2014 at 8:23 AM, Xinliang David Li davi...@google.com wrote: not sure if there is a better way, but ok. I looked through the documentation and other tests last night, but couldn't come up with a better way unfortunately. Teresa David On Wed, Sep 24, 2014 at 6:20 AM, Teresa Johnson tejohn...@google.com wrote: The new tests added for -mpatch-functions-for-instrumentation did not correctly restrict themselves to x86_64 since tree-prof.exp doesn't support dg-do. Work around this by using target selectors on the dg-options. I apply the -mpatch and related options only if it is x86_64, otherwise it simply does splitting. Ok for google branches? Teresa 2014-09-24 Teresa Johnson tejohn...@google.com * testsuite/gcc.dg/tree-prof/cold_partition_patch.c: * testsuite/g++.dg/tree-prof/partition_patch.C: Index: testsuite/gcc.dg/tree-prof/cold_partition_patch.c === --- testsuite/gcc.dg/tree-prof/cold_partition_patch.c (revision 215525) +++ testsuite/gcc.dg/tree-prof/cold_partition_patch.c (working copy) @@ -1,8 +1,7 @@ /* Check if patching works with function splitting. */ -/* { dg-do compile { target x86_64-*-* } } */ /* { dg-require-effective-target freorder } */ -/* { dg-options -O2 -freorder-blocks-and-partition -save-temps -mpatch-functions-for-instrumentation -fno-optimize-sibling-calls } */ - +/* { dg-options -O2 -freorder-blocks-and-partition -save-temps { target { ! x86_64-*-* } } } +/* { dg-options -O2 -freorder-blocks-and-partition -save-temps -mpatch-functions-for-instrumentation -fno-optimize-sibling-calls { target x86_64-*-* } } */ #define SIZE 1 const char *sarr[SIZE]; Index: testsuite/g++.dg/tree-prof/partition_patch.C === --- testsuite/g++.dg/tree-prof/partition_patch.C(revision 215525) +++ testsuite/g++.dg/tree-prof/partition_patch.C(working copy) @@ -1,7 +1,7 @@ // Check if patching works with function splitting. -// { dg-do compile { target x86_64-*-* } } // { dg-require-effective-target freorder } -// { dg-options -O2 -fnon-call-exceptions -freorder-blocks-and-partition -mpatch-functions-for-instrumentation -fno-optimize-sibling-calls } +// { dg-options -O2 -fnon-call-exceptions -freorder-blocks-and-partition { target { ! x86_64-*-* } } } +// { dg-options -O2 -fnon-call-exceptions -freorder-blocks-and-partition -mpatch-functions-for-instrumentation -fno-optimize-sibling-calls { target x86_64-*-* } } int k; -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [PATCH] microblaze: microblaze.md: Use 'SI' instead of 'VOID' for operand 1 of 'call_value_intern'
On 09/24/14 07:31, Chen Gang wrote: Hello Michael: Firstly, thank you very much for always providing your aid to me for microblaze. At present, after try testsuite, the result is much better than my original trying, please help check the result: is it enough for our microblaze testsuite (can we say it pass checking)? Current result: # of expected passes65987 # of unexpected failures82 # of unexpected successes 1 # of expected failures 97 # of unresolved testcases 16378 # of unsupported tests 1810 This is good. Original result: # of expected passes 48408 # of unexpected failures 17253 # of unexpected successes 1 # of expected failures97 # of unresolved testcases 16570 # of unsupported tests1854 After check the current result log, I find many remote target test related sentences, do we have to process it? e.g. Download to microblaze-xilinx-gdb failed, couldn't execute rcp: no such file or directory. The test suite uses rcp to transfer files to or from the target, either to provide input to a test case or to check the output. Most Linux systems do not install rcp, since it is a security risk. And I guess, it is a glibc bug: which still add root directory (e.g. /upstream/release) in 'libc.so' when already has --with-sysroot for configure. Oh, sorry, glibc should also need --with-sysroot. I shall try it today, hope it will let all things OK. After add --with-sysroot for glibc, this issue is still existance. And I remove the redundant direcltory manually for libc.so and libpthread.so. If our microblaze testsuite is OK, I will skip this issue (since I have no enough time resource on glibc, at present). OK with me. -- Michael Eagerea...@eagercon.com 1960 Park Blvd., Palo Alto, CA 94306 650-325-8077
Re: [PATCH] microblaze: microblaze.md: Use 'SI' instead of 'VOID' for operand 1 of 'call_value_intern'
On Sep 24, 2014, at 8:28 AM, Michael Eager ea...@eagerm.com wrote: After check the current result log, I find many remote target test related sentences, do we have to process it? e.g. Download to microblaze-xilinx-gdb failed, couldn't execute rcp: no such file or directory. The test suite uses rcp to transfer files to or from the target, either to provide input to a test case or to check the output. Most Linux systems do not install rcp, since it is a security risk. To clarify: if {[board_info $desthost exists rcp_prog]} { set RCP [board_info $desthost rcp_prog] } else { set RCP rcp } So, if you set rcp_prog to something else, you should be able to avoid rsh if you want. Most people use ssh now-a-days. You will want it set up to not require a password for testing.
Re: [PATCH i386 AVX512] [51/n] Add pd2dq and dq2pd converts.
On Wed, Sep 24, 2014 at 10:49 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, Patch in the bottom adds support for pd2dq and dq2pd conversions. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/i386.c (avx512f_ufix_notruncv8dfv8si_mask_round): Rename to ... (ufix_notruncv8dfv8si2_mask_round): this. (ufix_notruncv8dfv8si2_mask_round): ... this. * config/i386/sse.md (define_insn avx512f_cvtdq2pd512_2): Update TARGET check. (define_insn avx_cvtdq2pd256_2): Add EVEX version. (define_insn sse2_cvtdq2pdmask_name): Add masking. (define_insn avx_cvtpd2dq256mask_name): Ditto. (define_expand sse2_cvtpd2dq): Delete. (define_insn sse2_cvtpd2dqmask_name): Add masking. (define_insn avx512f_ufix_notruncv8dfv8simask_nameround_name): Delete. (define_mode_attr pd2udqsuff): New. (define_insn ufix_notruncmodesi2dfmodelower2mask_nameround_name): Ditto. (define_insn ufix_notruncv2dfv2si2mask_name): Ditto. (define_insn *avx_cvttpd2dq256_2): Delete. (define_expand sse2_cvttpd2dq): Ditto. (define_insn sse2_cvttpd2dqmask_name): Add masking. You didn't mention following no-op change (in two places): - (match_operand:V2SI 2 const0_operand)))] + (const_vector:V2SI [(const_int 0) (const_int 0)])))] OK with the updated ChangeLog. Thanks, Uros.
Re: [PATCH, i386, Pointer Bounds Checker 33/x] MPX ABI
On 09/24/14 01:05, Ilya Enkovich wrote: However, we've still got the problem that the RTL you've generated is ill-formed. If I understand things correctly, the assignments are the result of the call, that should be modeled by having the destination be a PARALLEL as mentioned earlier. OK. Will try it. BTW call_value_pop patterns have two sets. One for returned value and one for stack register. How comes it differs much from what I do with bound regs? The semantics of a PARALLEL are that all the values used in the expressions are evaluated, then all the side effects are performed. So: (define_insn *call_pop [(call (mem:QI (match_operand:SI 0 call_insn_operand lmBz)) (match_operand 1)) (set (reg:SI SP_REG) (plus:SI (reg:SI SP_REG) (match_operand:SI 2 immediate_operand i)))] !TARGET_64BIT !SIBLING_CALL_P (insn) * return ix86_output_call_insn (insn, operands[0]); [(set_attr type call)]) According to the semantics of a PARALLEL would indicate that the reference to SP_REG on the RHS of the 2nd assignment expression takes the value of SP_REG *prior to the call*. And those are the semantics we depend on. So in your case the RHS references to BND0_REG and BND1_REG use the values *before* the call -- and I don't think that's the semantics you want. You might get away with it because of the UNSPEC wrapping, but IMHO, it's still ill-formed RTL. jeff
Re: parallel check output changes?
On Wed, Sep 24, 2014 at 10:54:57AM -0400, Andrew MacLeod wrote: On 09/23/2014 11:33 AM, Richard Sandiford wrote: Your patch instead sorts based on the full test name, including options, which means that the output no longer matches what you'd get from a non-parallel run. AFAICT, it also no longer matches what you'd get from the .sh version. That might be OK, just thought I'd mention it. With the parallellisation changes the output was pretty random order. My patch made that a fixed order again, albeit a different one from before. Is this suppose to be resolved now? I'm still seeing some issues with a branch cut from mainline from yesterday. This is from the following sequence: check out revision 215511 , build, make -j16 check, make -j16 check, then compare all the .sum files: I don't understand what exactly you did; you have left out some steps I think? Segher
Re: [PATCH] microblaze: microblaze.md: Use 'SI' instead of 'VOID' for operand 1 of 'call_value_intern'
On 09/24/2014 11:28 PM, Michael Eager wrote: On 09/24/14 07:31, Chen Gang wrote: Hello Michael: Firstly, thank you very much for always providing your aid to me for microblaze. At present, after try testsuite, the result is much better than my original trying, please help check the result: is it enough for our microblaze testsuite (can we say it pass checking)? Current result: # of expected passes65987 # of unexpected failures82 # of unexpected successes 1 # of expected failures 97 # of unresolved testcases 16378 # of unsupported tests 1810 This is good. OK, thanks, and I shall send a fix patch for ((void (*)(void))0)() tomorrow, it pass testsuite (old and new get the same result), but new can fix ((void (*)(void))0)() issue. So I guess this fix is valid. :-) Thanks. -- Chen Gang Open share and attitude like air water and life which God blessed
Re: [PATCH i386 AVX512] [52/n] Add convert ps2pd and ps2dq.
On Wed, Sep 24, 2014 at 10:54 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, Patch in the bottom adds support for ps2dq and ps2pd conversions. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_c_enum unspec): Add UNSPEC_CVTINT2MASK. (define_insn fixsuffixfix_truncmodesselongvecmodelower2mask_nameround_saeonly_name): New. (define_insn fixsuffixfix_truncv2sfv2di2mask_name): Ditto. (define_insn ufix_truncmodesseintvecmodelower2mask_name): Ditto. (define_insn sse2_cvtss2sdround_saeonly_name): Change nonimmediate_operand to round_saeonly_nimm_predicate. (define_insn avx_cvtpd2ps256mask_name): Add masking. (define_expand sse2_cvtpd2ps_mask): New. (define_insn *sse2_cvtpd2psmask_name): Add masking. (define_insn avx512_cvtssemodesuffix2maskmode): New. (define_insn avx512_cvtmask2ssemodesuffixmode): Ditto. (define_insn sse2_cvtps2pdmask_name): Add masking. OK, modulo UNSPEC_CVTINT2MASK stuff. Please split out and repost UNSPEC_CVTINT2MASK part of the patch, as it doesn't belong in this one. Also, please see the question in the patch. Thanks, Uros. -- Thanks, K diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index b2e1d4f..c9d6e00 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -132,6 +132,7 @@ ;; For AVX512BW support UNSPEC_PSHUFHW UNSPEC_PSHUFLW + UNSPEC_CVTINT2MASK What was the reason to go with an unspec? The pattern that uses generic vector RTX is preferred, unless that kind of pattern is too complex. ;; For AVX512DQ support UNSPEC_REDUCE @@ -4659,6 +4660,38 @@ (set_attr prefix evex) (set_attr mode sseintvecmode2)]) +(define_insn fixsuffixfix_truncmodesselongvecmodelower2mask_nameround_saeonly_name + [(set (match_operand:sselongvecmode 0 register_operand =v) + (any_fix:sselongvecmode + (match_operand:VF1_128_256VL 1 round_saeonly_nimm_predicate round_saeonly_constraint)))] + TARGET_AVX512DQ round_saeonly_modev8sf_condition + vcvttps2fixsuffixqq\t{round_saeonly_mask_op2%1, %0mask_operand2|%0mask_operand2, %1round_saeonly_mask_op2} + [(set_attr type ssecvt) + (set_attr prefix evex) + (set_attr mode sseintvecmode3)]) + +(define_insn fixsuffixfix_truncv2sfv2di2mask_name + [(set (match_operand:V2DI 0 register_operand =v) + (any_fix:V2DI + (vec_select:V2SF + (match_operand:V4SF 1 nonimmediate_operand vm) + (parallel [(const_int 0) (const_int 1)]] + TARGET_AVX512DQ TARGET_AVX512VL + vcvttps2fixsuffixqq\t{%1, %0mask_operand2|%0mask_operand2, %1} + [(set_attr type ssecvt) + (set_attr prefix evex) + (set_attr mode TI)]) + +(define_insn ufix_truncmodesseintvecmodelower2mask_name + [(set (match_operand:sseintvecmode 0 register_operand =v) + (unsigned_fix:sseintvecmode + (match_operand:VF1_128_256VL 1 nonimmediate_operand vm)))] + TARGET_AVX512VL + vcvttps2udq\t{%1, %0mask_operand2|%0mask_operand2, %1} + [(set_attr type ssecvt) + (set_attr prefix evex) + (set_attr mode sseintvecmode2)]) + (define_expand avx_cvttpd2dq256_2 [(set (match_operand:V8SI 0 register_operand) (vec_concat:V8SI @@ -4713,7 +4746,7 @@ (vec_merge:V2DF (float_extend:V2DF (vec_select:V2SF - (match_operand:V4SF 2 nonimmediate_operand x,m,round_saeonly_constraint) + (match_operand:V4SF 2 round_saeonly_nimm_predicate x,m,round_saeonly_constraint) (parallel [(const_int 0) (const_int 1)]))) (match_operand:V2DF 1 register_operand 0,0,v) (const_int 1)))] @@ -4741,14 +4774,14 @@ (set_attr prefix evex) (set_attr mode V8SF)]) -(define_insn avx_cvtpd2ps256 - [(set (match_operand:V4SF 0 register_operand =x) +(define_insn avx_cvtpd2ps256mask_name + [(set (match_operand:V4SF 0 register_operand =v) (float_truncate:V4SF - (match_operand:V4DF 1 nonimmediate_operand xm)))] - TARGET_AVX - vcvtpd2ps{y}\t{%1, %0|%0, %1} + (match_operand:V4DF 1 nonimmediate_operand vm)))] + TARGET_AVX mask_avx512vl_condition + vcvtpd2ps{y}\t{%1, %0mask_operand2|%0mask_operand2, %1} [(set_attr type ssecvt) - (set_attr prefix vex) + (set_attr prefix maybe_evex) (set_attr btver2_decode vector) (set_attr mode V4SF)]) @@ -4761,16 +4794,28 @@ TARGET_SSE2 operands[2] = CONST0_RTX (V2SFmode);) -(define_insn *sse2_cvtpd2ps - [(set (match_operand:V4SF 0 register_operand =x) +(define_expand sse2_cvtpd2ps_mask + [(set (match_operand:V4SF 0 register_operand) + (vec_merge:V4SF + (vec_concat:V4SF + (float_truncate:V2SF + (match_operand:V2DF 1 nonimmediate_operand)) + (match_dup 4)) + (match_operand:V4SF 2
[jit] Add copyright and license headers and footers
On Tue, 2014-09-23 at 23:27 +, Joseph S. Myers wrote: [...] diff --git a/gcc/jit/internal-api.c b/gcc/jit/internal-api.c Should start with standard copyright and license header. This applies to all sources in gcc/jit/. [...] I've committed the following to the dmalcolm/jit branch: ChangeLog.jit: * ChangeLog.jit: Add copyright footer. contrib/ChangeLog.jit: * ChangeLog.jit: Add copyright footer. gcc/ChangeLog.jit: * ChangeLog.jit: Add copyright footer. gcc/java/ChangeLog.jit: * ChangeLog.jit: Add copyright footer. gcc/jit/ChangeLog.jit: * ChangeLog.jit: Add copyright footer. * Make-lang.in: Update copyright. * config-lang.in: Update copyright. * docs/examples/install-hello-world.c: Add copyright header. * docs/examples/tut01-square.c: Likewise. * docs/examples/tut02-sum-of-squares.c: Likewise. * docs/examples/tut03-toyvm/toyvm.c: Likewise. * internal-api.c: Likewise. * internal-api.h: Likewise. * libgccjit++.h: Likewise. * libgccjit.c: Likewise. * libgccjit.h: Likewise. * libgccjit.map: Likewise. gcc/testsuite/ChangeLog.jit: * ChangeLog.jit: Add copyright footer. libbacktrace/ChangeLog.jit: * ChangeLog.jit: Add copyright footer. libcpp/ChangeLog.jit: * ChangeLog.jit: Add copyright footer. libdecnumber/ChangeLog.jit: * ChangeLog.jit: Add copyright footer. libiberty/ChangeLog.jit: * ChangeLog.jit: Add copyright footer. zlib/ChangeLog.jit: * ChangeLog.jit: Add copyright footer. --- ChangeLog.jit| 10 ++ contrib/ChangeLog.jit| 10 ++ gcc/ChangeLog.jit| 10 ++ gcc/java/ChangeLog.jit | 10 ++ gcc/jit/ChangeLog.jit| 22 ++ gcc/jit/Make-lang.in | 2 +- gcc/jit/config-lang.in | 2 +- gcc/jit/docs/examples/install-hello-world.c | 19 +++ gcc/jit/docs/examples/tut01-square.c | 19 +++ gcc/jit/docs/examples/tut02-sum-of-squares.c | 19 +++ gcc/jit/docs/examples/tut03-toyvm/toyvm.c| 19 ++- gcc/jit/internal-api.c | 20 gcc/jit/internal-api.h | 20 gcc/jit/libgccjit++.h| 19 ++- gcc/jit/libgccjit.c | 21 - gcc/jit/libgccjit.h | 22 +++--- gcc/jit/libgccjit.map| 18 ++ gcc/testsuite/ChangeLog.jit | 10 ++ libbacktrace/ChangeLog.jit | 10 ++ libcpp/ChangeLog.jit | 10 ++ libdecnumber/ChangeLog.jit | 10 ++ libiberty/ChangeLog.jit | 10 ++ zlib/ChangeLog.jit | 10 ++ 23 files changed, 314 insertions(+), 8 deletions(-) diff --git a/ChangeLog.jit b/ChangeLog.jit index 5d2db3f..d2c3941 100644 --- a/ChangeLog.jit +++ b/ChangeLog.jit @@ -1,3 +1,7 @@ +2014-09-24 David Malcolm dmalc...@redhat.com + + * ChangeLog.jit: Add copyright footer. + 2014-09-11 David Malcolm dmalc...@redhat.com * MAINTAINERS (Various Maintainers): Add myself as jit maintainer. @@ -6,3 +10,9 @@ * configure.ac: Add --enable-host-shared * configure: Regenerate. + +Copyright (C) 2013-2014 Free Software Foundation, Inc. + +Copying and distribution of this file, with or without modification, +are permitted in any medium without royalty provided the copyright +notice and this notice are preserved. diff --git a/contrib/ChangeLog.jit b/contrib/ChangeLog.jit index 79be84d..38a315a 100644 --- a/contrib/ChangeLog.jit +++ b/contrib/ChangeLog.jit @@ -1,4 +1,14 @@ +2014-09-24 David Malcolm dmalc...@redhat.com + + * ChangeLog.jit: Add copyright footer. + 2014-01-23 David Malcolm dmalc...@redhat.com * jit-coverage-report.py: New file: a script to print crude code-coverage information for the libgccjit API. + +Copyright (C) 2014 Free Software Foundation, Inc. + +Copying and distribution of this file, with or without modification, +are permitted in any medium without royalty provided the copyright +notice and this notice are preserved. diff --git a/gcc/ChangeLog.jit b/gcc/ChangeLog.jit index 9771913..29307b1 100644 --- a/gcc/ChangeLog.jit +++ b/gcc/ChangeLog.jit @@ -1,5 +1,9 @@ 2014-09-24 David Malcolm dmalc...@redhat.com + * ChangeLog.jit: Add copyright footer. + +2014-09-24 David Malcolm dmalc...@redhat.com + * cgraph.h (cgraphbuild_c_finalize): Delete prototype of empty function. (ipa_c_finalize): Likewise. @@ -280,3 +284,9 @@
Re: [PATCH] microblaze: microblaze.md: Use 'SI' instead of 'VOID' for operand 1 of 'call_value_intern'
On 09/24/2014 11:37 PM, Mike Stump wrote: On Sep 24, 2014, at 8:28 AM, Michael Eager ea...@eagerm.com wrote: After check the current result log, I find many remote target test related sentences, do we have to process it? e.g. Download to microblaze-xilinx-gdb failed, couldn't execute rcp: no such file or directory. The test suite uses rcp to transfer files to or from the target, either to provide input to a test case or to check the output. Most Linux systems do not install rcp, since it is a security risk. To clarify: if {[board_info $desthost exists rcp_prog]} { set RCP [board_info $desthost rcp_prog] } else { set RCP rcp } So, if you set rcp_prog to something else, you should be able to avoid rsh if you want. Most people use ssh now-a-days. You will want it set up to not require a password for testing. OK, thank you for your information. For one simple solving way under fedora: yum install rsh, and I will get another issue: Download to microblaze-xilinx-gdb failed, microblaze-xilinx-gdb: Unknown host So I guess the root cause is: I only use cross-compiling environments under fedora x86_64, no any real or virtual target for test. Thanks. -- Chen Gang Open share and attitude like air water and life which God blessed
Re: [PATCH i386 AVX512] [54/n] Add mov[dlh]dup insns support.
On Wed, Sep 24, 2014 at 2:51 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, patch in the bottom introduces support for vmov[dlh]dup insns. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_insn avx_movshdup256mask_name): Add masking. (define_insn sse3_movshdupmask_name): Ditto. (define_insn avx_movsldup256mask_name): Ditto. (define_insn sse3_movsldupmask_name): Ditto. (define_insn vec_dupv2dfmask_name): Ditto. (define_insn *vec_concatv2df): Add EVEX version. OK. Thanks, Uros.
Re: parallel check output changes?
On 09/24/2014 12:10 PM, Segher Boessenkool wrote: On Wed, Sep 24, 2014 at 10:54:57AM -0400, Andrew MacLeod wrote: On 09/23/2014 11:33 AM, Richard Sandiford wrote: Your patch instead sorts based on the full test name, including options, which means that the output no longer matches what you'd get from a non-parallel run. AFAICT, it also no longer matches what you'd get from the .sh version. That might be OK, just thought I'd mention it. With the parallellisation changes the output was pretty random order. My patch made that a fixed order again, albeit a different one from before. Is this suppose to be resolved now? I'm still seeing some issues with a branch cut from mainline from yesterday. This is from the following sequence: check out revision 215511 , build, make -j16 check, make -j16 check, then compare all the .sum files: I don't understand what exactly you did; you have left out some steps I think? What? no.. like what? check out a tree, basic configure and build from scratch (./configure --verbose, make -j16 all) and then run make check twice in a row.. literally make -j16 -i check. nothing in between. so the compiler and toolchain are exactly the same. and different results. same way Ive done it forever. except I am still getting some different results from run to run. target is a normal build-x86_64-unknown-linux-gnu what I'm saying is something still isn't all getting sorted all the time (maybe if a section wasn't split up, it doesn't sort?), or all the patches to fix it aren't in, or there is something else still amok. Notice it isn't options that is the problem this time.. its the trailing line number of the test case warning. One is in numerical order, the other is in alphabetical order. Im running it a third time now.. we'll see if its different than both the others or not. Andrew
Re: [PATCH i386 AVX512] [53/n] Update vec_setmode_0 pattern constraints.
On Wed, Sep 24, 2014 at 2:48 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, Patch in the bottom extends to EVEX constraints of vec_setmode_0 insn pattern. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_insn vec_setmode_0): Add EVEX version. OK. Thanks, Uros.
Re: [patch] libstdc++/29988 Rb_Tree reuse allocated nodes
On 23/09/14 21:58 +0200, François Dumont wrote: On 23/09/2014 13:22, Jonathan Wakely wrote: On 22/09/14 23:51 +0200, François Dumont wrote: New patch in a couple of day then. OK, thanks. It was faster than I though, here is the fixed patch tested under Linux x86_64. [snip] Ok to commit ? Yes, it looks good - thanks! You can close the PR after you commit it (and set Target Milestone to 5.0).
Re: [PATCH] PR63300 'const volatile' sometimes stripped in debug info.
Hi Andreas, On Wed, 2014-09-24 at 14:40 +0200, Andreas Arnez wrote: I changed the patch a bit further, to reduce unnecessary iterations and recursions, and tested it again. Thanks for adding the tests and the testing. I think in general it is a nicer and cleaner fix than I did. I do have a question about the removal of the recursion of modified_type_die while stripping/adding qualifiers though: + /* Determine a lesser qualified type that most closely matches + this one. Then generate DW_TAG_* entries for the remaining + qualifiers. */ + sub_quals = get_nearest_type_subqualifiers (type, cv_quals, + cv_qual_mask); + mod_type_die = modified_type_die (type, sub_quals, context_die); + + for (i = 0; i sizeof (qual_info) / sizeof (qual_info[0]); i++) + if (qual_info[i].q cv_quals ~sub_quals) + { + dw_die_ref d = new_die (qual_info[i].t, mod_scope, type); + if (mod_type_die) + add_AT_die_ref (d, DW_AT_type, mod_type_die); + mod_type_die = d; + } Are you sure this is completely equivalent to the previous code that recursed into modified_type_die again for each qualifier added? At the top of modified_type_die we check whether there is already a qualified type and if there is then we try to get the DIE for that one with lookup_type_die. If there is no such DIE yet, then at the end of modified_type_die we associate that type with the DIE with a call to equate_type_number_to_die. In your patch we skip that association in case we need to add more than one qualifier. Is it guaranteed that for these in between qualified type DIES there is no associated real type that get_qualified_type would have been able to find? O. Yes, of course that is guaranteed. If there was such a type then get_nearest_type_subqualifiers would have returned it. Doh. OK. Now do I delete this whole email? Or will I just say: Looks good to me after thinking a bit about it. :) Thanks, Mark
Re: [PATCH i386 AVX512] [55/n] Extend `perm' insn patterns.
On Wed, Sep 24, 2014 at 2:53 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hello, Patch in the bottom extends `perm' insn patterns. Bootstrapped. AVX-512* tests on top of patch-set all pass under simulator. Is it ok for trunk? gcc/ * config/i386/sse.md (define_expand avx2_avx512f_permmode): Rename to ... (define_expand avx2_avx512bw_permmode): this. This is not consistent with the patch below. You are renaming to avx2_avx512... Also, please use ellipsis before this. (define_expand avx512_permmode_mask): Add 128/256-bit wide version. Mention also the rename, and Use VI8F_256_512 mode iterator. (define_insn avx2_avx512f_permmode_1mask_name): Rename to ... (define_insn avx2_avx512bw_permmode_1mask_name): this. Ellipsis before this. OK with updated ChangeLog. Thanks, Uros.
Re: [PATCH 1/4] [AARCH64,NEON] Add patterns + builtins for vld[234](q?)_lane_* intrinsics
Kyril, Tejas, Thanks for the review. I agree with all points and will respin v2 accordingly Charles
[jit] Use standard initial includes
On Tue, 2014-09-23 at 23:27 +, Joseph S. Myers wrote: [...] +#include config.h +#include system.h +#include ansidecl.h +#include coretypes.h The standard initial includes are config.h, system.h, coretypes.h. system.h includes libiberty.h which includes ansidecl.h, so direct ansidecl.h includes shouldn't be needed anywhere. [...] I've committed the following fix for the above to branch dmalcolm/jit: gcc/jit/ChangeLog.jit: * dummy-frontend.c: Update copyright year. Follow standard for initial includes by removing redundant include of ansidecl.h. * internal-api.c: Follow standard for initial includes by removing redundant include of ansidecl.h. * jit-builtins.c: Likewise. * libgccjit.c: Likewise. --- gcc/jit/ChangeLog.jit| 9 + gcc/jit/dummy-frontend.c | 3 +-- gcc/jit/internal-api.c | 1 - gcc/jit/jit-builtins.c | 1 - gcc/jit/libgccjit.c | 1 - 5 files changed, 10 insertions(+), 5 deletions(-) diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index f451771..4ddd3cb 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,5 +1,14 @@ 2014-09-24 David Malcolm dmalc...@redhat.com + * dummy-frontend.c: Update copyright year. Follow standard for + initial includes by removing redundant include of ansidecl.h. + * internal-api.c: Follow standard for initial includes by removing + redundant include of ansidecl.h. + * jit-builtins.c: Likewise. + * libgccjit.c: Likewise. + +2014-09-24 David Malcolm dmalc...@redhat.com + * ChangeLog.jit: Add copyright footer. * Make-lang.in: Update copyright. * config-lang.in: Update copyright. diff --git a/gcc/jit/dummy-frontend.c b/gcc/jit/dummy-frontend.c index 1b96c91..1d178f9 100644 --- a/gcc/jit/dummy-frontend.c +++ b/gcc/jit/dummy-frontend.c @@ -1,5 +1,5 @@ /* jit.c -- Dummy frontend for use during JIT-compilation. - Copyright (C) 2013 Free Software Foundation, Inc. + Copyright (C) 2013-2014 Free Software Foundation, Inc. This file is part of GCC. @@ -19,7 +19,6 @@ along with GCC; see the file COPYING3. If not see #include config.h #include system.h -#include ansidecl.h #include coretypes.h #include opts.h #include signop.h diff --git a/gcc/jit/internal-api.c b/gcc/jit/internal-api.c index 9e59d92..76ada70 100644 --- a/gcc/jit/internal-api.c +++ b/gcc/jit/internal-api.c @@ -20,7 +20,6 @@ along with GCC; see the file COPYING3. If not see #include config.h #include system.h -#include ansidecl.h #include coretypes.h #include opts.h #include tree.h diff --git a/gcc/jit/jit-builtins.c b/gcc/jit/jit-builtins.c index 160ef20..c4b0f59 100644 --- a/gcc/jit/jit-builtins.c +++ b/gcc/jit/jit-builtins.c @@ -19,7 +19,6 @@ along with GCC; see the file COPYING3. If not see #include config.h #include system.h -#include ansidecl.h #include coretypes.h #include opts.h #include tree.h diff --git a/gcc/jit/libgccjit.c b/gcc/jit/libgccjit.c index 510ed86..cb8321c 100644 --- a/gcc/jit/libgccjit.c +++ b/gcc/jit/libgccjit.c @@ -20,7 +20,6 @@ along with GCC; see the file COPYING3. If not see #include config.h #include system.h -#include ansidecl.h #include coretypes.h #include opts.h -- 1.7.11.7
Re: [PATCH] microblaze: microblaze.md: Use 'SI' instead of 'VOID' for operand 1 of 'call_value_intern'
On Sep 24, 2014, at 9:23 AM, Chen Gang gang.chen.5...@gmail.com wrote: For one simple solving way under fedora: yum install rsh, and I will get another issue: Download to microblaze-xilinx-gdb failed, microblaze-xilinx-gdb: Unknown host So I guess the root cause is: I only use cross-compiling environments under fedora x86_64, no any real or virtual target for test. Yes, if you want to test on a target, you will need a target. You can either have a simulator (see binutils and sim/* for an example of how to write one) or target hardware in some form.