Re: [Ping] [Patch C++] RFC Fix PR57958
Hello, Here is another version of fix. This time, I added complete_type_or_else call just before aggregate_value_p. Bootstraped and tested on x86_64-pc-linux-gnu with no new regressions. Ok to apply to trunk? Thanks, Dinar. On Mon, Apr 7, 2014 at 11:47 PM, Jason Merrill ja...@redhat.com wrote: On 04/07/2014 03:46 PM, Jason Merrill wrote: I guess we need to call complete_type before aggregate_value_p. complete_type_or_else, actually. Jason 2014-04-18 Dinar Temirbulatov dtemirbula...@gmail.com PR c++/57958 * semantics.c (apply_deduced_return_type): Complete non-void type before estimating whether the type is aggregate. fix.patch Description: Binary data
Re: [Ping] [Patch C++] RFC Fix PR57958
I found typo in the testcase header, fixed. Ok to apply to trunk? thanks, Dinar. On Fri, Apr 18, 2014 at 1:13 PM, Dinar Temirbulatov dtemirbula...@gmail.com wrote: Hello, Here is another version of fix. This time, I added complete_type_or_else call just before aggregate_value_p. Bootstraped and tested on x86_64-pc-linux-gnu with no new regressions. Ok to apply to trunk? Thanks, Dinar. On Mon, Apr 7, 2014 at 11:47 PM, Jason Merrill ja...@redhat.com wrote: On 04/07/2014 03:46 PM, Jason Merrill wrote: I guess we need to call complete_type before aggregate_value_p. complete_type_or_else, actually. Jason 2014-04-18 Dinar Temirbulatov dtemirbula...@gmail.com PR c++/57958 * semantics.c (apply_deduced_return_type): Complete non-void type before estimating whether the type is aggregate. fix.patch Description: Binary data
[Ping] [Patch C++] RFC Fix PR57958
Hi, I revised the patch from http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00083.html. This change fixes PR57958 (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57958) For this code: auto fn = [] (const FooData x) { return (x); }; { FooData a; fn(a); } Current version of trunk generates following gimple: main(int, char**)::lambda(const FooData) (const struct __lambda0 * const __closure, const struct Foo x) { struct Foo D.2089; FooData::Foo (D.2089, x); try { return retval; } finally { FooData::~Foo (D.2089); D.2089 = {CLOBBER}; } } And after fix applied gimple product looks like this: main(int, char**)::lambda(const FooData) (const struct __lambda0 * const __closure, const struct Foo x) { FooData::Foo (retval, x); return retval; } looking at the code that I think caused the issue: cp/typeck.c:8600 /* We can't initialize a register from a AGGR_INIT_EXPR. */ else if (! cfun-returns_struct TREE_CODE (retval) == TARGET_EXPR TREE_CODE (TREE_OPERAND (retval, 1)) == AGGR_INIT_EXPR) retval = build2 (COMPOUND_EXPR, TREE_TYPE (retval), retval, TREE_OPERAND (retval, 0)); It is not clear to me the propose of constructing COMPOUND_EXPR here, comment says: We can't initialize a register from a AGGR_INIT_EXPR., but how on the front-end side we could determine what is going to end up in a register and what is not? The code looks like a hack for some middle-end issue awhile back? I hoped to trigger any code generation problem by removing the code and testing regression testsuite on x86_64-pc-linux-gnu, but no regression have occured. Attached patch was tested on x86_64-pc-linux-gnu with no new regressions. Ok to apply to trunk? Thanks in advance, Dinar. PR57958.patch Description: Binary data
[Patch C++] PR57958 RFC
Hi, Following change fixes gimple production for lambda function, in the patch I assumed that constructing COMPOUND_EXPR for the return value of auto type function resoluted to CLASS_TYPE_P is wrong. Tested x86_64-pc-linux-gnu by applying to trunk with no new regressions. Thanks, Dinar. fix1.patch Description: Binary data
[Ping] [Google] Fix profiledbootstrap failure
Ping? Hi, Here is the patch, Tested by profiledbootstrap. Ok for google gcc-4.8? thanks, Dinar. profiledbootstrap-fix1.patch Description: Binary data
Re: [Google] Fix profiledbootstrap failure
Hi, Here is the patch, Tested by profiledbootstrap. Ok for google gcc-4.8? thanks, Dinar. On Wed, Jul 31, 2013 at 12:01 AM, Rong Xu x...@google.com wrote: Will do. The patch was in gcc-4_7 by Dehao. r194713 | dehao | 2012-12-24 16:49:06 -0800 (Mon, 24 Dec 2012) | 5 lines Fix the profiled bootstrap: 1. Set the default value of gcov-debug to be 0. 2. Merge profile summaries from different instrumented binaries. On Tue, Jul 30, 2013 at 12:58 PM, Xinliang David Li davi...@google.com wrote: Ok. Rong, can you help commit the parameter default setting patch? thanks, David On Tue, Jul 30, 2013 at 12:48 PM, Rong Xu x...@google.com wrote: We have seen the issue before. It does fail the profile boostrap as it reads a wrong gcda file. I thought it had been fixed. (The fix was as David mentioned, setting the default value of the parameter to 0). -Rong On Tue, Jul 30, 2013 at 12:02 PM, Xinliang David Li davi...@google.com wrote: I need to understand why this affects profile bootstrap -- is this due to file name conflict? The fix is wrong -- please do not remove the parameter. If it is a problem, a better fix is to change the default parameter value to 0. David On Tue, Jul 30, 2013 at 11:56 AM, Teresa Johnson tejohn...@google.com wrote: cc'ing Rong and David since this came from LIPO support. The patch as-is removes the one use of PARAM_GCOV_DEBUG (which is on by default) without removing the parameter itself. What is the failure mode you see from this code? Thanks, Teresa On Tue, Jul 30, 2013 at 11:50 AM, Dinar Temirbulatov di...@kugelworks.com wrote: Hello This change allows to complete profiledbootstrap on the google gcc-4.8 branch, tested with make bootstrap with no new regressions. OK for google 4.8? thanks, Dinar. -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 profiledbootstrap-fix1.patch Description: Binary data
[Google] Fix profiledbootstrap failure
Hello This change allows to complete profiledbootstrap on the google gcc-4.8 branch, tested with make bootstrap with no new regressions. OK for google 4.8? thanks, Dinar. profiledbootstrap-fix.patch Description: Binary data
Re: [Google] Fix profiledbootstrap failure
I need to understand why this affects profile bootstrap -- is this due to file name conflict? Yes, It is simple. During the profiledbootstrap on x86_64 platform for libiberty the compiler picks an incorrect profile from the current directory(non-pic version), while compiling pic version, and profiledbootstrap fails. Here is log: if [ x-fpic != x ]; then \ /home/dinar/bugz/x86_64/build-gcc-4_8-svn-orig/./prev-gcc/xgcc -B/home/dinar/bugz/x86_64/bisect/build-gcc-4_8-svn-orig/./prev-gcc/ -B/tmp/x86_64-unknown-linux-gnu/bin/ -B/tmp/x86_64-unknown-linux-gnu/bin/ -B/tmp/x86_64-unknown-linux-gnu/lib/ -isystem /tmp/x86_64-unknown-linux-gnu/include -isystem /tmp/x86_64-unknown-linux-gnu/sys-include-c -DHAVE_CONFIG_H -g -O2 -fprofile-use -I. -I../../gcc-4_8-svn-orig/libiberty/../include -W -Wall -Wwrite-strings -Wc++-compat -Wstrict-prototypes -pedantic -fpic ../../gcc-4_8-svn-orig/libiberty/dwarfnames.c -o pic/dwarfnames.o; \ else true; fi yes ../../gcc-4_8-svn-orig/libiberty/dwarfnames.c:75:0: error: the control flow of function ‘get_DW_CFA_name’ does not match its profile data (counter ‘arcs’) [-Werror=coverage-mismatch] ^ ../../gcc-4_8-svn-orig/libiberty/dwarfnames.c:75:0: note: use -Wno-error=coverage-mismatch to tolerate the mismatch but performance may drop if the function is hot ../../gcc-4_8-svn-orig/libiberty/dwarfnames.c: In function ‘get_DW_ATE_name’: ../../gcc-4_8-svn-orig/libiberty/dwarfnames.c:75:0: error: the control flow of function ‘get_DW_ATE_name’ does not match its profile data (counter ‘arcs’) [-Werror=coverage-mismatch] ../../gcc-4_8-svn-orig/libiberty/dwarfnames.c:75:0: note: use -Wno-error=coverage-mismatch to tolerate the mismatch but performance may drop if the function is hot ../../gcc-4_8-svn-orig/libiberty/dwarfnames.c: In function ‘get_DW_OP_name’: ../../gcc-4_8-svn-orig/libiberty/dwarfnames.c:75:0: error: the control flow of function ‘get_DW_OP_name’ does not match its profile data (counter ‘arcs’) [-Werror=coverage-mismatch] ../../gcc-4_8-svn-orig/libiberty/dwarfnames.c:75:0: note: use -Wno-error=coverage-mismatch to tolerate the mismatch but performance may drop if the function is hot thanks, Dinar. On Tue, Jul 30, 2013 at 11:02 PM, Xinliang David Li davi...@google.com wrote: I need to understand why this affects profile bootstrap -- is this due to file name conflict? The fix is wrong -- please do not remove the parameter. If it is a problem, a better fix is to change the default parameter value to 0. David On Tue, Jul 30, 2013 at 11:56 AM, Teresa Johnson tejohn...@google.com wrote: cc'ing Rong and David since this came from LIPO support. The patch as-is removes the one use of PARAM_GCOV_DEBUG (which is on by default) without removing the parameter itself. What is the failure mode you see from this code? Thanks, Teresa On Tue, Jul 30, 2013 at 11:50 AM, Dinar Temirbulatov di...@kugelworks.com wrote: Hello This change allows to complete profiledbootstrap on the google gcc-4.8 branch, tested with make bootstrap with no new regressions. OK for google 4.8? thanks, Dinar. -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: Fix PR57268
I'd guess you don't want to flush on DEBUG_INSN_Ps, because then you'd flush differently between -g and -g0. So perhaps something like: yes, If I skipped to flush all DEBUG_INSN_Ps, then dependence lists are the same under -g0 and -g. I bootstrapped Jakub's change on x86_64-linux with no errors. thanks, Dinar. On Sat, Jun 1, 2013 at 4:55 PM, Jakub Jelinek ja...@redhat.com wrote: On Sat, Jun 01, 2013 at 10:11:24AM +0200, Jakub Jelinek wrote: On Sat, Jun 01, 2013 at 08:39:58AM +0400, Dinar Temirbulatov wrote: I am investigating the problem. I'd guess you don't want to flush on DEBUG_INSN_Ps, because then you'd flush differently between -g and -g0. So perhaps something like: Now bootstrapped/regtested on x86_64-linux and i686-linux. I see you've already reverted in the mean time, so ok for trunk this way? 2013-06-01 Jakub Jelinek ja...@redhat.com PR rtl-optimization/57268 * sched-deps.c (sched_analyze_2): Don't flush_pedning_lists if DEBUG_INSN_P (insn). Reapply 2013-05-31 Dinar Temirbulatov di...@kugelworks.com PR rtl-optimization/57268 * sched-deps.c (sched_analyze_2): Flush dependence lists if the sum of the read and write lists exceeds MAX_PENDING_LIST_LENGTH. --- gcc/sched-deps.c(revision 199576) +++ gcc/sched-deps.c(revision 199575) @@ -2690,8 +2690,15 @@ /* Always add these dependencies to pending_reads, since this insn may be followed by a write. */ -if (!deps-readonly) - add_insn_mem_dependence (deps, true, insn, x); + if (!deps-readonly) + { + if ((deps-pending_read_list_length ++ deps-pending_write_list_length) +MAX_PENDING_LIST_LENGTH +!DEBUG_INSN_P (insn)) + flush_pending_lists (deps, insn, true, true); + add_insn_mem_dependence (deps, true, insn, x); + } sched_analyze_2 (deps, XEXP (x, 0), insn); Jakub
Updated MAINTAINERS
Hi, I updated MAINTAINERS list with following change(patch attached). thanks, Dinar. MAINTAINERS.patch Description: Binary data
Re: Fix PR57268
Hi, oh, This is my mistake I should have bootstrap the compiler. I am investigating the problem. thanks, Dinar. On Sat, Jun 1, 2013 at 7:50 AM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, May 29, 2013 at 9:36 AM, Dinar Temirbulatov di...@kugelworks.com wrote: Here is the corrected version of change. Also, I think, I need write-after-approval access to commit the change. thanks, Dinar, PR rtl-optimization/57268 * sched-deps.c (sched_analyze_2): Flush dependence lists if the sum of the read and write lists exceeds MAX_PENDING_LIST_LENGTH. On Wed, May 29, 2013 at 6:49 PM, Jeff Law l...@redhat.com wrote: On 05/29/2013 06:52 AM, Steven Bosscher wrote: Hello, On Wed, May 29, 2013 at 2:01 PM, Dinar Temirbulatov wrote: Hi, I noticed that the scheduler created long dependence list about ~9000 elements long and slowed compilation time for about an hour. Attached patch flushes the dependence list is case of it is longer than MAX_PENDING_LIST_LENGTH. Tested with gcc testsite on x86_64-linux-gnu with c and c++ enabled. Ok for trunk? thanks, Dinar. 2013-05-28 Dinar Temirbulatov dinar at kugelworks dot com PR rtl-optimization/57268 * sched-deps.c (sched_analyze_2): Flush dependence list then it is longer than MAX_PENDING_LIST_LENGTH. * sched-deps.c (sched_analyze_2): Flush dependence lists if the sum of the read and write lists exceeds MAX_PENDING_LIST_LENGTH. if (!deps-readonly) - add_insn_mem_dependence (deps, true, insn, x); + { + if ((deps-pending_read_list_length + deps-pending_write_list_length) +MAX_PENDING_LIST_LENGTH) + flush_pending_lists (deps, insn, true, true); +add_insn_mem_dependence (deps, true, insn, x); +} The flush_pending_lists, add_insn_mem_dependence and } lines are not indented correctly. The if (...+...) line is too long (max. 80 characters per line). The GCC style would be if (!deps-readonly) { if ((deps-pending_read_list_length + deps-pending_write_list_length) MAX_PENDING_LIST_LENGTH) flush_pending_lists (deps, insn, true, true); add_insn_mem_dependence (deps, true, insn, x); } (The aesthetics of GCC code style is a matter for debate, but not here and now ;-) And just to be clear, with Steven's suggested changes, this patch is OK. jeff This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57494 -- H.J.
Fix PR57268
Hi, I noticed that the scheduler created long dependence list about ~9000 elements long and slowed compilation time for about an hour. Attached patch flushes the dependence list is case of it is longer than MAX_PENDING_LIST_LENGTH. Tested with gcc testsite on x86_64-linux-gnu with c and c++ enabled. Ok for trunk? thanks, Dinar. 2013-05-28 Dinar Temirbulatov di...@kugelworks.com PR rtl-optimization/57268 * sched-deps.c (sched_analyze_2): Flush dependence list then it is longer than MAX_PENDING_LIST_LENGTH. fix.patch Description: Binary data
Re: Fix PR57268
* sched-deps.c (sched_analyze_2): Flush dependence lists if the sum of the read and write lists exceeds MAX_PENDING_LIST_LENGTH. On Wed, May 29, 2013 at 6:49 PM, Jeff Law l...@redhat.com wrote: On 05/29/2013 06:52 AM, Steven Bosscher wrote: Hello, On Wed, May 29, 2013 at 2:01 PM, Dinar Temirbulatov wrote: Hi, I noticed that the scheduler created long dependence list about ~9000 elements long and slowed compilation time for about an hour. Attached patch flushes the dependence list is case of it is longer than MAX_PENDING_LIST_LENGTH. Tested with gcc testsite on x86_64-linux-gnu with c and c++ enabled. Ok for trunk? thanks, Dinar. 2013-05-28 Dinar Temirbulatov dinar at kugelworks dot com PR rtl-optimization/57268 * sched-deps.c (sched_analyze_2): Flush dependence list then it is longer than MAX_PENDING_LIST_LENGTH. * sched-deps.c (sched_analyze_2): Flush dependence lists if the sum of the read and write lists exceeds MAX_PENDING_LIST_LENGTH. if (!deps-readonly) - add_insn_mem_dependence (deps, true, insn, x); + { + if ((deps-pending_read_list_length + deps-pending_write_list_length) +MAX_PENDING_LIST_LENGTH) + flush_pending_lists (deps, insn, true, true); +add_insn_mem_dependence (deps, true, insn, x); +} The flush_pending_lists, add_insn_mem_dependence and } lines are not indented correctly. The if (...+...) line is too long (max. 80 characters per line). The GCC style would be if (!deps-readonly) { if ((deps-pending_read_list_length + deps-pending_write_list_length) MAX_PENDING_LIST_LENGTH) flush_pending_lists (deps, insn, true, true); add_insn_mem_dependence (deps, true, insn, x); } (The aesthetics of GCC code style is a matter for debate, but not here and now ;-) And just to be clear, with Steven's suggested changes, this patch is OK. jeff fix1.patch Description: Binary data
Re: Fix PR57268
Here is the corrected version of change. Also, I think, I need write-after-approval access to commit the change. thanks, Dinar, PR rtl-optimization/57268 * sched-deps.c (sched_analyze_2): Flush dependence lists if the sum of the read and write lists exceeds MAX_PENDING_LIST_LENGTH. On Wed, May 29, 2013 at 6:49 PM, Jeff Law l...@redhat.com wrote: On 05/29/2013 06:52 AM, Steven Bosscher wrote: Hello, On Wed, May 29, 2013 at 2:01 PM, Dinar Temirbulatov wrote: Hi, I noticed that the scheduler created long dependence list about ~9000 elements long and slowed compilation time for about an hour. Attached patch flushes the dependence list is case of it is longer than MAX_PENDING_LIST_LENGTH. Tested with gcc testsite on x86_64-linux-gnu with c and c++ enabled. Ok for trunk? thanks, Dinar. 2013-05-28 Dinar Temirbulatov dinar at kugelworks dot com PR rtl-optimization/57268 * sched-deps.c (sched_analyze_2): Flush dependence list then it is longer than MAX_PENDING_LIST_LENGTH. * sched-deps.c (sched_analyze_2): Flush dependence lists if the sum of the read and write lists exceeds MAX_PENDING_LIST_LENGTH. if (!deps-readonly) - add_insn_mem_dependence (deps, true, insn, x); + { + if ((deps-pending_read_list_length + deps-pending_write_list_length) +MAX_PENDING_LIST_LENGTH) + flush_pending_lists (deps, insn, true, true); +add_insn_mem_dependence (deps, true, insn, x); +} The flush_pending_lists, add_insn_mem_dependence and } lines are not indented correctly. The if (...+...) line is too long (max. 80 characters per line). The GCC style would be if (!deps-readonly) { if ((deps-pending_read_list_length + deps-pending_write_list_length) MAX_PENDING_LIST_LENGTH) flush_pending_lists (deps, insn, true, true); add_insn_mem_dependence (deps, true, insn, x); } (The aesthetics of GCC code style is a matter for debate, but not here and now ;-) And just to be clear, with Steven's suggested changes, this patch is OK. jeff
RFC [Patch, arm] arm_reload_out_hi with strh instruction
Hi, During investigation of some defect, I found that for HImode reload pass generated three instruction with QImode but for modern arm machines that could be done with just one instruction. Attached patch adds that capability. thanks, Dinar. reload_out_hi.patch Description: Binary data
[ARM] fix for PR49423
Hi Ramana, Here is obvious fix for PR49423, I just added pool range for arm_zero_extendqisi2, arm_zero_extendqisi2_v6, arm_zero_extendhisi2, arm_zero_extendhisi2_v6 patterns. thanks, Dinar. PR49423.patch Description: Binary data
Re: divide 64-bit by constant for 32-bit target machines
Hi, Richard, How about if I add and utilize umul_highpart_di to the libgcc instead of expanding multiplication for the high part directly, or add my own function with with pre-shift, post-shift, and 64-bit constant and 64-bit operand as function parameters for division for less than -O3? thanks, Dinar. On Fri, Jun 15, 2012 at 12:12 PM, Richard Earnshaw rearn...@arm.com wrote: On 14/06/12 19:46, Dinar Temirbulatov wrote: Hi, OK for trunk? thanks, Dinar. I'm still not comfortable about the code bloat that this is likely to incurr at -O2. R. On Tue, Jun 12, 2012 at 11:00 AM, Paolo Bonzini bonz...@gnu.org wrote: Il 12/06/2012 08:52, Dinar Temirbulatov ha scritto: is safe? That is, that the underflows cannot produce a wrong result? [snip] Thanks very much! Paolo= ChangeLog.txt 2012-06-14 Dinar Temirbulatov dtemirbula...@gmail.com Alexey Kravets mr.kayr...@gmail.com Paolo Bonzini bonz...@gnu.org * config/arm/arm.c (arm_rtx_costs_1): Add cost estimate for the integer double-word division operation. * config/mips/mips.c (mips_rtx_costs): Extend cost estimate for the integer double-word division operation for 32-bit targets. * gcc/expmed.c (expand_mult_highpart_optab): Allow to generate the higher multipilcation product for unsigned double-word integers using 32-bit wide registers. 30.patch N ¬n‡r¥ªíÂ)emçhÂyhi× ¢w^™©Ý
Re: divide 64-bit by constant for 32-bit target machines
Hi, OK for trunk? thanks, Dinar. On Tue, Jun 12, 2012 at 11:00 AM, Paolo Bonzini bonz...@gnu.org wrote: Il 12/06/2012 08:52, Dinar Temirbulatov ha scritto: is safe? That is, that the underflows cannot produce a wrong result? [snip] Thanks very much! Paolo 2012-06-14 Dinar Temirbulatov dtemirbula...@gmail.com Alexey Kravets mr.kayr...@gmail.com Paolo Bonzini bonz...@gnu.org * config/arm/arm.c (arm_rtx_costs_1): Add cost estimate for the integer double-word division operation. * config/mips/mips.c (mips_rtx_costs): Extend cost estimate for the integer double-word division operation for 32-bit targets. * gcc/expmed.c (expand_mult_highpart_optab): Allow to generate the higher multipilcation product for unsigned double-word integers using 32-bit wide registers. 30.patch Description: Binary data
Re: divide 64-bit by constant for 32-bit target machines
Hi, Paolo. Here is the new version of patch. I have tested this version with gcc testsuite only on i686 without new regressions, for now. Mips and arm tests are in progress. One strange thing I noticed: No need for this gen_reg_rtx, either, by passing a NULL_RTX target below. + carry_result = expand_shift (LSHIFT_EXPR, mode, carry, BITS_PER_WORD, carry_result, 1); + + /* Adding 0x1 as carry here if required. */ Oops, a remnant of 32-bit specific code. that I have to add convert_to_mode () to DImode after emit_store_flag_force (), since emit_store_flag_force () returns carry in SImode and without convert_to_mode () call compiler fails with this error: Breakpoint 2, simplify_subreg (outermode=SImode, op=0x756cdf20, innermode=DImode, byte=0) at ../../gcc-20120418-1/gcc/simplify-rtx.c:5423 5423 gcc_assert (GET_MODE (op) == innermode (gdb) bt #0 simplify_subreg (outermode=SImode, op=0x756cdf20, innermode=DImode, byte=0) at ../../gcc-20120418-1/gcc/simplify-rtx.c:5423 #1 0x00aea223 in simplify_gen_subreg (outermode=SImode, op=0x756cdf20, innermode=DImode, byte=0) at ../../gcc-20120418-1/gcc/simplify-rtx.c:5763 #2 0x00733c99 in operand_subword (op=0x756cdf20, offset=0, validate_address=1, mode=DImode) at ../../gcc-20120418-1/gcc/emit-rtl.c:1427 #3 0x00733cc6 in operand_subword_force (op=0x756cdf20, offset=0, mode=DImode) at ../../gcc-20120418-1/gcc/emit-rtl.c:1440 #4 0x00a016b3 in expand_binop (mode=DImode, binoptab=0x195f580, op0=0x756cdf20, op1=0x7583d670, target=0x756cdfa0, unsignedp=1, methods=OPTAB_DIRECT) at ../../gcc-20120418-1/gcc/optabs.c:1779 #5 0x007525af in expand_shift_1 (code=LSHIFT_EXPR, mode=DImode, shifted=0x756cdf20, amount=0x7583d670, target=0x0, unsignedp=1) at ../../gcc-20120418-1/gcc/expmed.c:2273 #6 0x007526b6 in expand_shift (code=LSHIFT_EXPR, mode=DImode, shifted=0x756cdf20, amount=32, target=0x0, unsignedp=1) at ../../gcc-20120418-1/gcc/expmed.c:2318 #7 0x007563e6 in expand_mult_highpart_optab (mode=DImode, op0=0x756cdcc0, op1=0x756b1e00, target=0x0, unsignedp=1, max_cost=188) at ../../gcc-20120418-1/gcc/expmed.c:3581 #8 0x00756747 in expand_mult_highpart (mode=DImode, op0=0x756cdcc0, op1=0x756b1e00, target=0x0, unsignedp=1, max_cost=188) at ../../gcc-20120418-1/gcc/expmed.c:3654 thanks, Dinar. 30.patch Description: Binary data
Re: divide 64-bit by constant for 32-bit target machines
Hi, Here is new version of patch based up on Paolo review, again tested on arm-7l, mips-32r2 (74k), i686 without new regressions. thanks, Dinar. On Sat, May 26, 2012 at 4:45 PM, Paolo Bonzini bonz...@gnu.org wrote: Il 26/05/2012 14:35, Paolo Bonzini ha scritto: /* We have to return z2 + ((u0 + u1) GET_MODE_BITSIZE (word_mode)). u0 + u1 are the upper two words of the three-word intermediate result and they could have up to 2 * GET_MODE_BITSIZE (word_mode) + 1 bits of precision. We compute the extra bit by checking for carry, and add 1 GET_MODE_BITSIZE (word_mode) to z2 if there is carry. */ Oops, GET_MODE_BITSIZE (word_mode) is more concisely BITS_PER_WORD. + tmp = expand_binop (mode, add_optab, u0, u1, tmp, 1, OPTAB_LIB_WIDEN); + if (!tmp) + return 0; /* We have to return z2 + (tmp 32). We need + /* Checking for overflow. */ This is not overflow, it's carry (see above). + c = gen_reg_rtx (mode); + c1 = gen_reg_rtx (mode); + cres = gen_reg_rtx (mode); + + emit_store_flag_force (c, GT, u0, tmp, mode, 1, 1); + emit_store_flag_force (c1, GT, u1, tmp, mode, 1, 1); + result = expand_binop (mode, ior_optab, c, c1, cres, 1, OPTAB_LIB_WIDEN); + if (!result) + return 0; + + ccst = gen_reg_rtx (mode); + ccst = expand_shift (LSHIFT_EXPR, mode, cres, 32, ccst, 1); This 32 should be GET_MODE_BITSIZE (word_mode). Here, too. Paolo 2012-06-07 Dinar Temirbulatov dtemirbula...@gmail.com Alexey Kravets mr.kayr...@gmail.com * config/arm/arm.c (arm_rtx_costs_1): Add cost estimate for the integer double-word division operation. * config/mips/mips.c (mips_rtx_costs): Extend cost estimate for the integer double-word division operation for 32-bit targets. * gcc/expmed.c (expand_mult_highpart_optab): Allow to generate the higher multipilcation product for unsigned double-word integers using 32-bit wide registers. 28.patch Description: Binary data
Re: divide 64-bit by constant for 32-bit target machines
Hi, I have replaced expand_mult to expand_widening_mult and removed all direct references to DImode, SImode modes in the expand_mult_highpart_optab funtion. The attached patch was tested on arm-7l, mips-32r2 (74k), i686 without new regressions. Richard, do you think it is ready now? thanks, Dinar. On Tue, May 22, 2012 at 7:45 PM, Richard Henderson r...@redhat.com wrote: On 05/22/12 07:05, Dinar Temirbulatov wrote: + if ((size - 1 BITS_PER_WORD + BITS_PER_WORD == 32 mode == DImode) Do note that this patch will not go in with hard-coded SImode and DImode references. Which, really, you do not even need. GET_MODE_BITSIZE (mode) == 2*BITS_PER_WORD is what you wanted for testing for double-word-ness, and word_mode is a good substitute for SImode here. + /* Splitting the 64-bit constant for the higher and the lower parts. */ + y0 = gen_rtx_CONST_INT (DImode, dUINT32_MAX); + y1 = gen_rtx_CONST_INT (DImode, d32); Use gen_int_mode. + x1 = convert_to_mode (DImode, x1, 1); + x0 = convert_to_mode (DImode, x0, 1); + + /* Splitting the 64-bit constant for the higher and the lower parts. */ + y0 = gen_rtx_CONST_INT (DImode, dUINT32_MAX); + y1 = gen_rtx_CONST_INT (DImode, d32); + + z2 = gen_reg_rtx (DImode); + u0 = gen_reg_rtx (DImode); + + /* Unsigned multiplication of the higher multiplier part + and the higher constant part. */ + z2 = expand_mult(DImode, x1, y1, z2, 1); + /* Unsigned multiplication of the lower multiplier part + and the higher constant part. */ + u0 = expand_mult(DImode, x0, y1, u0, 1); I'm fairly sure you really want to be using expand_widening_mult here, rather than using convert_to_mode first. While combine may be able to re-generate a widening multiply out of this sequence, there's no sense making it work too hard. r~ 2012-05-25 Dinar Temirbulatov dtemirbula...@gmail.com Alexey Kravets mr.kayr...@gmail.com * config/arm/arm.c (arm_rtx_costs_1): Add cost estimate for the integer double-word division operation. * config/mips/mips.c (mips_rtx_costs): Extend cost estimate for the integer double-word division operation for 32-bit targets. * gcc/expmed.c (expand_mult_highpart_optab): Allow to generate the higher multipilcation product for unsigned double-word integers using 32-bit wide registers. 22.patch Description: Binary data
Re: divide 64-bit by constant for 32-bit target machines
Hi, Here is the new version of the patch I have fixed two errors in the previous version, on mips32 the compiler could not expand division and terminated with ICE, this change fixed the issue: /* Extrating the higher part of the sum */ tmp = gen_highpart (SImode, tmp); tmp = force_reg (SImode, tmp); + tmp = force_reg (SImode, tmp); + tmp = convert_to_mode (DImode, tmp, 1); and another error on i686 and mips32r2: I found that overflow check in multiplication was not working. For example 0xfe34b4190a392b23/257 produced wrong result. Following change resolved the issue: -emit_store_flag_force (c, GT, u0, tmp, DImode, 1, -1); +emit_store_flag_force (c, GT, u0, tmp, DImode, 1, 1); Tested this new version of the patch on -none-linux-gnu with arm-7l, mips-32r2 (74k), i686 without new regressions. On Thu, May 3, 2012 at 5:40 PM, Richard Earnshaw rearn...@arm.com wrote: On 03/05/12 11:27, Dinar Temirbulatov wrote: This clearly needs more work. Comments: Need to end with a period and two spaces. Binary Operators: Need to be surrounded with white space. sorry for this, hope I resolved such issues with the new version. As Andrew Haley has already said, some documentation of the algorithm is needed. General documentation for the issue could be found here gmplib.org/~tege/divcnst-pldi94.pdf. Multiplication to get higher 128-bit was developed by me and Alexey Kravets, I attached C version of the algorithm. Why is this restricted to BITS_PER_WORD == 32? I am checking here that we are generating code for 32-bit target with 32-bit wide general propose registers, and with 64-bit wide registers usually there is an instruction to get the higher 64-bit of 64x64-bit multiplication cheaply. Costs: This code clearly needs to understand the relative cost of doing division this way; there is a limit to the amount of inline expansion that we should tolerate, particularly at -O2 and it's not clear that this will be much better than a library call if we don't have a widening multiply operation (as is true for older ARM chips, and presumably some other CPUs). In essence, you need to work out the cost of a divide instruction (just as rtx costs for this) and the approximate cost of the expanded algorithm. Added cost calculation. Another issue that worries me is the number of live DImode values; on machines with few registers this could cause excessive spilling to start happening. The cost calculation suppose to take this into account. I also wonder whether it would be beneficial to generate custom functions for division by specific constants (using this algorithm) and then call those functions rather than the standard lib-call. On ELF systems the functions can marked as hidden and put into common sections so that we only end up with one instance of each function in a program. yes, I think it is a good approach, I could redo my patch with such intrinsic function implementation with pre-shift, post-shift, and 64-bit constant as function parameters. Finally, do you have a copyright assignment with the FSF? We can't use this code without one. yes, I do have a copyright assignment with the FSF. Also I am in process of implementing signed integer 64-bit division by constant. thanks, Dinar. 18.patch Description: Binary data unsigned long long mul(unsigned long long a, unsigned long long b) { unsigned long long x1=a32; unsigned long long x0=a0x; unsigned long long y1=b32; unsigned long long y0=b0x; unsigned long long z2, z0, tmp, u0, u1; unsigned char c=0; z2=x1*y1; z0=x0*y0; u0=x0*y1+(z032); u1=x1*y0; tmp = (u0+u1); c = (tmp u0) || (tmp u1); return z2+(tmp32)+(((unsigned long long)c)32); }
Re: divide 64-bit by constant for 32-bit target machines
Hi, Here is updated version of patch. I added comments describing the algorithm. Hi Dinar. I'm afraid it gives the wrong results for some dividends * 82625484914982912 / 2023346444509052928: gives 4096, should be zero * 18317463604061229328 / 2023346444509052928: gives 4109, should be 9 * 12097415865295708879 / 4545815675034402816: gives 130, should be 2 * 18195490362097456014 / 6999635335417036800: gives 10, should be 2 Oh, I have used signed multiplication instead of unsigned and that was the reason of errors above, fixed that typo. Tested on arm-7l with no new regressions. Ok for trunk? thanks, Dinar. 14.patch Description: Binary data
divide 64-bit by constant for 32-bit target machines
Hi, Here is the patch that adds support for divide 64-bit by constant for 32-bit target machines, this patch was tested on arm-7a with no new regressions, also I am not sure on how to avoid for example i686 targets since div operation there is fast compared to over targets and it showed better performance with libc/sysdeps/wordsize-32/divdi3.c __udivdi3 vs my implementation on the compiler side, it is not clear for me by looking at the udiv_cost[speed][SImode] value. thanks, Dinar. 11.patch Description: Binary data
Re: MIPS Fix PR18141
hi, Richard, This version of patch showed no regressions on mipsel-unknown-linux-gnu. Thanks, Dinar. On Wed, Sep 21, 2011 at 4:59 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Dinar Temirbulatov dtemirbula...@gmail.com writes: @@ -14696,7 +14696,11 @@ mips_avoid_hazard (rtx after, rtx insn, clobber hi and lo. */ if (*hilo_delay 2 reg_set_p (lo_reg, pattern)) nops = 2 - *hilo_delay; - else if (*delayed_reg != 0 reg_referenced_p (*delayed_reg, pattern)) + else if ((*delayed_reg != 0 reg_referenced_p (*delayed_reg, pattern)) + !((GET_MODE (*delayed_reg) == DFmode set_after != 0 + (set = single_set (insn)) != NULL_RTX GET_MODE (SET_DEST(set)) == DFmode + XINT((XEXP (pattern, 1)), 1) == UNSPEC_LOAD_HIGH + XINT((XEXP (*set_after, 1)), 1) == UNSPEC_LOAD_LOW))) This isn't safe because the patterns might not be UNSPECs (so XINT (...) would be meaningless). It's better to check the insn code instead. Something like: else if (*delayed_reg != 0 reg_referenced_p (*delayed_reg, pattern) !(recog_memoized (insn) == CODE_FOR_load_highdf recog_memoized (*set_after) == CODE_FOR_load_lowdf)) (untested). Note that *set_after should always be nonnull if *delayed_reg is. Looks good otherwise. Richard diff -ruNp gcc-20110912-orig/gcc/config/mips/mips.c gcc-20110912-fixed/gcc/config/mips/mips.c --- gcc-20110912-orig/gcc/config/mips/mips.c 2011-09-12 17:22:27.576457121 +0400 +++ gcc-20110912-fixed/gcc/config/mips/mips.c 2011-09-23 14:24:45.379778834 +0400 @@ -14659,20 +14659,20 @@ mips_orphaned_high_part_p (htab_t htab, INSN and a previous instruction, avoid it by inserting nops after instruction AFTER. - *DELAYED_REG and *HILO_DELAY describe the hazards that apply at - this point. If *DELAYED_REG is non-null, INSN must wait a cycle - before using the value of that register. *HILO_DELAY counts the - number of instructions since the last hilo hazard (that is, - the number of instructions since the last MFLO or MFHI). + *DELAYED_REG, *SET_AFTER and *HILO_DELAY describe the hazards that + apply at this point. If *DELAYED_REG and *SET_AFTER is non-null, + INSN must wait a cycle before using the value of that register. + *HILO_DELAY counts the number of instructions since the last hilo hazard + (that is, the number of instructions since the last MFLO or MFHI). - After inserting nops for INSN, update *DELAYED_REG and *HILO_DELAY - for the next instruction. + After inserting nops for INSN, update *DELAYED_REG, *SET_AFTER + and *HILO_DELAY for the next instruction. LO_REG is an rtx for the LO register, used in dependence checking. */ static void mips_avoid_hazard (rtx after, rtx insn, int *hilo_delay, - rtx *delayed_reg, rtx lo_reg) + rtx *delayed_reg, rtx lo_reg, rtx *set_after) { rtx pattern, set; int nops, ninsns; @@ -14696,7 +14696,9 @@ mips_avoid_hazard (rtx after, rtx insn, clobber hi and lo. */ if (*hilo_delay 2 reg_set_p (lo_reg, pattern)) nops = 2 - *hilo_delay; - else if (*delayed_reg != 0 reg_referenced_p (*delayed_reg, pattern)) + else if ((*delayed_reg != 0 set_after != 0 reg_referenced_p (*delayed_reg, pattern)) + !(recog_memoized (insn) == CODE_FOR_load_highdf + recog_memoized (*set_after) == CODE_FOR_load_lowdf)) nops = 1; else nops = 0; @@ -14710,6 +14712,7 @@ mips_avoid_hazard (rtx after, rtx insn, /* Set up the state for the next instruction. */ *hilo_delay += ninsns; *delayed_reg = 0; + *set_after = 0; if (INSN_CODE (insn) = 0) switch (get_attr_hazard (insn)) { @@ -14724,6 +14727,7 @@ mips_avoid_hazard (rtx after, rtx insn, set = single_set (insn); gcc_assert (set); *delayed_reg = SET_DEST (set); + *set_after = insn; break; } } @@ -14736,7 +14740,7 @@ mips_avoid_hazard (rtx after, rtx insn, static void mips_reorg_process_insns (void) { - rtx insn, last_insn, subinsn, next_insn, lo_reg, delayed_reg; + rtx insn, last_insn, subinsn, next_insn, lo_reg, delayed_reg, set_after; int hilo_delay; htab_t htab; @@ -14811,7 +14815,7 @@ mips_reorg_process_insns (void) INSN_CODE (subinsn) = CODE_FOR_nop; } mips_avoid_hazard (last_insn, subinsn, hilo_delay, - delayed_reg, lo_reg); + delayed_reg, lo_reg, set_after); } last_insn = insn; } @@ -14832,7 +14836,7 @@ mips_reorg_process_insns (void) else { mips_avoid_hazard (last_insn, insn, hilo_delay, - delayed_reg, lo_reg); + delayed_reg, lo_reg, set_after); last_insn = insn; } } ChangeLog Description: Binary data
Re: MIPS Fix PR18141
Hi, I found typo in the patch instead of checking *set_after != 0 it was set_after != 0, here is corrected version of patch. I retested the patch without typo on mipsel-unknown-linux-gnu with no new regressions. thanks, Dinar. On Fri, Sep 23, 2011 at 3:11 PM, Dinar Temirbulatov dtemirbula...@gmail.com wrote: hi, Richard, This version of patch showed no regressions on mipsel-unknown-linux-gnu. Thanks, Dinar. On Wed, Sep 21, 2011 at 4:59 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Dinar Temirbulatov dtemirbula...@gmail.com writes: @@ -14696,7 +14696,11 @@ mips_avoid_hazard (rtx after, rtx insn, clobber hi and lo. */ if (*hilo_delay 2 reg_set_p (lo_reg, pattern)) nops = 2 - *hilo_delay; - else if (*delayed_reg != 0 reg_referenced_p (*delayed_reg, pattern)) + else if ((*delayed_reg != 0 reg_referenced_p (*delayed_reg, pattern)) + !((GET_MODE (*delayed_reg) == DFmode set_after != 0 + (set = single_set (insn)) != NULL_RTX GET_MODE (SET_DEST(set)) == DFmode + XINT((XEXP (pattern, 1)), 1) == UNSPEC_LOAD_HIGH + XINT((XEXP (*set_after, 1)), 1) == UNSPEC_LOAD_LOW))) This isn't safe because the patterns might not be UNSPECs (so XINT (...) would be meaningless). It's better to check the insn code instead. Something like: else if (*delayed_reg != 0 reg_referenced_p (*delayed_reg, pattern) !(recog_memoized (insn) == CODE_FOR_load_highdf recog_memoized (*set_after) == CODE_FOR_load_lowdf)) (untested). Note that *set_after should always be nonnull if *delayed_reg is. Looks good otherwise. Richard diff -ruNp gcc-20110912-orig/gcc/config/mips/mips.c gcc-20110912-fixed/gcc/config/mips/mips.c --- gcc-20110912-orig/gcc/config/mips/mips.c 2011-09-12 17:22:27.576457121 +0400 +++ gcc-20110912-fixed/gcc/config/mips/mips.c 2011-09-23 18:59:33.329771992 +0400 @@ -14659,20 +14659,20 @@ mips_orphaned_high_part_p (htab_t htab, INSN and a previous instruction, avoid it by inserting nops after instruction AFTER. - *DELAYED_REG and *HILO_DELAY describe the hazards that apply at - this point. If *DELAYED_REG is non-null, INSN must wait a cycle - before using the value of that register. *HILO_DELAY counts the - number of instructions since the last hilo hazard (that is, - the number of instructions since the last MFLO or MFHI). + *DELAYED_REG, *SET_AFTER and *HILO_DELAY describe the hazards that + apply at this point. If *DELAYED_REG and *SET_AFTER is non-null, + INSN must wait a cycle before using the value of that register. + *HILO_DELAY counts the number of instructions since the last hilo hazard + (that is, the number of instructions since the last MFLO or MFHI). - After inserting nops for INSN, update *DELAYED_REG and *HILO_DELAY - for the next instruction. + After inserting nops for INSN, update *DELAYED_REG, *SET_AFTER + and *HILO_DELAY for the next instruction. LO_REG is an rtx for the LO register, used in dependence checking. */ static void mips_avoid_hazard (rtx after, rtx insn, int *hilo_delay, - rtx *delayed_reg, rtx lo_reg) + rtx *delayed_reg, rtx lo_reg, rtx *set_after) { rtx pattern, set; int nops, ninsns; @@ -14696,7 +14696,9 @@ mips_avoid_hazard (rtx after, rtx insn, clobber hi and lo. */ if (*hilo_delay 2 reg_set_p (lo_reg, pattern)) nops = 2 - *hilo_delay; - else if (*delayed_reg != 0 reg_referenced_p (*delayed_reg, pattern)) + else if ((*delayed_reg != 0 *set_after != 0 reg_referenced_p (*delayed_reg, pattern)) + !(recog_memoized (insn) == CODE_FOR_load_highdf + recog_memoized (*set_after) == CODE_FOR_load_lowdf)) nops = 1; else nops = 0; @@ -14710,6 +14712,7 @@ mips_avoid_hazard (rtx after, rtx insn, /* Set up the state for the next instruction. */ *hilo_delay += ninsns; *delayed_reg = 0; + *set_after = 0; if (INSN_CODE (insn) = 0) switch (get_attr_hazard (insn)) { @@ -14724,6 +14727,7 @@ mips_avoid_hazard (rtx after, rtx insn, set = single_set (insn); gcc_assert (set); *delayed_reg = SET_DEST (set); + *set_after = insn; break; } } @@ -14736,7 +14740,7 @@ mips_avoid_hazard (rtx after, rtx insn, static void mips_reorg_process_insns (void) { - rtx insn, last_insn, subinsn, next_insn, lo_reg, delayed_reg; + rtx insn, last_insn, subinsn, next_insn, lo_reg, delayed_reg, set_after; int hilo_delay; htab_t htab; @@ -14811,7 +14815,7 @@ mips_reorg_process_insns (void) INSN_CODE (subinsn) = CODE_FOR_nop; } mips_avoid_hazard (last_insn, subinsn, hilo_delay, - delayed_reg, lo_reg); + delayed_reg, lo_reg, set_after); } last_insn = insn; } @@ -14832,7 +14836,7 @@ mips_reorg_process_insns (void) else { mips_avoid_hazard (last_insn, insn, hilo_delay, - delayed_reg