Re: [PATCH] RISC-V: add option -m(no-)autovec-segment

2024-02-27 Thread Greg McGary
On 2/27/24 8:25 AM, Jeff Law wrote: On 2/25/24 21:53, Greg McGary wrote: Add option -m(no-)autovec-segment to enable/disable autovectorizer from emitting vector segment load/store instructions. This is useful for performance experiments. gcc/ChangeLog: * config/riscv/autovec.md

Re: [PATCH] combine: Don't simplify paradoxical SUBREG on WORD_REGISTER_OPERATIONS [PR113010]

2024-02-27 Thread Greg McGary
On 2/26/24 5:17 PM, Greg McGary wrote: diff --git a/gcc/testsuite/gcc.c-torture/execute/pr113010.c b/gcc/testsuite/gcc.c-torture/execute/pr113010.c new file mode 100644 index 000..a95c613c1df --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr113010.c @@ -0,0 +1,9 @@ +int

[PATCH] combine: Don't simplify paradoxical SUBREG on WORD_REGISTER_OPERATIONS [PR113010]

2024-02-26 Thread Greg McGary
The sign-bit-copies of a sign-extending load cannot be known until runtime on WORD_REGISTER_OPERATIONS targets, except in the case of a zero-extending MEM load. See the fix for PR112758. 2024-02-22 Greg McGary PR rtl-optimization/113010 * combine.cc (simplify_comparison

[PATCH] RISC-V: add option -m(no-)autovec-segment

2024-02-25 Thread Greg McGary
Add option -m(no-)autovec-segment to enable/disable autovectorizer from emitting vector segment load/store instructions. This is useful for performance experiments. gcc/ChangeLog: * config/riscv/autovec.md (vec_mask_len_load_lanes, vec_mask_len_store_lanes): Predicate with

Re: [PATCH] combine: Don't simplify high part of paradoxical-SUBREG-of-MEM on machines that sign-extend loads [PR113010]

2024-02-23 Thread Greg McGary
On 2/22/24 2:08 PM, Jakub Jelinek wrote: On Thu, Feb 22, 2024 at 12:59:18PM -0800, Greg McGary wrote: The sign bit of a sign-extending load cannot be known until runtime, so don't attempt to simplify it in the combiner. 2024-02-22 Greg McGary PR rtl-optimization/113010

[PATCH] combine: Don't simplify high part of paradoxical-SUBREG-of-MEM on machines that sign-extend loads [PR113010]

2024-02-22 Thread Greg McGary
The sign bit of a sign-extending load cannot be known until runtime, so don't attempt to simplify it in the combiner. 2024-02-22 Greg McGary PR rtl-optimization/113010 * combine.cc (simplify_comparison): Don't simplify high part of paradoxical-SUBREG-of-MEM on machines

Re: [PATCH] combine: Don't optimize SIGN_EXTEND of MEM on WORD_REGISTER_OPERATIONS targets [PR113010]

2024-02-07 Thread Greg McGary
On 2/4/24 9:58 PM, Jeff Law wrote: On 2/2/24 15:48, Greg McGary wrote: input: (sign_extend:DI (mem/c:SI (symbol_ref:DI ("minus_1") [flags 0x86] ) [1 minus_1+0 S4 A32])) result: (subreg:DI (mem/c:SI (symbol_ref:DI ("minus_1") [flags 0x86] ) [1 minus_1+0 S4 A32]) 0)

Re: [PATCH] combine: Don't optimize SIGN_EXTEND of MEM on WORD_REGISTER_OPERATIONS targets [PR113010]

2024-02-02 Thread Greg McGary
On 2/1/24 10:24 PM, Jeff Law wrote: On 2/1/24 18:24, Greg McGary wrote: However, for a machine where (WORD_REGISTER_OPERATIONS && load_extend_op (inner_mode) == SIGN_EXTEND), the high part of a PSoM is  only known at runtime as 0s or 1s. That's the downstream bug. The fix for such

Re: [PATCH] combine: Don't optimize SIGN_EXTEND of MEM on WORD_REGISTER_OPERATIONS targets [PR113010]

2024-02-01 Thread Greg McGary
On 1/18/24 9:24 AM, Jeff Law wrote: On 1/17/24 20:53, Greg McGary wrote: While the code comment is true, perhaps it obscures the primary intent, which is recognition that the pattern (SIGN_EXTEND (mem ...) ) is destined to expand into a single memory-load instruction and no simplification

Re: [PATCH] combine: Don't optimize SIGN_EXTEND of MEM on WORD_REGISTER_OPERATIONS targets [PR113010]

2024-01-17 Thread Greg McGary
On Tue, Jan 16, 2024 at 11:44 PM Richard Biener wrote: > > On Tue, Jan 16, 2024 at 11:20 PM Greg McGary wrote: > > > > > > The sign bit of a sign-extending load cannot be known until runtime, > > > so don't attempt to simplify it in the combiner. >

[PATCH] combine: Don't optimize SIGN_EXTEND of MEM on WORD_REGISTER_OPERATIONS targets [PR113010]

2024-01-16 Thread Greg McGary
The sign bit of a sign-extending load cannot be known until runtime, so don't attempt to simplify it in the combiner. 2024-01-11 Greg McGary PR rtl-optimization/113010 * combine.cc (expand_compound_operation): Don't simplify SIGN_EXTEND of a MEM

Re: Dependences for call-preserved regs on exposed pipeline target?

2012-11-26 Thread Greg McGary
On 11/25/12 23:33, Maxim Kuvyrkov wrote: You essentially need a fix-up pass just before the end of compilation (machine-dependent reorg, if memory serves me right) to space instructions consuming values from CPRs from the CALL_INSNS that set those CPRs. I.e., for the 99% of compilation you

Re: Dependences for call-preserved regs on exposed pipeline target?

2012-11-26 Thread Greg McGary
On 11/26/12 12:46, Maxim Kuvyrkov wrote: I wonder if kludgy fixups refers to the dummy-instruction solution I mentioned above. The complete dependence graph is a myth. You cannot have a complete dependence graph for a function -- scheduler works on DAG regions (and I doubt it will ever

Dependences for call-preserved regs on exposed pipeline target?

2012-11-25 Thread Greg McGary
I'm working onaport to a VLIW DSP with anexposed pipeline (i.e., no interlocks). Some operations OPhave as much as 2-cycle latency on values of the call-preserved regs CPR. E.g., if the callee's epiloguerestores a CPR in the delay slot of the return instruction, then any OP with that CPR as

INSN_EXACT_TICK scheduler backtrack

2012-09-13 Thread Greg McGary
When the timing requirements are not met upon queueing an insn with INSN_EXACT_TICK, the scheduler backtracks. This seems wasteful. Why not prioritize INSN_EXACT_TICK insns so that we queue them first on the cycle they need?

Maybe expand MAX_RECOG_ALTERNATIVES ?

2012-05-11 Thread Greg McGary
I'm working on a DSP port whose unit reservations are very sensitive to operand signature. E.g., for an assembler mnemonic, there can be 35-50 different combinations of operand register classes, each having different impacts on latencies and function units. For assembler code generation, very

Re: Maybe expand MAX_RECOG_ALTERNATIVES ?

2012-05-11 Thread Greg McGary
On 05/11/12 16:00, Greg McGary wrote: My question is this: does it make sense to double MAX_RECOG_ALTERNATIVES so that I can use insn attributes to identify operand signatures, or should I use another approach? After some exploration, I don't see that another approach is even possible

IRA and two-phase load/store

2012-04-27 Thread Greg McGary
I'm working on a port that does loads stores in two phases. Every load/store is funneled through the intermediate registers ld and st standing between memory and the rest of the register file. Example: ld=4(rB) ... ... rC=ld st=rD 8(rB)=st rB is

Re: IRA and two-phase load/store

2012-04-27 Thread Greg McGary
On 04/27/12 14:31, Greg McGary wrote: I'm working on a port that does loads stores in two phases. Every load/store is funneled through the intermediate registers ld and st standing between memory and the rest of the register file. Example: ld=4(rB

Re: where are caller-save addresses legitimized?

2010-05-07 Thread Greg McGary
On 05/05/10 21:27, Jeff Law wrote: On 05/05/10 21:34, Greg McGary wrote: On 05/05/10 20:21, Jeff Law wrote: I'm not sure they are ever legitimized -- IIRC caller-save tries to only generate addressing modes which are safe for precisely this reason. Apparently not so: caller

where are caller-save addresses legitimized?

2010-05-05 Thread Greg McGary
reload() setup_save_areas() assign_stack_local_1() creates a mem address whose offset too large to fit into the machine insn's offset operand. Later, reload() save_call_clobbered_regs() insert_save() adjust_address_1() change_address_1() asserts because the address is not legitimate.

Re: where are caller-save addresses legitimized?

2010-05-05 Thread Greg McGary
On 05/05/10 20:21, Jeff Law wrote: On 05/05/10 17:45, Greg McGary wrote: reload() setup_save_areas() assign_stack_local_1() creates a mem address whose offset too large to fit into the machine insn's offset operand. Later, reload() save_call_clobbered_regs() insert_save

Re: redundant divmodsi4 not optimized away

2010-04-28 Thread Greg McGary
On 04/28/10 05:58, Michael Matz wrote: On Tue, 27 Apr 2010, Greg McGary wrote: (define_insn *udivmodsi4_libcall [(set (reg:SI 4) (udiv:SI (reg:SI 1) (reg:SI 2))) (set (reg:SI 1) (umod:SI (reg:SI 1) (reg:SI 2))) (clobber (reg:SI 2)) (clobber

Re: redundant divmodsi4 not optimized away

2010-04-27 Thread Greg McGary
On 04/26/10 22:09, Ian Lance Taylor wrote: Greg McGaryg...@mcgary.org writes: I have a port without div or mod machine instructions. I wrote divmodsi4 patterns that do the libcall directly, hoping that GCC would recognize the opportunity to use a single divmodsi4 to compute both quotient

redundant divmodsi4 not optimized away

2010-04-26 Thread Greg McGary
I have a port without div or mod machine instructions. I wrote divmodsi4 patterns that do the libcall directly, hoping that GCC would recognize the opportunity to use a single divmodsi4 to compute both quotient and remainder. Alas, GCC calls divmodsi4 twice with the same divisor and dividend

How to deal with 48-bit pointers and 32-bit integers

2009-08-12 Thread Greg McGary
I'm doing a port for an unusual new machine which is 32-bit RISCy in every way, except that it has 48-bit pointers. Pointers have a high-order 16-bit segID and low-order 32-bit seg offset. Most ALU instructions only work on 32 bits, zeroing the upper 16-bit seg ID in the result. A few ALU

Trouble with powerpc64 mfpgpr patch

2007-07-12 Thread Greg McGary
I extracted the MFPGPR hunks from Peter Bergner's [PATCH] Add POWER6 machine description, posted on 2006-11-01 and dropped them into gcc-4.0.3, but the result fails with error: insn does not satisfy its constraints: .../src/gcc-4.0.3/gcc/config/rs6000/darwin-ldouble.c: In function

insns for register-move between general and floating

2006-03-21 Thread Greg McGary
I'm working on a port that has instructions to move bits between 64-bit floating-point and 64-bit general-purpose regs. I say bits because there's no conversion between float and int: the bit pattern is unaltered. Therefore, it's possible to use scratch FPRs for spilling GPRs vice-versa, and

Insn for direct increment of memory?

2005-09-24 Thread Greg McGary
I'm working with a machine that has a memory-increment insn. It's a network-processor performance hack that allows no-latency accumulation of statistical counters. The insn sends the increment and address to the memory controller which does the add, avoiding the usual long-latency

Re: Insn for direct increment of memory?

2005-09-24 Thread Greg McGary
Paul Brook [EMAIL PROTECTED] writes: It should just work if you have the appropriate movsi pattern/alternative. m68k has an memory-increment instruction (aka add :-). Touche. I've had my head in RISC-land too long... 8^) G

Re: How to use a fast scratchpad-RAM for fill/spill ?

2005-05-11 Thread Greg McGary
Daniel Jacobowitz [EMAIL PROTECTED] writes: ... Or you could try telling the entire compiler to treat them as registers, instead of just reload. That's likely to work as well or better. So, I define these as a separate register class, and only the movM insn patterns get constraints that

Re: emit_no_conflict_block breaks some conditional moves

2005-04-23 Thread Greg McGary
James E Wilson [EMAIL PROTECTED] writes: Greg McGary wrote: I found that emit_no_conflict_block() reordered insns gen'd by expand_doubleword_shift() in a way that violated dependency between compares and associated conditional-move insns that had the target register as destination

emit_no_conflict_block breaks some conditional moves

2005-04-20 Thread Greg McGary
My port failed the DImode part of the rotate regression-tests (gcc.c-torture/execute/20020508-[123].c). I found that emit_no_conflict_block() reordered insns gen'd by expand_doubleword_shift() in a way that violated dependency between compares and associated conditional-move insns that had the