Re: PATCH [1/n] addr32: Properly use Pmode and word_mode
On Mon, Mar 5, 2012 at 9:11 AM, H.J. Lu hjl.to...@gmail.com wrote: On Sun, Mar 4, 2012 at 11:47 PM, Uros Bizjak ubiz...@gmail.com wrote: On Mon, Mar 5, 2012 at 4:53 AM, H.J. Lu hjl.to...@gmail.com wrote: and compiler does generate the same output. i386.c also has xasm = jmp\t%A0; xasm = call\t%A0; for calls. There are no separate indirect call patterns. For x32, only indirect register calls have to be in DImode. The direct call should be in Pmode (SImode). Direct call just expects label to some abolute address that is assumed to fit in 32 bits (see constant_call_address_operand_p). call and jmp insn expect word_mode operands, so please change ix86_expand_call and call patterns in the same way as jump instructions above. Since x86-64 hardware always zero-extends upper 32bits of 64bit registers when loading its lower 32bits, it is safe and easier to just to output 64bit registers for %A than zero-extend it by hand for all jump/call patterns. No, the instruction expects word_mode operands, so we have to extend values to expected mode. I don't think that patching at insn output time is acceptable. You are right. I found a testcase to show problem: struct foo { void (*f) (void); int i; }; void __attribute__ ((noinline)) bar (struct foo x) { x.f (); } x is passed in RDI and the uppper 32bits of RDI is int i. Operand 1 of calls must be in Pmode for SYMOL_REF and word_mode for register. When I removed :P like @@ -11423,7 +11428,7 @@ (define_insn *call_value [(set (match_operand 0 ) - (call (mem:QI (match_operand:P 1 call_insn_operand czw)) + (call (mem:QI (match_operand 1 call_insn_operand czw)) (match_operand 2 )))] !SIBLING_CALL_P (insn) * return ix86_output_call_insn (insn, operands[1]); I got In file included from /net/gnu-6/export/gnu/import/git/gcc-addr32/libgcc/unwind-dw2.c:1633:0: /net/gnu-6/export/gnu/import/git/gcc-addr32/libgcc/unwind.inc: In function \u2018_Unwind_ForcedUnwind_Phase2\u2019: /net/gnu-6/export/gnu/import/git/gcc-addr32/libgcc/unwind.inc:189:1: error: unable to find a register to spill in class \u2018CREG\u2019 /net/gnu-6/export/gnu/import/git/gcc-addr32/libgcc/unwind.inc:189:1: error: this is the insn: (call_insn 62 60 63 9 (set (reg:SI 0 ax) (call (mem:QI (reg/f:DI 0 ax [orig:88 D.9044 ] [88]) [0 *D.9044_25 S1 A8]) (const_int 0 [0]))) /net/gnu-6/export/gnu/import/git/gcc-addr32/libgcc/unwind.inc:175 629 {*call_value} (expr_list:REG_DEAD (reg/f:DI 0 ax [orig:88 D.9044 ] [88]) (expr_list:REG_DEAD (reg:DI 37 r8) (expr_list:REG_DEAD (reg:SI 5 di) (expr_list:REG_DEAD (reg:SI 4 si) (expr_list:REG_DEAD (reg:DI 2 cx) (expr_list:REG_DEAD (reg:DI 1 dx) (nil))) (expr_list:REG_BR_PRED (use (reg:SI 5 di)) (expr_list:REG_BR_PRED (use (reg:SI 4 si)) (expr_list:REG_FRAME_RELATED_EXPR (use (reg:DI 1 dx)) (expr_list:REG_BR_PRED (use (reg:DI 2 cx)) (expr_list:REG_BR_PRED (use (reg:DI 37 r8)) (nil))) /net/gnu-6/export/gnu/import/git/gcc-addr32/libgcc/unwind.inc:189:1: internal compiler error: in spill_failure, at reload1.c:2120 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. Here is a patch to duplicate function symbol to change it from Pmode to word_mode. It seems to work. But I am not sure if it is the right approach. Any suggestions? Thanks. -- H.J. --- diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 1828cf6..26e23c7 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -22976,6 +22975,19 @@ construct_plt_address (rtx symbol) return tmp; } +static rtx +duplicate_function_symbol_ref (enum machine_mode mode, rtx fnaddr) +{ + rtx dup_symbol_ref; + gcc_assert (!SYMBOL_REF_HAS_BLOCK_INFO_P (fnaddr)); + dup_symbol_ref = gen_rtx_SYMBOL_REF (mode, XSTR (fnaddr, 0)); + SYMBOL_REF_USED (dup_symbol_ref) = SYMBOL_REF_USED (fnaddr); + SYMBOL_REF_WEAK (dup_symbol_ref) = SYMBOL_REF_WEAK (fnaddr); + SET_SYMBOL_REF_DECL (dup_symbol_ref, SYMBOL_REF_DECL (fnaddr)); + SYMBOL_REF_FLAGS (dup_symbol_ref) = SYMBOL_REF_FLAGS (fnaddr); + return dup_symbol_ref; +} + rtx ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1, rtx callarg2, @@ -23026,13 +23038,22 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1, !local_symbolic_operand (XEXP (fnaddr, 0), VOIDmode)) fnaddr = gen_rtx_MEM (QImode, construct_plt_address (XEXP (fnaddr, 0))); else if (sibcall - ? !sibcall_insn_operand (XEXP (fnaddr, 0), Pmode) - : !call_insn_operand (XEXP (fnaddr, 0), Pmode)) + ? !sibcall_insn_operand (XEXP (fnaddr, 0), word_mode) + : !call_insn_operand (XEXP (fnaddr, 0), word_mode)) { fnaddr = XEXP (fnaddr, 0); - if
Re: PATCH [1/n] addr32: Properly use Pmode and word_mode
On Mon, Mar 5, 2012 at 9:01 AM, Uros Bizjak ubiz...@gmail.com wrote: @@ -11388,6 +11400,11 @@ ix86_decompose_address (rtx addr, struct ix86_address *out) else disp = addr; /* displacement */ + /* Since address override works only on the (reg) part in fs:(reg), + we can't use it as memory operand. */ + if (Pmode != word_mode seg == SEG_FS (base || index)) + return 0; Can you explain the above some more? IMO, if the override works on (reg) part, this is just what we want. When Pmode == SImode, we have fs segment register == 0x1001 and base register (SImode) == -1 (0x). We are expecting address to be 0x1001 - 1 == 0x1000. But, what we get is 0x1000 + 0x, not 0x1000 since 0x67 address prefix only applies to base register to zero-extend 0x to 64bit. I would call this a bug in the specification - I guess that 0x1001(%eax) works correctly. We will treat this issue as a bug. Nice, we have TARGET_TLS_DIRECT_SEG_REFS option. We can just clear it somewhere appropriate when Pmode != word_mode. Uros.
Re: PATCH [1/n] addr32: Properly use Pmode and word_mode
On Sun, Mar 4, 2012 at 11:47 PM, Uros Bizjak ubiz...@gmail.com wrote: On Mon, Mar 5, 2012 at 4:53 AM, H.J. Lu hjl.to...@gmail.com wrote: and compiler does generate the same output. i386.c also has xasm = jmp\t%A0; xasm = call\t%A0; for calls. There are no separate indirect call patterns. For x32, only indirect register calls have to be in DImode. The direct call should be in Pmode (SImode). Direct call just expects label to some abolute address that is assumed to fit in 32 bits (see constant_call_address_operand_p). call and jmp insn expect word_mode operands, so please change ix86_expand_call and call patterns in the same way as jump instructions above. Since x86-64 hardware always zero-extends upper 32bits of 64bit registers when loading its lower 32bits, it is safe and easier to just to output 64bit registers for %A than zero-extend it by hand for all jump/call patterns. No, the instruction expects word_mode operands, so we have to extend values to expected mode. I don't think that patching at insn output time is acceptable. You are right. I found a testcase to show problem: struct foo { void (*f) (void); int i; }; void __attribute__ ((noinline)) bar (struct foo x) { x.f (); } x is passed in RDI and the uppper 32bits of RDI is int i. BTW: I propose to split the patch into smaller pieces, dealing with various independent parts separately. Handling jump/call insn is definitely one of them, the other is stringops handling, another prologue/epilogue expansion. I will do that. Thanks. -- H.J.
Re: PATCH [1/n] addr32: Properly use Pmode and word_mode
On Mon, Mar 5, 2012 at 12:01 AM, Uros Bizjak ubiz...@gmail.com wrote: On Sun, Mar 4, 2012 at 11:01 PM, H.J. Lu hjl.to...@gmail.com wrote: @@ -11388,6 +11400,11 @@ ix86_decompose_address (rtx addr, struct ix86_address *out) else disp = addr; /* displacement */ + /* Since address override works only on the (reg) part in fs:(reg), + we can't use it as memory operand. */ + if (Pmode != word_mode seg == SEG_FS (base || index)) + return 0; Can you explain the above some more? IMO, if the override works on (reg) part, this is just what we want. When Pmode == SImode, we have fs segment register == 0x1001 and base register (SImode) == -1 (0x). We are expecting address to be 0x1001 - 1 == 0x1000. But, what we get is 0x1000 + 0x, not 0x1000 since 0x67 address prefix only applies to base register to zero-extend 0x to 64bit. I would call this a bug in the specification - I guess that 0x1001(%eax) works correctly. This is how hardware works. We will treat this issue as a bug. I also was surprised by this behavior. -- H.J.
Re: PATCH [1/n] addr32: Properly use Pmode and word_mode
On Mon, Mar 5, 2012 at 12:24 AM, Uros Bizjak ubiz...@gmail.com wrote: On Mon, Mar 5, 2012 at 9:01 AM, Uros Bizjak ubiz...@gmail.com wrote: @@ -11388,6 +11400,11 @@ ix86_decompose_address (rtx addr, struct ix86_address *out) else disp = addr; /* displacement */ + /* Since address override works only on the (reg) part in fs:(reg), + we can't use it as memory operand. */ + if (Pmode != word_mode seg == SEG_FS (base || index)) + return 0; Can you explain the above some more? IMO, if the override works on (reg) part, this is just what we want. When Pmode == SImode, we have fs segment register == 0x1001 and base register (SImode) == -1 (0x). We are expecting address to be 0x1001 - 1 == 0x1000. But, what we get is 0x1000 + 0x, not 0x1000 since 0x67 address prefix only applies to base register to zero-extend 0x to 64bit. I would call this a bug in the specification - I guess that 0x1001(%eax) works correctly. We will treat this issue as a bug. Nice, we have TARGET_TLS_DIRECT_SEG_REFS option. We can just clear it somewhere appropriate when Pmode != word_mode. This only applies to fs:(reg) address. fs:offset is OK. -- H.J.
Re: PATCH [1/n] addr32: Properly use Pmode and word_mode
On Mon, Mar 05, 2012 at 09:13:49AM -0800, H.J. Lu wrote: We are expecting address to be 0x1001 - 1 == 0x1000. But, what we get is 0x1000 + 0x, not 0x1000 since 0x67 address prefix only applies to base register to zero-extend 0x to 64bit. I would call this a bug in the specification - I guess that 0x1001(%eax) works correctly. This is how hardware works. Do you really need to use addr32 prefixes for the direct TLS seg refs? Without that the addresses will be sign-extended from the 32-bit immediate (which is used in LP64 x86_64 code too) and everything will work fine, won't it? Jakub
Re: PATCH [1/n] addr32: Properly use Pmode and word_mode
On Mon, Mar 5, 2012 at 9:20 AM, Jakub Jelinek ja...@redhat.com wrote: On Mon, Mar 05, 2012 at 09:13:49AM -0800, H.J. Lu wrote: We are expecting address to be 0x1001 - 1 == 0x1000. But, what we get is 0x1000 + 0x, not 0x1000 since 0x67 address prefix only applies to base register to zero-extend 0x to 64bit. I would call this a bug in the specification - I guess that 0x1001(%eax) works correctly. This is how hardware works. Do you really need to use addr32 prefixes for the direct TLS seg refs? Without that the addresses will be sign-extended from the 32-bit immediate (which is used in LP64 x86_64 code too) and everything will work fine, won't it? 32bit immediate is OK. The problem is fs:(32bit register). -- H.J.
Re: PATCH [1/n] addr32: Properly use Pmode and word_mode
On Mon, Mar 05, 2012 at 09:26:20AM -0800, H.J. Lu wrote: On Mon, Mar 5, 2012 at 9:20 AM, Jakub Jelinek ja...@redhat.com wrote: On Mon, Mar 05, 2012 at 09:13:49AM -0800, H.J. Lu wrote: We are expecting address to be 0x1001 - 1 == 0x1000. But, what we get is 0x1000 + 0x, not 0x1000 since 0x67 address prefix only applies to base register to zero-extend 0x to 64bit. I would call this a bug in the specification - I guess that 0x1001(%eax) works correctly. This is how hardware works. Do you really need to use addr32 prefixes for the direct TLS seg refs? Without that the addresses will be sign-extended from the 32-bit immediate (which is used in LP64 x86_64 code too) and everything will work fine, won't it? 32bit immediate is OK. The problem is fs:(32bit register). Just require that the MEM uses DImode address in those patterns, even for -mx32? Jakub
Re: PATCH [1/n] addr32: Properly use Pmode and word_mode
On Mon, Mar 5, 2012 at 9:31 AM, Jakub Jelinek ja...@redhat.com wrote: On Mon, Mar 05, 2012 at 09:26:20AM -0800, H.J. Lu wrote: On Mon, Mar 5, 2012 at 9:20 AM, Jakub Jelinek ja...@redhat.com wrote: On Mon, Mar 05, 2012 at 09:13:49AM -0800, H.J. Lu wrote: We are expecting address to be 0x1001 - 1 == 0x1000. But, what we get is 0x1000 + 0x, not 0x1000 since 0x67 address prefix only applies to base register to zero-extend 0x to 64bit. I would call this a bug in the specification - I guess that 0x1001(%eax) works correctly. This is how hardware works. Do you really need to use addr32 prefixes for the direct TLS seg refs? Without that the addresses will be sign-extended from the 32-bit immediate (which is used in LP64 x86_64 code too) and everything will work fine, won't it? 32bit immediate is OK. The problem is fs:(32bit register). Just require that the MEM uses DImode address in those patterns, even for -mx32? Should SImode offset be zero-extended or sign-extended to DImode? The offset relative to TP can be negative. On the other hand, the upper 32bit address in x32 must be zero. Even if we properly extend it to DImode, it may not be faster than load fs:(reg) to a register first. -- H.J.
Re: PATCH [1/n] addr32: Properly use Pmode and word_mode
On Tue, Mar 6, 2012 at 6:40 AM, H.J. Lu hjl.to...@gmail.com wrote: We are expecting address to be 0x1001 - 1 == 0x1000. But, what we get is 0x1000 + 0x, not 0x1000 since 0x67 address prefix only applies to base register to zero-extend 0x to 64bit. I would call this a bug in the specification - I guess that 0x1001(%eax) works correctly. This is how hardware works. Do you really need to use addr32 prefixes for the direct TLS seg refs? Without that the addresses will be sign-extended from the 32-bit immediate (which is used in LP64 x86_64 code too) and everything will work fine, won't it? 32bit immediate is OK. The problem is fs:(32bit register). Just require that the MEM uses DImode address in those patterns, even for -mx32? Should SImode offset be zero-extended or sign-extended to DImode? The offset relative to TP can be negative. On the other hand, the upper 32bit address in x32 must be zero. Even if we properly extend it to DImode, it may not be faster than load fs:(reg) to a register first. As I proposed earlier, just clear TARGET_TLS_DIRECT_SEG_REFS for now. This will always load fs:0 to Pmode register and will add correctly extended register or immediate to it. We can revisit this issue later to specialize for fs:reg only, but I doubt it is worth any further efforts. Uros. -- H.J.
Re: PATCH [1/n] addr32: Properly use Pmode and word_mode
On Sat, Nov 12, 2011 at 3:19 AM, H.J. Lu hongjiu...@intel.com wrote: The current x32 implementation uses LEAs to convert 32bit address to 64bit. However, we can use addr32 prefix to use 32bit address directly. It improves performance by 5% in SPEC CPU 2K/2006. All changes are done in x86 backend, except for a smaill unwind library assert change: http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01555.html due to return column size difference. For x86-64, Pmode can be 32bit or 64bit, but word_mode is always 64bit. push/pop only work on word_mode. Also string instructions take Pmode pointers. I will submit a set of patches to use 32bit Pmode for x32. This is the first patch to properly use Pmode and word_mode. It also adds addr32 prefix to string instructions if needed. OK for trunk? First round of review comments: @@ -10252,14 +10260,18 @@ ix86_expand_prologue (void) if (r10_live eax_live) { t = choose_baseaddr (m-fs.sp_offset - allocate); - emit_move_insn (r10, gen_frame_mem (Pmode, t)); + emit_move_insn (gen_rtx_REG (word_mode, R10_REG), + gen_frame_mem (word_mode, t)); t = choose_baseaddr (m-fs.sp_offset - allocate - UNITS_PER_WORD); - emit_move_insn (eax, gen_frame_mem (Pmode, t)); + emit_move_insn (gen_rtx_REG (word_mode, AX_REG), + gen_frame_mem (word_mode, t)); } else if (eax_live || r10_live) { t = choose_baseaddr (m-fs.sp_offset - allocate); - emit_move_insn ((eax_live ? eax : r10), gen_frame_mem (Pmode, t)); + emit_move_insn (gen_rtx_REG (word_mode, + (eax_live ? AX_REG : R10_REG)), + gen_frame_mem (word_mode, t)); } } gcc_assert (m-fs.sp_offset == frame.stack_pointer_offset); Please just change rtx eax = gen_rtx_REG (Pmode, AX_REG); and r10 = gen_rtx_REG (Pmode, R10_REG); around line 10305 and line 10324. You also have gen_push in Pmode, just following the former line. Please review the whole ix86_expand_prologue how AX and R10 are defined and used. @@ -11060,8 +11072,8 @@ ix86_expand_split_stack_prologue (void) { rtx rax; - rax = gen_rtx_REG (Pmode, AX_REG); - emit_move_insn (rax, reg10); + rax = gen_rtx_REG (word_mode, AX_REG); + emit_move_insn (rax, gen_rtx_REG (word_mode, R10_REG)); use_reg (call_fusage, rax); } Same here. Please review how AX, R10 and R11 are defined and used. Also, this needs review from split stack author. @@ -11388,6 +11400,11 @@ ix86_decompose_address (rtx addr, struct ix86_address *out) else disp = addr; /* displacement */ + /* Since address override works only on the (reg) part in fs:(reg), + we can't use it as memory operand. */ + if (Pmode != word_mode seg == SEG_FS (base || index)) +return 0; Can you explain the above some more? IMO, if the override works on (reg) part, this is just what we want. @@ -13637,7 +13665,8 @@ ix86_print_operand (FILE *file, rtx x, int code) gcc_unreachable (); } - ix86_print_operand (file, x, 0); + ix86_print_operand (file, x, + TARGET_64BIT REG_P (x) ? 'q' : 0); return; This is too big hammer. You output everything in DImode, so even if the address is in fact in SImode, you output it in DImode with an addr32 prefix. Uros.
Re: PATCH [1/n] addr32: Properly use Pmode and word_mode
On Sun, Mar 4, 2012 at 12:09 PM, Uros Bizjak ubiz...@gmail.com wrote: On Sat, Nov 12, 2011 at 3:19 AM, H.J. Lu hongjiu...@intel.com wrote: The current x32 implementation uses LEAs to convert 32bit address to 64bit. However, we can use addr32 prefix to use 32bit address directly. It improves performance by 5% in SPEC CPU 2K/2006. All changes are done in x86 backend, except for a smaill unwind library assert change: http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01555.html due to return column size difference. For x86-64, Pmode can be 32bit or 64bit, but word_mode is always 64bit. push/pop only work on word_mode. Also string instructions take Pmode pointers. I will submit a set of patches to use 32bit Pmode for x32. This is the first patch to properly use Pmode and word_mode. It also adds addr32 prefix to string instructions if needed. OK for trunk? First round of review comments: @@ -10252,14 +10260,18 @@ ix86_expand_prologue (void) if (r10_live eax_live) { t = choose_baseaddr (m-fs.sp_offset - allocate); - emit_move_insn (r10, gen_frame_mem (Pmode, t)); + emit_move_insn (gen_rtx_REG (word_mode, R10_REG), + gen_frame_mem (word_mode, t)); t = choose_baseaddr (m-fs.sp_offset - allocate - UNITS_PER_WORD); - emit_move_insn (eax, gen_frame_mem (Pmode, t)); + emit_move_insn (gen_rtx_REG (word_mode, AX_REG), + gen_frame_mem (word_mode, t)); } else if (eax_live || r10_live) { t = choose_baseaddr (m-fs.sp_offset - allocate); - emit_move_insn ((eax_live ? eax : r10), gen_frame_mem (Pmode, t)); + emit_move_insn (gen_rtx_REG (word_mode, + (eax_live ? AX_REG : R10_REG)), + gen_frame_mem (word_mode, t)); } } gcc_assert (m-fs.sp_offset == frame.stack_pointer_offset); Please just change rtx eax = gen_rtx_REG (Pmode, AX_REG); and r10 = gen_rtx_REG (Pmode, R10_REG); This is done on purpose. We manipulate stack using AX and R10 as scratch registers in Pmode since stack is in Pmode. But AX and R10 registers have to be saved and restored in word_mode. around line 10305 and line 10324. You also have gen_push in Pmode, In those places, they just want to push a register on stack to save it. Callers don't care how it is done. I changed gen_push to allow Pmode by always pushing registers in word_mode: if (REG_P (arg) GET_MODE (arg) != word_mode) arg = gen_rtx_REG (word_mode, REGNO (arg)); just following the former line. Please review the whole ix86_expand_prologue how AX and R10 are defined and used. The same issue applies here. @@ -11060,8 +11072,8 @@ ix86_expand_split_stack_prologue (void) { rtx rax; - rax = gen_rtx_REG (Pmode, AX_REG); - emit_move_insn (rax, reg10); + rax = gen_rtx_REG (word_mode, AX_REG); + emit_move_insn (rax, gen_rtx_REG (word_mode, R10_REG)); use_reg (call_fusage, rax); } Same here. Please review how AX, R10 and R11 are defined and used. Also, this needs review from split stack author. I CCed Ian. That is the same issue. We need some scratch registers in Pmode to manipulate stack. But we have to save and restore them in word_mode, not Pmode. @@ -11388,6 +11400,11 @@ ix86_decompose_address (rtx addr, struct ix86_address *out) else disp = addr; /* displacement */ + /* Since address override works only on the (reg) part in fs:(reg), + we can't use it as memory operand. */ + if (Pmode != word_mode seg == SEG_FS (base || index)) + return 0; Can you explain the above some more? IMO, if the override works on (reg) part, this is just what we want. When Pmode == SImode, we have fs segment register == 0x1001 and base register (SImode) == -1 (0x). We are expecting address to be 0x1001 - 1 == 0x1000. But, what we get is 0x1000 + 0x, not 0x1000 since 0x67 address prefix only applies to base register to zero-extend 0x to 64bit. @@ -13637,7 +13665,8 @@ ix86_print_operand (FILE *file, rtx x, int code) gcc_unreachable (); } - ix86_print_operand (file, x, 0); + ix86_print_operand (file, x, + TARGET_64BIT REG_P (x) ? 'q' : 0); return; This is too big hammer. You output everything in DImode, so even if the address is in fact in SImode, you output it in DImode with an addr32 prefix. %A is only used in jmp\t%A0 and there is no jmp *%eax instruction in 64bit mode, only jmp *%rax: [hjl@gnu-4 tmp]$ cat j.s jmp *%eax jmp *%rax [hjl@gnu-4 tmp]$ gcc -c j.s j.s: Assembler messages: j.s:1: Error: operand type mismatch for `jmp' [hjl@gnu-4 tmp]$ It is OK for x32 since the upper 32bits are zero when we are loading %eax. -- H.J.
Re: PATCH [1/n] addr32: Properly use Pmode and word_mode
On Sun, Mar 4, 2012 at 11:01 PM, H.J. Lu hjl.to...@gmail.com wrote: @@ -13637,7 +13665,8 @@ ix86_print_operand (FILE *file, rtx x, int code) gcc_unreachable (); } - ix86_print_operand (file, x, 0); + ix86_print_operand (file, x, + TARGET_64BIT REG_P (x) ? 'q' : 0); return; This is too big hammer. You output everything in DImode, so even if the address is in fact in SImode, you output it in DImode with an addr32 prefix. %A is only used in jmp\t%A0 and there is no jmp *%eax instruction in 64bit mode, only jmp *%rax: [hjl@gnu-4 tmp]$ cat j.s jmp *%eax jmp *%rax [hjl@gnu-4 tmp]$ gcc -c j.s j.s: Assembler messages: j.s:1: Error: operand type mismatch for `jmp' [hjl@gnu-4 tmp]$ It is OK for x32 since the upper 32bits are zero when we are loading %eax. Just zero_extend register in wrong mode to DImode in indirect_jump and tablejump expanders. If above is true, then gcc will remove this extension automatically. Uros.
Re: PATCH [1/n] addr32: Properly use Pmode and word_mode
On Sun, Mar 4, 2012 at 2:40 PM, Uros Bizjak ubiz...@gmail.com wrote: On Sun, Mar 4, 2012 at 11:01 PM, H.J. Lu hjl.to...@gmail.com wrote: @@ -13637,7 +13665,8 @@ ix86_print_operand (FILE *file, rtx x, int code) gcc_unreachable (); } - ix86_print_operand (file, x, 0); + ix86_print_operand (file, x, + TARGET_64BIT REG_P (x) ? 'q' : 0); return; This is too big hammer. You output everything in DImode, so even if the address is in fact in SImode, you output it in DImode with an addr32 prefix. %A is only used in jmp\t%A0 and there is no jmp *%eax instruction in 64bit mode, only jmp *%rax: [hjl@gnu-4 tmp]$ cat j.s jmp *%eax jmp *%rax [hjl@gnu-4 tmp]$ gcc -c j.s j.s: Assembler messages: j.s:1: Error: operand type mismatch for `jmp' [hjl@gnu-4 tmp]$ It is OK for x32 since the upper 32bits are zero when we are loading %eax. Just zero_extend register in wrong mode to DImode in indirect_jump and tablejump expanders. If above is true, then gcc will remove this extension automatically. I tried: diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 715e7ea..de5cf67 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -11100,10 +11100,15 @@ (set_attr modrm 0)]) (define_expand indirect_jump - [(set (pc) (match_operand 0 indirect_branch_operand ))]) + [(set (pc) (match_operand 0 indirect_branch_operand ))] + +{ + if (TARGET_X32) +operands[0] = convert_memory_address (word_mode, operands[0]); +}) (define_insn *indirect_jump - [(set (pc) (match_operand:P 0 indirect_branch_operand rw))] + [(set (pc) (match_operand:W 0 indirect_branch_operand rw))] jmp\t%A0 [(set_attr type ibr) @@ -11145,12 +11150,12 @@ operands[0] = expand_simple_binop (Pmode, code, op0, op1, NULL_RTX, 0, OPTAB_DIRECT); } - else if (TARGET_X32) -operands[0] = convert_memory_address (Pmode, operands[0]); + if (TARGET_X32) +operands[0] = convert_memory_address (word_mode, operands[0]); }) (define_insn *tablejump_1 - [(set (pc) (match_operand:P 0 indirect_branch_operand rw)) + [(set (pc) (match_operand:W 0 indirect_branch_operand rw)) (use (label_ref (match_operand 1 )))] jmp\t%A0 and compiler does generate the same output. i386.c also has xasm = jmp\t%A0; xasm = call\t%A0; for calls. There are no separate indirect call patterns. For x32, only indirect register calls have to be in DImode. The direct call should be in Pmode (SImode). Since x86-64 hardware always zero-extends upper 32bits of 64bit registers when loading its lower 32bits, it is safe and easier to just to output 64bit registers for %A than zero-extend it by hand for all jump/call patterns. -- H.J.
Re: PATCH [1/n] addr32: Properly use Pmode and word_mode
H.J. Lu hjl.to...@gmail.com writes: @@ -11060,8 +11072,8 @@ ix86_expand_split_stack_prologue (void) { rtx rax; - rax = gen_rtx_REG (Pmode, AX_REG); - emit_move_insn (rax, reg10); + rax = gen_rtx_REG (word_mode, AX_REG); + emit_move_insn (rax, gen_rtx_REG (word_mode, R10_REG)); use_reg (call_fusage, rax); } Same here. Please review how AX, R10 and R11 are defined and used. Also, this needs review from split stack author. I CCed Ian. That is the same issue. We need some scratch registers in Pmode to manipulate stack. But we have to save and restore them in word_mode, not Pmode. Changing Pmode to word_mode is fine here, if the x86 maintainers approve the rest of the patch. Ian
Re: PATCH [1/n] addr32: Properly use Pmode and word_mode
On Mon, Mar 5, 2012 at 4:53 AM, H.J. Lu hjl.to...@gmail.com wrote: and compiler does generate the same output. i386.c also has xasm = jmp\t%A0; xasm = call\t%A0; for calls. There are no separate indirect call patterns. For x32, only indirect register calls have to be in DImode. The direct call should be in Pmode (SImode). Direct call just expects label to some abolute address that is assumed to fit in 32 bits (see constant_call_address_operand_p). call and jmp insn expect word_mode operands, so please change ix86_expand_call and call patterns in the same way as jump instructions above. Since x86-64 hardware always zero-extends upper 32bits of 64bit registers when loading its lower 32bits, it is safe and easier to just to output 64bit registers for %A than zero-extend it by hand for all jump/call patterns. No, the instruction expects word_mode operands, so we have to extend values to expected mode. I don't think that patching at insn output time is acceptable. BTW: I propose to split the patch into smaller pieces, dealing with various independent parts separately. Handling jump/call insn is definitely one of them, the other is stringops handling, another prologue/epilogue expansion. Uros.
Re: PATCH [1/n] addr32: Properly use Pmode and word_mode
On Sat, Nov 12, 2011 at 9:32 AM, Uros Bizjak ubiz...@gmail.com wrote: On Sat, Nov 12, 2011 at 3:19 AM, H.J. Lu hongjiu...@intel.com wrote: The current x32 implementation uses LEAs to convert 32bit address to 64bit. However, we can use addr32 prefix to use 32bit address directly. It improves performance by 5% in SPEC CPU 2K/2006. All changes are done in x86 backend, except for a smaill unwind library assert change: http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01555.html due to return column size difference. For x86-64, Pmode can be 32bit or 64bit, but word_mode is always 64bit. push/pop only work on word_mode. Also string instructions take Pmode pointers. I will submit a set of patches to use 32bit Pmode for x32. This is the first patch to properly use Pmode and word_mode. It also adds addr32 prefix to string instructions if needed. OK for trunk? Not for stage3. Uros. Now trunk in stage1. The patch is at http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01572.html OK for trunk? Thanks. -- H.J.
Re: PATCH [1/n] addr32: Properly use Pmode and word_mode
On Sat, Nov 12, 2011 at 3:19 AM, H.J. Lu hongjiu...@intel.com wrote: The current x32 implementation uses LEAs to convert 32bit address to 64bit. However, we can use addr32 prefix to use 32bit address directly. It improves performance by 5% in SPEC CPU 2K/2006. All changes are done in x86 backend, except for a smaill unwind library assert change: http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01555.html due to return column size difference. For x86-64, Pmode can be 32bit or 64bit, but word_mode is always 64bit. push/pop only work on word_mode. Also string instructions take Pmode pointers. I will submit a set of patches to use 32bit Pmode for x32. This is the first patch to properly use Pmode and word_mode. It also adds addr32 prefix to string instructions if needed. OK for trunk? Not for stage3. Uros.