Re: PATCH [1/n] addr32: Properly use Pmode and word_mode

2012-03-06 Thread H.J. Lu
On Mon, Mar 5, 2012 at 9:11 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Sun, Mar 4, 2012 at 11:47 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Mon, Mar 5, 2012 at 4:53 AM, H.J. Lu hjl.to...@gmail.com wrote:

 and compiler does generate the same output. i386.c also has

        xasm = jmp\t%A0;
    xasm = call\t%A0;

 for calls.  There are no separate indirect call patterns.  For x32,
 only indirect register calls have to be in DImode.  The direct call
 should be in Pmode (SImode).

 Direct call just expects label to some abolute address that is assumed
 to fit in 32 bits (see constant_call_address_operand_p).

 call and jmp insn expect word_mode operands, so please change
 ix86_expand_call and call patterns in the same way as jump
 instructions above.

 Since x86-64 hardware always zero-extends upper 32bits of 64bit
 registers when loading its lower 32bits, it is safe and easier to just
 to output 64bit registers for %A than zero-extend it by hand for all
 jump/call patterns.

 No, the instruction expects word_mode operands, so we have to extend
 values to expected mode. I don't think that patching at insn output
 time is acceptable.

 You are right. I found a testcase to show problem:

 struct foo
 {
  void (*f) (void);
  int i;
 };

 void
 __attribute__ ((noinline))
 bar (struct foo x)
 {
  x.f ();
 }

 x is passed in RDI and the uppper 32bits of RDI is int i.


Operand 1 of calls must be in Pmode for SYMOL_REF and word_mode
for register.  When I removed :P like

@@ -11423,7 +11428,7 @@

 (define_insn *call_value
   [(set (match_operand 0  )
-  (call (mem:QI (match_operand:P 1 call_insn_operand czw))
+  (call (mem:QI (match_operand 1 call_insn_operand czw))
 (match_operand 2  )))]
   !SIBLING_CALL_P (insn)
   * return ix86_output_call_insn (insn, operands[1]);

I got

In file included from
/net/gnu-6/export/gnu/import/git/gcc-addr32/libgcc/unwind-dw2.c:1633:0:
/net/gnu-6/export/gnu/import/git/gcc-addr32/libgcc/unwind.inc: In
function \u2018_Unwind_ForcedUnwind_Phase2\u2019:
/net/gnu-6/export/gnu/import/git/gcc-addr32/libgcc/unwind.inc:189:1:
error: unable to find a register to spill in class \u2018CREG\u2019
/net/gnu-6/export/gnu/import/git/gcc-addr32/libgcc/unwind.inc:189:1:
error: this is the insn:
(call_insn 62 60 63 9 (set (reg:SI 0 ax)
(call (mem:QI (reg/f:DI 0 ax [orig:88 D.9044 ] [88]) [0
*D.9044_25 S1 A8])
(const_int 0 [0])))
/net/gnu-6/export/gnu/import/git/gcc-addr32/libgcc/unwind.inc:175 629
{*call_value}
 (expr_list:REG_DEAD (reg/f:DI 0 ax [orig:88 D.9044 ] [88])
(expr_list:REG_DEAD (reg:DI 37 r8)
(expr_list:REG_DEAD (reg:SI 5 di)
(expr_list:REG_DEAD (reg:SI 4 si)
(expr_list:REG_DEAD (reg:DI 2 cx)
(expr_list:REG_DEAD (reg:DI 1 dx)
(nil)))
(expr_list:REG_BR_PRED (use (reg:SI 5 di))
(expr_list:REG_BR_PRED (use (reg:SI 4 si))
(expr_list:REG_FRAME_RELATED_EXPR (use (reg:DI 1 dx))
(expr_list:REG_BR_PRED (use (reg:DI 2 cx))
(expr_list:REG_BR_PRED (use (reg:DI 37 r8))
(nil)))
/net/gnu-6/export/gnu/import/git/gcc-addr32/libgcc/unwind.inc:189:1:
internal compiler error: in spill_failure, at reload1.c:2120
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.

Here is a patch to duplicate function symbol to change it from Pmode
to word_mode.  It seems to work.  But I am not sure if  it is the
right approach.  Any suggestions?

Thanks.



-- 
H.J.
---
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 1828cf6..26e23c7 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -22976,6 +22975,19 @@ construct_plt_address (rtx symbol)
   return tmp;
 }

+static rtx
+duplicate_function_symbol_ref (enum machine_mode mode, rtx fnaddr)
+{
+  rtx dup_symbol_ref;
+  gcc_assert (!SYMBOL_REF_HAS_BLOCK_INFO_P (fnaddr));
+  dup_symbol_ref = gen_rtx_SYMBOL_REF (mode, XSTR (fnaddr, 0));
+  SYMBOL_REF_USED (dup_symbol_ref) = SYMBOL_REF_USED (fnaddr);
+  SYMBOL_REF_WEAK (dup_symbol_ref) = SYMBOL_REF_WEAK (fnaddr);
+  SET_SYMBOL_REF_DECL (dup_symbol_ref, SYMBOL_REF_DECL (fnaddr));
+  SYMBOL_REF_FLAGS (dup_symbol_ref) = SYMBOL_REF_FLAGS (fnaddr);
+  return dup_symbol_ref;
+}
+
 rtx
 ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
  rtx callarg2,
@@ -23026,13 +23038,22 @@ ix86_expand_call (rtx retval, rtx fnaddr,
rtx callarg1,
!local_symbolic_operand (XEXP (fnaddr, 0), VOIDmode))
 fnaddr = gen_rtx_MEM (QImode, construct_plt_address (XEXP (fnaddr, 0)));
   else if (sibcall
-  ? !sibcall_insn_operand (XEXP (fnaddr, 0), Pmode)
-  : !call_insn_operand (XEXP (fnaddr, 0), Pmode))
+  ? !sibcall_insn_operand (XEXP (fnaddr, 0), word_mode)
+  : !call_insn_operand (XEXP (fnaddr, 0), word_mode))
 {
   fnaddr = XEXP (fnaddr, 0);
-  if 

Re: PATCH [1/n] addr32: Properly use Pmode and word_mode

2012-03-05 Thread Uros Bizjak
On Mon, Mar 5, 2012 at 9:01 AM, Uros Bizjak ubiz...@gmail.com wrote:

 @@ -11388,6 +11400,11 @@ ix86_decompose_address (rtx addr, struct
 ix86_address *out)
   else
     disp = addr;                       /* displacement */

 +  /* Since address override works only on the (reg) part in fs:(reg),
 +     we can't use it as memory operand.  */
 +  if (Pmode != word_mode  seg == SEG_FS  (base || index))
 +    return 0;

 Can you explain the above some more? IMO, if the override works on
 (reg) part, this is just what we want.

 When Pmode == SImode, we have

 fs segment register == 0x1001

 and

 base register (SImode) == -1 (0x).

 We are expecting address to be 0x1001 - 1 == 0x1000.  But, what we get
 is 0x1000 + 0x, not 0x1000 since 0x67 address prefix only applies to
 base register to zero-extend 0x to 64bit.

 I would call this a bug in the specification - I guess that
 0x1001(%eax) works correctly.

 We will treat this issue as a bug.

Nice, we have TARGET_TLS_DIRECT_SEG_REFS option. We can just clear it
somewhere appropriate when Pmode != word_mode.

Uros.


Re: PATCH [1/n] addr32: Properly use Pmode and word_mode

2012-03-05 Thread H.J. Lu
On Sun, Mar 4, 2012 at 11:47 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Mon, Mar 5, 2012 at 4:53 AM, H.J. Lu hjl.to...@gmail.com wrote:

 and compiler does generate the same output. i386.c also has

        xasm = jmp\t%A0;
    xasm = call\t%A0;

 for calls.  There are no separate indirect call patterns.  For x32,
 only indirect register calls have to be in DImode.  The direct call
 should be in Pmode (SImode).

 Direct call just expects label to some abolute address that is assumed
 to fit in 32 bits (see constant_call_address_operand_p).

 call and jmp insn expect word_mode operands, so please change
 ix86_expand_call and call patterns in the same way as jump
 instructions above.

 Since x86-64 hardware always zero-extends upper 32bits of 64bit
 registers when loading its lower 32bits, it is safe and easier to just
 to output 64bit registers for %A than zero-extend it by hand for all
 jump/call patterns.

 No, the instruction expects word_mode operands, so we have to extend
 values to expected mode. I don't think that patching at insn output
 time is acceptable.

You are right. I found a testcase to show problem:

struct foo
{
  void (*f) (void);
  int i;
};

void
__attribute__ ((noinline))
bar (struct foo x)
{
  x.f ();
}

x is passed in RDI and the uppper 32bits of RDI is int i.

 BTW: I propose to split the patch into smaller pieces, dealing with
 various independent parts separately. Handling jump/call insn is
 definitely one of them, the other is stringops handling, another
 prologue/epilogue expansion.


I will do that.

Thanks.

-- 
H.J.


Re: PATCH [1/n] addr32: Properly use Pmode and word_mode

2012-03-05 Thread H.J. Lu
On Mon, Mar 5, 2012 at 12:01 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Sun, Mar 4, 2012 at 11:01 PM, H.J. Lu hjl.to...@gmail.com wrote:

 @@ -11388,6 +11400,11 @@ ix86_decompose_address (rtx addr, struct
 ix86_address *out)
   else
     disp = addr;                       /* displacement */

 +  /* Since address override works only on the (reg) part in fs:(reg),
 +     we can't use it as memory operand.  */
 +  if (Pmode != word_mode  seg == SEG_FS  (base || index))
 +    return 0;

 Can you explain the above some more? IMO, if the override works on
 (reg) part, this is just what we want.

 When Pmode == SImode, we have

 fs segment register == 0x1001

 and

 base register (SImode) == -1 (0x).

 We are expecting address to be 0x1001 - 1 == 0x1000.  But, what we get
 is 0x1000 + 0x, not 0x1000 since 0x67 address prefix only applies to
 base register to zero-extend 0x to 64bit.

 I would call this a bug in the specification - I guess that
 0x1001(%eax) works correctly.

This is how hardware works.

 We will treat this issue as a bug.


I also was surprised by this behavior.

-- 
H.J.


Re: PATCH [1/n] addr32: Properly use Pmode and word_mode

2012-03-05 Thread H.J. Lu
On Mon, Mar 5, 2012 at 12:24 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Mon, Mar 5, 2012 at 9:01 AM, Uros Bizjak ubiz...@gmail.com wrote:

 @@ -11388,6 +11400,11 @@ ix86_decompose_address (rtx addr, struct
 ix86_address *out)
   else
     disp = addr;                       /* displacement */

 +  /* Since address override works only on the (reg) part in fs:(reg),
 +     we can't use it as memory operand.  */
 +  if (Pmode != word_mode  seg == SEG_FS  (base || index))
 +    return 0;

 Can you explain the above some more? IMO, if the override works on
 (reg) part, this is just what we want.

 When Pmode == SImode, we have

 fs segment register == 0x1001

 and

 base register (SImode) == -1 (0x).

 We are expecting address to be 0x1001 - 1 == 0x1000.  But, what we get
 is 0x1000 + 0x, not 0x1000 since 0x67 address prefix only applies to
 base register to zero-extend 0x to 64bit.

 I would call this a bug in the specification - I guess that
 0x1001(%eax) works correctly.

 We will treat this issue as a bug.

 Nice, we have TARGET_TLS_DIRECT_SEG_REFS option. We can just clear it
 somewhere appropriate when Pmode != word_mode.


This only applies to fs:(reg) address.  fs:offset is OK.

-- 
H.J.


Re: PATCH [1/n] addr32: Properly use Pmode and word_mode

2012-03-05 Thread Jakub Jelinek
On Mon, Mar 05, 2012 at 09:13:49AM -0800, H.J. Lu wrote:
  We are expecting address to be 0x1001 - 1 == 0x1000.  But, what we get
  is 0x1000 + 0x, not 0x1000 since 0x67 address prefix only applies 
  to
  base register to zero-extend 0x to 64bit.
 
  I would call this a bug in the specification - I guess that
  0x1001(%eax) works correctly.
 
 This is how hardware works.

Do you really need to use addr32 prefixes for the direct TLS seg refs?
Without that the addresses will be sign-extended from the 32-bit immediate
(which is used in LP64 x86_64 code too) and everything will work fine, won't
it?

Jakub


Re: PATCH [1/n] addr32: Properly use Pmode and word_mode

2012-03-05 Thread H.J. Lu
On Mon, Mar 5, 2012 at 9:20 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Mon, Mar 05, 2012 at 09:13:49AM -0800, H.J. Lu wrote:
  We are expecting address to be 0x1001 - 1 == 0x1000.  But, what we get
  is 0x1000 + 0x, not 0x1000 since 0x67 address prefix only applies 
  to
  base register to zero-extend 0x to 64bit.
 
  I would call this a bug in the specification - I guess that
  0x1001(%eax) works correctly.

 This is how hardware works.

 Do you really need to use addr32 prefixes for the direct TLS seg refs?
 Without that the addresses will be sign-extended from the 32-bit immediate
 (which is used in LP64 x86_64 code too) and everything will work fine, won't
 it?


32bit immediate is OK.  The problem is fs:(32bit register).


-- 
H.J.


Re: PATCH [1/n] addr32: Properly use Pmode and word_mode

2012-03-05 Thread Jakub Jelinek
On Mon, Mar 05, 2012 at 09:26:20AM -0800, H.J. Lu wrote:
 On Mon, Mar 5, 2012 at 9:20 AM, Jakub Jelinek ja...@redhat.com wrote:
  On Mon, Mar 05, 2012 at 09:13:49AM -0800, H.J. Lu wrote:
   We are expecting address to be 0x1001 - 1 == 0x1000.  But, what we get
   is 0x1000 + 0x, not 0x1000 since 0x67 address prefix only 
   applies to
   base register to zero-extend 0x to 64bit.
  
   I would call this a bug in the specification - I guess that
   0x1001(%eax) works correctly.
 
  This is how hardware works.
 
  Do you really need to use addr32 prefixes for the direct TLS seg refs?
  Without that the addresses will be sign-extended from the 32-bit immediate
  (which is used in LP64 x86_64 code too) and everything will work fine, won't
  it?
 
 
 32bit immediate is OK.  The problem is fs:(32bit register).
 

Just require that the MEM uses DImode address in those patterns, even for -mx32?

Jakub


Re: PATCH [1/n] addr32: Properly use Pmode and word_mode

2012-03-05 Thread H.J. Lu
On Mon, Mar 5, 2012 at 9:31 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Mon, Mar 05, 2012 at 09:26:20AM -0800, H.J. Lu wrote:
 On Mon, Mar 5, 2012 at 9:20 AM, Jakub Jelinek ja...@redhat.com wrote:
  On Mon, Mar 05, 2012 at 09:13:49AM -0800, H.J. Lu wrote:
   We are expecting address to be 0x1001 - 1 == 0x1000.  But, what we get
   is 0x1000 + 0x, not 0x1000 since 0x67 address prefix only 
   applies to
   base register to zero-extend 0x to 64bit.
  
   I would call this a bug in the specification - I guess that
   0x1001(%eax) works correctly.
 
  This is how hardware works.
 
  Do you really need to use addr32 prefixes for the direct TLS seg refs?
  Without that the addresses will be sign-extended from the 32-bit immediate
  (which is used in LP64 x86_64 code too) and everything will work fine, 
  won't
  it?
 

 32bit immediate is OK.  The problem is fs:(32bit register).


 Just require that the MEM uses DImode address in those patterns, even for 
 -mx32?


Should SImode offset be zero-extended or sign-extended to
DImode? The offset relative to TP can be negative.  On the
other hand, the upper 32bit address in x32 must be zero.
Even if we properly extend it to DImode, it may not be faster
than load fs:(reg) to a register first.

-- 
H.J.


Re: PATCH [1/n] addr32: Properly use Pmode and word_mode

2012-03-05 Thread Uros Bizjak
On Tue, Mar 6, 2012 at 6:40 AM, H.J. Lu hjl.to...@gmail.com wrote:

   We are expecting address to be 0x1001 - 1 == 0x1000.  But, what we get
   is 0x1000 + 0x, not 0x1000 since 0x67 address prefix only 
   applies to
   base register to zero-extend 0x to 64bit.
  
   I would call this a bug in the specification - I guess that
   0x1001(%eax) works correctly.
 
  This is how hardware works.
 
  Do you really need to use addr32 prefixes for the direct TLS seg refs?
  Without that the addresses will be sign-extended from the 32-bit immediate
  (which is used in LP64 x86_64 code too) and everything will work fine, 
  won't
  it?
 

 32bit immediate is OK.  The problem is fs:(32bit register).


 Just require that the MEM uses DImode address in those patterns, even for 
 -mx32?


 Should SImode offset be zero-extended or sign-extended to
 DImode? The offset relative to TP can be negative.  On the
 other hand, the upper 32bit address in x32 must be zero.
 Even if we properly extend it to DImode, it may not be faster
 than load fs:(reg) to a register first.

As I proposed earlier, just clear TARGET_TLS_DIRECT_SEG_REFS for now.
This will always load fs:0 to Pmode register and will add correctly
extended register or immediate to it. We can revisit this issue later
to specialize for fs:reg only, but I doubt it is worth any further
efforts.

Uros.

 --
 H.J.


Re: PATCH [1/n] addr32: Properly use Pmode and word_mode

2012-03-04 Thread Uros Bizjak
On Sat, Nov 12, 2011 at 3:19 AM, H.J. Lu hongjiu...@intel.com wrote:

 The current x32 implementation uses LEAs to convert 32bit address to
 64bit.  However, we can use addr32 prefix to use 32bit address directly.
 It improves performance by 5% in SPEC CPU 2K/2006.  All changes are done
 in x86 backend, except for a smaill unwind library assert change:

 http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01555.html

 due to return column size difference.

 For x86-64, Pmode can be 32bit or 64bit, but word_mode is always 64bit.
 push/pop only work on word_mode.  Also string instructions take Pmode
 pointers.

 I will submit a set of patches to use 32bit Pmode for x32.  This is
 the first patch to properly use Pmode and word_mode.  It also adds
 addr32 prefix to string instructions if needed.  OK for trunk?

First round of review comments:

@@ -10252,14 +10260,18 @@ ix86_expand_prologue (void)
   if (r10_live  eax_live)
 {
  t = choose_baseaddr (m-fs.sp_offset - allocate);
- emit_move_insn (r10, gen_frame_mem (Pmode, t));
+ emit_move_insn (gen_rtx_REG (word_mode, R10_REG),
+ gen_frame_mem (word_mode, t));
  t = choose_baseaddr (m-fs.sp_offset - allocate - UNITS_PER_WORD);
- emit_move_insn (eax, gen_frame_mem (Pmode, t));
+ emit_move_insn (gen_rtx_REG (word_mode, AX_REG),
+ gen_frame_mem (word_mode, t));
}
   else if (eax_live || r10_live)
{
  t = choose_baseaddr (m-fs.sp_offset - allocate);
- emit_move_insn ((eax_live ? eax : r10), gen_frame_mem (Pmode, t));
+ emit_move_insn (gen_rtx_REG (word_mode,
+  (eax_live ? AX_REG : R10_REG)),
+ gen_frame_mem (word_mode, t));
}
 }
   gcc_assert (m-fs.sp_offset == frame.stack_pointer_offset);

Please just change

  rtx eax = gen_rtx_REG (Pmode, AX_REG);

and
  r10 = gen_rtx_REG (Pmode, R10_REG);

around line 10305 and line 10324. You also have gen_push in Pmode,
just following the former line. Please review the whole
ix86_expand_prologue how AX and R10 are defined and used.

@@ -11060,8 +11072,8 @@ ix86_expand_split_stack_prologue (void)
{
  rtx rax;

- rax = gen_rtx_REG (Pmode, AX_REG);
- emit_move_insn (rax, reg10);
+ rax = gen_rtx_REG (word_mode, AX_REG);
+ emit_move_insn (rax, gen_rtx_REG (word_mode, R10_REG));
  use_reg (call_fusage, rax);
}

Same here. Please review how AX, R10 and R11 are defined and used.
Also, this needs review from split stack author.

@@ -11388,6 +11400,11 @@ ix86_decompose_address (rtx addr, struct
ix86_address *out)
   else
 disp = addr;   /* displacement */

+  /* Since address override works only on the (reg) part in fs:(reg),
+ we can't use it as memory operand.  */
+  if (Pmode != word_mode  seg == SEG_FS  (base || index))
+return 0;

Can you explain the above some more? IMO, if the override works on
(reg) part, this is just what we want.

@@ -13637,7 +13665,8 @@ ix86_print_operand (FILE *file, rtx x, int code)
  gcc_unreachable ();
}

- ix86_print_operand (file, x, 0);
+ ix86_print_operand (file, x,
+ TARGET_64BIT  REG_P (x) ? 'q' : 0);
  return;

This is too big hammer. You output everything in DImode, so even if
the address is in fact in SImode, you output it in DImode with an
addr32 prefix.

Uros.


Re: PATCH [1/n] addr32: Properly use Pmode and word_mode

2012-03-04 Thread H.J. Lu
On Sun, Mar 4, 2012 at 12:09 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Sat, Nov 12, 2011 at 3:19 AM, H.J. Lu hongjiu...@intel.com wrote:

 The current x32 implementation uses LEAs to convert 32bit address to
 64bit.  However, we can use addr32 prefix to use 32bit address directly.
 It improves performance by 5% in SPEC CPU 2K/2006.  All changes are done
 in x86 backend, except for a smaill unwind library assert change:

 http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01555.html

 due to return column size difference.

 For x86-64, Pmode can be 32bit or 64bit, but word_mode is always 64bit.
 push/pop only work on word_mode.  Also string instructions take Pmode
 pointers.

 I will submit a set of patches to use 32bit Pmode for x32.  This is
 the first patch to properly use Pmode and word_mode.  It also adds
 addr32 prefix to string instructions if needed.  OK for trunk?

 First round of review comments:

 @@ -10252,14 +10260,18 @@ ix86_expand_prologue (void)
       if (r10_live  eax_live)
         {
          t = choose_baseaddr (m-fs.sp_offset - allocate);
 -         emit_move_insn (r10, gen_frame_mem (Pmode, t));
 +         emit_move_insn (gen_rtx_REG (word_mode, R10_REG),
 +                         gen_frame_mem (word_mode, t));
          t = choose_baseaddr (m-fs.sp_offset - allocate - UNITS_PER_WORD);
 -         emit_move_insn (eax, gen_frame_mem (Pmode, t));
 +         emit_move_insn (gen_rtx_REG (word_mode, AX_REG),
 +                         gen_frame_mem (word_mode, t));
        }
       else if (eax_live || r10_live)
        {
          t = choose_baseaddr (m-fs.sp_offset - allocate);
 -         emit_move_insn ((eax_live ? eax : r10), gen_frame_mem (Pmode, t));
 +         emit_move_insn (gen_rtx_REG (word_mode,
 +                                      (eax_live ? AX_REG : R10_REG)),
 +                         gen_frame_mem (word_mode, t));
        }
     }
   gcc_assert (m-fs.sp_offset == frame.stack_pointer_offset);

 Please just change

      rtx eax = gen_rtx_REG (Pmode, AX_REG);

 and
          r10 = gen_rtx_REG (Pmode, R10_REG);

This is done on purpose.  We manipulate stack using AX and R10 as
scratch registers in Pmode since stack is in Pmode.  But AX and R10
registers have to be saved and restored in word_mode.

 around line 10305 and line 10324. You also have gen_push in Pmode,

In those places, they just want to push a register on stack to save it.
Callers don't care how it is done.  I changed gen_push  to allow
Pmode by always pushing registers in word_mode:

 if (REG_P (arg)  GET_MODE (arg) != word_mode)
arg = gen_rtx_REG (word_mode, REGNO (arg));

 just following the former line. Please review the whole
 ix86_expand_prologue how AX and R10 are defined and used.

The same issue applies here.

 @@ -11060,8 +11072,8 @@ ix86_expand_split_stack_prologue (void)
        {
          rtx rax;

 -         rax = gen_rtx_REG (Pmode, AX_REG);
 -         emit_move_insn (rax, reg10);
 +         rax = gen_rtx_REG (word_mode, AX_REG);
 +         emit_move_insn (rax, gen_rtx_REG (word_mode, R10_REG));
          use_reg (call_fusage, rax);
        }

 Same here. Please review how AX, R10 and R11 are defined and used.
 Also, this needs review from split stack author.

I CCed Ian. That is the same issue.  We need some scratch registers
in Pmode to manipulate stack.  But we have to save and restore them
in word_mode, not Pmode.

 @@ -11388,6 +11400,11 @@ ix86_decompose_address (rtx addr, struct
 ix86_address *out)
   else
     disp = addr;                       /* displacement */

 +  /* Since address override works only on the (reg) part in fs:(reg),
 +     we can't use it as memory operand.  */
 +  if (Pmode != word_mode  seg == SEG_FS  (base || index))
 +    return 0;

 Can you explain the above some more? IMO, if the override works on
 (reg) part, this is just what we want.

When Pmode == SImode, we have

fs segment register == 0x1001

and

base register (SImode) == -1 (0x).

We are expecting address to be 0x1001 - 1 == 0x1000.  But, what we get
is 0x1000 + 0x, not 0x1000 since 0x67 address prefix only applies to
base register to zero-extend 0x to 64bit.

 @@ -13637,7 +13665,8 @@ ix86_print_operand (FILE *file, rtx x, int code)
              gcc_unreachable ();
            }

 -         ix86_print_operand (file, x, 0);
 +         ix86_print_operand (file, x,
 +                             TARGET_64BIT  REG_P (x) ? 'q' : 0);
          return;

 This is too big hammer. You output everything in DImode, so even if
 the address is in fact in SImode, you output it in DImode with an
 addr32 prefix.


%A is only used in jmp\t%A0 and there is no jmp *%eax instruction in
64bit mode, only jmp *%rax:

[hjl@gnu-4 tmp]$ cat j.s
jmp *%eax
jmp *%rax
[hjl@gnu-4 tmp]$ gcc -c j.s
j.s: Assembler messages:
j.s:1: Error: operand type mismatch for `jmp'
[hjl@gnu-4 tmp]$

It is OK for x32 since the upper 32bits are zero when we are loading %eax.


-- 
H.J.


Re: PATCH [1/n] addr32: Properly use Pmode and word_mode

2012-03-04 Thread Uros Bizjak
On Sun, Mar 4, 2012 at 11:01 PM, H.J. Lu hjl.to...@gmail.com wrote:

 @@ -13637,7 +13665,8 @@ ix86_print_operand (FILE *file, rtx x, int code)
              gcc_unreachable ();
            }

 -         ix86_print_operand (file, x, 0);
 +         ix86_print_operand (file, x,
 +                             TARGET_64BIT  REG_P (x) ? 'q' : 0);
          return;

 This is too big hammer. You output everything in DImode, so even if
 the address is in fact in SImode, you output it in DImode with an
 addr32 prefix.


 %A is only used in jmp\t%A0 and there is no jmp *%eax instruction in
 64bit mode, only jmp *%rax:

 [hjl@gnu-4 tmp]$ cat j.s
        jmp *%eax
        jmp *%rax
 [hjl@gnu-4 tmp]$ gcc -c j.s
 j.s: Assembler messages:
 j.s:1: Error: operand type mismatch for `jmp'
 [hjl@gnu-4 tmp]$

 It is OK for x32 since the upper 32bits are zero when we are loading %eax.

Just zero_extend register in wrong mode to DImode in indirect_jump and
tablejump expanders. If above is true, then gcc will remove this
extension automatically.

Uros.


Re: PATCH [1/n] addr32: Properly use Pmode and word_mode

2012-03-04 Thread H.J. Lu
On Sun, Mar 4, 2012 at 2:40 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Sun, Mar 4, 2012 at 11:01 PM, H.J. Lu hjl.to...@gmail.com wrote:

 @@ -13637,7 +13665,8 @@ ix86_print_operand (FILE *file, rtx x, int code)
              gcc_unreachable ();
            }

 -         ix86_print_operand (file, x, 0);
 +         ix86_print_operand (file, x,
 +                             TARGET_64BIT  REG_P (x) ? 'q' : 0);
          return;

 This is too big hammer. You output everything in DImode, so even if
 the address is in fact in SImode, you output it in DImode with an
 addr32 prefix.


 %A is only used in jmp\t%A0 and there is no jmp *%eax instruction in
 64bit mode, only jmp *%rax:

 [hjl@gnu-4 tmp]$ cat j.s
        jmp *%eax
        jmp *%rax
 [hjl@gnu-4 tmp]$ gcc -c j.s
 j.s: Assembler messages:
 j.s:1: Error: operand type mismatch for `jmp'
 [hjl@gnu-4 tmp]$

 It is OK for x32 since the upper 32bits are zero when we are loading %eax.

 Just zero_extend register in wrong mode to DImode in indirect_jump and
 tablejump expanders. If above is true, then gcc will remove this
 extension automatically.


I tried:

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 715e7ea..de5cf67 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -11100,10 +11100,15 @@
(set_attr modrm 0)])

 (define_expand indirect_jump
-  [(set (pc) (match_operand 0 indirect_branch_operand ))])
+  [(set (pc) (match_operand 0 indirect_branch_operand ))]
+  
+{
+  if (TARGET_X32)
+operands[0] = convert_memory_address (word_mode, operands[0]);
+})

 (define_insn *indirect_jump
-  [(set (pc) (match_operand:P 0 indirect_branch_operand rw))]
+  [(set (pc) (match_operand:W 0 indirect_branch_operand rw))]
   
   jmp\t%A0
   [(set_attr type ibr)
@@ -11145,12 +11150,12 @@
   operands[0] = expand_simple_binop (Pmode, code, op0, op1, NULL_RTX, 0,
 OPTAB_DIRECT);
 }
-  else if (TARGET_X32)
-operands[0] = convert_memory_address (Pmode, operands[0]);
+  if (TARGET_X32)
+operands[0] = convert_memory_address (word_mode, operands[0]);
 })

 (define_insn *tablejump_1
-  [(set (pc) (match_operand:P 0 indirect_branch_operand rw))
+  [(set (pc) (match_operand:W 0 indirect_branch_operand rw))
(use (label_ref (match_operand 1  )))]
   
   jmp\t%A0

and compiler does generate the same output. i386.c also has

xasm = jmp\t%A0;
xasm = call\t%A0;

for calls.  There are no separate indirect call patterns.  For x32,
only indirect register calls have to be in DImode.  The direct call
should be in Pmode (SImode).

Since x86-64 hardware always zero-extends upper 32bits of 64bit
registers when loading its lower 32bits, it is safe and easier to just
to output 64bit registers for %A than zero-extend it by hand for all
jump/call patterns.

-- 
H.J.


Re: PATCH [1/n] addr32: Properly use Pmode and word_mode

2012-03-04 Thread Ian Lance Taylor
H.J. Lu hjl.to...@gmail.com writes:

 @@ -11060,8 +11072,8 @@ ix86_expand_split_stack_prologue (void)
        {
          rtx rax;

 -         rax = gen_rtx_REG (Pmode, AX_REG);
 -         emit_move_insn (rax, reg10);
 +         rax = gen_rtx_REG (word_mode, AX_REG);
 +         emit_move_insn (rax, gen_rtx_REG (word_mode, R10_REG));
          use_reg (call_fusage, rax);
        }

 Same here. Please review how AX, R10 and R11 are defined and used.
 Also, this needs review from split stack author.

 I CCed Ian. That is the same issue.  We need some scratch registers
 in Pmode to manipulate stack.  But we have to save and restore them
 in word_mode, not Pmode.

Changing Pmode to word_mode is fine here, if the x86 maintainers approve
the rest of the patch.

Ian


Re: PATCH [1/n] addr32: Properly use Pmode and word_mode

2012-03-04 Thread Uros Bizjak
On Mon, Mar 5, 2012 at 4:53 AM, H.J. Lu hjl.to...@gmail.com wrote:

 and compiler does generate the same output. i386.c also has

        xasm = jmp\t%A0;
    xasm = call\t%A0;

 for calls.  There are no separate indirect call patterns.  For x32,
 only indirect register calls have to be in DImode.  The direct call
 should be in Pmode (SImode).

Direct call just expects label to some abolute address that is assumed
to fit in 32 bits (see constant_call_address_operand_p).

call and jmp insn expect word_mode operands, so please change
ix86_expand_call and call patterns in the same way as jump
instructions above.

 Since x86-64 hardware always zero-extends upper 32bits of 64bit
 registers when loading its lower 32bits, it is safe and easier to just
 to output 64bit registers for %A than zero-extend it by hand for all
 jump/call patterns.

No, the instruction expects word_mode operands, so we have to extend
values to expected mode. I don't think that patching at insn output
time is acceptable.

BTW: I propose to split the patch into smaller pieces, dealing with
various independent parts separately. Handling jump/call insn is
definitely one of them, the other is stringops handling, another
prologue/epilogue expansion.

Uros.


Re: PATCH [1/n] addr32: Properly use Pmode and word_mode

2012-03-02 Thread H.J. Lu
On Sat, Nov 12, 2011 at 9:32 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Sat, Nov 12, 2011 at 3:19 AM, H.J. Lu hongjiu...@intel.com wrote:

 The current x32 implementation uses LEAs to convert 32bit address to
 64bit.  However, we can use addr32 prefix to use 32bit address directly.
 It improves performance by 5% in SPEC CPU 2K/2006.  All changes are done
 in x86 backend, except for a smaill unwind library assert change:

 http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01555.html

 due to return column size difference.

 For x86-64, Pmode can be 32bit or 64bit, but word_mode is always 64bit.
 push/pop only work on word_mode.  Also string instructions take Pmode
 pointers.

 I will submit a set of patches to use 32bit Pmode for x32.  This is
 the first patch to properly use Pmode and word_mode.  It also adds
 addr32 prefix to string instructions if needed.  OK for trunk?

 Not for stage3.

 Uros.


Now trunk in stage1.  The patch is at

http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01572.html

OK for trunk?

Thanks.

-- 
H.J.


Re: PATCH [1/n] addr32: Properly use Pmode and word_mode

2011-11-12 Thread Uros Bizjak
On Sat, Nov 12, 2011 at 3:19 AM, H.J. Lu hongjiu...@intel.com wrote:

 The current x32 implementation uses LEAs to convert 32bit address to
 64bit.  However, we can use addr32 prefix to use 32bit address directly.
 It improves performance by 5% in SPEC CPU 2K/2006.  All changes are done
 in x86 backend, except for a smaill unwind library assert change:

 http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01555.html

 due to return column size difference.

 For x86-64, Pmode can be 32bit or 64bit, but word_mode is always 64bit.
 push/pop only work on word_mode.  Also string instructions take Pmode
 pointers.

 I will submit a set of patches to use 32bit Pmode for x32.  This is
 the first patch to properly use Pmode and word_mode.  It also adds
 addr32 prefix to string instructions if needed.  OK for trunk?

Not for stage3.

Uros.