from:"Uros Bizjak"

[PATCH, i386]: Remove mode of address_operand predicate from prefetch patterns

2012-09-13 Thread Uros Bizjak

Hello!

The mode of address_operand predicate is ignored in ix86_legitimate_address_p.

2012-08-13  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.md (prefetch): Do not assert mode of operand 0.
(*prefetch_sse_mode): Do not set mode of address_operand predicate.
Rename to ...
(*prefetch_sse): ... this.
(*prefetch_3dnow_mode): Do not set mode of address_operand predicate.
Rename to ...
(*prefetch_3dnow): ... this.

Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32} and
committed to mainline SVN.

Uros.
Index: i386.md
===
--- i386.md (revision 191240)
+++ i386.md (working copy)
@@ -17800,12 +17800,10 @@
   int locality = INTVAL (operands[2]);
 
   gcc_assert (rw == 0 || rw == 1);
-  gcc_assert (locality = 0  locality = 3);
-  gcc_assert (GET_MODE (operands[0]) == Pmode
- || GET_MODE (operands[0]) == VOIDmode);
+  gcc_assert (IN_RANGE (locality, 0, 3));
+
   if (TARGET_PRFCHW  rw)
 operands[2] = GEN_INT (3);
-
   /* Use 3dNOW prefetch in case we are asking for write prefetch not
  supported by SSE counterpart or the SSE prefetch is not available
  (K6 machines).  Otherwise use SSE prefetch as it allows specifying
@@ -17816,8 +17814,8 @@
 operands[1] = const0_rtx;
 })
 
-(define_insn *prefetch_sse_mode
-  [(prefetch (match_operand:P 0 address_operand p)
+(define_insn *prefetch_sse
+  [(prefetch (match_operand 0 address_operand p)
 (const_int 0)
 (match_operand:SI 1 const_int_operand))]
   TARGET_PREFETCH_SSE
@@ -17827,7 +17825,7 @@
   };
 
   int locality = INTVAL (operands[1]);
-  gcc_assert (locality = 0  locality = 3);
+  gcc_assert (IN_RANGE (locality, 0, 3));
 
   return patterns[locality];
 }
@@ -17837,8 +17835,8 @@
(symbol_ref memory_address_length (operands[0])))
(set_attr memory none)])
 
-(define_insn *prefetch_3dnow_mode
-  [(prefetch (match_operand:P 0 address_operand p)
+(define_insn *prefetch_3dnow
+  [(prefetch (match_operand 0 address_operand p)
 (match_operand:SI 1 const_int_operand n)
 (const_int 3))]
   TARGET_3DNOW || TARGET_PRFCHW

Re: [PATCH] Fix up _mm_f{,n}m{add,sub}_s{s,d} (PR target/54564)

2012-09-13 Thread Uros Bizjak

On Thu, Sep 13, 2012 at 5:52 PM, Jakub Jelinek ja...@redhat.com wrote:

 The fma-*.c testcase show that these intrinsics probably mean to preserve
 the high elements (other than the lowest) of the first argument of the
 fmaintrin.h *_s{s,d} intrinsics in the destination (the HW insn preserve
 there the destination register, but that varies - for 132 and 213 it is the
 first one (but the negation performed for _mm_fnm*_s[sd] breaks it anyway),
 for 231 it is the last one).  What the expander did was to put there
 an uninitialized pseudo, so we ended up with pretty random content, before
 H.J's http://gcc.gnu.org/viewcvs?root=gccview=revrev=190492 it happened
 to work by accident, but when things changed slightly and reload chose
 different alternative, this broke.

 The following patch fixes it, by tweaking the header so that the first
 argument is not negated (we negate the second one instead), as we don't want
 to negate the high elements if e.g. for whatever reason combiner doesn't
 match it.  It fixes the expander to use a dup of the X operand as the high
 element provider for the pattern, removes the 231 alternatives (because
 those provide different destination high elements) and removes commutative
 marker (again, that would mean different high elements).

Can we introduce additional *fmai_fmadd_mode_1 pattern (and
others) that would cover missing 231 alternative?

 2012-09-13  Jakub Jelinek  ja...@redhat.com

 PR target/54564
 * config/i386/sse.md (fmai_vmfmadd_mode): Use (match_dup 1)
 instead of (match_dup 0) as second argument to vec_merge.
 (*fmai_fmadd_mode, *fmai_fmsub_mode): Likewise.
 Remove third alternative.
 (*fmai_fnmadd_mode, *fmai_fnmsub_mode): Likewise.  Negate
 operand 2 instead of operand 1, but put it as first argument
 of fma.

 * config/i386/fmaintrin.h (_mm_fnmadd_sd, _mm_fnmadd_ss,
 _mm_fnmsub_sd, _mm_fnmsub_ss): Negate the second argument instead
 of the first.

OK, but header change should be also reviewed by H.J.

Thanks,
Uros.

Re: [RFC PATCH, go]: Port to ALPHA arch - sysinfo.go fixup

2011-07-06 Thread Uros Bizjak

On Wed, Jul 6, 2011 at 7:34 PM, Ian Lance Taylor i...@google.com wrote:

 This seems like a reasonable patch to me, but technically speaking it is
 incomplete.  Go should have IEEE floating point behaviour by default.  I
 believe Java is the same.  Ideally there would be a target-independent
 way for a frontend to request this mode by default.  It's a little bit
 odd because as far as I know every other backend does default to proper
 IEEE arithmetic, and only deviates when using -ffast-math or equivalent.

sh*-*-* also needs -mieee to handle NaN  Inf, spu-*-* simply doesn't
support them.

Uros.

Re: Remove unused t-* fragments

2011-07-06 Thread Uros Bizjak

On Wed, Jul 6, 2011 at 10:14 PM, Joseph S. Myers
jos...@codesourcery.com wrote:
 This patch removes three unused t-* makefile fragments.  (t-pa is
 unused because no target uses it explicitly and all PA targets define
 nonempty tmake_file; t-$cpu_type is is only used implicitly if
 tmake_file is empty after config.gcc.)

 Bootstrapped with no regressions on x86_64-unknown-linux-gnu.  OK to
 commit?

 2011-07-06  Joseph Myers  jos...@codesourcery.com

        * config/i386/t-crtpic, config/i386/t-svr3dbx, config/pa/t-pa:
        Remove.

OK for x86.

Thanks,
Uros.

[go]: Port to ALPHA arch - epoll problems

2011-07-07 Thread Uros Bizjak

On Tue, Jul 5, 2011 at 10:12 PM, Ian Lance Taylor i...@google.com wrote:

 What remains is a couple of unrelated failures in the testsuite:

 Epoll unexpected fd=0
 pollServer: unexpected wakeup for fd=0 mode=w
 panic: test timed out
 ../../../gcc-svn/trunk/libgo/testsuite/gotest: line 388:  7123 Aborted
                 ./a.out -test.short -test.timeout=$timeout $@
 FAIL: http
 gmake[2]: *** [http/check] Error 1

 2011/07/05 18:43:28 Test RPC server listening on 127.0.0.1:50334
 2011/07/05 18:43:28 Test HTTP RPC server listening on 127.0.0.1:49010
 2011/07/05 18:43:28 rpc.Serve: accept:accept tcp 127.0.0.1:50334:
 Resource temporarily unavailable
 FAIL: rpc
 gmake[2]: *** [rpc/check] Error 1

 2011/07/05 18:44:22 Test WebSocket server listening on 127.0.0.1:40893
 Epoll unexpected fd=0
 pollServer: unexpected wakeup for fd=0 mode=w
 panic: test timed out
 ../../../gcc-svn/trunk/libgo/testsuite/gotest: line 388: 12993 Aborted
                 ./a.out -test.short -test.timeout=$timeout $@
 FAIL: websocket
 gmake[2]: *** [websocket/check] Error 1

 ../../../gcc-svn/trunk/libgo/testsuite/gotest: line 388: 13945
 Segmentation fault      ./a.out -test.short -test.timeout=$timeout
 $@
 FAIL: compress/flate
 gmake[2]: *** [compress/flate/check] Error 1

 Any ideas how to attack these?

 None of these look familiar to me.

 An Epoll unexpected fd error means that epoll returned information
 about a file descriptor which the program didn't ask about.  Not sure
 why that would happen.  Particularly for fd 0, since epoll is only used
 for network connections, which fd 0 presumably is not.

 The way to look into these is to cd to TARGET/libgo and run make
 GOTESTFLAGS=--keep http/check (or whatever/check).  That will leave a
 directory gotest in your libgo directory.  The executable a.out in
 that directory is the test case.  You can debug the test case using gdb
 in more or less the usual way.  It's a bit painful to set breakpoints by
 function name, but setting breakpoints by file:line works fine.
 Printing variables works as well as it ever does, but the variables are
 printed in C form rather than Go form.

It turned out that the EpollEvent definition in
libgo/syscalls/epoll/socket_epoll.go is non-portable (if not outright
dangerous...). The definition does have a FIXME comment, but does not
take into account the effects of __attribute__((__packed__)) from
system headers. Contrary to alpha header, x86 has
__attribute__((__packed__)) added to struct epoll_event definition in
sys/epoll.h header.

To illustrate the problem, please run following test:

--cut here--
#include stdint.h
#include stdio.h

typedef union epoll_data
{
  void *ptr;
  int fd;
  uint32_t u32;
  uint64_t u64;
} epoll_data_t;

struct epoll_event
{
  uint32_t events;
  epoll_data_t data;
};

struct packed_epoll_event
{
  uint32_t events;
  epoll_data_t data;
} __attribute__ ((__packed__));

struct fake_epoll_event
{
  uint32_t events;
  int32_t fd;
  int32_t pad;
};

int
main ()
{
  struct epoll_event *ep;
  struct packed_epoll_event *pep;

  struct fake_epoll_event fep;

  fep.events = 0xfe;
  fep.fd = 9;
  fep.pad = 0;

  ep = (struct epoll_event *) fep;
  pep = (struct packed_epoll_event *) fep;

  printf (%#x %i\n, ep-events, ep-data.fd);
  printf (%#x %i\n, pep-events, pep-data.fd);
  return 0;
}
--cut here--

./a.out
0xfe 0
0xfe 9

So, the first line simulates the alpha, the second simulates x86_64.
32bit targets are OK in both cases:

./a.out
0xfe 9
0xfe 9

By changing the definition of EpollEvent to the form that suits alpha:

type EpollEvent struct {
  Events uint32;
  Pad int32;
  Fd int32;
};

both timeouts got fixed and correct FD was passed to and from the syscall.

Uros.

[go]: Many valgrind errors (use of uninit value, jump depends on uninit value) in the testsuite

2011-07-07 Thread Uros Bizjak

On Tue, Jul 5, 2011 at 10:12 PM, Ian Lance Taylor i...@google.com wrote:

 What remains is a couple of unrelated failures in the testsuite:

 ../../../gcc-svn/trunk/libgo/testsuite/gotest: line 388: 13945
 Segmentation fault      ./a.out -test.short -test.timeout=$timeout
 $@
 FAIL: compress/flate
 gmake[2]: *** [compress/flate/check] Error 1

 Any ideas how to attack these?

 None of these look familiar to me.

compress/flate test sometimes passes and sometimes don't. I  have run
the resulting executable through the valgrind, and there are many
(i.e. hundreds) of warnings of uses and calls that depend on
uninitialized variables, also on x86_64.

ATM, I would like to just report problems with valgrind, and due to
the number of them, it looks to me that something is wrong with the
library.

Uros.

Re: Improve Solaris mudflap support (PR libmudflap/49550)

2011-07-07 Thread Uros Bizjak

Hello!

 diff --git a/libmudflap/testsuite/libmudflap.c/pass47-frag.c 
 b/libmudflap/testsuite/libmudflap.c/pass47-frag.c
  --- a/libmudflap/testsuite/libmudflap.c/pass47-frag.c
  +++ b/libmudflap/testsuite/libmudflap.c/pass47-frag.c
 @@ -8,3 +8,5 @@ int main ()
   tolower (buf[4]) == 'o'  tolower ('X') == 'x' 
   isdigit (buf[3])) == 0  isalnum ('4'));
  }
 +
 +/* { dg-warning cannot track unknown size extern .__ctype. Solaris 
 __ctype declared without size { target *-*-solaris2.* } 0 } */

This is handled differently throughout the mudflap testsuite:

/* Ignore a warning that is irrelevant to the purpose of this test.  */
/* { dg-prune-output .*mudflap cannot track unknown size extern.* } */

Uros.

Re: [PATCH] Fix UNRESOLVED gcc.dg/graphite/pr37485.c

2011-07-07 Thread Uros Bizjak

Hello!

 Committed.

 Richard.

 2011-07-07  Richard Guenther  rguent...@suse.de

   * gcc.dg/graphite/pr37485.c: Add -floop-block.

Heh, you were faster by a minute!

Uros.

Re: PATCH [1/n] X32: Add initial -x32 support

2011-07-07 Thread Uros Bizjak

On Thu, Jul 7, 2011 at 2:59 PM, H.J. Lu hjl.to...@gmail.com wrote:

 Hi Paolo, DJ, Nathanael, Alexandre, Ralf,

 Is the change
 .
        * configure.ac: Support --enable-x32.
        * configure: Regenerated.

 diff --git a/gcc/configure.ac b/gcc/configure.ac
 index 5f3641b..bddabeb 100644
 --- a/gcc/configure.ac
 +++ b/gcc/configure.ac
 @@ -611,6 +611,11 @@ AC_ARG_ENABLE(multilib,
  [], [enable_multilib=yes])
  AC_SUBST(enable_multilib)

 +# With x32 support
 +AC_ARG_ENABLE(x32,
 +[  --enable-x32            enable x32 library support for multiple ABIs],

 Looks like a very very generic switch for a global configury ... we already
 have --with-multilib-list (SH only), why not extend that to also work
 for x86_64?

 Richard.

 +[], [enable_x32=no])
 +
  # Enable __cxa_atexit for C++.
  AC_ARG_ENABLE(__cxa_atexit,
  [AS_HELP_STRING([--enable-__cxa_atexit], [enable __cxa_atexit for C++])],

 OK?

 Thanks.


 Here is the updated patch to use --with-multilib-list=x32.

 Paolo, DJ, Nathanael, Alexandre, Ralf, Is the configure.ac change

 ---
        * configure.ac: Mention x86-64 for --with-multilib-list.
        * configure: Regenerated.

        * doc/install.texi: Document --with-multilib-list=x32.

 diff --git a/gcc/configure.ac b/gcc/configure.ac
 index 5f3641b..a73f758 100644
 --- a/gcc/configure.ac
 +++ b/gcc/configure.ac
 @@ -795,7 +795,7 @@ esac],
  [enable_languages=c])

  AC_ARG_WITH(multilib-list,
 -[AS_HELP_STRING([--with-multilib-list], [select multilibs (SH only)])],
 +[AS_HELP_STRING([--with-multilib-list], [select multilibs (SH and
 x86-64 only)])],
  :,
  with_multilib_list=default)

 diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
 index 49aac95..a5d266c 100644
 --- a/gcc/doc/install.texi
 +++ b/gcc/doc/install.texi
 @@ -1049,8 +1049,10 @@ sysv, aix.
  @item --with-multilib-list=@var{list}
  @itemx --without-multilib-list
  Specify what multilibs to build.
 -Currently only implemented for sh*-*-*.
 +Currently only implemented for sh*-*-* and x86-64-*-linux*.

 +@table @code
 +@item sh*-*-*
  @var{list} is a comma separated list of CPU names.  These must be of the
  form @code{sh*} or @code{m*} (in which case they match the compiler option
  for that processor).  The list should not contain any endian options -
 @@ -1082,6 +1084,12 @@ only little endian SH4AL:
  --with-multilib-list=sh4al,!mb/m4al
  @end smallexample

 +@item x86-64-*-linux*
 +If @var{list} is @code{x32}, x32 run-time library will be enabled.  By
 +default, x32 run-time library is disabled.
 +
 +@end table
 +
  @item --with-endian=@var{endians}
  Specify what endians to use.
  Currently only implemented for sh*-*-*.
 ---

 OK?

 Thanks.

 --
 H.J.
 ---
 2011-07-06  H.J. Lu  hongjiu...@intel.com

        * config.gcc: Support --with-multilib-list=x32 for x86 Linux
        targets.

        * configure.ac: Mention x86-64 for --with-multilib-list.
        * configure: Regenerated.

        * config/i386/gnu-user64.h (SPEC_64): Support x32.
        (SPEC_32): Likewise.
        (ASM_SPEC): Likewise.
        (LINK_SPEC): Likewise.
        (TARGET_THREAD_SSP_OFFSET): Likewise.
        (TARGET_THREAD_SPLIT_STACK_OFFSET): Likewise.
        (SPEC_X32): New.

        * config/i386/i386.h (TARGET_X32): New.
        (TARGET_LP64): New.
        (LONG_TYPE_SIZE): Likewise.
        (POINTER_SIZE): Likewise.
        (POINTERS_EXTEND_UNSIGNED): Likewise.
        (OPT_ARCH64): Support x32.
        (OPT_ARCH32): Likewise.

        * config/i386/i386.opt (mx32): New.

        * config/i386/kfreebsd-gnu64.h (GNU_USER_LINK_EMULATIONX32): New.
        (GLIBC_DYNAMIC_LINKERX32): Likewise.
        * config/i386/linux64.h (GNU_USER_LINK_EMULATIONX32): Likewise.
        (GLIBC_DYNAMIC_LINKERX32): Likewise.

        * config/i386/t-linux-x32: New.

        * config/linux.h (UCLIBC_DYNAMIC_LINKERX32): New.
        (BIONIC_DYNAMIC_LINKERX32): Likewise.
        (GNU_USER_DYNAMIC_LINKERX32): Likewise.

        * doc/install.texi: Document --with-multilib-list=x32.

        * doc/invoke.texi: Document -mx32.


 Hi Uros,

 This new version only adds a comment to configure.ac.  OK to install?

OK.

Thanks,
Uros.

Re: PATCH: Support -mx32 in GCC tests

2011-07-08 Thread Uros Bizjak

On Fri, Jul 8, 2011 at 1:03 AM, H.J. Lu hjl.to...@gmail.com wrote:

 Here is the updated patch.  I will wait for Uros's comments.


 I attached the wrong file.  Here is the updated patch.

--- a/gcc/testsuite/g++.dg/abi/bitfield3.C
+++ b/gcc/testsuite/g++.dg/abi/bitfield3.C
@@ -4,7 +4,7 @@
 // Cygwin and mingw32 default to MASK_ALIGN_DOUBLE. Override to ensure
 // 4-byte alignment.
 // { dg-options -mno-align-double { target i?86-*-cygwin* i?86-*-mingw* } }
-// { dg-require-effective-target ilp32 }
+// { dg-require-effective-target ia32 }

Please rather change dg-do run command to:

+// { dg-do ... { target { { i?86-*-* x86_64-*-* }  ia32 } } }

and remove dg-require-effective-target entirely. This will ease
grepping for certain target considerably.

+++ b/gcc/testsuite/g++.dg/ext/attrib8.C
+++ b/gcc/testsuite/g++.dg/ext/tmplattr1.C
+++ b/gcc/testsuite/g++.dg/inherit/override-attribs.C
+++ b/gcc/testsuite/g++.dg/opt/life1.C
+++ b/gcc/testsuite/g++.dg/opt/nrv12.C
+++ b/gcc/testsuite/g++.old-deja/g++.ext/attrib1.C
+++ b/gcc/testsuite/g++.old-deja/g++.ext/attrib2.C
+++ b/gcc/testsuite/g++.old-deja/g++.ext/attrib3.C
+++ b/gcc/testsuite/g++.old-deja/g++.pt/asm2.C
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-28.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/prefetch-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/prefetch-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/prefetch-5.c
... and many more.

Same here.

--- a/gcc/testsuite/gcc.dg/20020103-1.c
+++ b/gcc/testsuite/gcc.dg/20020103-1.c
@@ -1,6 +1,6 @@
 /* Verify that constant equivalences get reloaded properly, either by being
spilled to the stack, or regenerated, but not dropped to memory.  */
-/* { dg-do compile { target { { i?86-*-* rs6000-*-* alpha*-*-*
x86_64-*-* } || { powerpc*-*-*  ilp32 } } } } */
+/* { dg-do compile { target { { i?86-*-* rs6000-*-* alpha*-*-*
x86_64-*-* } || { powerpc*-*-*  ia32 } } } } */

Wrong change.

--- a/gcc/testsuite/gcc.dg/pr25023.c
+++ b/gcc/testsuite/gcc.dg/pr25023.c
@@ -1,7 +1,7 @@
 /* PR debug/25023 */
 /* { dg-do compile } */
 /* { dg-options -O2 } */
-/* { dg-options -O2 -mtune=i686 { target { { i?86-*-* || x86_64-*-*
}  ilp32 } } } */
+/* { dg-options -O2 -mtune=i686 { target { { i?86-*-* || x86_64-*-*
}  ia32 } } } */

Please also remove || in the target string.

--- a/gcc/testsuite/gcc.dg/lower-subreg-1.c
+++ b/gcc/testsuite/gcc.dg/lower-subreg-1.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { { { ! mips64 }  { ! ia64-*-* } }  {
! spu-*-* } } } } */
+/* { dg-do compile { target { { { { ! mips64 }  { ! ia64-*-* } } 
{ ! spu-*-* } }  { ! { { i?86-*-* x86_64-*-* }  x32 } } } } } */
 /* { dg-options -O -fdump-rtl-subreg1 } */
 /* { dg-require-effective-target ilp32 } */

This change is still present in updated patch, please change according
to Mike's comments. I'd prefer skip-if there, BTW.


BTW: What about using ...  { ! ia32 } instead of ...   { x32 || lp64 }  in

+/* { dg-do compile { target { { i?86-*-* x86_64-*-* }  { x32 ||
lp64 } } } } */

This will IMO future-proof the testcases.

Otherwise, the patch looks OK to me.

Uros.

[PATCH, fortran]: Fix PR 48926, gfortran.dg/coarray/image_index_1.f90 -fcoarray=single -O2 (test for excess errors)

2011-07-09 Thread Uros Bizjak

Hello!

gfc_get_corank returns integer value, not bool.  This problem was
triggered by --enable-build-with-cxx configured build.

2011-07-09  Uros Bizjak  ubiz...@gmail.com

PR fortran/48926
* expr.c (gfc_get_corank): Change return value to int.
* gfortran.h (gfc_get_corank): Update function prototype.

Patch was regression tested on x86_64-pc-linux-gnu {,-m32} with
--enable-build-with-cxx.

Approved by Tobias Burnus in the PR. Patch was committed to mainline,
will be committed to 4.6 branch soon.

Uros.
Index: expr.c
===
--- expr.c  (revision 176083)
+++ expr.c  (working copy)
@@ -4143,7 +4143,7 @@
 }
 
 
-bool
+int
 gfc_get_corank (gfc_expr *e)
 {
   int corank;
Index: gfortran.h
===
--- gfortran.h  (revision 176083)
+++ gfortran.h  (working copy)
@@ -2734,7 +2734,7 @@
 bool gfc_is_proc_ptr_comp (gfc_expr *, gfc_component **);
 
 bool gfc_is_coindexed (gfc_expr *);
-bool gfc_get_corank (gfc_expr *);
+int gfc_get_corank (gfc_expr *);
 bool gfc_has_ultimate_allocatable (gfc_expr *);
 bool gfc_has_ultimate_pointer (gfc_expr *);

Re: [rfc, i386] Convert output_mi_thunk to rtl

2011-07-10 Thread Uros Bizjak

On Sun, Jul 10, 2011 at 3:34 AM, Richard Henderson r...@redhat.com wrote:
 I developed this patch while working on the dwarf2 pass series.
 This was before I bypassed the entire problem by removing the
 !deep branch prediction paths.

 Ideally, we'd do this generically from gimple.  Less ideally,
 but still better, is to always emit rtl, and support that in
 the middle end without so many hacks in the back end.

Looks good to me!

+  reload_completed = 1;
+  epilogue_completed = 1;

Do we really need these? Perhaps a comment should be added here, it is
not obvious at the first sight...

+ tmp_regno = CX_REG;
  if ((ccvt  (IX86_CALLCVT_FASTCALL | IX86_CALLCVT_THISCALL)) != 0)
tmp_regno = AX_REG;

if (...)
  tmp_regno = AX_REG;
else
  tmp_regno = CX_REG;

Uros.

Re: PATCH [2/n] X32: Turn on 64bit and check models for x32

2011-07-10 Thread Uros Bizjak

On Sat, Jul 9, 2011 at 11:22 PM, H.J. Lu hongjiu...@intel.com wrote:

 This patch turns on 64bit and check models for x32.  OK for trunk?

 Thanks.

 H.J.
 ---
 2011-07-09  H.J. Lu  hongjiu...@intel.com

        * config/i386/i386.c (ix86_option_override_internal): Turn on
        OPTION_MASK_ISA_64BIT for TARGET_X32.  Only allow small and
        small PIC models for TARGET_X32.

OK.

Thanks,
Uros.

Re: PATCH [3/n] X32: Promote pointers to Pmode

2011-07-10 Thread Uros Bizjak

On Sat, Jul 9, 2011 at 11:28 PM, H.J. Lu hongjiu...@intel.com wrote:

 X32 psABI requires promoting pointers to Pmode when passing/returning
 in registers.  OK for trunk?

 Thanks.

 H.J.
 --
 2011-07-09  H.J. Lu  hongjiu...@intel.com

        * config/i386/i386.c (ix86_promote_function_mode): New.
        (TARGET_PROMOTE_FUNCTION_MODE): Likewise.

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index 04cb07d..c852719 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -7052,6 +7061,23 @@ ix86_function_value (const_tree valtype, const_tree 
 fntype_or_decl,
   return ix86_function_value_1 (valtype, fntype_or_decl, orig_mode, mode);
  }

 +/* Pointer function arguments and return values are promoted to
 +   Pmode.  */
 +
 +static enum machine_mode
 +ix86_promote_function_mode (const_tree type, enum machine_mode mode,
 +                           int *punsignedp, const_tree fntype,
 +                           int for_return)
 +{
 +  if (for_return != 1  type != NULL_TREE  POINTER_TYPE_P (type))
 +    {
 +      *punsignedp = POINTERS_EXTEND_UNSIGNED;
 +      return Pmode;
 +    }
 +  return default_promote_function_mode (type, mode, punsignedp, fntype,
 +                                       for_return);
 +}

Please rewrite the condition to:

if (for_return == 1)
  /* Do not promote function return values.  */
  ;
else if (type != NULL_TREE  ...)

Also, please add some comments.

Your comment also says that pointer return arguments are promoted to
Pmode. The documentation says that:

 FOR_RETURN allows to distinguish the promotion of arguments and
 return values.  If it is `1', a return value is being promoted and
 `TARGET_FUNCTION_VALUE' must perform the same promotions done here.
 If it is `2', the returned mode should be that of the register in
 which an incoming parameter is copied, or the outgoing result is
 computed; then the hook should return the same mode as
 `promote_mode', though the signedness may be different.

You bypass promotions when FOR_RETURN is 1.

Uros.

[PATCH, i386]: ix86_trampoline_init: use offset everywhere

2011-07-11 Thread Uros Bizjak

Hello!

A small cleanup, no functional change.  This allows us to assert that
generated code length is less than TRAMPOLINE_SIZE also for 32bit
targets.

2011-07-11  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.c (ix86_trampoline_init): Switch arms of if expr.
Use offset everywhere.  Always assert that offset = TRAMPOLINE_SIZE.

Tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline.

Uros.
Index: i386.c
===
--- i386.c  (revision 176159)
+++ i386.c  (working copy)
@@ -22683,54 +22683,14 @@ static void
 ix86_trampoline_init (rtx m_tramp, tree fndecl, rtx chain_value)
 {
   rtx mem, fnaddr;
+  int opcode;
+  int offset = 0;
 
   fnaddr = XEXP (DECL_RTL (fndecl), 0);
 
-  if (!TARGET_64BIT)
-{
-  rtx disp, chain;
-  int opcode;
-
-  /* Depending on the static chain location, either load a register
-with a constant, or push the constant to the stack.  All of the
-instructions are the same size.  */
-  chain = ix86_static_chain (fndecl, true);
-  if (REG_P (chain))
-   {
- if (REGNO (chain) == CX_REG)
-   opcode = 0xb9;
- else if (REGNO (chain) == AX_REG)
-   opcode = 0xb8;
- else
-   gcc_unreachable ();
-   }
-  else
-   opcode = 0x68;
-
-  mem = adjust_address (m_tramp, QImode, 0);
-  emit_move_insn (mem, gen_int_mode (opcode, QImode));
-
-  mem = adjust_address (m_tramp, SImode, 1);
-  emit_move_insn (mem, chain_value);
-
-  /* Compute offset from the end of the jmp to the target function.
-In the case in which the trampoline stores the static chain on
-the stack, we need to skip the first insn which pushes the
-(call-saved) register static chain; this push is 1 byte.  */
-  disp = expand_binop (SImode, sub_optab, fnaddr,
-  plus_constant (XEXP (m_tramp, 0),
- MEM_P (chain) ? 9 : 10),
-  NULL_RTX, 1, OPTAB_DIRECT);
-
-  mem = adjust_address (m_tramp, QImode, 5);
-  emit_move_insn (mem, gen_int_mode (0xe9, QImode));
-
-  mem = adjust_address (m_tramp, SImode, 6);
-  emit_move_insn (mem, disp);
-}
-  else
+  if (TARGET_64BIT)
 {
-  int offset = 0, size;
+  int size;
 
   /* Load the function address to r11.  Try to load address using
 the shorter movl instead of movabs.  We may want to support
@@ -22757,20 +22717,22 @@ ix86_trampoline_init (rtx m_tramp, tree 
  offset += 10;
}
 
-  /* Load static chain using movabs to r10.  */
-  mem = adjust_address (m_tramp, HImode, offset);
-  /* Use the shorter movl instead of movabs for x32.  */
+  /* Load static chain using movabs to r10.  Use the
+shorter movl instead of movabs for x32.  */
   if (TARGET_X32)
{
+ opcode = 0xba41;
  size = 6;
- emit_move_insn (mem, gen_int_mode (0xba41, HImode));
}
   else
{
+ opcode = 0xba49;
  size = 10;
- emit_move_insn (mem, gen_int_mode (0xba49, HImode));
}
 
+  mem = adjust_address (m_tramp, HImode, offset);
+  emit_move_insn (mem, gen_int_mode (opcode, HImode));
+
   mem = adjust_address (m_tramp, ptr_mode, offset + 2);
   emit_move_insn (mem, chain_value);
   offset += size;
@@ -22780,10 +22742,56 @@ ix86_trampoline_init (rtx m_tramp, tree 
   mem = adjust_address (m_tramp, SImode, offset);
   emit_move_insn (mem, gen_int_mode (0x90e3ff49, SImode));
   offset += 4;
+}
+  else
+{
+  rtx disp, chain;
 
-  gcc_assert (offset = TRAMPOLINE_SIZE);
+  /* Depending on the static chain location, either load a register
+with a constant, or push the constant to the stack.  All of the
+instructions are the same size.  */
+  chain = ix86_static_chain (fndecl, true);
+  if (REG_P (chain))
+   {
+ switch (REGNO (chain))
+   {
+   case AX_REG:
+ opcode = 0xb8; break;
+   case CX_REG:
+ opcode = 0xb9; break; 
+   default:
+ gcc_unreachable ();
+   }
+   }
+  else
+   opcode = 0x68;
+
+  mem = adjust_address (m_tramp, QImode, offset);
+  emit_move_insn (mem, gen_int_mode (opcode, QImode));
+
+  mem = adjust_address (m_tramp, SImode, offset + 1);
+  emit_move_insn (mem, chain_value);
+  offset += 5;
+
+  mem = adjust_address (m_tramp, QImode, offset);
+  emit_move_insn (mem, gen_int_mode (0xe9, QImode));
+
+  mem = adjust_address (m_tramp, SImode, offset + 1);
+
+  /* Compute offset from the end of the jmp to the target function.
+In the case in which the trampoline stores the static chain on
+the stack, we need to skip the first insn which pushes the
+(call-saved) register static chain; this push is 1 byte

Re: AMD bdver2 enablement.

2011-07-12 Thread Uros Bizjak

Hello!

 2011-07-11  Harsha Jagasia  harsha.jaga...@amd.com

   AMD bdver2 Enablement
   * config.gcc (i[34567]86-*-linux* | ...): Add bdver2.
   (case ${target}): Add bdver2.
   * config/i386/driver-i386.c (host_detect_local_cpu): Let
   -march=native recognize bdver2 processors.
   * config/i386/i386-c.c (ix86_target_macros_internal): Add
   bdver2 def_and_undef
   * config/i386/i386.c (struct processor_costs bdver2_cost): New
   bdver2 cost table.
   (m_BDVER2): New definition.
   (m_AMD_MULTIPLE): Includes m_BDVER2.
   (initial_ix86_tune_features): Add bdver2 tuning.
   (processor_target_table): Add bdver2 entry.
   (static const char *const cpu_names): Add bdver2 entry.
   (ix86_option_override_internal): Add bdver2 instruction sets.
   (ix86_issue_rate): Add bdver2.
   (ix86_adjust_cost): Add bdver2.
   (has_dispatch): Add bdver2.
   * config/i386/i386.h (TARGET_BDVER2): New definition.
   (enum target_cpu_default): Add TARGET_CPU_DEFAULT_bdver2.
   (enum processor_type): Add PROCESSOR_BDVER2.
   * config/i386/i386.md (define_attr cpu): Add bdver2.
   * config/i386/i386.opt ( mdispatch-scheduler): Add bdver2 to
   description.

OK, with a small change - see below.

@@ -1813,8 +1900,10 @@ const struct processor_costs *ix86_cost
 #define m_ATHLON_K8  (m_K8 | m_ATHLON)
 #define m_AMDFAM10  (1PROCESSOR_AMDFAM10)
 #define m_BDVER1  (1PROCESSOR_BDVER1)
+#define m_BDVER2  (1PROCESSOR_BDVER2)
 #define m_BTVER1  (1PROCESSOR_BTVER1)
-#define m_AMD_MULTIPLE  (m_K8 | m_ATHLON | m_AMDFAM10 | m_BDVER1 | m_BTVER1)
+#define m_BDVER (m_BDVER1 | m_BDVER2)
+#define m_AMD_MULTIPLE  (m_ATHLON_K8 | m_AMDFAM10 | m_BDVER | m_BTVER1)

 #define m_GENERIC32 (1PROCESSOR_GENERIC32)
 #define m_GENERIC64 (1PROCESSOR_GENERIC64)
@@ -1856,8 +1945,8 @@ static unsigned int initial_ix86_tune_fe
   ~m_386,

   /* X86_TUNE_USE_SAHF */
-  m_ATOM | m_PPRO | m_K6_GEODE | m_K8 | m_AMDFAM10 | m_BDVER1 | m_BTVER1
-  | m_PENT4 | m_NOCONA | m_CORE2I7 | m_GENERIC,
+  m_ATOM | m_PPRO | m_K6_GEODE | m_K8 | m_AMDFAM10 | m_BDVER1 | m_BDVER2
+  | m_BTVER1 | m_PENT4 | m_NOCONA | m_CORE2I7 | m_GENERIC,

Please use newly introduced m_BDVER in tune flags instead of m_BDVER1
| m_BDVER2.

Thanks,
Uros.

Re: Use of vector instructions in memmov/memset expanding

2011-07-13 Thread Uros Bizjak

Hello!

 Please don't use -m32/-m64 in testcases directly.
 You should use

 /* { dg-do compile { target { ! ia32 } } } */

 for 32bit insns and

 /* { dg-do compile { target { ia32 } } } */

 for 64bit insns.

Also, there is no need to add -mtune if -march is already specified.
-mtune will follow -march.
To scan for the %xmm register, you don't have to add -dp to compile
flags. -dp will also dump pattern name to file, so unless you are
looking for specific pattern name, you should omit -dp.

Uros.

Re: PATCH [3/n] X32: Promote pointers to Pmode

2011-07-13 Thread Uros Bizjak

On Wed, Jul 13, 2011 at 3:17 PM, H.J. Lu hjl.to...@gmail.com wrote:
 PING.

 2011-07-10  H.J. Lu  hongjiu...@intel.com

        * config/i386/i386.c (ix86_promote_function_mode): New.
        (TARGET_PROMOTE_FUNCTION_MODE): Likewise.

You have discussed this with rth, the final approval should be from him.

Uros.

[PATCH, testsuite]: Use istarget everywhere

2011-07-13 Thread Uros Bizjak

Hello!

Attached patch converts several places where string match or regexp on
$target_triplet is used with istarget.  The patch also removes quotes
around target string.

2011-07-13  Uros Bizjak  ubiz...@gmail.com

* lib/g++.exp (g++_init):  Use istarget.  Remove target_triplet global.
* lib/obj-c++.exp (obj-c++_init): Ditto.
* lib/file-format.exp (gcc_target_object_format): Ditto.
* lib/target-supports-dg.exp (dg-require-dll): Ditto.
* lib/target-supports-dg-exp (check_weak_available): Ditto.
(check_visibility_available): Ditto.
(check_effective_target_tls_native): Ditto.
(check_effective_target_tls_emulated): Ditto.
(check_effective_target_function_sections): Ditto.

Tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN.

Uros.
Index: lib/g++.exp
===
--- lib/g++.exp (revision 176236)
+++ lib/g++.exp (working copy)
@@ -188,7 +188,6 @@
 global TOOL_EXECUTABLE TOOL_OPTIONS
 global GXX_UNDER_TEST
 global TESTING_IN_BUILD_TREE
-global target_triplet
 global gcc_warning_prefix
 global gcc_error_prefix
 
@@ -263,7 +262,7 @@
 set gcc_warning_prefix warning:
 set gcc_error_prefix error:
 
-if { [string match *-*-darwin* $target_triplet] } {
+if { [istarget *-*-darwin*] } {
lappend ALWAYS_CXXFLAGS ldflags=-multiply_defined suppress
}
 
Index: lib/obj-c++.exp
===
--- lib/obj-c++.exp (revision 176236)
+++ lib/obj-c++.exp (working copy)
@@ -210,7 +210,6 @@
 global TOOL_EXECUTABLE TOOL_OPTIONS
 global OBJCXX_UNDER_TEST
 global TESTING_IN_BUILD_TREE
-global target_triplet
 global gcc_warning_prefix
 global gcc_error_prefix
 
@@ -270,7 +269,7 @@
 set gcc_warning_prefix warning:
 set gcc_error_prefix error:
 
-if { [string match *-*-darwin* $target_triplet] } {
+if { [istarget *-*-darwin*] } {
lappend ALWAYS_OBJCXXFLAGS ldflags=-multiply_defined suppress
 }
 
@@ -299,7 +298,7 @@
 # we need to add the include path for the gnu runtime if that is in
 # use.
 # First, set the default...
-if { [istarget *-*-darwin*] } {
+if { [istarget *-*-darwin*] } {
set nextruntime 1
 } else {
set nextruntime 0
Index: lib/scanasm.exp
===
--- lib/scanasm.exp (revision 176236)
+++ lib/scanasm.exp (working copy)
@@ -461,10 +461,10 @@
}
 }
 
-if { [istarget hppa*-*-*] } {
+if { [istarget hppa*-*-*] } {
set pattern [format {\t;[^:]+:%d\n(\t[^\t]+\n)+%s:\n\t.PROC} \
  $line $symbol]
-} elseif { [istarget mips-sgi-irix*] } {
+} elseif { [istarget mips-sgi-irix*] } {
set pattern [format {\t\.loc [0-9]+ %d 0( 
[^\n]*)?\n\t\.set\t(no)?mips16\n\t\.ent\t%s\n\t\.type\t%s, @function\n%s:\n} \
 $line $symbol $symbol $symbol]
 } else {
Index: lib/file-format.exp
===
--- lib/file-format.exp (revision 176236)
+++ lib/file-format.exp (working copy)
@@ -24,17 +24,16 @@
 
 proc gcc_target_object_format { } { 
 global gcc_target_object_format_saved
-global target_triplet
 global tool
 
 if [info exists gcc_target_object_format_saved] {
 verbose gcc_target_object_format returning saved 
$gcc_target_object_format_saved 2
-} elseif { [string match *-*-darwin* $target_triplet] } {
+} elseif { [istarget *-*-darwin*] } {
# Darwin doesn't necessarily have objdump, so hand-code it.
set gcc_target_object_format_saved mach-o
-} elseif { [string match hppa*-*-hpux* $target_triplet] } {
+} elseif { [istarget hppa*-*-hpux*] } {
# HP-UX doesn't necessarily have objdump, so hand-code it.
-   if { [string match hppa*64*-*-hpux* $target_triplet] } {
+   if { [istarget hppa*64*-*-hpux*] } {
  set gcc_target_object_format_saved elf
} else {
  set gcc_target_object_format_saved som
Index: lib/target-libpath.exp
===
--- lib/target-libpath.exp  (revision 176236)
+++ lib/target-libpath.exp  (working copy)
@@ -272,11 +272,11 @@
 proc get_shlib_extension { } {
 global shlib_ext
 
-if { [ istarget *-*-darwin* ] } {
+if { [istarget *-*-darwin*] } {
set shlib_ext dylib
-} elseif { [ istarget *-*-cygwin* ] || [ istarget *-*-mingw* ] } {
+} elseif { [istarget *-*-cygwin*] || [istarget *-*-mingw*] } {
set shlib_ext dll
-} elseif { [ istarget hppa*-*-hpux* ] } {
+} elseif { [istarget hppa*-*-hpux*] } {
set shlib_ext sl
 } else {
set shlib_ext so
Index: lib/go-torture.exp
===
--- lib/go-torture.exp  (revision 176236

Re: [build] Move crtfastmath to toplevel libgcc

2011-07-14 Thread Uros Bizjak

On Thu, Jul 14, 2011 at 12:09 PM, Rainer Orth
r...@cebitec.uni-bielefeld.de wrote:
 Andreas Schwab sch...@redhat.com writes:

 Same on ia64:

 Configuration mismatch!
 Extra parts from gcc directory: crtbegin.o crtbeginS.o crtend.o crtendS.o
 Extra parts from libgcc: crtbegin.o crtend.o crtbeginS.o crtendS.o 
 crtfastmath.o

Alpha needs the same fix. I need following patch to bootstrap the compiler:

--cut here--
Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 176282)
+++ gcc/config.gcc  (working copy)
@@ -757,6 +757,7 @@
extra_options=${extra_options} alpha/elf.opt
target_cpu_default=MASK_GAS
tmake_file=${tmake_file} alpha/t-alpha alpha/t-ieee alpha/t-linux
+   extra_parts=$extra_parts crtfastmath.o
;;
 alpha*-*-freebsd*)
tm_file=${tm_file} ${fbsd_tm_file} alpha/elf.h alpha/freebsd.h
--cut here--

Uros.

Re: PATCH [5/n] X32: Supprot 32bit address

2011-07-15 Thread Uros Bizjak

On Sun, Jul 10, 2011 at 12:20 AM, H.J. Lu hongjiu...@intel.com wrote:

 TARGET_MEM_REF only works on ptr_mode.  That means base and index parts
 of x86 address operand in x32 mode may be in ptr_mode.  This patch
 supports 32bit base and index parts in x32 mode.  OK for trunk?

 Thanks.


 H.J.
 ---
 2011-07-09  H.J. Lu  hongjiu...@intel.com

        * config/i386/i386.c (ix86_simplify_base_index_disp): New.
        (ix86_decompose_address): Support 32bit address in x32 mode.
        (ix86_legitimate_address_p): Likewise.
        (ix86_fixup_binary_operands): Likewise.

Why don't you handle translations in TARGET_LEGITIMIZE_ADDRESS (or
maybe also LEGITIMIZE_RELOAD_ADDRESS) ?

Uros.

Re: PATCH [5/n] X32: Supprot 32bit address

2011-07-15 Thread Uros Bizjak

On Fri, Jul 15, 2011 at 3:03 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Jul 15, 2011 at 5:49 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Sun, Jul 10, 2011 at 12:20 AM, H.J. Lu hongjiu...@intel.com wrote:

 TARGET_MEM_REF only works on ptr_mode.  That means base and index parts
 of x86 address operand in x32 mode may be in ptr_mode.  This patch
 supports 32bit base and index parts in x32 mode.  OK for trunk?

 Thanks.


 H.J.
 ---
 2011-07-09  H.J. Lu  hongjiu...@intel.com

        * config/i386/i386.c (ix86_simplify_base_index_disp): New.
        (ix86_decompose_address): Support 32bit address in x32 mode.
        (ix86_legitimate_address_p): Likewise.
        (ix86_fixup_binary_operands): Likewise.

 Why don't you handle translations in TARGET_LEGITIMIZE_ADDRESS (or
 maybe also LEGITIMIZE_RELOAD_ADDRESS) ?


 It is because ix86_decompose_address is also called from:

 predicates.md:  ok = ix86_decompose_address (op, parts);
 predicates.md:  ok = ix86_decompose_address (op, parts);
 predicates.md:  ok = ix86_decompose_address (XEXP (op, 0), parts);
 predicates.md:  ok = ix86_decompose_address (XEXP (op, 0), parts);
 predicates.md:  ok = ix86_decompose_address (XEXP (op, 0), parts);

Yes, but you should legitimize the address created by reload before it
enters into predicates.

So, the questions are:

+   (set (reg:SI 40 r11)
+(plus:SI (plus:SI (mult:SI (reg:SI 1 dx)
+  (const_int 8))
+ (subreg:SI (plus:DI (reg/f:DI 7 sp)
+ (const_int CONST1)) 0))
+(const_int CONST2)))
+
+   We translate it into
+
+   (set (reg:SI 40 r11)
+(plus:SI (plus:SI (mult:SI (reg:SI 1 dx)
+  (const_int 8))
+ (reg/f:SI 7 sp))
+(const_int [CONST1 + CONST2])))

If the first form of the address is not OK (it does not represent the
hardware operation), then it should not enter into the insn stream.
This means, that it should be fixed (legitimized) to second form by
appropriate function (it looks that LEGITIMIZE_RELOAD_ADDRESS should
fix it, since the incorrect address is generated by IRA/reload). After
this operation, various predicates, based on ix86_decompose_address
will start to work, since they will decompose valid memory addresses.

Uros.

Re: PATCH [5/n] X32: Supprot 32bit address

2011-07-15 Thread Uros Bizjak

On Fri, Jul 15, 2011 at 5:44 PM, H.J. Lu hjl.to...@gmail.com wrote:

 TARGET_MEM_REF only works on ptr_mode.  That means base and index parts
 of x86 address operand in x32 mode may be in ptr_mode.  This patch
 supports 32bit base and index parts in x32 mode.  OK for trunk?

 Thanks.


 H.J.
 ---
 2011-07-09  H.J. Lu  hongjiu...@intel.com

        * config/i386/i386.c (ix86_simplify_base_index_disp): New.
        (ix86_decompose_address): Support 32bit address in x32 mode.
        (ix86_legitimate_address_p): Likewise.
        (ix86_fixup_binary_operands): Likewise.

 Why don't you handle translations in TARGET_LEGITIMIZE_ADDRESS (or
 maybe also LEGITIMIZE_RELOAD_ADDRESS) ?


 It is because ix86_decompose_address is also called from:

 predicates.md:  ok = ix86_decompose_address (op, parts);
 predicates.md:  ok = ix86_decompose_address (op, parts);
 predicates.md:  ok = ix86_decompose_address (XEXP (op, 0), parts);
 predicates.md:  ok = ix86_decompose_address (XEXP (op, 0), parts);
 predicates.md:  ok = ix86_decompose_address (XEXP (op, 0), parts);

 Yes, but you should legitimize the address created by reload before it
 enters into predicates.

 So, the questions are:

 +   (set (reg:SI 40 r11)
 +        (plus:SI (plus:SI (mult:SI (reg:SI 1 dx)
 +                                  (const_int 8))
 +                         (subreg:SI (plus:DI (reg/f:DI 7 sp)
 +                                             (const_int CONST1)) 0))
 +                (const_int CONST2)))
 +
 +   We translate it into
 +
 +   (set (reg:SI 40 r11)
 +        (plus:SI (plus:SI (mult:SI (reg:SI 1 dx)
 +                                  (const_int 8))
 +                         (reg/f:SI 7 sp))
 +                (const_int [CONST1 + CONST2])))

 If the first form of the address is not OK (it does not represent the
 hardware operation), then it should not enter into the insn stream.
 This means, that it should be fixed (legitimized) to second form by
 appropriate function (it looks that LEGITIMIZE_RELOAD_ADDRESS should
 fix it, since the incorrect address is generated by IRA/reload). After
 this operation, various predicates, based on ix86_decompose_address
 will start to work, since they will decompose valid memory addresses.


 IRA/.RELOAD isn't prepared to deal with it and it just ICEs.  I opened
 a few GCC bugs on this.

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47744

 is one of them.  That is why I went this route.

Hm, but it crashed in postreload pass since the address was not in the
legitimate form.  This is exactly what LEGITIMIZE_RELOAD_ADDRESS
fixes. Did you try to go this route?

Uros.

Re: PATCH [5/n] X32: Supprot 32bit address

2011-07-15 Thread Uros Bizjak

On Fri, Jul 15, 2011 at 6:07 PM, H.J. Lu hjl.to...@gmail.com wrote:

 If the first form of the address is not OK (it does not represent the
 hardware operation), then it should not enter into the insn stream.
 This means, that it should be fixed (legitimized) to second form by
 appropriate function (it looks that LEGITIMIZE_RELOAD_ADDRESS should
 fix it, since the incorrect address is generated by IRA/reload). After
 this operation, various predicates, based on ix86_decompose_address
 will start to work, since they will decompose valid memory addresses.


 IRA/.RELOAD isn't prepared to deal with it and it just ICEs.  I opened
 a few GCC bugs on this.

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47744

 is one of them.  That is why I went this route.

 Hm, but it crashed in postreload pass since the address was not in the
 legitimate form.  This is exactly what LEGITIMIZE_RELOAD_ADDRESS
 fixes. Did you try to go this route?


 It ran into various ICEs like:

 /export/build/gnu/gcc-x32/build-x86_64-linux/gcc/xgcc
 -B/export/build/gnu/gcc-x32/build-x86_64-linux/gcc/ -S -o m.s -mx32 -std=gnu99
 -O2 -fPIC    m.i
 m.i: In function \u2018__kernel_rem_pio2\u2019:
 m.i:18:1: error: insn does not satisfy its constraints:
 (insn 108 106 186 3 (set (reg:SI 40 r11 [207])
        (plus:SI (plus:SI (mult:SI (reg:SI 1 dx [205])
                    (const_int 8 [0x8]))
                (subreg:SI (plus:DI (reg/f:DI 7 sp)
                        (const_int 208 [0xd0])) 0))
            (const_int -160 [0xff60]))) m.i:3 251 {*lea_1_x32}
     (nil))
 m.i:18:1: internal compiler error: in reload_cse_simplify_operands, at
 postreload.c:403
 Please submit a full bug report,
 with preprocessed source if appropriate.
 See http://gcc.gnu.org/bugs.html for instructions.
 make: *** [m.s] Error 1

Yes, this is an example from PR I am referring to. Did you try to
define LEGITIMIZE_RELOAD_ADDRESS? It is supposed to fix this.

Uros.

Re: PATCH [5/n] X32: Supprot 32bit address

2011-07-16 Thread Uros Bizjak

On Sat, Jul 16, 2011 at 6:47 PM, H.J. Lu hjl.to...@gmail.com wrote:


 Yes, this is an example from PR I am referring to. Did you try to
 define LEGITIMIZE_RELOAD_ADDRESS? It is supposed to fix this.


 They make things even more complex. ix86_simplify_base_index_disp
 is called after reload is done since we can do this translation safely
 only on hard registers, not on pseudo registers.


 Hi Uros,

 The current implementation  has been tested extensively. I'd like to keep
 it ASIS so that we can have a working x32 support.  We will revisit it later:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49765

 after we have a working x32 GCC.

This can not be only my decision, I have CCd other x86 maintainers and
RMs for their opinion on this question.

Uros.

[PATCH, i386]: FixPR47744; [x32] ICE: in reload_cse_simplify_operands, at postreload.c:403 [was: Re: PATCH [5/n] X32: Supprot 32bit address]

2011-07-18 Thread Uros Bizjak

Hello!

This alternative patch fixes the problem in ix86_decompose_address,
uncovered by x32 branch. Since x32 branch generates lots of SImode
subreg of DImode values to handle Pmode vs. ptr_mode restrictions, a
latent bug in x86_decompose_address allowed addresses in the (invalid,
see SImode subreg of DImode operation) form of:

(insn 108 106 186 3 (set (reg:SI 40 r11 [207])
(plus:SI (plus:SI (mult:SI (reg:SI 1 dx [205])
(const_int 8 [0x8]))
(subreg:SI (plus:DI (reg/f:DI 7 sp)
(const_int 208 [0xd0])) 0))
(const_int -160 [0xff60]))) m.i:3 251 {*lea_1_x32}
 (nil))

this form later choked reload to ICE with error: insn does not
satisfy its constraints: in reload_cse_simplify_operands, at
postreload.c:403.

Invalid RTX in this example was created by reload trying to eliminate
frame pointer register to RSP+offset.

The solution is to prevent subregs of DImode operations in PLUS
address sequences. We can still allow hard registers, since we are
sure that combine won't touch them and reload won't try to eliminate
them to some reg+offset. Effectively, instead of above RTX, gcc
generates more correct sequence that correctly handles SI and DImodes:

(insn 185 87 89 3 (set (reg:DI 0 ax)
(plus:DI (reg/f:DI 7 sp)
(const_int 200 [0xc8]))) pr47744.c:5 248 {*lea_1}
 (nil))

(insn 89 185 90 3 (set (reg:SI 40 r11 [177])
(plus:SI (plus:SI (mult:SI (reg:SI 40 r11 [175])
(const_int 8 [0x8]))
(reg:SI 0 ax))
(const_int -160 [0xff60]))) pr47744.c:5 286
{*lea_general_3}
 (nil))

So, there is no need for some special lea_* patterns. In addition,
this simple patch removes huge amount of problematic kludges from
current x32 branch.

Also, the patch prevents invalid address RTXes for current x86 targets
(32 and 64 bit). There is in fact no protection for i.e. SImode subreg
of HImode operation to combine into invalid address RTX. On a related
note, SImode subreg of a DImode hard register is OK also for 32bit
targets, reload will choose the lower SImode register of a DImode
pair.

Oh, and BTW: patched gcc bootstrapped faster for me on x86_64 SNB for
default configure and make -j 8:

(unpached)

real28m40.314s
user154m2.612s
sys8m16.934s

vs:

(patched)

real27m8.057s
user142m42.522s
sys7m41.875s

(see PR for details).

2011-07-18  Uros Bizjak  ubiz...@gmail.com

PR target/47744
* config/i386/i386.c (ix86_decompose_address): Allow only subregs
of DImode hard registers in PLUS address chains.

Patch was bootstrapped on x86_64-pc-linux-gnu {,-m32}. H.J. tested it
on x32 target, where the patch fixed all reported failures.

Patch was committed to mainline SVN.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 176386)
+++ config/i386/i386.c  (working copy)
@@ -11149,8 +11149,13 @@ ix86_decompose_address (rtx addr, struct
return 0;
  break;
 
-   case REG:
case SUBREG:
+ /* Allow only subregs of DImode hard regs in PLUS chains.  */
+ if (!register_no_elim_operand (SUBREG_REG (op), DImode))
+   return 0;
+ /* FALLTHRU */
+
+   case REG:
  if (!base)
base = op;
  else if (!index)

Re: PATCH [6/n] X32: Supprot 32bit address

2011-07-18 Thread Uros Bizjak

On Mon, Jul 18, 2011 at 8:39 PM, H.J. Lu hongjiu...@intel.com wrote:

 TARGET_MEM_REF only works on ptr_mode.   This patch allows 32bit address
 in x32 mode.  OK for trunk?

Do you perhaps have a testcase to help in analyzing the problem?

Uros.

Re: PATCH [6/n] X32: Supprot 32bit address

2011-07-18 Thread Uros Bizjak

On Mon, Jul 18, 2011 at 8:48 PM, H.J. Lu hjl.to...@gmail.com wrote:

 TARGET_MEM_REF only works on ptr_mode.   This patch allows 32bit address
 in x32 mode.  OK for trunk?

 Do you perhaps have a testcase to help in analyzing the problem?


 See:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49780

I don't think that tree-ssa-address/addr_for_mem_ref is correct when
REALLY_EXPAND is false. It constructs RTX template in pointer_mode,
which is not necessary valid and is rejected from
ix86_validate_address_p. When really expanding the expression, we have
a conversion at the end:

  gen_addr_rtx (pointer_mode, sym, bse, idx, st, off, address, NULL, NULL);
  if (pointer_mode != address_mode)
address = convert_memory_address (address_mode, address);
  return address;

This is in fact your r175912 change in the fix for PR47383 - you need
to do something with template as well...

Uros.

Re: PATCH [6/n] X32: Supprot 32bit address

2011-07-18 Thread Uros Bizjak

On Mon, Jul 18, 2011 at 10:25 PM, H.J. Lu hjl.to...@gmail.com wrote:

 TARGET_MEM_REF only works on ptr_mode.   This patch allows 32bit address
 in x32 mode.  OK for trunk?

 Do you perhaps have a testcase to help in analyzing the problem?


 See:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49780

 I don't think that tree-ssa-address/addr_for_mem_ref is correct when
 REALLY_EXPAND is false. It constructs RTX template in pointer_mode,
 which is not necessary valid and is rejected from
 ix86_validate_address_p. When really expanding the expression, we have
 a conversion at the end:

  gen_addr_rtx (pointer_mode, sym, bse, idx, st, off, address, NULL, NULL);
  if (pointer_mode != address_mode)
    address = convert_memory_address (address_mode, address);
  return address;

 This is in fact your r175912 change in the fix for PR47383 - you need
 to do something with template as well...


 Since TARGET_MEM_REF only works on ptr_mode, I don't think
 we can change template.  We just need to accept TARGET_MEM_REF
 in ptr_mode and fix it up later.

No, a template is used to get some insight into the supported address
structure. If there is a mismatch, this approach fails, we can as well
give the compiler whatever fake template we want.

Uros.

Re: PATCH [5/n] X32: Supprot 32bit address

2011-07-19 Thread Uros Bizjak

On Tue, Jul 19, 2011 at 1:25 PM, Richard Sandiford
richard.sandif...@linaro.org wrote:

 On Sat, Jul 16, 2011 at 6:47 PM, H.J. Lu hjl.to...@gmail.com wrote:
 Yes, this is an example from PR I am referring to. Did you try to
 define LEGITIMIZE_RELOAD_ADDRESS? It is supposed to fix this.


 They make things even more complex. ix86_simplify_base_index_disp
 is called after reload is done since we can do this translation safely
 only on hard registers, not on pseudo registers.


 Hi Uros,

 The current implementation  has been tested extensively. I'd like to keep
 it ASIS so that we can have a working x32 support.  We will revisit it 
 later:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49765

 after we have a working x32 GCC.

 This can not be only my decision, I have CCd other x86 maintainers and
 RMs for their opinion on this question.

 FWIW, I agree with you that things like:

   (set (reg:SI 40 r11)
        (plus:SI (plus:SI (mult:SI (reg:SI 1 dx)
                                   (const_int 8))
                          (subreg:SI (plus:DI (reg/f:DI 7 sp)
                                              (const_int CONST1)) 0))
                 (const_int CONST2)))

 do not look like things that should ever enter the insn stream.
 They're liable to confuse other code besides the x86 predicates.
 The target of the conversion:

   (set (reg:SI 40 r11)
        (plus:SI (plus:SI (mult:SI (reg:SI 1 dx)
                                   (const_int 8))
                          (reg/f:SI 7 sp))
                 (const_int [CONST1 + CONST2])))

 looks like the generally preferred form.  It isn't an x32-ism.

 LEGITIMIZE_RELOAD_ADDRESS is supposed to be for optimisation only,
 not correctness.  Why doesn't reload have enough information to
 generate the correct form itself?

Please see the solution at [1]. The problem was that x86 target
allowed SImode subregs of DImode operations (i.e. PLUS).  When these
are rejected, everything works as expected.

IMO, LEGITIMIZE_RELOAD_ADDRESS can not optimize resulting RTX, as shown in [1].

[1] http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01427.html

Uros.

Re: PATCH [6/n] X32: Supprot 32bit address

2011-07-19 Thread Uros Bizjak

On Tue, Jul 19, 2011 at 3:47 PM, H.J. Lu hjl.to...@gmail.com wrote:

 Attached patch simply removes these two checks, as it seems they are
 not needed. This also follows how other Pmode != ptr_mode targets.

 2011-07-19  Uros Bizjak  ubiz...@gmail.com

        PR target/49780
        * config/i386/i386.c (ix86_legitimate_address_p): Remove checks that
        base and index registers are in Pmode.

 Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
 {,-m32}. Can you please re-test it on x32?

 Comparing with my patch, which only allows DImode and SImode,
 it caused the following regressions:

 FAIL: libgomp.fortran/omp_atomic1.f90  -O1  execution test
 FAIL: libgomp.fortran/omp_atomic1.f90  -O2  execution test
 FAIL: libgomp.fortran/omp_atomic1.f90  -O3 -fomit-frame-pointer  execution 
 test
 FAIL: libgomp.fortran/omp_atomic1.f90  -O3 -fomit-frame-pointer
 -funroll-all-loops -finline-functions  execution test
 FAIL: libgomp.fortran/omp_atomic1.f90  -O3 -fomit-frame-pointer
 -funroll-loops  execution test
 FAIL: libgomp.fortran/omp_atomic1.f90  -O3 -g  execution test
 FAIL: libgomp.fortran/omp_atomic1.f90  -Os  execution test

 BTW: I still think that template should return the same address
 structure as expansion, but this won't crash the compiler anymore.

There is no non-DImode addresses in insn stream, so I doubt the bug is
due to my change.

Uros.

Re: PATCH [6/n] X32: Supprot 32bit address

2011-07-19 Thread Uros Bizjak

On Tue, Jul 19, 2011 at 4:42 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Tue, Jul 19, 2011 at 7:04 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Tue, Jul 19, 2011 at 3:47 PM, H.J. Lu hjl.to...@gmail.com wrote:

 Attached patch simply removes these two checks, as it seems they are
 not needed. This also follows how other Pmode != ptr_mode targets.

 2011-07-19  Uros Bizjak  ubiz...@gmail.com

        PR target/49780
        * config/i386/i386.c (ix86_legitimate_address_p): Remove checks that
        base and index registers are in Pmode.

 Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
 {,-m32}. Can you please re-test it on x32?

 Comparing with my patch, which only allows DImode and SImode,
 it caused the following regressions:

 FAIL: libgomp.fortran/omp_atomic1.f90  -O1  execution test
 FAIL: libgomp.fortran/omp_atomic1.f90  -O2  execution test
 FAIL: libgomp.fortran/omp_atomic1.f90  -O3 -fomit-frame-pointer  execution 
 test
 FAIL: libgomp.fortran/omp_atomic1.f90  -O3 -fomit-frame-pointer
 -funroll-all-loops -finline-functions  execution test
 FAIL: libgomp.fortran/omp_atomic1.f90  -O3 -fomit-frame-pointer
 -funroll-loops  execution test
 FAIL: libgomp.fortran/omp_atomic1.f90  -O3 -g  execution test
 FAIL: libgomp.fortran/omp_atomic1.f90  -Os  execution test

 BTW: I still think that template should return the same address
 structure as expansion, but this won't crash the compiler anymore.

 There is no non-DImode addresses in insn stream, so I doubt the bug is
 due to my change.


 I saw the same failures on x86-64:

 http://gcc.gnu.org/ml/gcc-testresults/2011-07/msg02224.html

 Can you take a look?

Sometimes, the compiler is really creative in inventing instructions:

(insn 47 46 49 7 (set (reg:SI 68 [ D.1686 ])
(subreg:SI (plus:SF (reg:SF 159 [ D.1685 ])
(reg:SF 159 [ D.1685 ])) 0)) omp_atomic1.f90:17 247 {*lea_2}
 (expr_list:REG_DEAD (reg:SF 159 [ D.1685 ])
(nil)))

Really funny.

Uros.

Re: PATCH [6/n] X32: Supprot 32bit address

2011-07-19 Thread Uros Bizjak

On Tue, Jul 19, 2011 at 6:30 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Tue, Jul 19, 2011 at 06:26:33PM +0200, Uros Bizjak wrote:
 Sometimes, the compiler is really creative in inventing instructions:

 (insn 47 46 49 7 (set (reg:SI 68 [ D.1686 ])
         (subreg:SI (plus:SF (reg:SF 159 [ D.1685 ])
                 (reg:SF 159 [ D.1685 ])) 0)) omp_atomic1.f90:17 247 {*lea_2}
      (expr_list:REG_DEAD (reg:SF 159 [ D.1685 ])
         (nil)))

 Really funny.

 That's the job of combiner to try all kinds of stuff and it is the
 responsibility of the backend to reject those.  I think it would be better
 to get back to testing Pmode in the legitimate address hook, perhaps
 allowing ptr_mode too in addition to Pmode (which for -m32/-m64 won't mean
 any change, just for -mx32).

Actually, there is a bypass in ix86_decompose_address, and this RTX
squeezed through. IMO constructs like this should be rejected in
i_d_a, which effectively only moves Pmode/ptr_mode check here.

I'm looking into it.

Uros.

Re: PATCH [6/n] X32: Supprot 32bit address

2011-07-19 Thread Uros Bizjak

On Tue, Jul 19, 2011 at 6:37 PM, Uros Bizjak ubiz...@gmail.com wrote:
 Sometimes, the compiler is really creative in inventing instructions:

 (insn 47 46 49 7 (set (reg:SI 68 [ D.1686 ])
         (subreg:SI (plus:SF (reg:SF 159 [ D.1685 ])
                 (reg:SF 159 [ D.1685 ])) 0)) omp_atomic1.f90:17 247 {*lea_2}
      (expr_list:REG_DEAD (reg:SF 159 [ D.1685 ])
         (nil)))

 Really funny.

 That's the job of combiner to try all kinds of stuff and it is the
 responsibility of the backend to reject those.  I think it would be better
 to get back to testing Pmode in the legitimate address hook, perhaps
 allowing ptr_mode too in addition to Pmode (which for -m32/-m64 won't mean
 any change, just for -mx32).

 Actually, there is a bypass in ix86_decompose_address, and this RTX
 squeezed through. IMO constructs like this should be rejected in
 i_d_a, which effectively only moves Pmode/ptr_mode check here.

 I'm looking into it.

The problem was in fact the declaration of no_seg_address_operand
predicate that was defined as special predicate and this way ignoring
the mode of the operand.

Attached patch also includes check for DImode SUBREGS for base
register, to eventually save x32 some trouble in future.

I'm currently regression testing the patch added to the patch that
removed Pmode checks.

H.J., can you please test it on x32?

Uros.
Index: predicates.md
===
--- predicates.md   (revision 176462)
+++ predicates.md   (working copy)
@@ -796,7 +796,7 @@
 
 ;; Return true if op if a valid address, and does not contain
 ;; a segment override.
-(define_special_predicate no_seg_address_operand
+(define_predicate no_seg_address_operand
   (match_operand 0 address_operand)
 {
   struct ix86_address parts;
Index: i386.c
===
--- i386.c  (revision 176462)
+++ i386.c  (working copy)
@@ -11085,8 +11085,16 @@ ix86_decompose_address (rtx addr, struct
   int retval = 1;
   enum ix86_address_seg seg = SEG_DEFAULT;
 
-  if (REG_P (addr) || GET_CODE (addr) == SUBREG)
+  if (REG_P (addr))
 base = addr;
+  else if (GET_CODE (addr) == SUBREG)
+{
+  /* Allow only subregs of DImode hard regs.  */
+  if (register_no_elim_operand (SUBREG_REG (addr), DImode))
+   base = addr;
+  else
+   return 0;
+}
   else if (GET_CODE (addr) == PLUS)
 {
   rtx addends[4], op;

Re: PATCH [6/n] X32: Supprot 32bit address

2011-07-19 Thread Uros Bizjak

On Tue, Jul 19, 2011 at 7:33 PM, Uros Bizjak ubiz...@gmail.com wrote:
 Sometimes, the compiler is really creative in inventing instructions:

 (insn 47 46 49 7 (set (reg:SI 68 [ D.1686 ])
         (subreg:SI (plus:SF (reg:SF 159 [ D.1685 ])
                 (reg:SF 159 [ D.1685 ])) 0)) omp_atomic1.f90:17 247 
 {*lea_2}
      (expr_list:REG_DEAD (reg:SF 159 [ D.1685 ])
         (nil)))

 Really funny.

 That's the job of combiner to try all kinds of stuff and it is the
 responsibility of the backend to reject those.  I think it would be better
 to get back to testing Pmode in the legitimate address hook, perhaps
 allowing ptr_mode too in addition to Pmode (which for -m32/-m64 won't mean
 any change, just for -mx32).

 Actually, there is a bypass in ix86_decompose_address, and this RTX
 squeezed through. IMO constructs like this should be rejected in
 i_d_a, which effectively only moves Pmode/ptr_mode check here.

 I'm looking into it.

 The problem was in fact the declaration of no_seg_address_operand
 predicate that was defined as special predicate and this way ignoring
 the mode of the operand.

This change should be backported to 4.6 and 4.5.

Uros.

Re: PATCH [8/n] X32: Convert to Pmode if needed

2011-07-19 Thread Uros Bizjak

On Tue, Jul 19, 2011 at 6:47 AM, H.J. Lu hongjiu...@intel.com wrote:

 This patch adds the missing Pmode check and conversion.  OK for trunk?

 2011-07-18  H.J. Lu  hongjiu...@intel.com

        * config/i386/i386.c (ix86_legitimize_address): Convert to
        Pmode if needed.
        (ix86_expand_move): Likewise.
        (ix86_expand_call): Likewise.
        (ix86_expand_special_args_builtin): Likewise.
        (ix86_expand_builtin): Likewise.

copy_addr_to_reg ?

Uros.

Re: PATCH [6/n] X32: Supprot 32bit address

2011-07-19 Thread Uros Bizjak

On Tue, Jul 19, 2011 at 6:30 PM, Jakub Jelinek ja...@redhat.com wrote:

 Sometimes, the compiler is really creative in inventing instructions:

 (insn 47 46 49 7 (set (reg:SI 68 [ D.1686 ])
         (subreg:SI (plus:SF (reg:SF 159 [ D.1685 ])
                 (reg:SF 159 [ D.1685 ])) 0)) omp_atomic1.f90:17 247 {*lea_2}
      (expr_list:REG_DEAD (reg:SF 159 [ D.1685 ])
         (nil)))

 Really funny.

 That's the job of combiner to try all kinds of stuff and it is the
 responsibility of the backend to reject those.  I think it would be better
 to get back to testing Pmode in the legitimate address hook, perhaps
 allowing ptr_mode too in addition to Pmode (which for -m32/-m64 won't mean
 any change, just for -mx32).

I agree that we still need to check naked registers. However, for
64bit targets it is OK to pass both, SImode and DImode registers. We
are sure that SImode values in DImode regs have top 32bits equal to 0
in address calculations. This is not true for QImode regs (assignment
to lowpart only). We also have to prevent non-integer registers.

Attached is my final version of the patch.

Uros.
Index: predicates.md
===
--- predicates.md   (revision 176462)
+++ predicates.md   (working copy)
@@ -796,7 +796,7 @@
 
 ;; Return true if op if a valid address, and does not contain
 ;; a segment override.
-(define_special_predicate no_seg_address_operand
+(define_predicate no_seg_address_operand
   (match_operand 0 address_operand)
 {
   struct ix86_address parts;
Index: i386.c
===
--- i386.c  (revision 176462)
+++ i386.c  (working copy)
@@ -11085,8 +11085,16 @@ ix86_decompose_address (rtx addr, struct
   int retval = 1;
   enum ix86_address_seg seg = SEG_DEFAULT;
 
-  if (REG_P (addr) || GET_CODE (addr) == SUBREG)
+  if (REG_P (addr))
 base = addr;
+  else if (GET_CODE (addr) == SUBREG)
+{
+  /* Allow only subregs of DImode hard regs.  */
+  if (register_no_elim_operand (SUBREG_REG (addr), DImode))
+   base = addr;
+  else
+   return 0;
+}
   else if (GET_CODE (addr) == PLUS)
 {
   rtx addends[4], op;
@@ -11643,8 +11651,7 @@ ix86_legitimate_address_p (enum machine_
/* Base is not a register.  */
return false;
 
-  if (GET_MODE (base) != Pmode)
-   /* Base is not in Pmode.  */
+  if (GET_MODE (base) != SImode  GET_MODE (base) != DImode)
return false;
 
   if ((strict  ! REG_OK_FOR_BASE_STRICT_P (reg))
@@ -11672,8 +11679,7 @@ ix86_legitimate_address_p (enum machine_
/* Index is not a register.  */
return false;
 
-  if (GET_MODE (index) != Pmode)
-   /* Index is not in Pmode.  */
+  if (GET_MODE (index) != SImode  GET_MODE (index) != DImode)
return false;
 
   if ((strict  ! REG_OK_FOR_INDEX_STRICT_P (reg))

Re: [PATCH, testsuite] Fix for PR47440 - Use LCM for vzeroupper insertion

2011-07-20 Thread Uros Bizjak

Hello!

  ? ? ? ?* a/gcc/gcse.c (alloc_gcse_mem): Added code to run in PRE2.

 And this is necessary because...???

 Why not just make it a separate pass in ix86-reorg that uses LCM? Look at 
 mode switching for an example.

I was also expecting that vzeroupper would be inserted in the same way
as I387 mode switching instructions are inserted. To expand on
Steven's suggestion, please see i386.h for OPTIMIZE_MODE_SWITCHING and
following macros.

At the moment, there are 4 separate entities that handle (four
independent) insertions for mode switching for x87 for each mode of
fistp or frndint instruction. Mode insertions will actually insert
calculations of x87 control word (CW) at optimal points and push this
new CW (together with old CW) to known stack slot to be consumed by
fistp/frndint insn.

You can add a new entitiy to enum ix86_entity (say, AVX_VZEROUPPER)
and update OPTIMIZE_MODE_SWITCHING to perform mode insertion for
AVX_VZEROUPPER entitiy when needed. Various modes for AVX_VZEROUPPER
are defined in NUM_MODES_FOR_MODE_SWITCHING, mode transition in
MODE_NEEDED and insn insertions in EMIT_MODE_SET.

Please note that LCM handles all entities in parallel, so there is no
need for extra passes. The real worker for mode switching is
ix86_mode_needed, but don't forget that you can disable mode switching
pass per-function when not needed through OPTIMIZE_MODE_SWITCHING
macro.

FYI: Existing x87 CW initialization insertion works this way:
- fistp/frndint is inserted into insn stream and corresponding
OPTIMIZE_MODE_SWITCHING flag is set.
- inserted insn has i386_cw attribute that defines requested mode in
which the insn operate. Based on this attribute, MODE_NEEDED handles
mode transitions (please note that there are four independent
entities) for each entitiy.
- EMIT_MODE_SET emits CW initializations. These are further optimized
by follow-up optimization passes, so two consecutive initializations
at the same place are CSEd, etc.

Uros.

Re: PATCH [7/n] X32: Handle address output and calls patterns

2011-07-20 Thread Uros Bizjak

On Wed, Jul 20, 2011 at 4:51 AM, H.J. Lu hjl.to...@gmail.com wrote:

 I had it in my x32 tree. But I reverted:

 http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00954.html

 since Pmode is used in non-PIC tablejump, we have to put 64bit value for
 labels with 0 upper 32bits in tablejump for x32.

 The mode is completely controled by CASE_VECTOR_MODE.


 Here is the updated patch.  OK for trunk?


 A small change.  It always use 64bit register for indirect branch.

- ix86_print_operand (file, x, 0);
+ /* Always use 64bit register for indirect branch.  */
+ ix86_print_operand (file, x,
+ REG_P (x)  TARGET_64BIT ? 'q' : 0);
  return;

/* Always use 64bit register for indirect branch.  */
if (REG_P (x)  TARGET_64BIT)
  print_reg (x, 'q', file);
else
  ix86_print_operand (file, x, 0);

 (define_insn *indirect_jump
-  [(set (pc) (match_operand:P 0 nonimmediate_operand rm))]
+  [(set (pc) (match_operand:P 0 x32_indirect_branch_operand rm))]

Just name it indirect_branch_operand.

 (define_insn_and_split *call_vzeroupper
-  [(call (mem:QI (match_operand:P 0 call_insn_operand czm))
+  [(call (mem:QI (match_operand:P 0 x32_call_insn_operand czm))

Don't introduce new predicate, change call_insn_operand instead to
conditionally disable memory_operand on x32. You will need to change
czm register constraint to cz on x32, otherwise you will get
ICEs.

And i386.c also calls call_insn_operand in one place.

Uros.

Re: PATCH [7/n] X32: Handle address output and calls patterns

2011-07-20 Thread Uros Bizjak

On Wed, Jul 20, 2011 at 9:53 AM, Uros Bizjak ubiz...@gmail.com wrote:

 since Pmode is used in non-PIC tablejump, we have to put 64bit value for
 labels with 0 upper 32bits in tablejump for x32.

 The mode is completely controled by CASE_VECTOR_MODE.


 Here is the updated patch.  OK for trunk?


 A small change.  It always use 64bit register for indirect branch.

 -         ix86_print_operand (file, x, 0);
 +         /* Always use 64bit register for indirect branch.  */
 +         ix86_print_operand (file, x,
 +                             REG_P (x)  TARGET_64BIT ? 'q' : 0);
          return;

 /* Always use 64bit register for indirect branch.  */
 if (REG_P (x)  TARGET_64BIT)
  print_reg (x, 'q', file);
 else
  ix86_print_operand (file, x, 0);

  (define_insn *indirect_jump
 -  [(set (pc) (match_operand:P 0 nonimmediate_operand rm))]
 +  [(set (pc) (match_operand:P 0 x32_indirect_branch_operand rm))]

 Just name it indirect_branch_operand.

  (define_insn_and_split *call_vzeroupper
 -  [(call (mem:QI (match_operand:P 0 call_insn_operand czm))
 +  [(call (mem:QI (match_operand:P 0 x32_call_insn_operand czm))

 Don't introduce new predicate, change call_insn_operand instead to
 conditionally disable memory_operand on x32. You will need to change
 czm register constraint to cz on x32, otherwise you will get
 ICEs.

Use new constraint here, something like (untested):

Index: constraints.md
===
--- constraints.md  (revision 176494)
+++ constraints.md  (working copy)
@@ -127,6 +127,11 @@
   @internal Constant call address operand.
   (match_operand 0 constant_call_address_operand))

+(define_constraint w
+  @internal Call memory operand.
+  (and (match_test !TARGET_X32)
+   (match_operand 0 memory_operand))
+
 ;; Integer constant constraints.
 (define_constraint I
   Integer constant in the range 0 @dots{} 31, for 32-bit shifts.

Uros.

Re: PATCH [6/n] X32: Supprot 32bit address

2011-07-20 Thread Uros Bizjak

On Wed, Jul 20, 2011 at 2:54 PM, H.J. Lu hjl.to...@gmail.com wrote:

 Sometimes, the compiler is really creative in inventing instructions:

 (insn 47 46 49 7 (set (reg:SI 68 [ D.1686 ])
         (subreg:SI (plus:SF (reg:SF 159 [ D.1685 ])
                 (reg:SF 159 [ D.1685 ])) 0)) omp_atomic1.f90:17 247 
 {*lea_2}
      (expr_list:REG_DEAD (reg:SF 159 [ D.1685 ])
         (nil)))

 Really funny.

 That's the job of combiner to try all kinds of stuff and it is the
 responsibility of the backend to reject those.  I think it would be better
 to get back to testing Pmode in the legitimate address hook, perhaps
 allowing ptr_mode too in addition to Pmode (which for -m32/-m64 won't mean
 any change, just for -mx32).

 I agree that we still need to check naked registers. However, for
 64bit targets it is OK to pass both, SImode and DImode registers. We
 are sure that SImode values in DImode regs have top 32bits equal to 0
 in address calculations. This is not true for QImode regs (assignment
 to lowpart only). We also have to prevent non-integer registers.

 Attached is my final version of the patch.


 It works fine.  Can you check it in?

Tested on x86_64-pc-linux-gnu {,-m32} and committed to mainline SVN
with following ChangeLog:

2011-07-20  Uros Bizjak  ubiz...@gmail.com

PR target/49780
* config/i386/predicates.md (no_seg_addres_operand): No more special.
* config/i386/i386.c (ix86_decompose_address): Allow only subregs
of DImode hard registers in base.
(ix86_legitimate_address_p): Allow SImode and DImode base and index
registers.

Uros.

Re: PATCH [7/n] X32: Handle address output and calls patterns

2011-07-20 Thread Uros Bizjak

On Wed, Jul 20, 2011 at 3:18 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Wed, Jul 20, 2011 at 1:19 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Wed, Jul 20, 2011 at 9:53 AM, Uros Bizjak ubiz...@gmail.com wrote:

 since Pmode is used in non-PIC tablejump, we have to put 64bit value for
 labels with 0 upper 32bits in tablejump for x32.

 The mode is completely controled by CASE_VECTOR_MODE.


 Here is the updated patch.  OK for trunk?


 A small change.  It always use 64bit register for indirect branch.

 -         ix86_print_operand (file, x, 0);
 +         /* Always use 64bit register for indirect branch.  */
 +         ix86_print_operand (file, x,
 +                             REG_P (x)  TARGET_64BIT ? 'q' : 0);
          return;

 /* Always use 64bit register for indirect branch.  */
 if (REG_P (x)  TARGET_64BIT)
  print_reg (x, 'q', file);
 else
  ix86_print_operand (file, x, 0);

  (define_insn *indirect_jump
 -  [(set (pc) (match_operand:P 0 nonimmediate_operand rm))]
 +  [(set (pc) (match_operand:P 0 x32_indirect_branch_operand rm))]

 Just name it indirect_branch_operand.

  (define_insn_and_split *call_vzeroupper
 -  [(call (mem:QI (match_operand:P 0 call_insn_operand czm))
 +  [(call (mem:QI (match_operand:P 0 x32_call_insn_operand czm))

 Don't introduce new predicate, change call_insn_operand instead to
 conditionally disable memory_operand on x32. You will need to change
 czm register constraint to cz on x32, otherwise you will get
 ICEs.

 Use new constraint here, something like (untested):

 Index: constraints.md
 ===
 --- constraints.md      (revision 176494)
 +++ constraints.md      (working copy)
 @@ -127,6 +127,11 @@
   @internal Constant call address operand.
   (match_operand 0 constant_call_address_operand))

 +(define_constraint w
 +  @internal Call memory operand.
 +  (and (match_test !TARGET_X32)
 +       (match_operand 0 memory_operand))
 +
  ;; Integer constant constraints.
  (define_constraint I
   Integer constant in the range 0 @dots{} 31, for 32-bit shifts.

 Uros.


 Here is the updated patch.  OK for trunk?

 Thanks.

 --
 H.J.
 -
 2011-07-20  H.J. Lu  hongjiu...@intel.com
            Uros Bizjak  ubiz...@gmail.com

        * config/i386/constraints.md (w): New.

        * config/i386/i386.c (ix86_print_operand): Always use 64bit
        register for indirect branch.
        (ix86_output_addr_vec_elt): Check TARGET_LP64 instead of
        TARGET_64BIT for ASM_QUAD.

        * config/i386/i386.h (CASE_VECTOR_MODE): Check TARGET_LP64
        instead of TARGET_64BIT.

        * config/i386/i386.md (*indirect_jump): Replace
        nonimmediate_operand with indirect_branch_operand.
        (*tablejump_1): Likewise.
        (*call_vzeroupper): Replace constraint m with w.
        (*call): Likewise.
        (*call_rex64_ms_sysv_vzeroupper): Likewise.
        (*call_rex64_ms_sysv): Likewise.
        (*call_value_vzeroupper): Likewise.
        (*call_value): Likewise.
        (*call_value_rex64_ms_sysv_vzeroupper): Likewise.
        (*call_value_rex64_ms_sysv): Likewise.
        (*tablejump_1_x32): New.
        (set_got_offset_rex64): Check TARGET_LP64 instead of
        TARGET_64BIT.

        * config/i386/predicates.md (indirect_branch_operand): New.
        (call_insn_operand): Support x32.


+
+(define_insn *tablejump_1_x32
+  [(set (pc) (match_operand:SI 0 register_operand r))
+   (use (label_ref (match_operand 1  )))]
+  TARGET_X32
+  jmp\t%A0
+  [(set_attr type ibr)
+   (set_attr length_immediate 0)])

This pattern should include zero_extend from operand 0. Please fix the
tablejump expander to generate correct pattern.

Also, indirect jump needs to generate zero_extend from SImode register for x32.

Other than that, the patch looks OK to me. Please also wait for rth's approval.

Thanks,
Uros.

Re: PATCH [7/n] X32: Handle address output and calls patterns

2011-07-20 Thread Uros Bizjak

On Wed, Jul 20, 2011 at 4:09 PM, H.J. Lu hjl.to...@gmail.com wrote:
Hello!

 +(define_insn *tablejump_1_x32
 +  [(set (pc) (match_operand:SI 0 register_operand r))
 +   (use (label_ref (match_operand 1  )))]
 +  TARGET_X32
 +  jmp\t%A0
 +  [(set_attr type ibr)
 +   (set_attr length_immediate 0)])

 This pattern should include zero_extend from operand 0. Please fix the
 tablejump expander to generate correct pattern.

 Also, indirect jump needs to generate zero_extend from SImode register for 
 x32.


 I am testing this patch on top of the last one. We don't need to zero-extend
 indirect jump since it takes operand 0 in Pmode, which is DImode.


Looks good to me, but please wait for rth's approval.

Thanks,
Uros.

[PATCH, i386]: Allow subregs of multi-word values in addresses

2011-07-20 Thread Uros Bizjak

On Wed, Jul 20, 2011 at 9:46 PM, Uros Bizjak ubiz...@gmail.com wrote:

 Note that SUBREG_PROMOTED_UNSIGNED_P wasn't designed for paradoxical subregs,
 but for regular subregs (typically of word-sized objects).  You should check
 that the ones created for x32 (because of POINTERS_EXTEND_UNSIGNED I guess)
 are legitimate.

I have left out paradoxical subreg stuff ATM and committed following
patch that allows subregs of multi-word values in addresses.

2011-07-20  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.c (ix86_decompose_address): Allow only subregs
of DImode hard registers in index.
(ix86_legitimate_address_p): Allow subregs of base and index to span
more than a word.  Assert that subregs of base and index satisfy
register_no_elim_operand predicates.  Reject addresses where
base and index have different modes.

Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
{,-m32} and committed to mainline SVN.

(I will prepare a followup [RFC] patch that also allows paradoxical
(?) subregs for experimenting and testing on x32 target).

Uros.
Index: i386.c
===
--- i386.c  (revision 176533)
+++ i386.c  (working copy)
@@ -11197,6 +11197,16 @@ ix86_decompose_address (rtx addr, struct
   else
 disp = addr;   /* displacement */
 
+  if (index)
+{
+  if (REG_P (index))
+   ;
+  /* Allow only subregs of DImode hard regs.  */
+  else if (GET_CODE (index) == SUBREG
+   !register_no_elim_operand (SUBREG_REG (index), DImode))
+   return 0;
+}
+
   /* Extract the integral value of scale.  */
   if (scale_rtx)
 {
@@ -11630,23 +11640,18 @@ ix86_legitimate_address_p (enum machine_
   disp = parts.disp;
   scale = parts.scale;
 
-  /* Validate base register.
-
- Don't allow SUBREG's that span more than a word here.  It can lead to 
spill
- failures when the base is one word out of a two word structure, which is
- represented internally as a DImode int.  */
-
+  /* Validate base register.  */
   if (base)
 {
   rtx reg;
 
   if (REG_P (base))
reg = base;
-  else if (GET_CODE (base) == SUBREG
-   REG_P (SUBREG_REG (base))
-   GET_MODE_SIZE (GET_MODE (SUBREG_REG (base)))
- = UNITS_PER_WORD)
-   reg = SUBREG_REG (base);
+  else if (GET_CODE (base) == SUBREG  REG_P (SUBREG_REG (base)))
+   {
+ reg = SUBREG_REG (base);
+ gcc_assert (register_no_elim_operand (reg, DImode));
+   }
   else
/* Base is not a register.  */
return false;
@@ -11660,21 +11665,18 @@ ix86_legitimate_address_p (enum machine_
return false;
 }
 
-  /* Validate index register.
-
- Don't allow SUBREG's that span more than a word here -- same as above.  */
-
+  /* Validate index register.  */
   if (index)
 {
   rtx reg;
 
   if (REG_P (index))
reg = index;
-  else if (GET_CODE (index) == SUBREG
-   REG_P (SUBREG_REG (index))
-   GET_MODE_SIZE (GET_MODE (SUBREG_REG (index)))
- = UNITS_PER_WORD)
-   reg = SUBREG_REG (index);
+  else if (GET_CODE (index) == SUBREG  REG_P (SUBREG_REG (index)))
+   {
+ reg = SUBREG_REG (index);
+ gcc_assert (register_no_elim_operand (reg, DImode));
+   }
   else
/* Index is not a register.  */
return false;
@@ -11688,6 +11690,11 @@ ix86_legitimate_address_p (enum machine_
return false;
 }
 
+  /* Index and base should have the same mode.  */
+  if (base  index
+   GET_MODE (base) != GET_MODE (index))
+return false;
+
   /* Validate scale factor.  */
   if (scale != 1)
 {

Re: PATCH [8/n] X32: Convert to Pmode if needed

2011-07-21 Thread Uros Bizjak

On Tue, Jul 19, 2011 at 6:47 AM, H.J. Lu hongjiu...@intel.com wrote:

So, since copy_to_reg  co. expects x in Pmode or VOIDmode constant
(due to force_reg that won't do mode conversion), we have to implement
them with a mode conversion...

 This patch adds the missing Pmode check and conversion.  OK for trunk?

 2011-07-18  H.J. Lu  hongjiu...@intel.com

        * config/i386/i386.c (ix86_legitimize_address): Convert to
        Pmode if needed.
        (ix86_expand_move): Likewise.
        (ix86_expand_call): Likewise.
        (ix86_expand_special_args_builtin): Likewise.
        (ix86_expand_builtin): Likewise.

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index c268899..1ed451b 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -12618,7 +12667,11 @@ ix86_legitimize_address (rtx x, rtx oldx 
 ATTRIBUTE_UNUSED,
          rtx temp = gen_reg_rtx (Pmode);
          rtx val  = force_operand (XEXP (x, 1), temp);
          if (val != temp)
 -           emit_move_insn (temp, val);
 +           {
 +             if (GET_MODE (val) != Pmode)
 +               val = convert_to_mode (Pmode, val, 1);
 +             emit_move_insn (temp, val);
 +           }

          XEXP (x, 1) = temp;
          return x;

OK.

 @@ -12629,7 +12682,11 @@ ix86_legitimize_address (rtx x, rtx oldx 
 ATTRIBUTE_UNUSED,
          rtx temp = gen_reg_rtx (Pmode);
          rtx val  = force_operand (XEXP (x, 0), temp);
          if (val != temp)
 -           emit_move_insn (temp, val);
 +           {
 +             if (GET_MODE (val) != Pmode)
 +               val = convert_to_mode (Pmode, val, 1);
 +             emit_move_insn (temp, val);
 +           }

OK.

 @@ -14956,6 +15023,8 @@ ix86_expand_move (enum machine_mode mode, rtx 
 operands[])
       if (model)
        {
          op1 = legitimize_tls_address (op1, model, true);
 +         if (GET_MODE (op1) != mode)
 +           op1 = convert_to_mode (mode, op1, 1);
          op1 = force_operand (op1, op0);
          if (op1 == op0)
            return;

Please write this part in the same form as above two changes. This
way, force_operand will emit instructions in narrower mode (i.e.
SImode, not in DImode).

 @@ -21475,7 +21554,10 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx 
 callarg1,
           ? !sibcall_insn_operand (XEXP (fnaddr, 0), Pmode)
           : !call_insn_operand (XEXP (fnaddr, 0), Pmode))
     {
 -      fnaddr = copy_to_mode_reg (Pmode, XEXP (fnaddr, 0));
 +      fnaddr = XEXP (fnaddr, 0);
 +      if (GET_MODE (fnaddr) != Pmode)
 +       fnaddr = convert_to_mode (Pmode, fnaddr, 1);
 +      fnaddr = copy_to_mode_reg (Pmode, fnaddr);
       fnaddr = gen_rtx_MEM (QImode, fnaddr);
     }


Use force_reg (Pmode, ...) instead of copy_to_mode_reg (Pmode, ...).
We know we have Pmode here. No need to copy a register if
convert_to_mode returned a register.

Better yet:

fnaddr = gen_rtx_MEM (QImode, force_reg (Pmode, fnaddr));

 @@ -26700,7 +26782,11 @@ ix86_expand_special_args_builtin (const struct 
 builtin_description *d,
       op = expand_normal (arg);
       gcc_assert (target == 0);
       if (memory)
 -       target = gen_rtx_MEM (tmode, copy_to_mode_reg (Pmode, op));
 +       {
 +         if (GET_MODE (op) != Pmode)
 +           op = convert_to_mode (Pmode, op, 1);
 +         target = gen_rtx_MEM (tmode, copy_to_mode_reg (Pmode, op));
 +       }
       else
        target = force_reg (tmode, op);
       arg_adjust = 1;

Use force_reg.

 @@ -26743,6 +26829,8 @@ ix86_expand_special_args_builtin (const struct 
 builtin_description *d,
          if (i == memory)
            {
              /* This must be the memory operand.  */
 +             if (GET_MODE (op) != Pmode)
 +               op = convert_to_mode (Pmode, op, 1);
              op = gen_rtx_MEM (mode, copy_to_mode_reg (Pmode, op));
              gcc_assert (GET_MODE (op) == mode
                          || GET_MODE (op) == VOIDmode);

Same here.

 @@ -26969,6 +27057,8 @@ ix86_expand_builtin (tree exp, rtx target, rtx 
 subtarget ATTRIBUTE_UNUSED,
       mode1 = insn_data[icode].operand[1].mode;
       mode2 = insn_data[icode].operand[2].mode;

 +      if (GET_MODE (op0) != Pmode)
 +       op0 = convert_to_mode (Pmode, op0, 1);
       op0 = force_reg (Pmode, op0);
       op0 = gen_rtx_MEM (mode1, op0);


op0 = gen_rtx_MEM (mode1, force_reg (Pmode, op0));

 @@ -27001,7 +27091,11 @@ ix86_expand_builtin (tree exp, rtx target, rtx 
 subtarget ATTRIBUTE_UNUSED,
        op0 = expand_normal (arg0);
        icode = CODE_FOR_sse2_clflush;
        if (!insn_data[icode].operand[0].predicate (op0, Pmode))
 +         {
 +           if (GET_MODE (op0) != Pmode)
 +             op0 = convert_to_mode (Pmode, op0, 1);
            op0 = copy_to_mode_reg (Pmode, op0);
 +         }

        emit_insn (gen_sse2_clflush (op0));
        return 0;

Use force_reg.

 @@ -27014,7 +27108,11 @@ ix86_expand_builtin (tree exp, rtx target, rtx 
 subtarget ATTRIBUTE_UNUSED,
       op1 = expand_normal (arg1);

[PATCH, i386]: Reject wrong RTXes from index early

2011-07-21 Thread Uros Bizjak

Hello!

Just a small optimization, we can reject non-register RTXes and wrong
subregs from index early.  No functional change - these RTXes were
rejected in ix86_legitimate_address_p anyway.

2011-07-21  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.c (ix86_decompose_address): Reject all but
register operands and DImode hard registers in index.

Tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN.

Uros.

Index: i386.c
===
--- i386.c  (revision 176550)
+++ i386.c  (working copy)
@@ -11203,7 +11203,9 @@
;
   /* Allow only subregs of DImode hard regs.  */
   else if (GET_CODE (index) == SUBREG
-   !register_no_elim_operand (SUBREG_REG (index), DImode))
+   register_no_elim_operand (SUBREG_REG (index), DImode))
+   ;
+  else
return 0;
 }

[PATCH, testsuite]: Introduce check_avx_os_support_available

2011-07-21 Thread Uros Bizjak

Hello!

This is the same functionality as recently added to glibc [1].

2011-07-21  Uros Bizjak  ubiz...@gmail.com

* lib/target-supports.exp (check_avx_os_support_available): New.
(check_effective_target_avx_runtime): Use it.

Tested on x86_64-pc-linux-gnu {,-m32} AVX and non-AVX targets,
ommitted to mainline SVN.
The patch will be backported to release branches.

[1] 
http://sourceware.org/git/?p=glibc.git;a=commitdiff;h=5644ef5461b5d3ff266206d8ee70d4b575ea6658

Uros.
Index: lib/target-supports.exp
===
--- lib/target-supports.exp (revision 176571)
+++ lib/target-supports.exp (working copy)
@@ -1070,8 +1070,8 @@
check_runtime_nocache sse_os_support_available {
int main ()
{
-   __asm__ volatile (movaps %xmm0,%xmm0);
-   return 0;
+ asm volatile (movaps %xmm0,%xmm0);
+ return 0;
}
} -msse
} else {
@@ -1080,6 +1080,29 @@
 }]
 }
 
+# Return 1 if the target OS supports running AVX executables, 0
+# otherwise.  Cache the result.
+
+proc check_avx_os_support_available { } {
+return [check_cached_effective_target avx_os_support_available {
+   # If this is not the right target then we can skip the test.
+   if { !([istarget x86_64-*-*] || [istarget i?86-*-*]) } {
+   expr 0
+   } else {
+   # Check that OS has AVX and SSE saving enabled.
+   check_runtime_nocache avx_os_support_available {
+   int main ()
+   {
+ unsigned int eax, edx;
+
+ asm (xgetbv : =a (eax), =d (edx) : c (0));
+ return (eax  6) != 6;
+   }
+   } 
+   }
+}]
+}
+
 # Return 1 if the target supports executing SSE instructions, 0
 # otherwise.  Cache the result.
 
@@ -1176,7 +1199,8 @@
 
 proc check_effective_target_avx_runtime { } {
 if { [check_effective_target_avx]
- [check_avx_hw_available] } {
+ [check_avx_hw_available]
+ [check_avx_os_support_available] } {
return 1
 }
 return 0

Re: PATCH [9/n] X32: PR target/49798: Zero-extend symbol address to 64bit if needed

2011-07-21 Thread Uros Bizjak

On Thu, Jul 21, 2011 at 6:28 PM, H.J. Lu hjl.to...@gmail.com wrote:

 .quad  symbol isn't really valid for 32bit.

 Why not?  We certainly know what value to put there.


 x32 doesn't support 64bit relocation, like R_X86_64_64.
 In many causes,  generate

 .long symbol
 .long 0

 for .quad symbol is wrong. Please see:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47446

 for some examples.

Please also see:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49798#c12

on why I think this is middle-end/tree-optimization issue.

Uros.

Re: PATCH [9/n] X32: PR target/49798: Zero-extend symbol address to 64bit if needed

2011-07-21 Thread Uros Bizjak

On Thu, Jul 21, 2011 at 7:24 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, Jul 21, 2011 at 10:04 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Thu, Jul 21, 2011 at 6:28 PM, H.J. Lu hjl.to...@gmail.com wrote:

 .quad  symbol isn't really valid for 32bit.

 Why not?  We certainly know what value to put there.


 x32 doesn't support 64bit relocation, like R_X86_64_64.
 In many causes,  generate

 .long symbol
 .long 0

 for .quad symbol is wrong. Please see:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47446

 for some examples.

 Please also see:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49798#c12

 on why I think this is middle-end/tree-optimization issue.


 I still think it is a backend issue.

/* Represents viewing something of one type as being of a second type.
   This corresponds to an Unchecked Conversion in Ada and roughly to
   the idiom *(type2 *)X in C.  The only operand is the value to be
   viewed as being of another type.  **It is undefined if the type of the
   input and of the expression have different sizes.**

   ...
DEFTREECODE (VIEW_CONVERT_EXPR, view_convert_expr, tcc_reference, 1)

We have:

bb 2:
  D.2709_8 = VIEW_CONVERT_EXPRdouble();
  D.2702_1 = u.d;
  D.2704_3 = D.2702_1 == D.2709_8;
  D.2701_4 = (int) D.2704_3;
  return D.2701_4;

Where

sizeof (double) = 64
sizeof (ptr_type) = 32.

Uros.

Re: PATCH [9/n] X32: PR target/49798: Zero-extend symbol address to 64bit if needed

2011-07-21 Thread Uros Bizjak

On Thu, Jul 21, 2011 at 6:42 PM, Richard Henderson r...@redhat.com wrote:
 On 07/21/2011 09:28 AM, H.J. Lu wrote:
 On Thu, Jul 21, 2011 at 9:23 AM, Richard Henderson r...@redhat.com wrote:
 On 07/21/2011 09:20 AM, H.J. Lu wrote:
 .quad  symbol isn't really valid for 32bit.

 Why not?  We certainly know what value to put there.


 x32 doesn't support 64bit relocation, like R_X86_64_64.

 This being a self-fulfilling assertion, because you decided
 to disable that relocation.  It *could* be supported.  Easily.

IMO, it is OK to disable 64bit relocations, and that compiler is at
fault here. Consider that something gets written to the d field (see
example of PR49798). Reading a pointer from *m fileld in DImode, we
will get non-zero bits in high 32bits of a pointer. We have to access
the pointer in SImode.

Uros.

Re: PATCH [9/n] X32: PR target/49798: Zero-extend symbol address to 64bit if needed

2011-07-21 Thread Uros Bizjak

On Thu, Jul 21, 2011 at 10:00 PM, H.J. Lu hjl.to...@gmail.com wrote:

 /* Represents viewing something of one type as being of a second type.
   This corresponds to an Unchecked Conversion in Ada and roughly to
   the idiom *(type2 *)X in C.  The only operand is the value to be
   viewed as being of another type.  **It is undefined if the type of the
   input and of the expression have different sizes.**

   ...
 DEFTREECODE (VIEW_CONVERT_EXPR, view_convert_expr, tcc_reference, 1)

 We have:

 bb 2:
  D.2709_8 = VIEW_CONVERT_EXPRdouble();
  D.2702_1 = u.d;
  D.2704_3 = D.2702_1 == D.2709_8;
  D.2701_4 = (int) D.2704_3;
  return D.2701_4;

 Where

 sizeof (double) = 64
 sizeof (ptr_type) = 32.


 Are you sure that you used -mx32?  I couldn't reproduce it.
 It looks like an x86 backend bug to me.

Hm, can't reproduce it anymore... x32 -O2 looks OK:

bb 2:
  v = {};
  v.m = ;
  D.2702_1 = u.d;
  D.2703_2 = v.d;
  D.2704_3 = D.2702_1 == D.2703_2;
  D.2701_4 = (int) D.2704_3;
  return D.2701_4;

}

Expand generates:

(insn 8 6 9 (set (reg:SI 68)
(symbol_ref:SI () [flags 0x40]  var_decl 0x7fccc360b140 )) p
r49798.c:12 -1
 (nil))

(insn 9 8 10 (set (reg:DI 67)
(zero_extend:DI (reg:SI 68))) pr49798.c:12 -1
 (nil))

I don't know if this is OK to be transformed to DImode load.

Uros.

Re: PATCH [9/n] X32: PR target/49798: Zero-extend symbol address to 64bit if needed

2011-07-21 Thread Uros Bizjak

On Thu, Jul 21, 2011 at 10:22 PM, H.J. Lu hjl.to...@gmail.com wrote:

 Expand generates:

 (insn 8 6 9 (set (reg:SI 68)
        (symbol_ref:SI () [flags 0x40]  var_decl 0x7fccc360b140 
 )) p
 r49798.c:12 -1
     (nil))

 (insn 9 8 10 (set (reg:DI 67)
        (zero_extend:DI (reg:SI 68))) pr49798.c:12 -1
     (nil))

 I don't know if this is OK to be transformed to DImode load.


 I believe it is valid.

How is this situation handled in other targets? I don't see that any
of other ptr_mode != Pmode targets define TARGET_ASM_INTEGER in the
way you propose.

Uros.

[PATCH, testsuite]: Fix detection of ifunc support

2011-07-21 Thread Uros Bizjak

Hello!

Revision 164725 [1] broke detection of ifunc support in the testsuite
[2] due to extra #endif without if in the test function. Attached
patch fixes this up.

2011-07-21  Uros Bizjak  ubiz...@gmail.com

* lib/target-supports.exp (check_ifunc_available): Fix test function.

The patch is tested on x86_64-pc-linux-gnu, but my toolchain does not
support ifunc attribute. Can somebody please test it with ifunc
support?

OTOH, the patch is kind of obvious, so OK for mainline?

[1] http://gcc.gnu.org/viewcvs?view=revisionrevision=164725
[2] 
http://gcc.gnu.org/viewcvs/trunk/gcc/testsuite/lib/target-supports.exp?r1=164725r2=164724pathrev=164725

Uros.

Index: lib/target-supports.exp
===
--- lib/target-supports.exp (revision 176584)
+++ lib/target-supports.exp (working copy)
@@ -381,10 +381,8 @@
set obj ifunc[pid].o
 verbose check_ifunc_available  compiling testfile $src 2
set f [open $src w]
-   puts $f #endif
puts $f #ifdef __cplusplus\nextern \C\\n#endif
-   puts $f void g() {}
-   puts $f void f() __attribute__((ifunc(\g\)));
+   puts $f void g() {} f() __attribute__((ifunc(\g\)));
close $f
set lines [${tool}_target_compile $src $obj object ]
file delete $src

Re: [PATCH, testsuite]: Fix detection of ifunc support

2011-07-21 Thread Uros Bizjak

On Thu, Jul 21, 2011 at 11:56 PM, Uros Bizjak ubiz...@gmail.com wrote:

 Revision 164725 [1] broke detection of ifunc support in the testsuite
 [2] due to extra #endif without if in the test function. Attached
 patch fixes this up.

Actually, we can use existing testsuite infrastructure to simplify the
function substantially.

2011-07-21  Uros Bizjak  ubiz...@gmail.com

        * lib/target-supports.exp (check_ifunc_available): Rewrite.

The patch is tested on x86_64-pc-linux-gnu, but my toolchain does not
support ifunc attribute. Can somebody please test it with ifunc
support?

OK for mainline and 4.6 ?

Uros.
Index: lib/target-supports.exp
===
--- lib/target-supports.exp (revision 176584)
+++ lib/target-supports.exp (working copy)
@@ -361,45 +361,16 @@
 return $alias_available_saved
 }
 
-###
-# proc check_ifunc_available { }
-###
+# Returns 1 if the target supports ifunc, 0 otherwise.
 
-# Determine if the target toolchain supports the ifunc attribute.
-
-# Returns 1 if the target supports ifunc.  Returns 0 if the target
-# does not support ifunc.
-
 proc check_ifunc_available { } {
-global ifunc_available_saved
-global tool
-
-if [info exists ifunc_available_saved] {
-verbose check_ifunc_available  returning saved 
$ifunc_available_saved 2
-} else {
-   set src ifunc[pid].c
-   set obj ifunc[pid].o
-verbose check_ifunc_available  compiling testfile $src 2
-   set f [open $src w]
-   puts $f #endif
-   puts $f #ifdef __cplusplus\nextern \C\\n#endif
-   puts $f void g() {}
-   puts $f void f() __attribute__((ifunc(\g\)));
-   close $f
-   set lines [${tool}_target_compile $src $obj object ]
-   file delete $src
-   remote_file build delete $obj
-
-   if [string match  $lines] then {
-   set ifunc_available_saved 1
-   } else {
-   set ifunc_available_saved 0
-   }
-
-   verbose check_ifunc_available  returning $ifunc_available_saved 2
-}
-
-return $ifunc_available_saved
+return [check_no_compiler_messages ifunc_available object {
+   #ifdef __cplusplus
+   extern C
+   #endif
+   void g() {}
+   f() __attribute__((ifunc(g)));
+}]
 }
 
 # Returns true if --gc-sections is supported on the target.

[PATCH, build]: Enable default_gnu_indirect_function on x86_64--linux

2011-07-22 Thread Uros Bizjak

Hello!

Fixing ifunc test function in the testsuite uncovered a nasty screwup
in config.gcc that prohibited usage of GNU indirect functions on
x86_64-*-linux*. Fixed by mirroring i[34567]86-*-linux* setting.

2011-07-22  Uros Bizjak  ubiz...@gmail.com

* config.gcc (i[34567]86-*-linux*): Set
default_gnu_indirect_function to yes.

Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}.
Committed to mainline, will commit to 4.6 after regression tests
finish there.

Uros.
Index: config.gcc
===
--- config.gcc  (revision 176624)
+++ config.gcc  (working copy)
@@ -1327,8 +1327,10 @@
 i386/x86-64.h i386/gnu-user64.h
case ${target} in
x86_64-*-linux*)
- tm_file=${tm_file} linux.h i386/linux64.h
- default_gnu_indirect_function=glibc-2011 ;;
+   tm_file=${tm_file} linux.h i386/linux64.h
+   # Assume modern glibc
+   default_gnu_indirect_function=yes
+   ;;
x86_64-*-kfreebsd*-gnu) tm_file=${tm_file} kfreebsd-gnu.h 
i386/kfreebsd-gnu64.h ;;
x86_64-*-knetbsd*-gnu) tm_file=${tm_file} knetbsd-gnu.h ;;
esac

Re: [PATCH, build]: Enable default_gnu_indirect_function on x86_64--linux

2011-07-22 Thread Uros Bizjak

On Fri, Jul 22, 2011 at 5:27 PM, Uros Bizjak ubiz...@gmail.com wrote:

 Fixing ifunc test function in the testsuite uncovered a nasty screwup
 in config.gcc that prohibited usage of GNU indirect functions on
 x86_64-*-linux*. Fixed by mirroring i[34567]86-*-linux* setting.

 2011-07-22  Uros Bizjak  ubiz...@gmail.com

        * config.gcc (i[34567]86-*-linux*): Set
        default_gnu_indirect_function to yes.

(x86_64-*-linux*) in fact. Fixed typo in ChangeLog.

Uros.

Re: [PATCH, build]: Enable default_gnu_indirect_function on x86_64--linux

2011-07-22 Thread Uros Bizjak

On Fri, Jul 22, 2011 at 5:38 PM, H.J. Lu hjl.to...@gmail.com wrote:

 Fixing ifunc test function in the testsuite uncovered a nasty screwup
 in config.gcc that prohibited usage of GNU indirect functions on
 x86_64-*-linux*. Fixed by mirroring i[34567]86-*-linux* setting.

 2011-07-22  Uros Bizjak  ubiz...@gmail.com

        * config.gcc (i[34567]86-*-linux*): Set
        default_gnu_indirect_function to yes.

 (x86_64-*-linux*) in fact. Fixed typo in ChangeLog.


 Can we also enable it for Linux/iX86?

It is already enabled for this target.

Uros.

[PATCH, libstdc++]: Backport PR libstdc++/49293 fix to 4.6 branch

2011-07-22 Thread Uros Bizjak

Hello!

This patch backports the fix to the testcase for newer glibcs to 4.6 branch.

2011-07-22  Uros Bizjak  ubiz...@gmail.com

Backport from mainline
2011-06-07  Paolo Carlini  paolo.carl...@oracle.com

PR libstdc++/49293
* testsuite/22_locale/time_get/get_weekday/char/38081-1.cc: Tweak
for glibc 2.14.
* testsuite/22_locale/time_get/get_weekday/char/38081-2.cc: Likewise.

Tested on x86_64-pc-linux-gnu on Fedora 15.

OK for 4.6 branch?

Uros.
Index: testsuite/22_locale/time_get/get_weekday/char/38081-1.cc
===
--- testsuite/22_locale/time_get/get_weekday/char/38081-1.cc(revision 
176630)
+++ testsuite/22_locale/time_get/get_weekday/char/38081-1.cc(working copy)
@@ -1,6 +1,6 @@
 // { dg-require-namedlocale ru_RU.ISO-8859-5 }
 
-// Copyright (C) 2010 Free Software Foundation
+// Copyright (C) 2010, 2011 Free Software Foundation
 //
 // This file is part of the GNU ISO C++ Library.  This library is free
 // software; you can redistribute it and/or modify it under the
@@ -49,7 +49,11 @@
   // get_weekday(iter_type, iter_type, ios_base, 
   // ios_base::iostate, tm*) const
 
+#if __GLIBC__  2 || (__GLIBC__ == 2  __GLIBC_MINOR__ = 14)
+  iss.str(\xbf\xdd\x2e);
+#else
   iss.str(\xbf\xdd\xd4);
+#endif
   iterator_type is_it01(iss);
   tm time01;
   memset(time01, -1, sizeof(tm));
@@ -67,7 +71,11 @@
   VERIFY( time02.tm_wday == 1 );
   VERIFY( errorstate == ios_base::eofbit );
 
+#if __GLIBC__  2 || (__GLIBC__ == 2  __GLIBC_MINOR__ = 14)
+  iss.str(\xbf\xdd\x2e\xd5\xd4\xd5\xdb\xec\xdd\xd8\xda);
+#else
   iss.str(\xbf\xdd\xd4\xd5\xd4\xd5\xdb\xec\xdd\xd8\xda);
+#endif
   iterator_type is_it03(iss);
   tm time03;
   memset(time03, -1, sizeof(tm));
Index: testsuite/22_locale/time_get/get_weekday/char/38081-2.cc
===
--- testsuite/22_locale/time_get/get_weekday/char/38081-2.cc(revision 
176630)
+++ testsuite/22_locale/time_get/get_weekday/char/38081-2.cc(working copy)
@@ -2,7 +2,7 @@
 
 // 2010-01-05  Paolo Carlini  paolo.carl...@oracle.com
 
-// Copyright (C) 2010 Free Software Foundation
+// Copyright (C) 2010, 2011 Free Software Foundation
 //
 // This file is part of the GNU ISO C++ Library.  This library is free
 // software; you can redistribute it and/or modify it under the
@@ -50,6 +50,15 @@
   // get_weekday(iter_type, iter_type, ios_base, 
   // ios_base::iostate, tm*) const
 
+#if __GLIBC__  2 || (__GLIBC__ == 2  __GLIBC_MINOR__ = 14)
+  const char* awdays[7] = { \u0412\u0441\u002E,
+   \u041F\u043D\u002E,
+   \u0412\u0442\u002E,
+   \u0421\u0440\u002E,
+   \u0427\u0442\u002E,
+   \u041F\u0442\u002E,
+   \u0421\u0431\u002E };
+#else
   const char* awdays[7] = { \u0412\u0441\u043A,
\u041F\u043D\u0434,
\u0412\u0442\u0440,
@@ -57,6 +66,7 @@
\u0427\u0442\u0432,
\u041F\u0442\u043D,
\u0421\u0431\u0442 };
+#endif
 
   for (int i = 0; i  7; ++i)
 {

[PATCH, i386]: Rewrite LEA handling (was:Re: PATCH [10/n] X32: Support x32 LEA insns)

2011-07-24 Thread Uros Bizjak

On Sat, Jul 23, 2011 at 3:57 PM, H.J. Lu hjl.to...@gmail.com wrote:

 This patch adds x32 LEA insn support.  The main issue is

 gen_lowpart (Pmode, operands[1]);

 doesn't work on symbol.  This patch avoids it.

 Also we shouldn't generate 32bit store with x32 PIC source.

 Any comments?

You are not fixing the core of the problem... this is why you need so
much hacks and kludges at various places (some w.r.t. -fPIC already
existed, see the patch). Above, you correctly identified the problem,
so let's avoid gen_lowpart on SImode operands by not calling it
anymore.

Attached patch effectively rewrites LEA handling. The trick is, that
instead of using Pmode operations in addresses, we use either SImode
or DImode operations to calculate the address on 64bit targets. Up to
now, address calculations strictly used Pmode, so SImode on 32bit
targets and DImode on 64bit targets. Recent patches to
ix86_decompose_address and ix86_legitimate_address_p relaxed this
requirement.

Attached patch changes LEA patterns and LEA splitters to accept
addresses, calculated with either SImode or DImode operations.This
means, that on x64 targets, we don't use gen_lowpart on SImode
operands anymore. Since symbol references on x32 are in SImode, this
solves the problem. The patch also avoids generating SImode subregs of
DImode addresses and DImode zero_extends of SImode addresses, since
LEA insn does this for us automatically.

Please also note the change to ix86_print_operand_address. To avoid
addr32 prefixes, we can force registers in DImode on 64bit targets
without any problems. On x32, we can investigate, if this change
avoids unnecessary LEAs (for PR 49781, patched gcc genrates 6 vs. 8).
Also, we can investigate the effect of addr32 on benchmarks.

Patched gcc also fixes all testcases from PR 47381.

2011-07-24  Uros Bizjak  ubiz...@gmail.com

PR target/47381
* config/i386/i386.md (*lea_1): Use SWI48 mode iterator.
(*lea_1_zext): New insn pattern.
(add-lea splitter): Check operand modes in insn constraint.  Extend
operands less than SImode wide to SImode.
(add-lea zext splitter): Do not extend operands to DImode.
(*lea_general_1): Handle only QImode and HImode operands.
(*lea_general_2): Ditto.
(*lea_general_3): Ditto.
(*lea_general_1_zext): Remove.
(*lea_general_2_zext): Ditto.
(*lea_general_3_zext): Ditto.
(*lea_general_4): Check operand modes in insn constraint.  Extend
operands less than SImode wide to SImode.
(ashift-lea splitter): Ditto.
* config/i386/i386.md (ix86_print_operand_address): Print address
registers with 'q' modifier on 64bit targets.

Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
{,-m32} with no regressions. H.J., can you please test it on x32?

BTW: -fPIC is not yet implemented on trunk and still fails there with
an (unrelated) error, I didn't check x32 branch.

Uros.
Index: i386.md
===
--- i386.md (revision 176713)
+++ i386.md (working copy)
@@ -5425,13 +5425,22 @@
(set_attr mode QI)])
 
 (define_insn *lea_1
-  [(set (match_operand:P 0 register_operand =r)
-   (match_operand:P 1 no_seg_address_operand p))]
+  [(set (match_operand:SWI48 0 register_operand =r)
+   (match_operand:SWI48 1 no_seg_address_operand p))]
   
   lea{imodesuffix}\t{%a1, %0|%0, %a1}
   [(set_attr type lea)
(set_attr mode MODE)])
 
+(define_insn *lea_1_zext
+  [(set (match_operand:DI 0 register_operand =r)
+   (zero_extend:DI
+ (match_operand:SI 1 no_seg_address_operand p)))]
+  TARGET_64BIT
+  lea{l}\t{%a1, %k0|%k0, %a1}
+  [(set_attr type lea)
+   (set_attr mode SI)])
+
 (define_insn *lea_2
   [(set (match_operand:SI 0 register_operand =r)
(subreg:SI (match_operand:DI 1 no_seg_address_operand p) 0))]
@@ -5794,39 +5803,36 @@
 (const_string none)))
(set_attr mode QI)])
 
-;; Convert lea to the lea pattern to avoid flags dependency.
+;; Convert add to the lea pattern to avoid flags dependency.
 (define_split
   [(set (match_operand 0 register_operand )
(plus (match_operand 1 register_operand )
   (match_operand 2 nonmemory_operand )))
(clobber (reg:CC FLAGS_REG))]
-  reload_completed  ix86_lea_for_add_ok (insn, operands) 
+  GET_MODE (operands[0]) == GET_MODE (operands[1])
+(GET_MODE (operands[0]) == GET_MODE (operands[2])
+   || GET_MODE (operands[2]) == VOIDmode)
+reload_completed  ix86_lea_for_add_ok (insn, operands) 
   [(const_int 0)]
 {
-  rtx pat;
   enum machine_mode mode = GET_MODE (operands[0]);
-
-  /* In -fPIC mode the constructs like (const (unspec [symbol_ref]))
- may confuse gen_lowpart.  */
-  if (mode != Pmode)
-{
-  operands[1] = gen_lowpart (Pmode, operands[1]);
-  operands[2] = gen_lowpart (Pmode, operands[2]);
-}
-
-  pat = gen_rtx_PLUS (Pmode, operands[1], operands[2]);
+  rtx pat

Re: [PATCH, i386]: Rewrite LEA handling (was:Re: PATCH [10/n] X32: Support x32 LEA insns)

2011-07-25 Thread Uros Bizjak

On Mon, Jul 25, 2011 at 3:58 AM, H.J. Lu hjl.to...@gmail.com wrote:

 You are not fixing the core of the problem... this is why you need so
 much hacks and kludges at various places (some w.r.t. -fPIC already
 existed, see the patch). Above, you correctly identified the problem,
 so let's avoid gen_lowpart on SImode operands by not calling it
 anymore.

 Attached patch effectively rewrites LEA handling. The trick is, that
 instead of using Pmode operations in addresses, we use either SImode
 or DImode operations to calculate the address on 64bit targets. Up to
 now, address calculations strictly used Pmode, so SImode on 32bit
 targets and DImode on 64bit targets. Recent patches to
 ix86_decompose_address and ix86_legitimate_address_p relaxed this
 requirement.

 Attached patch changes LEA patterns and LEA splitters to accept
 addresses, calculated with either SImode or DImode operations.This
 means, that on x64 targets, we don't use gen_lowpart on SImode
 operands anymore. Since symbol references on x32 are in SImode, this
 solves the problem. The patch also avoids generating SImode subregs of
 DImode addresses and DImode zero_extends of SImode addresses, since
 LEA insn does this for us automatically.

 Please also note the change to ix86_print_operand_address. To avoid
 addr32 prefixes, we can force registers in DImode on 64bit targets
 without any problems. On x32, we can investigate, if this change
 avoids unnecessary LEAs (for PR 49781, patched gcc genrates 6 vs. 8).

 The testcase won't compile since PIC doesn't work:

Well, I did say that -fPIC did not work.

 Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
 {,-m32} with no regressions. H.J., can you please test it on x32?

 On x32, it failed:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49832

 BTW: -fPIC is not yet implemented on trunk and still fails there with
 an (unrelated) error, I didn't check x32 branch.


 This could be:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49833

Attached patch implements -fpic handling for x32. In x32 mode, we now
use x86_64_general_operand and corresponding e constraints for adds
in SImode, since it looks that invalid addresses can only be generated
through adds. This avoids a whole bunch of new predicates and
constraints.

2011-07-25  Uros Bizjak  ubiz...@gmail.com

PR target/47381
PR target/49832
PR target/49833
* config/i386/i386.md (add_operand): New mode attribute.
(*movdi_internal_rex64): Remove mode from pic_32bit_operand check.
(*movsi_internal): Ditto.  Use e constraint in alternative 2.
(*lea_1): Use SWI48 mode iterator.
(*lea_1_zext): New insn pattern.
(addmode3): Use add_operand predicate for operand 2.
(*addmode_1): Use add_operand predicate for operand 2.  Use le
constraint for alternative 2.
(addsi_1_zext): Use addsi_operand predicate for operand 2.  Use le
constraint for alternative 2.
(add-lea splitter): Check operand modes in insn constraint.  Extend
operands less than SImode wide to SImode.
(add-lea zext splitter): Do not extend operands to DImode.
(*lea_general_1): Handle only QImode and HImode operands.
(*lea_general_2): Ditto.
(*lea_general_3): Ditto.
(*lea_general_1_zext): Remove.
(*lea_general_2_zext): Ditto.
(*lea_general_3_zext): Ditto.
(*lea_general_4): Check operand modes in insn constraint.  Extend
operands less than SImode wide to SImode.
(ashift-lea splitter): Ditto.
* config/i386/i386.c (ix86_print_operand_address): Print address
registers with 'q' modifier on 64bit targets.
* config/i386/predicates.md (pic_32bit_opreand): Define as special
predicate.  Reject non-SI and non-DI modes.
(addsi_operand): New predicate.

Uros.
Index: i386.md
===
--- i386.md (revision 176733)
+++ i386.md (working copy)
@@ -901,6 +901,14 @@
 (SI nonmemory_operand)
 (DI x86_64_nonmemory_operand)])
 
+;; Operand predicate for adds.
+(define_mode_attr add_operand
+   [(QI general_operand)
+(HI general_operand)
+(SI addsi_operand)
+(DI x86_64_general_operand)
+(TI x86_64_general_operand)])
+
 ;; Operand predicate for shifts.
 (define_mode_attr shift_operand
[(QI nonimmediate_operand)
@@ -2039,7 +2047,7 @@
  (const_string ssemov)
(eq_attr alternative 16,17)
  (const_string ssecvt)
-   (match_operand:DI 1 pic_32bit_operand )
+   (match_operand 1 pic_32bit_operand )
  (const_string lea)
   ]
   (const_string imov)))
@@ -2184,7 +2192,7 @@
   [(set (match_operand:SI 0 nonimmediate_operand
=r,m ,*y,*y,?rm,?*y,*x,*x,?r ,m ,?*Yi,*x)
(match_operand:SI 1 general_operand
-   g ,ri,C ,*y,*y ,rm ,C ,*x

Re: [PATCH, i386]: Rewrite LEA handling (was:Re: PATCH [10/n] X32: Support x32 LEA insns)

2011-07-25 Thread Uros Bizjak

On Mon, Jul 25, 2011 at 3:30 PM, H.J. Lu hjl.to...@gmail.com wrote:

 Attached patch implements -fpic handling for x32. In x32 mode, we now
 use x86_64_general_operand and corresponding e constraints for adds
 in SImode, since it looks that invalid addresses can only be generated
 through adds. This avoids a whole bunch of new predicates and
 constraints.

 X32 glibc is miscompiled:

 CPP='/export/build/gnu/gcc-x32/release/usr/gcc-4.7.0-x32/bin/gcc -mx32
  -E -x c-header'
 /export/build/gnu/glibc-x32/build-x86_64-linux/elf/ld-linux-x32.so.2
 --library-path 
 /export/build/gnu/glibc-x32/build-x86_64-linux:/export/build/gnu/glibc-x32/build-x86_64-linux/math:/export/build/gnu/glibc-x32/build-x86_64-linux/elf:/export/build/gnu/glibc-x32/build-x86_64-linux/dlfcn:/export/build/gnu/glibc-x32/build-x86_64-linux/nss:/export/build/gnu/glibc-x32/build-x86_64-linux/nis:/export/build/gnu/glibc-x32/build-x86_64-linux/rt:/export/build/gnu/glibc-x32/build-x86_64-linux/resolv:/export/build/gnu/glibc-x32/build-x86_64-linux/crypt:/export/build/gnu/glibc-x32/build-x86_64-linux/nptl
 /export/build/gnu/glibc-x32/build-x86_64-linux/sunrpc/rpcgen -Y
 ../scripts -h rpcsvc/yppasswd.x -o
 /export/build/gnu/glibc-x32/build-x86_64-linux/sunrpc/rpcsvc/yppasswd.T
 make[5]: *** 
 [/export/build/gnu/glibc-x32/build-x86_64-linux/sunrpc/rpcsvc/yppasswd.stmp]
 Segmentation fault (core dumped)

 Some LEA patterns are wrong for x32.  I will investigate.

What about x32 GCC testsuite?

Uros.

Re: [PATCH, i386, take 2]: Rewrite LEA handling (was:Re: PATCH [10/n] X32: Support x32 LEA insns)

2011-07-25 Thread Uros Bizjak

On Mon, Jul 25, 2011 at 11:05 PM, H.J. Lu hjl.to...@gmail.com wrote:

 Attached patch implements -fpic handling for x32. In x32 mode, we now
 use x86_64_general_operand and corresponding e constraints for adds
 in SImode, since it looks that invalid addresses can only be generated
 through adds. This avoids a whole bunch of new predicates and
 constraints.

 X32 glibc is miscompiled:

 CPP='/export/build/gnu/gcc-x32/release/usr/gcc-4.7.0-x32/bin/gcc -mx32
  -E -x c-header'
 /export/build/gnu/glibc-x32/build-x86_64-linux/elf/ld-linux-x32.so.2
 --library-path 
 /export/build/gnu/glibc-x32/build-x86_64-linux:/export/build/gnu/glibc-x32/build-x86_64-linux/math:/export/build/gnu/glibc-x32/build-x86_64-linux/elf:/export/build/gnu/glibc-x32/build-x86_64-linux/dlfcn:/export/build/gnu/glibc-x32/build-x86_64-linux/nss:/export/build/gnu/glibc-x32/build-x86_64-linux/nis:/export/build/gnu/glibc-x32/build-x86_64-linux/rt:/export/build/gnu/glibc-x32/build-x86_64-linux/resolv:/export/build/gnu/glibc-x32/build-x86_64-linux/crypt:/export/build/gnu/glibc-x32/build-x86_64-linux/nptl
 /export/build/gnu/glibc-x32/build-x86_64-linux/sunrpc/rpcgen -Y
 ../scripts -h rpcsvc/yppasswd.x -o
 /export/build/gnu/glibc-x32/build-x86_64-linux/sunrpc/rpcsvc/yppasswd.T
 make[5]: *** 
 [/export/build/gnu/glibc-x32/build-x86_64-linux/sunrpc/rpcsvc/yppasswd.stmp]
 Segmentation fault (core dumped)

 Some LEA patterns are wrong for x32.  I will investigate.

 We have to prevent symbols from entering general_operand predicated
 SImode operands. Fortunatelly, x86_64_general_operand works OK for
 x32, while both for i686 and x86_64 are unaffected due to early bypass
 (i686) and due to the fact that all symbols are DImode (x86_64).

 2011-07-25  Uros Bizjak  ubiz...@gmail.com
            H.J. Lu  hongjiu...@intel.com

        PR target/47381
        PR target/49832
        PR target/49833
        * config/i386/i386.md (i): Change SImode attribute to e.
        (g): Change SImode attribute to rme.
        (di): Change SImode attribute to nF.
        (general_operand): Change SImode attribute to x86_64_general_operand.
        (general_szext_operand): Change SImode attribute to
        x86_64_szext_general_operand.
        (immediate_operand): Change SImode attribute to
        x86_64_immediate_operand-
        (*movdi_internal_rex64): Remove mode from pic_32bit_operand check.
        (*movsi_internal): Ditto.  Use e constraint in alternative 2.
        (*lea_1): Use SWI48 mode iterator.
        (*lea_1_zext): New insn pattern.
        (*addmode1): Use x86_64_general_operand predicate for operand 2.
        Update operand constraints.
        (addsi_1_zext): Ditto.
        (*addmode2): Ditto.
        (*addsi_3_zext): Ditto.
        (*subsi_1_zext): Ditto.
        (*subsi_2_zext): Ditto.
        (*subsi_3_zext): Ditto.
        (*addsi3_carry_zext): Ditto.
        (*plusminus_insnsi3_zext_cc_overflow): Ditto.
        (*mulsi3_1_zext): Ditto.
        (*andsi_1): Ditto.
        (*andsi_1_zext): Ditto.
        (*andsi_2_zext): Ditto.
        (*any_or:codesi_1_zext): Ditto.
        (*any_or:codesi_2_zext): Ditto.
        (*testmode_1): Use general_operand predicate for operand 1.
        (*andmode_2): Ditto.
        (add-lea splitter): Check operand modes in insn constraint.  Extend
        operands less than SImode wide to SImode.
        (add-lea zext splitter): Do not extend input operands to DImode.
        (*lea_general_1): Handle only QImode and HImode operands.
        (*lea_general_2): Ditto.
        (*lea_general_3): Ditto.
        (*lea_general_1_zext): Remove.
        (*lea_general_2_zext): Ditto.
        (*lea_general_3_zext): Ditto.
        (*lea_general_4): Check operand modes in insn constraint.  Extend
        operands less than SImode wide to SImode.
        (ashift-lea splitter): Ditto.
        * config/i386/i386.c (ix86_print_operand_address): Print address
        registers with 'q' modifier on 64bit targets.
        * config/i386/predicates.md (pic_32bit_opreand): Define as special
        predicate.  Reject non-SI and non-DI modes.

 Bootstrapped and regression ested on x86_64-pc-linux-gnu {,-m32}.

 GCC and glibc testsuites are clean on x32.  Can you check it in?

I will do this tomorrow, if anybody has some comment on the patch.

Thanks,
Uros.

Re: PATCH: PR target/49853: [x32] PIC doesn't work with external symbol

2011-07-26 Thread Uros Bizjak

On Tue, Jul 26, 2011 at 4:59 PM, H.J. Lu hongjiu...@intel.com wrote:

 This patch fixes PIC with external symbol and updates
 x86_64_immediate_operand/x86_64_zext_immediate_operand/x86_64_movabs_operand
 for x32.

 2011-07-26  H.J. Lu  hongjiu...@intel.com

        PR target/49853
        * config/i386/i386.c (ix86_expand_move): Call convert_to_mode
        on legitimize_tls_address return if needed.  Allow ptr_mode for
        symbolic operand with PIC.

Eh... half of your patch is just an unnecessary rename of a temporary
variable. See attached patch for a cleaned-up version.
Also, please use explicit DImode and SImode checks to match what
ix86_legitimate_address_p does.

        * config/i386/predicates.md (x86_64_immediate_operand): Always
        allow the offsetted memory references for TARGET_X32.
        (x86_64_zext_immediate_operand): Likewise.
        (x86_64_movabs_operand): Don't allow nonmemory_operand for
        TARGET_X32.

Why? It is certainly not needed for -fPIC. Please provide a separate
patch and testcase for predicates.md change.

Uros.
Index: i386.c
===
--- i386.c  (revision 176794)
+++ i386.c  (working copy)
@@ -15028,11 +15028,14 @@ ix86_expand_move (enum machine_mode mode
 op0, 1, OPTAB_DIRECT);
  if (tmp == op0)
return;
+ if (GET_MODE (tmp) != mode)
+   op1 = convert_to_mode (mode, tmp, 1);
}
 }
 
   if ((flag_pic || MACHOPIC_INDIRECT) 
-mode == Pmode  symbolic_operand (op1, Pmode))
+   (mode == SImode || mode == DImode)
+   symbolic_operand (op1, mode))
 {
   if (TARGET_MACHO  !TARGET_64BIT)
{
@@ -15073,13 +15076,15 @@ ix86_expand_move (enum machine_mode mode
   else
{
  if (MEM_P (op0))
-   op1 = force_reg (Pmode, op1);
- else if (!TARGET_64BIT || !x86_64_movabs_operand (op1, Pmode))
+   op1 = force_reg (mode, op1);
+ else if (!TARGET_64BIT || !x86_64_movabs_operand (op1, mode))
{
  rtx reg = can_create_pseudo_p () ? NULL_RTX : op0;
  op1 = legitimize_pic_address (op1, reg);
  if (op0 == op1)
return;
+ if (GET_MODE (op1) != mode)
+   op1 = convert_to_mode (mode, op1, 1);
}
}
 }

Re: PATCH: PR target/49853: [x32] PIC doesn't work with external symbol

2011-07-26 Thread Uros Bizjak

On Tue, Jul 26, 2011 at 7:31 PM, H.J. Lu hjl.to...@gmail.com wrote:

 This patch fixes PIC with external symbol and updates
 x86_64_immediate_operand/x86_64_zext_immediate_operand/x86_64_movabs_operand
 for x32.

 2011-07-26  H.J. Lu  hongjiu...@intel.com

        PR target/49853
        * config/i386/i386.c (ix86_expand_move): Call convert_to_mode
        on legitimize_tls_address return if needed.  Allow ptr_mode for
        symbolic operand with PIC.

 Eh... half of your patch is just an unnecessary rename of a temporary
 variable. See attached patch for a cleaned-up version.

 It looks good to me.  Can you check it in?

Please, can you test it on x32 first? I will commit it after
bootstrap/regtest finish.

Thanks,
Uros.

Re: PATCH: PR target/49853: [x32] PIC doesn't work with external symbol

2011-07-26 Thread Uros Bizjak

On Tue, Jul 26, 2011 at 7:50 PM, H.J. Lu hjl.to...@gmail.com wrote:

 This patch fixes PIC with external symbol and updates
 x86_64_immediate_operand/x86_64_zext_immediate_operand/x86_64_movabs_operand
 for x32.

 2011-07-26  H.J. Lu  hongjiu...@intel.com

        PR target/49853
        * config/i386/i386.c (ix86_expand_move): Call convert_to_mode
        on legitimize_tls_address return if needed.  Allow ptr_mode for
        symbolic operand with PIC.

 Eh... half of your patch is just an unnecessary rename of a temporary
 variable. See attached patch for a cleaned-up version.

 It looks good to me.  Can you check it in?

 Please, can you test it on x32 first? I will commit it after
 bootstrap/regtest finish.


 It may need other changes for TLS support.  I can update it
 after your change is checked in.

Committed with following ChangeLog:

2011-07-26  Uros Bizjak  ubiz...@gmail.com
H.J. Lu  hongjiu...@intel.com

PR target/47369
PR target/49853
* config/i386/i386.c (ix86_expand_move): Call convert_to_mode
if legitimize_tls_address returned operand in wrong mode. Allow
SImode and DImode symbolic operand for PIC.  Call convert_to_mode
if legitimize_pic_address returned operand in wrong mode.

Tested on x86_64-pc-linux-gnu {,-m32}.

Uros.

Re: PATCH: PR target/47372: [x32] internal compiler error: in simplify_subreg, at simplify-rtx.c:5222

2011-07-26 Thread Uros Bizjak

On Tue, Jul 26, 2011 at 10:12 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Tue, Jul 26, 2011 at 10:05:06PM +0200, Uros Bizjak wrote:
  2011-07-26  H.J. Lu  hongjiu...@intel.com
 
         PR target/47372
         * config/i386/i386.c (ix86_delegitimize_address): Call
         simplify_gen_subreg for PIC with ptr_mode only if modes of
         x and orig_x are different.
 
  diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
  index 429cd62..9c52aa3 100644
  --- a/gcc/config/i386/i386.c
  +++ b/gcc/config/i386/i386.c
  @@ -12967,9 +12982,10 @@ ix86_delegitimize_address (rtx x)
           || !MEM_P (orig_x))
         return ix86_delegitimize_tls_address (orig_x);
        x = XVECEXP (XEXP (x, 0), 0, 0);

 When x is no longer known to be Pmode

  -      if (GET_MODE (orig_x) != Pmode)
  +      if (GET_MODE (orig_x) != GET_MODE (x)
  +          GET_MODE (orig_x) != ptr_mode)

 why not simply just
        if (GET_MODE (orig_x) != GET_MODE (x))

         {
  -         x = simplify_gen_subreg (GET_MODE (orig_x), x, Pmode, 0);
  +         x = simplify_gen_subreg (GET_MODE (orig_x), x, ptr_mode, 0);

 and using GET_MODE (x) instead of Pmode/ptr_mode here?  I mean,
 x is certainly not VOIDmode here, should be either SImode or DImode
 and thus simplify_gen_subreg ought to work for it.

This also works, we look at orig_x that looks like:

(mem/u/c:SI (const:DI (unspec:DI [
(symbol_ref:SI (__sflush) [flags 0x41]
function_decl 0x7f6f2eaad000 __sflush)
] UNSPEC_GOTPCREL)) [2 S4 A8])

So, we look at SImode load, and compare it with SImode (actually
ptr_mode) symbol. Will your suggestion work with this RTX?

Thanks,
Uros.

Re: PATCH: PR target/47372: [x32] internal compiler error: in simplify_subreg, at simplify-rtx.c:5222

2011-07-26 Thread Uros Bizjak

On Tue, Jul 26, 2011 at 10:33 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Tue, Jul 26, 2011 at 1:29 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Tue, Jul 26, 2011 at 10:21:11PM +0200, Uros Bizjak wrote:
 This also works, we look at orig_x that looks like:

 (mem/u/c:SI (const:DI (unspec:DI [
                 (symbol_ref:SI (__sflush) [flags 0x41]
 function_decl 0x7f6f2eaad000 __sflush)
             ] UNSPEC_GOTPCREL)) [2 S4 A8])

 So, we look at SImode load, and compare it with SImode (actually
 ptr_mode) symbol. Will your suggestion work with this RTX?

 Then
      if (GET_MODE (orig_x) != GET_MODE (x))
        {
          x = simplify_gen_subreg (GET_MODE (orig_x), x, GET_MODE (x), 0);
          if (x == NULL_RTX)
            return orig_x;
        }
 will work, orig_x is the above SImode MEM, x is (symbol_ref:SI (__sflush)
 [flags 0x41] function_decl 0x7f6f2eaad000 __sflush)
 thus the modes are the same and no simplify_gen_subreg needs to be done, the
 mode is already right.


 This works for my testcase. I will do a full test.

Also OK for mainline, wih suitable ChangeLog and bootstrap/regression test.

BTW: I'm thinking of removing this check from ix86_expand_move:

@@ -15034,7 +15034,6 @@ ix86_expand_move (enum machine_mode mode
 }

   if ((flag_pic || MACHOPIC_INDIRECT)
-   (mode == SImode || mode == DImode)
symbolic_operand (op1, mode))
 {
   if (TARGET_MACHO  !TARGET_64BIT)

There is no way symbolic_operand would be in different mode than SImode/DImode.

Uros.

Re: [Patch, i386, testsuite] Fix for PR49547, new tescases for lzcnt instruction

2011-07-27 Thread Uros Bizjak

On Wed, Jul 27, 2011 at 9:05 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Thanks for inputs! I'll do it today.

 Just ine point.
 How AVX is connected to LZCNT features?
 AVX requires OS support since it has wider registers etc.
 LZCNT need no support from OS side, so from my point of view it is
 redundant to check in lzcnt-check.h presence of AVX support from OS
 side.
 Or I get you wrong?

Ah, I see. I got distracted by the wrong comment in your patch:

+# Return 1 if the target supports running AVX executables, 0 otherwise.
+
+proc check_effective_target_lzcnt_runtime { } {
+if { [check_effective_target_lzcnt]
+ [check_lzcnt_hw_available] } {
+   return 1
+}
+return 0
+}

(I will add avx-os-support.h myself later today).

Uros.

Re: PATCH: PR target/49860: [x32] Error: cannot represent relocation type BFD_RELOC_64 in x32 mode

2011-07-27 Thread Uros Bizjak

On Wed, Jul 27, 2011 at 6:31 AM, H.J. Lu hongjiu...@intel.com wrote:

 The offsetted memory references always work for x32.  OK for trunk?

No, this is the same issue as in [1]. Please fix the assembler to
zero-extend this relocation.

[1] http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01825.html

Uros,

[PATCH, i386]: Do not explicitly check symbol_operands in ix86_expand_move

2011-07-27 Thread Uros Bizjak

Hello!

There is no way symbol_operand uses non-DI or non-SI modes on x86.

2011-07-27  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.c (ix86_expand_move): Do not explicitly check
the mode of symbolic_opreand RTXes.

Tested on x86_64-pc-linux-gnu {,-m32}. Committed to mainline SVN.

Uros.

Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 176833)
+++ config/i386/i386.c  (working copy)
@@ -15032,7 +15032,6 @@
 }

   if ((flag_pic || MACHOPIC_INDIRECT)
-   (mode == SImode || mode == DImode)
symbolic_operand (op1, mode))
 {
   if (TARGET_MACHO  !TARGET_64BIT)

Re: [PATCH, i386, testsuite] New BMI testcases

2011-07-28 Thread Uros Bizjak

On Wed, Jul 27, 2011 at 11:29 PM, Jakub Jelinek ja...@redhat.com wrote:

  Guys, with write approval, could you please commit that?
 

 I checked it in for you.

 Unfortunately many of the new tests fail with old assembler, because
 the builtin in check_effective_target_bmi is optimized away (ignored, as
 well as using constant arguments, two reasons to get rid of it).

 Fixed thusly, tested on i686-linux and x86_64-linux, both with old and new
 binutils.  Ok for trunk?

 2011-07-27  Jakub Jelinek  ja...@redhat.com

        * gcc.target/i386/i386.exp (check_effective_target_bmi): Make sure
        the builtin isn't optimized away.

OK.

Thanks,
Uros.

Re: PATCH: PR target/47364: [x32] internal compiler error: in emit_move_insn, at expr.c:3355

2011-07-28 Thread Uros Bizjak

On Thu, Jul 28, 2011 at 5:48 AM, H.J. Lu hongjiu...@intel.com wrote:

 We should only expand strlen to Pmode.  Otherwise, we got

 [hjl@gnu-6 ilp32-38]$ cat x.i
 char one[50] = ijk;
 int
 main (void)
 {
  return __builtin_strlen (one) != 3;
 }
 [hjl@gnu-6 ilp32-38]$ /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc 
 -B/export/build/gnu/gcc/build-x86_64-linux/gcc/ -S -o x.s -mx32 -O2 x.i
 x.i: In function ‘main’:
 x.i:5:27: internal compiler error: in emit_move_insn, at expr.c:
 Please submit a full bug report,
 with preprocessed source if appropriate.
 See http://gcc.gnu.org/bugs.html for instructions.

 OK for trunk?

 2011-07-27  H.J. Lu  hongjiu...@intel.com

        PR target/47364
        * config/i386/i386.md (strlenmode): Replace SWI48x with P.

OK.

Thanks,
Uros.

Re: PATCH: PR target/47715: [x32] TLS doesn't work

2011-07-28 Thread Uros Bizjak

On Thu, Jul 28, 2011 at 4:55 AM, H.J. Lu hongjiu...@intel.com wrote:
 TLS on X32 is almost identical to TLS on x86-64.  The only difference is
 x32 address space is 32bit.  That means TLS symbols can be in either
 SImode or DImode with upper 32bit zero.  This patch updates
 tls_global_dynamic_64 to support x32.  OK for trunk?

 2011-07-27  H.J. Lu  hongjiu...@intel.com

        PR target/47715
        * config/i386/i386.md (PTR64): New.
        (*tls_global_dynamic_64): Rename to ...
        (*tls_global_dynamic_64_mode): This.  Put PTR64 on operand 1.
        (tls_global_dynamic_64): Rename to ...
        (tls_global_dynamic_64_mode): This.  Put PTR64 on operand 1.
        * config/i386/i386.c (legitimize_tls_address): Updated.

Just remove mode check, so:

(unspec:DI [(match_operand 1 tls_symbolic_operand )]

at both sites.

-  fputs (ASM_BYTE 0x66\n, asm_out_file);
+  if (!TARGET_X32)
+fputs (ASM_BYTE 0x66\n, asm_out_file);

Are you sure? There are some scary comments in binutils that these
sequences have to be written _exactly_ as shown to enable certain
linker relaxations w.r.t. TLS relocs.

Uros.

[PATCH, i386]: Fix i386.md:5807: warning: source missing a mode?

2011-07-28 Thread Uros Bizjak

Hello!

2011-07-28  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.c (add-lea splitter): Add SWI mode to PLUS RTX.

Tested on x86_64-pc-linux-gnu, committed to mainline.

Uros.

Index: i386.md
===
--- i386.md (revision 176858)
+++ i386.md (working copy)
@@ -5806,8 +5806,8 @@
 ;; Convert add to the lea pattern to avoid flags dependency.
 (define_split
   [(set (match_operand:SWI 0 register_operand )
-   (plus (match_operand:SWI 1 register_operand )
-  (match_operand:SWI 2 nonmemory_operand )))
+   (plus:SWI (match_operand:SWI 1 register_operand )
+ (match_operand:SWI 2 nonmemory_operand )))
(clobber (reg:CC FLAGS_REG))]
   reload_completed  ix86_lea_for_add_ok (insn, operands)
   [(const_int 0)]

Re: PATCH: PR target/47715: [x32] TLS doesn't work

2011-07-28 Thread Uros Bizjak

On Thu, Jul 28, 2011 at 8:52 AM, Uros Bizjak ubiz...@gmail.com wrote:

 TLS on X32 is almost identical to TLS on x86-64.  The only difference is
 x32 address space is 32bit.  That means TLS symbols can be in either
 SImode or DImode with upper 32bit zero.  This patch updates
 tls_global_dynamic_64 to support x32.  OK for trunk?

Please also change 64bit GNU2_TLS patterns, so -mtls-dialect=gnu2 will
also work.  Please see attached patch.

Uros.
Index: i386.md
===
--- i386.md (revision 176860)
+++ i386.md (working copy)
@@ -12327,7 +12327,7 @@
(call:DI
 (mem:QI (match_operand:DI 2 constant_call_address_operand z))
 (match_operand:DI 3  )))
-   (unspec:DI [(match_operand:DI 1 tls_symbolic_operand )]
+   (unspec:DI [(match_operand 1 tls_symbolic_operand )]
  UNSPEC_TLS_GD)]
   TARGET_64BIT
 {
@@ -12349,7 +12349,7 @@
  (call:DI
   (mem:QI (match_operand:DI 2 constant_call_address_operand ))
   (const_int 0)))
- (unspec:DI [(match_operand:DI 1 tls_symbolic_operand )]
+ (unspec:DI [(match_operand 1 tls_symbolic_operand )]
UNSPEC_TLS_GD)])])
 
 (define_insn *tls_local_dynamic_base_32_gnu
@@ -12553,7 +12553,7 @@
 
 (define_expand tls_dynamic_gnu2_64
   [(set (match_dup 2)
-   (unspec:DI [(match_operand:DI 1 tls_symbolic_operand )]
+   (unspec:DI [(match_operand 1 tls_symbolic_operand )]
   UNSPEC_TLSDESC))
(parallel
 [(set (match_operand:DI 0 register_operand )
@@ -12568,7 +12568,7 @@
 
 (define_insn *tls_dynamic_lea_64
   [(set (match_operand:DI 0 register_operand =r)
-   (unspec:DI [(match_operand:DI 1 tls_symbolic_operand )]
+   (unspec:DI [(match_operand 1 tls_symbolic_operand )]
   UNSPEC_TLSDESC))]
   TARGET_64BIT  TARGET_GNU2_TLS
   lea{q}\t{%a1@TLSDESC(%%rip), %0|%0, %a1@TLSDESC[rip]}
@@ -12579,7 +12579,7 @@
 
 (define_insn *tls_dynamic_call_64
   [(set (match_operand:DI 0 register_operand =a)
-   (unspec:DI [(match_operand:DI 1 tls_symbolic_operand )
+   (unspec:DI [(match_operand 1 tls_symbolic_operand )
(match_operand:DI 2 register_operand 0)
(reg:DI SP_REG)]
   UNSPEC_TLSDESC))
@@ -12598,7 +12598,7 @@
 (reg:DI SP_REG)]
UNSPEC_TLSDESC)
 (const:DI (unspec:DI
-   [(match_operand:DI 1 tls_symbolic_operand )]
+   [(match_operand 1 tls_symbolic_operand )]
UNSPEC_DTPOFF
(clobber (reg:CC FLAGS_REG))]
   TARGET_64BIT  TARGET_GNU2_TLS

Re: PATCH: PR target/47715: [x32] Use SImode for thread pointer

2011-07-28 Thread Uros Bizjak

On Thu, Jul 28, 2011 at 5:11 AM, H.J. Lu hongjiu...@intel.com wrote:

 In x32, thread pointer is 32bit and choice of segment register for the
 thread base ptr load should be based on TARGET_64BIT.  This patch
 implements it.  OK for trunk?

-ENOTESTCASE.

Uros.

Re: PING: PATCH [4/n]: Prepare x32: Permute the conversion and addition if one operand is a constant

2011-07-28 Thread Uros Bizjak

Hello!

 convert_memory_address_addr_space has a special PLUS/MULT case for
 POINTERS_EXTEND_UNSIGNED  0. ?It turns out that it is also needed
 for all Pmode != ptr_mode cases. ?OK for trunk?

 2011-06-11 ?H.J. Lu ?hongjiu...@intel.com

 ? ? ? ?PR middle-end/47727
 ? ? ? ?* explow.c (convert_memory_address_addr_space): Permute the
 ? ? ? ?conversion and addition if one operand is a constant.

Do we still need this patch? With recent target changes the testcase
from PR can be compiled without problems with a gcc from an unpatched
trunk.

Uros.

Re: PATCH: PR target/47364: [x32] internal compiler error: in emit_move_insn, at expr.c:3355

2011-07-28 Thread Uros Bizjak

On Thu, Jul 28, 2011 at 8:30 AM, Uros Bizjak ubiz...@gmail.com wrote:

 We should only expand strlen to Pmode.  Otherwise, we got

 [hjl@gnu-6 ilp32-38]$ cat x.i
 char one[50] = ijk;
 int
 main (void)
 {
  return __builtin_strlen (one) != 3;
 }
 [hjl@gnu-6 ilp32-38]$ /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc 
 -B/export/build/gnu/gcc/build-x86_64-linux/gcc/ -S -o x.s -mx32 -O2 x.i
 x.i: In function ‘main’:
 x.i:5:27: internal compiler error: in emit_move_insn, at expr.c:
 Please submit a full bug report,
 with preprocessed source if appropriate.
 See http://gcc.gnu.org/bugs.html for instructions.

 OK for trunk?

 2011-07-27  H.J. Lu  hongjiu...@intel.com

        PR target/47364
        * config/i386/i386.md (strlenmode): Replace SWI48x with P.

 OK.

Please also backport this fix to release branches.

Thanks,
Uros.

Re: PING: PATCH [4/n]: Prepare x32: Permute the conversion and addition if one operand is a constant

2011-07-28 Thread Uros Bizjak

On Thu, Jul 28, 2011 at 7:59 PM, H.J. Lu hjl.to...@gmail.com wrote:

   convert_memory_address_addr_space has a special PLUS/MULT case for
   POINTERS_EXTEND_UNSIGNED  0. ?It turns out that it is also needed
   for all Pmode != ptr_mode cases. ?OK for trunk?
   2011-06-11 ?H.J. Lu ?hongjiu...@intel.com
 
   ? ? ? ?PR middle-end/47727
   ? ? ? ?* explow.c (convert_memory_address_addr_space): Permute the
   ? ? ? ?conversion and addition if one operand is a constant.

 Do we still need this patch? With recent target changes the testcase
 from PR can be compiled without problems with a gcc from an unpatched
 trunk.

 Given the communication difficulties, I hope not...

 Paolo


 Here is the updated patch.  OK for trunk?

Did you see the question two levels up the thread you are replying to?

Uros.

Re: PATCH: PR target/47715: [x32] Use SImode for thread pointer

2011-07-28 Thread Uros Bizjak

On Thu, Jul 28, 2011 at 8:03 PM, H.J. Lu hjl.to...@gmail.com wrote:

 So, instead of huge complications with new mode iterator, just
 introduce two new patterns that will shadow existing ones for
 TARGET_X32.

 Like in attached (untested) patch.


 I tried the following patch with typos fixed.  It almost worked,
 except for this failure in glibc testsuite:

 gen-locale.sh: line 27: 14755 Aborted                 (core dumped)
 I18NPATH=. GCONV_PATH=${common_objpfx}iconvdata ${localedef} --quiet
 -c -f $charmap -i $input ${common_objpfx}localedata/$out
 Charmap: ISO-8859-1 Inputfile: nb_NO Outputdir: nb_NO.ISO-8859-1 
 failed
 make[4]: *** 
 [/export/build/gnu/glibc-x32/build-x86_64-linux/localedata/nb_NO.ISO-8859-1/LC_CTYPE]
 Error 1

 I will add:

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index 8723dc5..d32d64d 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -12120,7 +12120,9 @@ get_thread_pointer (bool to_reg)
  {
   rtx tp, reg, insn;

 -  tp = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
 +  tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
 +  if (ptr_mode != Pmode)
 +    tp = convert_to_mode (Pmode, tp, 1);
   if (!to_reg)
     return tp;

 since TP must be 32bit.

 No, this won't have the desired effect. It will change the UNSPEC, so
 it won't match patterns in i386.md.

 Can you debug the failure a bit more? With my patterns, add{l} and
 mov{l} should clear top 32bits.


 TP is 32bit in x32  For load_tp_x32, we load SImode value and
 zero-extend to DImode. For add_tp_x32, we are adding SImode
 value.  We can't pretend TP is 64bit.  load_tp_x32 and add_tp_x32
 must take SImode TP.


 I will see what I can do.


 Here is the updated patch to use 32bit TP for 32.

Why??

This part makes no sense:

-  tp = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
+  tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
+  if (ptr_mode != Pmode)
+tp = convert_to_mode (Pmode, tp, 1);

You will create zero_extend (unspec ...), that won't be matched by any pattern.

Can you please explain, how is this pattern different than DImode
pattern, proposed in my patch?

+(define_insn *load_tp_x32
+  [(set (match_operand:SI 0 register_operand =r)
+   (unspec:SI [(const_int 0)] UNSPEC_TP))]
+  TARGET_X32
+  mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}
+  [(set_attr type imov)
+   (set_attr modrm 0)
+   (set_attr length 7)
+   (set_attr memory load)
+   (set_attr imm_disp false)])

vs:

+(define_insn *load_tp_x32
+  [(set (match_operand:DI 0 register_operand =r)
+   (unspec:DI [(const_int 0)] UNSPEC_TP))]
+  TARGET_X32
+  mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}
+  [(set_attr type imov)
+   (set_attr modrm 0)
+   (set_attr length 7)
+   (set_attr memory load)
+   (set_attr imm_disp false)])

Uros.

Re: PATCH: PR middle-end/49721: convert_memory_address_addr_space may generate invalid new insns

2011-07-28 Thread Uros Bizjak

On Thu, Jul 28, 2011 at 8:09 PM, H.J. Lu hjl.to...@gmail.com wrote:

   convert_memory_address_addr_space has a special PLUS/MULT case for
   POINTERS_EXTEND_UNSIGNED  0. ?It turns out that it is also needed
   for all Pmode != ptr_mode cases. ?OK for trunk?
   2011-06-11 ?H.J. Lu ?hongjiu...@intel.com
 
   ? ? ? ?PR middle-end/47727
   ? ? ? ?* explow.c (convert_memory_address_addr_space): Permute the
   ? ? ? ?conversion and addition if one operand is a constant.

 Do we still need this patch? With recent target changes the testcase
 from PR can be compiled without problems with a gcc from an unpatched
 trunk.

 Given the communication difficulties, I hope not...

 Paolo


 Here is the updated patch.  OK for trunk?

 Did you see the question two levels up the thread you are replying to?


 The patch is for

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49721

 I changed the thread subject.

Please add testcase to see the patch in action.

Uros.

Re: PATCH: PR target/47715: [x32] Use SImode for thread pointer

2011-07-28 Thread Uros Bizjak

On Thu, Jul 28, 2011 at 8:30 PM, H.J. Lu hjl.to...@gmail.com wrote:

 TP is 32bit in x32  For load_tp_x32, we load SImode value and
 zero-extend to DImode. For add_tp_x32, we are adding SImode
 value.  We can't pretend TP is 64bit.  load_tp_x32 and add_tp_x32
 must take SImode TP.


 I will see what I can do.


 Here is the updated patch to use 32bit TP for 32.

 Why??

 This part makes no sense:

 -  tp = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
 +  tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
 +  if (ptr_mode != Pmode)
 +    tp = convert_to_mode (Pmode, tp, 1);

 You will create zero_extend (unspec ...), that won't be matched by any 
 pattern.

 No.  I created  zero_exten from (reg:SI) to (reg: DI).

 Can you please explain, how is this pattern different than DImode
 pattern, proposed in my patch?

 +(define_insn *load_tp_x32
 +  [(set (match_operand:SI 0 register_operand =r)
 +       (unspec:SI [(const_int 0)] UNSPEC_TP))]
 +  TARGET_X32
 +  mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}
 +  [(set_attr type imov)
 +   (set_attr modrm 0)
 +   (set_attr length 7)
 +   (set_attr memory load)
 +   (set_attr imm_disp false)])

 vs:

 +(define_insn *load_tp_x32
 +  [(set (match_operand:DI 0 register_operand =r)
 +       (unspec:DI [(const_int 0)] UNSPEC_TP))]

 That is wrong since source (TP)  is 32bit.  This pattern tells compiler
 source is 64bit.

Where?

Uros.

Re: PATCH: PR middle-end/49721: convert_memory_address_addr_space may generate invalid new insns

2011-07-28 Thread Uros Bizjak

On Thu, Jul 28, 2011 at 8:32 PM, H.J. Lu hjl.to...@gmail.com wrote:

   convert_memory_address_addr_space has a special PLUS/MULT case for
   POINTERS_EXTEND_UNSIGNED  0. ?It turns out that it is also needed
   for all Pmode != ptr_mode cases. ?OK for trunk?
   2011-06-11 ?H.J. Lu ?hongjiu...@intel.com
 
   ? ? ? ?PR middle-end/47727
   ? ? ? ?* explow.c (convert_memory_address_addr_space): Permute the
   ? ? ? ?conversion and addition if one operand is a constant.

 Do we still need this patch? With recent target changes the testcase
 from PR can be compiled without problems with a gcc from an unpatched
 trunk.

 Given the communication difficulties, I hope not...

 Paolo


 Here is the updated patch.  OK for trunk?

 Did you see the question two levels up the thread you are replying to?


 The patch is for

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49721

 I changed the thread subject.

 Please add testcase to see the patch in action.


 I haven't found a testcase yet.  The problem was discovered in
 this thread:

 http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01065.html

This was before x32 could handle SImode addresses. With recent x86
target work, this is no more true, and SImode and DImode addresses are
first-class citizens as far as x32 backend is concerned. Please note
that original testcase (that this whole patch is all about) now
compiles without problems. Also, middle end is shared with at least
two ptr_mode != Pmode targets, and they all work well. So, to see what
makes x32 special, we need a testcase that breaks _WITHOUT_ your
proposed patch. Without testcase, nobody can analyze your approach and
tell if the approach is the right one, if this is in fact target
problem, or indeed a middle-end problem.

And there is no point to flood the mainling-list with patches.

Uros.

Re: PATCH: PR target/47766: [x32] -fstack-protector doesn't work

2011-07-28 Thread Uros Bizjak

On Thu, Jul 28, 2011 at 8:13 PM, H.J. Lu hongjiu...@intel.com wrote:

 This patch adds x32 support to UNSPEC_SP_XXX patterns.  OK for trunk?

http://gcc.gnu.org/contribute.html#patches

Uros.

Re: PATCH: PR target/47766: [x32] -fstack-protector doesn't work

2011-07-28 Thread Uros Bizjak

On Thu, Jul 28, 2011 at 9:03 PM, H.J. Lu hjl.to...@gmail.com wrote:

 This patch adds x32 support to UNSPEC_SP_XXX patterns.  OK for trunk?

 http://gcc.gnu.org/contribute.html#patches


 Sorry. I should have mentioned testcase in:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47766

 Actually, they are in gcc testsuite.  I noticed them when
 I run gcc testsuite on x32.

This looks like a middle-end problem to me.

According to the documentation:

--quote--
`stack_protect_set'
 This pattern, if defined, moves a `Pmode' value from the memory in
 operand 1 to the memory in operand 0 without leaving the value in
 a register afterward.  This is to avoid leaking the value some
 place that an attacker might use to rewrite the stack guard slot
 after having clobbered it.

 If this pattern is not defined, then a plain move pattern is
 generated.

`stack_protect_test'
 This pattern, if defined, compares a `Pmode' value from the memory
 in operand 1 with the memory in operand 0 without leaving the
 value in a register afterward and branches to operand 2 if the
 values weren't equal.

 If this pattern is not defined, then a plain compare pattern and
 conditional branch pattern is used.
--quote--

According to the documentation, x86 patterns are correct. However,
middle-end fails to extend ptr_mode value to Pmode, and in function.c,
stack_protect_prologue/stack_protect_epilogue, we already have
ptr_mode (SImode) operand:

(mem/v/f/c/i:SI (plus:DI (reg/f:DI 54 virtual-stack-vars)
(const_int -4 [0xfffc])) [2 D.2704+0 S4 A32])

(mem/v/f/c/i:SI (symbol_ref:DI (__stack_chk_guard) [flags 0x40]
var_decl 0x7ffc35aa0be0 __stack_chk_guard) [2 __stack_chk_guard+0 S4
A32])

An opinion of a RTL maintainer (CC'd) is needed here. Target
definition is OK in its current form.

Uros.

Re: PATCH: PR target/47715: [x32] Use SImode for thread pointer

2011-07-28 Thread Uros Bizjak

On Thu, Jul 28, 2011 at 10:15 PM, H.J. Lu hjl.to...@gmail.com wrote:

 TP is 32bit in x32  For load_tp_x32, we load SImode value and
 zero-extend to DImode. For add_tp_x32, we are adding SImode
 value.  We can't pretend TP is 64bit.  load_tp_x32 and add_tp_x32
 must take SImode TP.

 Here is the revised patch.  The difference is I changed *add_tp_x32 to SImode.
 For

 ---
 extern __thread int __libc_errno __attribute__ ((tls_model (initial-exec)));

 int *
 __errno_location (void)
 {
  return __libc_errno;
 }
 ---

 compiled with -mx32 -O2 -fPIC  DImode *add_tp_x32 generates:

        movq    __libc_errno@gottpoff(%rip), %rax
        addl    %fs:0, %eax
        mov     %eax, %eax
        ret

 SImode *add_tp_x32 generates:

        movl    %fs:0, %eax
        addl    __libc_errno@gottpoff(%rip), %eax
        ret

This happens because combine can't combine DImode load and SImode plus
RTXes. These RTXes have to be in Pmode, see the intention in
legitimize_tls_address, also for TARGET_GNU2_TLS.

Can you please debug what goes wrong with tp_add_x32 in DImode?

Uros.

Re: PATCH: PR target/47715: [x32] Use SImode for thread pointer

2011-07-28 Thread Uros Bizjak

On Fri, Jul 29, 2011 at 12:28 AM, H.J. Lu hjl.to...@gmail.com wrote:

 TP is 32bit in x32  For load_tp_x32, we load SImode value and
 zero-extend to DImode. For add_tp_x32, we are adding SImode
 value.  We can't pretend TP is 64bit.  load_tp_x32 and add_tp_x32
 must take SImode TP.

 Here is the revised patch.  The difference is I changed *add_tp_x32 to 
 SImode.
 For

 ---
 extern __thread int __libc_errno __attribute__ ((tls_model 
 (initial-exec)));

 int *
 __errno_location (void)
 {
  return __libc_errno;
 }
 ---

 compiled with -mx32 -O2 -fPIC  DImode *add_tp_x32 generates:

        movq    __libc_errno@gottpoff(%rip), %rax
        addl    %fs:0, %eax
        mov     %eax, %eax
        ret

 SImode *add_tp_x32 generates:

        movl    %fs:0, %eax
        addl    __libc_errno@gottpoff(%rip), %eax
        ret

 This happens because combine can't combine DImode load and SImode plus
 RTXes. These RTXes have to be in Pmode, see the intention in
 legitimize_tls_address, also for TARGET_GNU2_TLS.

 Can you please debug what goes wrong with tp_add_x32 in DImode?


 We start with

Uh, we didn't understand each other... can you please debug what goes
wrong with glibc runtime test?

Thanks,
Uros.

Re: PATCH: PR target/47715: [x32] Use SImode for thread pointer

2011-07-29 Thread Uros Bizjak

On Thu, Jul 28, 2011 at 3:24 PM, H.J. Lu hjl.to...@gmail.com wrote:

 In x32, thread pointer is 32bit and choice of segment register for the
 thread base ptr load should be based on TARGET_64BIT.  This patch
 implements it.  OK for trunk?

 -ENOTESTCASE.


 There is no standalone testcase.  The symptom is in glibc build, I
 got

 CPP='/export/build/gnu/gcc-x32/release/usr/gcc-4.7.0-x32/bin/gcc -mx32
  -E -x c-header'
 /export/build/gnu/glibc-x32/build-x86_64-linux/elf/ld-linux-x32.so.2
 --library-path 
 /export/build/gnu/glibc-x32/build-x86_64-linux:/export/build/gnu/glibc-x32/build-x86_64-linux/math:/export/build/gnu/glibc-x32/build-x86_64-linux/elf:/export/build/gnu/glibc-x32/build-x86_64-linux/dlfcn:/export/build/gnu/glibc-x32/build-x86_64-linux/nss:/export/build/gnu/glibc-x32/build-x86_64-linux/nis:/export/build/gnu/glibc-x32/build-x86_64-linux/rt:/export/build/gnu/glibc-x32/build-x86_64-linux/resolv:/export/build/gnu/glibc-x32/build-x86_64-linux/crypt:/export/build/gnu/glibc-x32/build-x86_64-linux/nptl
 /export/build/gnu/glibc-x32/build-x86_64-linux/sunrpc/rpcgen -Y
 ../scripts -h rpcsvc/yppasswd.x -o
 /export/build/gnu/glibc-x32/build-x86_64-linux/sunrpc/rpcsvc/yppasswd.T
 make[5]: *** 
 [/export/build/gnu/glibc-x32/build-x86_64-linux/sunrpc/xbootparam_prot.stmp]
 Segmentation fault
 make[5]: *** Waiting for unfinished jobs
 make[5]: *** 
 [/export/build/gnu/glibc-x32/build-x86_64-linux/sunrpc/xrstat.stmp]
 Segmentation fault
 make[5]: *** 
 [/export/build/gnu/glibc-x32/build-x86_64-linux/sunrpc/xyppasswd.stmp]
 Segmentation fault

 since thread pointer is 32bit in x32.


 If we load thread pointer (fs segment register) in x32 with 64bit
 load, the upper 32bits are garbage.
 We must load 32bit

So, instead of huge complications with new mode iterator, just
introduce two new patterns that will shadow existing ones for
TARGET_X32.

Like in attached (untested) patch.

Uros.
Index: i386.md
===
--- i386.md (revision 176860)
+++ i386.md (working copy)
@@ -12442,6 +12442,17 @@
 (define_mode_attr tp_seg [(SI gs) (DI fs)])
 
 ;; Load and add the thread base pointer from %tp_seg:0.
+(define_insn *load_tp_x32
+  [(set (match_operand:DI 0 register_operand =r)
+   (unspec:DI [(const_int 0)] UNSPEC_TP))]
+  TARGET_X32
+  mov{l}\t{%%tp_seg:0, %k0|%k0, DWORD PTR tp_seg:0}
+  [(set_attr type imov)
+   (set_attr modrm 0)
+   (set_attr length 7)
+   (set_attr memory load)
+   (set_attr imm_disp false)])
+
 (define_insn *load_tp_mode
   [(set (match_operand:P 0 register_operand =r)
(unspec:P [(const_int 0)] UNSPEC_TP))]
@@ -12453,6 +12464,19 @@
(set_attr memory load)
(set_attr imm_disp false)])
 
+(define_insn *add_tp_x32
+  [(set (match_operand:DI 0 register_operand =r)
+   (plus:DI (unspec:DI [(const_int 0)] UNSPEC_TP)
+(match_operand:DI 1 register_operand 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  TARGET_X32
+  add{l}\t{%%tp_seg:0, %k0|%k0, DWORD PTR tp_seg:0}
+  [(set_attr type alu)
+   (set_attr modrm 0)
+   (set_attr length 7)
+   (set_attr memory load)
+   (set_attr imm_disp false)])
+
 (define_insn *add_tp_mode
   [(set (match_operand:P 0 register_operand =r)
(plus:P (unspec:P [(const_int 0)] UNSPEC_TP)

PATCH, v2: PR target/47715: [x32] Use SImode for thread pointer

2011-07-29 Thread Uros Bizjak

Hello!

ABI specifies that TP is loaded in ptr_mode. Attached patch implements
this requirement.

2011-07-29  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.md (*load_tp_x32): New.
(*load_tp_x32_zext): Ditto.
(*add_tp_x32): Ditto.
(*add_tp_x32_zext): Ditto.
(*load_tp_mode): Disable for !TARGET_X32 targets.
(*add_tp_mode): Ditto.
* config/i386/i386.c (get_thread_pointer): Load thread pointer in
ptr_mode and convert to Pmode if needed.

Testing on x86_64-pc-linux-gnu in progress. H.J., please test this
version on x32.

Uros.
Index: i386.md
===
--- i386.md (revision 176915)
+++ i386.md (working copy)
@@ -12444,10 +12444,32 @@
 (define_mode_attr tp_seg [(SI gs) (DI fs)])
 
 ;; Load and add the thread base pointer from %tp_seg:0.
+(define_insn *load_tp_x32
+  [(set (match_operand:SI 0 register_operand =r)
+   (unspec:SI [(const_int 0)] UNSPEC_TP))]
+  TARGET_X32
+  mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}
+  [(set_attr type imov)
+   (set_attr modrm 0)
+   (set_attr length 7)
+   (set_attr memory load)
+   (set_attr imm_disp false)])
+
+(define_insn *load_tp_x32_zext
+  [(set (match_operand:DI 0 register_operand =r)
+   (zero_extend:DI (unspec:SI [(const_int 0)] UNSPEC_TP)))]
+  TARGET_X32
+  mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}
+  [(set_attr type imov)
+   (set_attr modrm 0)
+   (set_attr length 7)
+   (set_attr memory load)
+   (set_attr imm_disp false)])
+
 (define_insn *load_tp_mode
   [(set (match_operand:P 0 register_operand =r)
(unspec:P [(const_int 0)] UNSPEC_TP))]
-  
+  !TARGET_X32
   mov{imodesuffix}\t{%%tp_seg:0, %0|%0, iptrsize PTR tp_seg:0}
   [(set_attr type imov)
(set_attr modrm 0)
@@ -12455,12 +12477,39 @@
(set_attr memory load)
(set_attr imm_disp false)])
 
+(define_insn *add_tp_x32
+  [(set (match_operand:SI 0 register_operand =r)
+   (plus:SI (unspec:SI [(const_int 0)] UNSPEC_TP)
+(match_operand:SI 1 register_operand 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  TARGET_X32
+  add{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}
+  [(set_attr type alu)
+   (set_attr modrm 0)
+   (set_attr length 7)
+   (set_attr memory load)
+   (set_attr imm_disp false)])
+
+(define_insn *add_tp_x32_zext
+  [(set (match_operand:DI 0 register_operand =r)
+   (zero_extend:DI
+ (plus:SI (unspec:SI [(const_int 0)] UNSPEC_TP)
+  (match_operand:SI 1 register_operand 0
+   (clobber (reg:CC FLAGS_REG))]
+  TARGET_X32
+  add{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}
+  [(set_attr type alu)
+   (set_attr modrm 0)
+   (set_attr length 7)
+   (set_attr memory load)
+   (set_attr imm_disp false)])
+
 (define_insn *add_tp_mode
   [(set (match_operand:P 0 register_operand =r)
(plus:P (unspec:P [(const_int 0)] UNSPEC_TP)
(match_operand:P 1 register_operand 0)))
(clobber (reg:CC FLAGS_REG))]
-  
+  !TARGET_X32
   add{imodesuffix}\t{%%tp_seg:0, %0|%0, iptrsize PTR tp_seg:0}
   [(set_attr type alu)
(set_attr modrm 0)
Index: i386.c
===
--- i386.c  (revision 176915)
+++ i386.c  (working copy)
@@ -12118,17 +12118,15 @@ legitimize_pic_address (rtx orig, rtx re
 static rtx
 get_thread_pointer (bool to_reg)
 {
-  rtx tp, reg, insn;
+  rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
 
-  tp = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
-  if (!to_reg)
-return tp;
+  if (GET_MODE (tp) != Pmode)
+tp = convert_to_mode (Pmode, tp, 1);
 
-  reg = gen_reg_rtx (Pmode);
-  insn = gen_rtx_SET (VOIDmode, reg, tp);
-  insn = emit_insn (insn);
+  if (to_reg)
+tp = copy_addr_to_reg (tp);
 
-  return reg;
+  return tp;
 }
 
 /* Construct the SYMBOL_REF for the tls_get_addr function.  */

PATCH, v2: PR target/47715: [x32] Use SImode for thread pointer

2011-07-29 Thread Uros Bizjak

[ For some reason this post didn't reach gcc-patches@ ML archives... ]

Hello!

ABI specifies that TP is loaded in ptr_mode. Attached patch implements
this requirement.

2011-07-29  Uros Bizjak  ubiz...@gmail.com

       * config/i386/i386.md (*load_tp_x32): New.
       (*load_tp_x32_zext): Ditto.
       (*add_tp_x32): Ditto.
       (*add_tp_x32_zext): Ditto.
       (*load_tp_mode): Disable for !TARGET_X32 targets.
       (*add_tp_mode): Ditto.
       * config/i386/i386.c (get_thread_pointer): Load thread pointer in
       ptr_mode and convert to Pmode if needed.

Testing on x86_64-pc-linux-gnu in progress. H.J., please test this
version on x32.

Uros.
Index: i386.md
===
--- i386.md (revision 176915)
+++ i386.md (working copy)
@@ -12444,10 +12444,32 @@
 (define_mode_attr tp_seg [(SI gs) (DI fs)])
 
 ;; Load and add the thread base pointer from %tp_seg:0.
+(define_insn *load_tp_x32
+  [(set (match_operand:SI 0 register_operand =r)
+   (unspec:SI [(const_int 0)] UNSPEC_TP))]
+  TARGET_X32
+  mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}
+  [(set_attr type imov)
+   (set_attr modrm 0)
+   (set_attr length 7)
+   (set_attr memory load)
+   (set_attr imm_disp false)])
+
+(define_insn *load_tp_x32_zext
+  [(set (match_operand:DI 0 register_operand =r)
+   (zero_extend:DI (unspec:SI [(const_int 0)] UNSPEC_TP)))]
+  TARGET_X32
+  mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}
+  [(set_attr type imov)
+   (set_attr modrm 0)
+   (set_attr length 7)
+   (set_attr memory load)
+   (set_attr imm_disp false)])
+
 (define_insn *load_tp_mode
   [(set (match_operand:P 0 register_operand =r)
(unspec:P [(const_int 0)] UNSPEC_TP))]
-  
+  !TARGET_X32
   mov{imodesuffix}\t{%%tp_seg:0, %0|%0, iptrsize PTR tp_seg:0}
   [(set_attr type imov)
(set_attr modrm 0)
@@ -12455,12 +12477,39 @@
(set_attr memory load)
(set_attr imm_disp false)])
 
+(define_insn *add_tp_x32
+  [(set (match_operand:SI 0 register_operand =r)
+   (plus:SI (unspec:SI [(const_int 0)] UNSPEC_TP)
+(match_operand:SI 1 register_operand 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  TARGET_X32
+  add{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}
+  [(set_attr type alu)
+   (set_attr modrm 0)
+   (set_attr length 7)
+   (set_attr memory load)
+   (set_attr imm_disp false)])
+
+(define_insn *add_tp_x32_zext
+  [(set (match_operand:DI 0 register_operand =r)
+   (zero_extend:DI
+ (plus:SI (unspec:SI [(const_int 0)] UNSPEC_TP)
+  (match_operand:SI 1 register_operand 0
+   (clobber (reg:CC FLAGS_REG))]
+  TARGET_X32
+  add{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}
+  [(set_attr type alu)
+   (set_attr modrm 0)
+   (set_attr length 7)
+   (set_attr memory load)
+   (set_attr imm_disp false)])
+
 (define_insn *add_tp_mode
   [(set (match_operand:P 0 register_operand =r)
(plus:P (unspec:P [(const_int 0)] UNSPEC_TP)
(match_operand:P 1 register_operand 0)))
(clobber (reg:CC FLAGS_REG))]
-  
+  !TARGET_X32
   add{imodesuffix}\t{%%tp_seg:0, %0|%0, iptrsize PTR tp_seg:0}
   [(set_attr type alu)
(set_attr modrm 0)
Index: i386.c
===
--- i386.c  (revision 176915)
+++ i386.c  (working copy)
@@ -12118,17 +12118,15 @@ legitimize_pic_address (rtx orig, rtx re
 static rtx
 get_thread_pointer (bool to_reg)
 {
-  rtx tp, reg, insn;
+  rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
 
-  tp = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
-  if (!to_reg)
-return tp;
+  if (GET_MODE (tp) != Pmode)
+tp = convert_to_mode (Pmode, tp, 1);
 
-  reg = gen_reg_rtx (Pmode);
-  insn = gen_rtx_SET (VOIDmode, reg, tp);
-  insn = emit_insn (insn);
+  if (to_reg)
+tp = copy_addr_to_reg (tp);
 
-  return reg;
+  return tp;
 }
 
 /* Construct the SYMBOL_REF for the tls_get_addr function.  */

Re: PATCH: PR target/47715: [x32] TLS doesn't work

2011-07-29 Thread Uros Bizjak

On Thu, Jul 28, 2011 at 3:47 PM, H.J. Lu hjl.to...@gmail.com wrote:

 TLS on X32 is almost identical to TLS on x86-64.  The only difference is
 x32 address space is 32bit.  That means TLS symbols can be in either
 SImode or DImode with upper 32bit zero.  This patch updates
 tls_global_dynamic_64 to support x32.  OK for trunk?

 Please also change 64bit GNU2_TLS patterns, so -mtls-dialect=gnu2 will
 also work.  Please see attached patch.


 Yes, it works.  Can you apply it?

This is what I have committed:

2011-07-28  Uros Bizjak  ubiz...@gmail.com

PR target/47715
* config/i386/i386.md (*tls_global_dynamic_64): Remove mode from
tls_symbolic_operand check.  Update code sequence for TARGET_X32.
(tls_global_dynamic_64): Remove mode from tls_symbolic_operand check.
(tls_dynamic_gnu2_64): Ditto.
(*tls_dynamic_gnu2_lea_64): Ditto.
(*tls_dynamic_gnu2_call_64): Ditto.
(*tls_dynamic_gnu2_combine_64): Ditto.

Uros.
Index: i386.md
===
--- i386.md (revision 176870)
+++ i386.md (working copy)
@@ -12327,11 +12327,12 @@
(call:DI
 (mem:QI (match_operand:DI 2 constant_call_address_operand z))
 (match_operand:DI 3  )))
-   (unspec:DI [(match_operand:DI 1 tls_symbolic_operand )]
+   (unspec:DI [(match_operand 1 tls_symbolic_operand )]
  UNSPEC_TLS_GD)]
   TARGET_64BIT
 {
-  fputs (ASM_BYTE 0x66\n, asm_out_file);
+  if (!TARGET_X32)
+fputs (ASM_BYTE 0x66\n, asm_out_file);
   output_asm_insn
 (lea{q}\t{%a1@tlsgd(%%rip), %%rdi|rdi, %a1@tlsgd[rip]}, operands);
   fputs (ASM_SHORT 0x\n, asm_out_file);
@@ -12349,7 +12350,7 @@
  (call:DI
   (mem:QI (match_operand:DI 2 constant_call_address_operand ))
   (const_int 0)))
- (unspec:DI [(match_operand:DI 1 tls_symbolic_operand )]
+ (unspec:DI [(match_operand 1 tls_symbolic_operand )]
UNSPEC_TLS_GD)])])
 
 (define_insn *tls_local_dynamic_base_32_gnu
@@ -12553,7 +12554,7 @@
 
 (define_expand tls_dynamic_gnu2_64
   [(set (match_dup 2)
-   (unspec:DI [(match_operand:DI 1 tls_symbolic_operand )]
+   (unspec:DI [(match_operand 1 tls_symbolic_operand )]
   UNSPEC_TLSDESC))
(parallel
 [(set (match_operand:DI 0 register_operand )
@@ -12568,7 +12569,7 @@
 
 (define_insn *tls_dynamic_lea_64
   [(set (match_operand:DI 0 register_operand =r)
-   (unspec:DI [(match_operand:DI 1 tls_symbolic_operand )]
+   (unspec:DI [(match_operand 1 tls_symbolic_operand )]
   UNSPEC_TLSDESC))]
   TARGET_64BIT  TARGET_GNU2_TLS
   lea{q}\t{%a1@TLSDESC(%%rip), %0|%0, %a1@TLSDESC[rip]}
@@ -12579,7 +12580,7 @@
 
 (define_insn *tls_dynamic_call_64
   [(set (match_operand:DI 0 register_operand =a)
-   (unspec:DI [(match_operand:DI 1 tls_symbolic_operand )
+   (unspec:DI [(match_operand 1 tls_symbolic_operand )
(match_operand:DI 2 register_operand 0)
(reg:DI SP_REG)]
   UNSPEC_TLSDESC))
@@ -12598,7 +12599,7 @@
 (reg:DI SP_REG)]
UNSPEC_TLSDESC)
 (const:DI (unspec:DI
-   [(match_operand:DI 1 tls_symbolic_operand )]
+   [(match_operand 1 tls_symbolic_operand )]
UNSPEC_DTPOFF
(clobber (reg:CC FLAGS_REG))]
   TARGET_64BIT  TARGET_GNU2_TLS

[PATCH, i386]: Re-define pic_32bit_operand back to define_predicate

2011-07-29 Thread Uros Bizjak

Hello!

With recent developments, there is no need for pic_32bit_operand to be
defined as special predicate with explicit mode checks anymore.
Implicit mode checks (including VIODmode bypass) of normal predicates
work OK now.

2011-07-28  Uros Bizjak  ubiz...@gmail.com

* config/i386/predicates.md (pic_32bit_opreand): Do not define as
special predicate.  Remove explicit mode checks.

Tested on x86_64-pc-linux-gnu {,-m32}. There is remote chance this
patch breaks x32, so let's alert H.J.

Committed to mainline SVN.

Uros.
Index: predicates.md
===
--- predicates.md   (revision 176870)
+++ predicates.md   (working copy)
@@ -366,15 +366,12 @@
 
 ;; Return true when operand is PIC expression that can be computed by lea
 ;; operation.
-(define_special_predicate pic_32bit_operand
+(define_predicate pic_32bit_operand
   (match_code const,symbol_ref,label_ref)
 {
-  if (GET_MODE (op) != SImode
-   GET_MODE (op) != DImode)
-return false;
-
   if (!flag_pic)
 return false;
+
   /* Rule out relocations that translate into 64bit constants.  */
   if (TARGET_64BIT  GET_CODE (op) == CONST)
 {
@@ -386,6 +383,7 @@
  || XINT (op, 1) == UNSPEC_GOT))
return false;
 }
+
   return symbolic_operand (op, mode);
 })

[PATCH, i386]: Remove tp_or_register_operand predicate

2011-07-29 Thread Uros Bizjak

Hello!

tp_or_register_operand predicate is not used.

2011-07-29  Uros Bizjak  ubiz...@gmail.com

* config/i386/predicates.md (tp_or_register_operand): Remove predicate.

Tested on x86_64-pc-linux-gnu, committed to mainline SVN.

Uros.

Index: predicates.md
===
--- predicates.md   (revision 176924)
+++ predicates.md   (working copy)
@@ -490,11 +490,6 @@
   (and (match_code symbol_ref)
(match_test op == ix86_tls_module_base (

-(define_predicate tp_or_register_operand
-  (ior (match_operand 0 register_operand)
-   (and (match_code unspec)
-   (match_test XINT (op, 1) == UNSPEC_TP
-
 ;; Test for a pc-relative call operand
 (define_predicate constant_call_address_operand
   (match_code symbol_ref)

Re: PATCH: [x32]: Check TARGET_LP64 for SIZE_TYPE/PTRDIFF_TYPE

2011-07-30 Thread Uros Bizjak

On Sat, Jul 30, 2011 at 12:41 AM, H.J. Lu hongjiu...@intel.com wrote:

 X32 is 32bit.  This patch checks TARGET_LP64 for SIZE_TYPE/PTRDIFF_TYPE.
 OK for trunk?

OK, if tested on x32. You didn't say how the patch was tested.

Thanks,
Uros.

[PATCH, testsuite]: Remove .exe.???t.* and .exe.ltrans0.???t.* files from testsuite dir

2011-07-31 Thread Uros Bizjak

Hello!

2011-07-31  Uros Bizjak  ubiz...@gmail.com

* lib/gcc-dg.exp (cleanup-dump): Also remove .exe. and .exe.ltrans0.
dump files.

Tested on x64-pc-linux-gnu. OK for mainline?

Uros.
Index: lib/gcc-dg.exp
===
--- lib/gcc-dg.exp  (revision 176960)
+++ lib/gcc-dg.exp  (working copy)
@@ -487,6 +487,8 @@
 # The name might include a list of options; extract the file name.
 set src [file tail [lindex $testcase 0]]
 remove-build-file [file tail $src].$suffix
+remove-build-file [file rootname [file tail $src]].exe.$suffix
+remove-build-file [file rootname [file tail $src]].exe.ltrans0.$suffix
 # -fcompare-debug dumps
 remove-build-file [file tail $src].gk.$suffix
 
@@ -494,6 +496,8 @@
 if [info exists additional_sources] {
foreach srcfile $additional_sources {
remove-build-file [file tail $srcfile].$suffix
+   remove-build-file [file rootname [file tail $srcfile]].exe.$suffix
+   remove-build-file [file rootname [file tail 
$srcfile]].exe.ltrans0.$suffix
# -fcompare-debug dumps
remove-build-file [file tail $srcfile].gk.$suffix
}

Re: [PATCH, testsuite]: Remove .exe.???t.* and .exe.ltrans0.???t.* files from testsuite dir

2011-07-31 Thread Uros Bizjak

On Sun, Jul 31, 2011 at 11:39 AM, Richard Guenther
richard.guent...@gmail.com wrote:

 2011-07-31  Uros Bizjak  ubiz...@gmail.com

        * lib/gcc-dg.exp (cleanup-dump): Also remove .exe. and .exe.ltrans0.
        dump files.

 Tested on x64-pc-linux-gnu. OK for mainline?

 I think you need to remove all .exe.ltrans[0-9]*. files instead.

Thanks, attached is what I have committed.

2011-07-31  Uros Bizjak  ubiz...@gmail.com

* lib/gcc-dg.exp (cleanup-dump): Also remove .exe. and
.exe.ltrans[0-9]*. dump files.

Uros.
Index: lib/gcc-dg.exp
===
--- lib/gcc-dg.exp  (revision 176960)
+++ lib/gcc-dg.exp  (working copy)
@@ -487,6 +487,8 @@
 # The name might include a list of options; extract the file name.
 set src [file tail [lindex $testcase 0]]
 remove-build-file [file tail $src].$suffix
+remove-build-file [file rootname [file tail $src]].exe.$suffix
+remove-build-file [file rootname [file tail 
$src]].exe.ltrans\[0-9\]*.$suffix
 # -fcompare-debug dumps
 remove-build-file [file tail $src].gk.$suffix
 
@@ -494,6 +496,8 @@
 if [info exists additional_sources] {
foreach srcfile $additional_sources {
remove-build-file [file tail $srcfile].$suffix
+   remove-build-file [file rootname [file tail $srcfile]].exe.$suffix
+   remove-build-file [file rootname [file tail 
$srcfile]].exe.ltrans\[0-9\]*.$suffix
# -fcompare-debug dumps
remove-build-file [file tail $srcfile].gk.$suffix
}

[PATCH, testsuite]: Prevent stale dump files in testsuite directory

2011-07-31 Thread Uros Bizjak

Hello!

2011-07-31  Uros Bizjak  ubiz...@gmail.com

* gcc.dg/tree-ssa/20050314-1.c: Dump and cleanup lim1 pass only.
* gcc.dg/tree-ssa/pr23109.c: Ditto.
* gcc.dg/tree-ssa/loop-7.c: Ditto.
* gcc.dg/tree-ssa/loop-32.c: Ditto.
* gcc.dg/tree-ssa/loop-33.c: Ditto.
* gcc.dg/tree-ssa/loop-34.c: Ditto.
* gcc.dg/tree-ssa/loop-35.c: Ditto.
* gcc.dg/tree-ssa/restrict-3.c: Ditto.
* gcc.dg/tree-ssa/ssa-lim-2.c: Ditto.
* gcc.dg/tree-ssa/ssa-lim-1.c: Ditto.
* gcc.dg/tree-ssa/ssa-lim-3.c: Ditto.
* gcc.dg/tree-ssa/ssa-lim-6.c: Ditto.
* gcc.dg/tree-ssa/structopt-1.c: Ditto.
* g++.dg/tree-ssa/pr33615.C: Ditto.
* g++.dg/tree-ssa/restrict1.C: Ditto.
* c-c++-common/restrict-2.c: Ditto.
* gfortran.dg/pr32921.f: Ditto.
* gcc.dg/tree-ssa/ssa-dse-10.c: Dump and cleanup dse1 pass only.
* gcc.dg/fold-compare-2.c: Dump and cleanup vrp1 pass only.
* gcc.dg/tree-ssa/vrp47.c: Ditto.
* gcc.dg/tree-ssa/pr25501.c: Dump and cleanup mergephi1 pass only.
* gcc.dg/tree-ssa/pr15349.c: Dump and cleanup mergephi2 pass only.
* gcc.dg/tree-ssa/tailrecursion-1.c: Dump and cleanup tailr1 pass only.
* gcc.dg/tree-ssa/tailrecursion-2.c: Ditto.
* gcc.dg/tree-ssa/tailrecursion-3.c: Ditto.
* gcc.dg/tree-ssa/tailrecursion-4.c: Ditto.
* gcc.dg/tree-ssa/tailrecursion-6.c: Ditto.

Tested on x86_64-pc-linux-gnu, committed to mainline SVN.

Uros.
Index: gfortran.dg/pr32921.f
===
--- gfortran.dg/pr32921.f   (revision 176960)
+++ gfortran.dg/pr32921.f   (working copy)
@@ -1,5 +1,5 @@
 ! { dg-do compile }
-! { dg-options -O2 -fdump-tree-lim }
+! { dg-options -O2 -fdump-tree-lim1 }
 ! gfortran -c -m32 -O2 -S junk.f
 !
   MODULE LES3D_DATA
@@ -46,5 +46,5 @@
   RETURN
   END
 ! { dg-final { scan-tree-dump-times stride 4 lim1 } }
-! { dg-final { cleanup-tree-dump lim\[1-2\] } }
+! { dg-final { cleanup-tree-dump lim1 } }
 ! { dg-final { cleanup-modules LES3D_DATA } }
Index: gcc.dg/fold-compare-2.c
===
--- gcc.dg/fold-compare-2.c (revision 176960)
+++ gcc.dg/fold-compare-2.c (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options -O2 -fdump-tree-vrp } */
+/* { dg-options -O2 -fdump-tree-vrp1 } */
 
 extern void abort (void);
 
@@ -16,5 +16,5 @@
 }
 
 /* { dg-final { scan-tree-dump-times Removing basic block 2 vrp1 } } */
-/* { dg-final { cleanup-tree-dump vrp\[1-2\] } } */
+/* { dg-final { cleanup-tree-dump vrp1 } } */
 
Index: gcc.dg/tree-ssa/vrp47.c
===
--- gcc.dg/tree-ssa/vrp47.c (revision 176960)
+++ gcc.dg/tree-ssa/vrp47.c (working copy)
@@ -4,8 +4,8 @@
jumps when evaluating an  condition.  VRP is not able to optimize
this.  */
 /* { dg-do compile { target { ! mips*-*-* s390*-*-*  avr-*-* mn10300-*-* } } 
} */
-/* { dg-options -O2 -fdump-tree-vrp -fdump-tree-dom } */
-/* { dg-options -O2 -fdump-tree-vrp -fdump-tree-dom -march=i586 { target { 
i?86-*-*  ilp32 } } } */
+/* { dg-options -O2 -fdump-tree-vrp1 -fdump-tree-dom1 } */
+/* { dg-options -O2 -fdump-tree-vrp1 -fdump-tree-dom1 -march=i586 { target { 
i?86-*-*  ilp32 } } } */
 
 int h(int x, int y)
 {
@@ -44,5 +44,5 @@
 /* { dg-final { scan-tree-dump-times x\[^ \]* \[|\] y 1 vrp1 } } */
 /* { dg-final { scan-tree-dump-times x\[^ \]* \\^ 1 1 vrp1 } } */
 
-/* { dg-final { cleanup-tree-dump vrp\[0-9\] } } */
-/* { dg-final { cleanup-tree-dump dom\[0-9\] } } */
+/* { dg-final { cleanup-tree-dump vrp1 } } */
+/* { dg-final { cleanup-tree-dump dom1 } } */
Index: gcc.dg/tree-ssa/pr15349.c
===
--- gcc.dg/tree-ssa/pr15349.c   (revision 176960)
+++ gcc.dg/tree-ssa/pr15349.c   (working copy)
@@ -1,6 +1,6 @@
 /* PR 15349.  Merge two PHI nodes.  */
 /* { dg-do compile } */
-/* { dg-options -O1 -fdump-tree-mergephi } */
+/* { dg-options -O1 -fdump-tree-mergephi2 } */
 
 int
 foo (int a, int b)
@@ -23,4 +23,4 @@
 }
 
 /* { dg-final { scan-tree-dump-times PHI 1 mergephi2} } */
-/* { dg-final { cleanup-tree-dump mergephi\[1-2\] } } */
+/* { dg-final { cleanup-tree-dump mergephi2 } } */
Index: gcc.dg/tree-ssa/loop-32.c
===
--- gcc.dg/tree-ssa/loop-32.c   (revision 176960)
+++ gcc.dg/tree-ssa/loop-32.c   (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options -O2 -fdump-tree-lim-details } */
+/* { dg-options -O2 -fdump-tree-lim1-details } */
 
 int x;
 int a[100];
@@ -43,4 +43,4 @@
 }
 
 /* { dg-final { scan-tree-dump-times Executing store motion of 3 lim1 } } 
*/
-/* { dg-final { cleanup-tree-dump lim\[1-2\] } } */
+/* { dg-final { cleanup-tree-dump lim1 } } */
Index: gcc.dg/tree-ssa/ssa-lim-1.c

[PATCH, i386]: Fix PR49920, unable to find a register to spill in class ‘DIREG’

2011-07-31 Thread Uros Bizjak

Hello!

The problem is similar to PR11001, where we should not expand to
special x86 stringop insn when one of necessary registers is marked
fixed.

In this particular PR, the problem was, that combine synthesized an
instruction that exactly matched stringop insn. However, special
registers were also marked fixed, so reload (obviously) didn't manage
to get one.

Attached patch disables stringop patterns when one of needed registers
is marked fixed and this way prevents combine to synthesize stringop
insn. Since nothing prevents combine to synthesize other stringop
patterns, the patch conditionally disables these as well.

2011-07-31  Uros Bizjak  ubiz...@gmail.com

PR target/49920
* config/i386/i386.md (strset): Do not expand strset_singleop
when %eax or $edi are fixed.
(*strsetdi_rex_1): Disable when %eax or %edi are fixed.
(*strsetsi_1): Ditto.
(*strsethi_1): Ditto.
(*strsetqi_1): Ditto.
(*rep_stosdi_rex64): Disable when %eax, %ecx or %edi are fixed.
(*rep_stossi): Ditto.
(*rep_stosqi): Ditto.
(cmpstrnsi): Also fail when %ecx is fixed.
(*cmpstrnqi_nz_1): Disable when %ecx, %esi or %edi are fixed.
(*cmpstrnqi_1): Ditto.
(*strlenqi_1): Ditto.
(*strmovdi_rex_1): Disable when %esi or %edi are fixed.
(*strmovsi_1): Ditto.
(*strmovhi_1): Ditto.
(*strmovqi_1): Ditto.
(*rep_movdi_rex64): Disable when %ecx, %esi or %edi are fixed.
(*rep_movsi): Ditto.
(*rep_movqi): Ditto.

testsuite/ChangeLog:

2011-07-31  Uros Bizjak  ubiz...@gmail.com

PR target/49920
* gcc.target/i386/pr49920.c: New test.

Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
{,-m32}.  Patch was committed to mainline SVN and will be backported
to release branches.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 176960)
+++ config/i386/i386.md (working copy)
@@ -15421,7 +15421,8 @@
(set (match_operand:DI 1 register_operand =S)
(plus:DI (match_dup 3)
 (const_int 8)))]
-  TARGET_64BIT
+  TARGET_64BIT
+!(fixed_regs[SI_REG] || fixed_regs[DI_REG])
   movsq
   [(set_attr type str)
(set_attr memory both)
@@ -15436,7 +15437,7 @@
(set (match_operand:P 1 register_operand =S)
(plus:P (match_dup 3)
(const_int 4)))]
-  
+  !(fixed_regs[SI_REG] || fixed_regs[DI_REG])
   movs{l|d}
   [(set_attr type str)
(set_attr memory both)
@@ -15451,7 +15452,7 @@
(set (match_operand:P 1 register_operand =S)
(plus:P (match_dup 3)
(const_int 2)))]
-  
+  !(fixed_regs[SI_REG] || fixed_regs[DI_REG])
   movsw
   [(set_attr type str)
(set_attr memory both)
@@ -15466,7 +15467,7 @@
(set (match_operand:P 1 register_operand =S)
(plus:P (match_dup 3)
(const_int 1)))]
-  
+  !(fixed_regs[SI_REG] || fixed_regs[DI_REG])
   movsb
   [(set_attr type str)
(set_attr memory both)
@@ -15501,7 +15502,8 @@
(set (mem:BLK (match_dup 3))
(mem:BLK (match_dup 4)))
(use (match_dup 5))]
-  TARGET_64BIT
+  TARGET_64BIT
+!(fixed_regs[CX_REG] || fixed_regs[SI_REG] || fixed_regs[DI_REG])
   rep{%;} movsq
   [(set_attr type str)
(set_attr prefix_rep 1)
@@ -15520,7 +15522,7 @@
(set (mem:BLK (match_dup 3))
(mem:BLK (match_dup 4)))
(use (match_dup 5))]
-  
+  !(fixed_regs[CX_REG] || fixed_regs[SI_REG] || fixed_regs[DI_REG])
   rep{%;} movs{l|d}
   [(set_attr type str)
(set_attr prefix_rep 1)
@@ -15537,7 +15539,7 @@
(set (mem:BLK (match_dup 3))
(mem:BLK (match_dup 4)))
(use (match_dup 5))]
-  
+  !(fixed_regs[CX_REG] || fixed_regs[SI_REG] || fixed_regs[DI_REG])
   rep{%;} movsb
   [(set_attr type str)
(set_attr prefix_rep 1)
@@ -15580,7 +15582,9 @@
   operands[3] = gen_rtx_PLUS (Pmode, operands[0],
  GEN_INT (GET_MODE_SIZE (GET_MODE
  (operands[2];
-  if (TARGET_SINGLE_STRINGOP || optimize_insn_for_size_p ())
+  /* Can't use this if the user has appropriated eax or edi.  */
+  if ((TARGET_SINGLE_STRINGOP || optimize_insn_for_size_p ())
+   !(fixed_regs[AX_REG] || fixed_regs[DI_REG]))
 {
   emit_insn (gen_strset_singleop (operands[0], operands[1], operands[2],
  operands[3]));
@@ -15602,7 +15606,8 @@
(set (match_operand:DI 0 register_operand =D)
(plus:DI (match_dup 1)
 (const_int 8)))]
-  TARGET_64BIT
+  TARGET_64BIT
+!(fixed_regs[AX_REG] || fixed_regs[DI_REG])
   stosq
   [(set_attr type str)
(set_attr memory store)
@@ -15614,7 +15619,7 @@
(set (match_operand:P 0 register_operand =D)
(plus:P (match_dup 1)
(const_int 4)))]
-  
+  !(fixed_regs[AX_REG] || fixed_regs[DI_REG])
   stos{l|d}
   [(set_attr type str)
(set_attr memory store

Re: [Patch, i386, testsuite] Fix for PR49547, new tescases for lzcnt instruction

2011-08-01 Thread Uros Bizjak

On Mon, Aug 1, 2011 at 10:21 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote:

 Okay, then here is an updated patch

 updated ChangeLog entry:
 2011-07-26  Kirill Yukhin  kirill.yuk...@intel.com

        PR target/49547
        * config.gcc (i[34567]86-*-*): Replace abmintrin.h with
        lzcntintrin.h.
        (x86_64-*-*): Likewise.
        * config/i386/i386.opt (mlzcnt): New.
        * config/i386/abmintrin.h: File removed.
        (__lzcnt_u16, __lzcnt, __lzcnt_u64): Moved to ...
        * config/i386/lzcntintrin.h: ... here. New file.
        (__lzcnt): Rename to ...
        (__lzcnt32): ... this.
        * config/i386/bmiintrin.h (head): Update copyright year.
        (__lzcnt_u16): Removed.
        (__lzcnt_u32): Likewise.
        (__lzcnt_u64): Likewise.
        * config/i386/x86intrin.h: Include lzcntintrin.h when __LZCNT__
        is defined, remove abmintrin.h.
        * config/i386/cpuid.h: New define.
        * config/i386/driver-i386.c (host_detect_local_cpu): Detect
        LZCNT feature.
        * config/i386/i386-c.c (ix86_target_macros_internal): Define
        __LZCNT__ if needed.
        * config/i386/i386.c (ix86_target_string): New option -mlzcnt.
        (ix86_option_override_internal): Handle LZCNT option.
        (ix86_valid_target_attribute_inner_p): Likewise.
        (struct builtin_description bdesc_args) IX86_BUILTIN_CLZS: Update.
        * config/i386/i386.h (TARGET_LZCNT): New.
        (CLZ_DEFINED_VALUE_AT_ZERO): Update.
        * config/i386/i386.md (clzmode2): Update insn constraint.
        (clzmode2_lzcnt): Likewise.
        * doc/invoke.texi: Mention -mlzcnt option.
        * doc/extend.texi: Likewise.

 Bootstrapped successfully.

OK for mainline.

Uros.

[PATCH, i386]: Fix PR49927, ice in spill_failure, at reload1.c:2120

2011-08-01 Thread Uros Bizjak

Hello!

On a register starved i686, the relaxation that we allow DImode values
in addresses can lead to register shortages and spill failures.

Attached patch puts back the requirement that we allow subregs up to
and including WORD_MODE width, nicely packed in a new function.

2011-08-01  Uros Bizjak  ubiz...@gmail.com

PR target/49927
* config/i386/i386.c (ix86_address_subreg_operand): New.
(ix86_decompose_address): Use ix86_address_subreg_operand.
(ix86_legitimate_address_p): Do not assert that subregs satisfy
register_no_elim_operand in DImode.

testsuite/ChangeLog:

2011-08-01  Uros Bizjak  ubiz...@gmail.com

PR target/49927
* gcc.target/i386/pr49927.c: New test.

Tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 177036)
+++ config/i386/i386.c  (working copy)
@@ -11096,6 +11096,30 @@ ix86_live_on_entry (bitmap regs)
 }
 }
 
+/* Determine if op is suitable SUBREG RTX for address.  */
+
+static bool
+ix86_address_subreg_operand (rtx op)
+{
+  enum machine_mode mode;
+
+  if (!REG_P (op))
+return false;
+
+  mode = GET_MODE (op);
+
+  if (GET_MODE_CLASS (mode) != MODE_INT)
+return false;
+
+  /* Don't allow SUBREGs that span more than a word.  It can lead to spill
+ failures when the register is one word out of a two word structure.  */
+  if (GET_MODE_SIZE (mode)  UNITS_PER_WORD)
+return false;
+
+  /* Allow only SUBREGs of non-eliminable hard registers.  */
+  return register_no_elim_operand (op, mode);
+}
+
 /* Extract the parts of an RTL expression that is a valid memory address
for an instruction.  Return 0 if the structure of the address is
grossly off.  Return -1 if the address contains ASHIFT, so it is not
@@ -6,8 +11140,7 @@ ix86_decompose_address (rtx addr, struct
 base = addr;
   else if (GET_CODE (addr) == SUBREG)
 {
-  /* Allow only subregs of DImode hard regs.  */
-  if (register_no_elim_operand (SUBREG_REG (addr), DImode))
+  if (ix86_address_subreg_operand (SUBREG_REG (addr)))
base = addr;
   else
return 0;
@@ -11175,8 +11198,7 @@ ix86_decompose_address (rtx addr, struct
  break;
 
case SUBREG:
- /* Allow only subregs of DImode hard regs in PLUS chains.  */
- if (!register_no_elim_operand (SUBREG_REG (op), DImode))
+ if (!ix86_address_subreg_operand (SUBREG_REG (op)))
return 0;
  /* FALLTHRU */
 
@@ -11228,9 +11250,8 @@ ix86_decompose_address (rtx addr, struct
 {
   if (REG_P (index))
;
-  /* Allow only subregs of DImode hard regs.  */
   else if (GET_CODE (index) == SUBREG
-   register_no_elim_operand (SUBREG_REG (index), DImode))
+   ix86_address_subreg_operand (SUBREG_REG (index)))
;
   else
return 0;
@@ -11677,10 +11698,7 @@ ix86_legitimate_address_p (enum machine_
   if (REG_P (base))
reg = base;
   else if (GET_CODE (base) == SUBREG  REG_P (SUBREG_REG (base)))
-   {
- reg = SUBREG_REG (base);
- gcc_assert (register_no_elim_operand (reg, DImode));
-   }
+   reg = SUBREG_REG (base);
   else
/* Base is not a register.  */
return false;
@@ -11702,10 +11720,7 @@ ix86_legitimate_address_p (enum machine_
   if (REG_P (index))
reg = index;
   else if (GET_CODE (index) == SUBREG  REG_P (SUBREG_REG (index)))
-   {
- reg = SUBREG_REG (index);
- gcc_assert (register_no_elim_operand (reg, DImode));
-   }
+   reg = SUBREG_REG (index);
   else
/* Index is not a register.  */
return false;
Index: testsuite/gcc.target/i386/pr49927.c
===
--- testsuite/gcc.target/i386/pr49927.c (revision 0)
+++ testsuite/gcc.target/i386/pr49927.c (revision 0)
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options -O0 } */
+
+char a[1][1];
+long long b;
+
+void
+foo (void)
+{
+  --a[b][b];
+}

< 5 6 7 8 9 10 11 12 13 14 >

901 - 1000 of 6051 matches

Mail list logo