Re: [PATCH, libcpp]: Use x86 __builtin_ia32_pcmpestri128 instead of asm.

2012-06-19 Thread Uros Bizjak
On Tue, Jun 19, 2012 at 12:07 AM, Richard Henderson r...@redhat.com wrote:
 On 2012-06-18 13:19, Uros Bizjak wrote:
        /* ??? The builtin doesn't understand that the PCMPESTRI read from
        memory need not be aligned.  */
 -      __asm (%vpcmpestri $0, (%1), %2
 -          : =c(index) : r(s), x(search), a(4), d(16));
 +      sv = __builtin_ia32_loaddqu ((const char *) s);
 +      index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0);
 +


 Surely the comment can be removed too then?

I'm not sure there. The builtin, as defined, expects V16QI operand
with xm constraint. Using:

int test (const char *s1)
{
  const v16qi *p = (const v16qi *)(unsigned long) s1;
  return __builtin_ia32_pcmpistri128 (*p, ...);
}

will generate movdqa before pcmpistri.

With x86 pcmp[ie]str patch, we trick gcc to pass unaligned memory to
the pcmp[ie]str RTX, but we still need __builtin_ia32_loaddqu in front
of __builtin_ia32_pcmpestri128.

Uros.


Re: [PATCH, libcpp]: Use x86 __builtin_ia32_pcmpestri128 instead of asm.

2012-06-19 Thread Uros Bizjak
On Tue, Jun 19, 2012 at 8:38 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Tue, Jun 19, 2012 at 12:07 AM, Richard Henderson r...@redhat.com wrote:
 On 2012-06-18 13:19, Uros Bizjak wrote:
        /* ??? The builtin doesn't understand that the PCMPESTRI read from
        memory need not be aligned.  */
 -      __asm (%vpcmpestri $0, (%1), %2
 -          : =c(index) : r(s), x(search), a(4), d(16));
 +      sv = __builtin_ia32_loaddqu ((const char *) s);
 +      index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0);
 +


 Surely the comment can be removed too then?

 I'm not sure there. The builtin, as defined, expects V16QI operand
 with xm constraint. Using:

 int test (const char *s1)
 {
  const v16qi *p = (const v16qi *)(unsigned long) s1;
  return __builtin_ia32_pcmpistri128 (*p, ...);
 }

 will generate movdqa before pcmpistri.

Pedantic correction: __builtin_ia32_pcmpistri128 (v16qi_arg, *p, N);

movdqa in front of this builtin will be generated with -O0.

Uros.


Re: [PATCH, libcpp]: Use x86 __builtin_ia32_pcmpestri128 instead of asm.

2012-06-19 Thread Richard Henderson
On 2012-06-18 23:38, Uros Bizjak wrote:
 On Tue, Jun 19, 2012 at 12:07 AM, Richard Henderson r...@redhat.com wrote:
 On 2012-06-18 13:19, Uros Bizjak wrote:
/* ??? The builtin doesn't understand that the PCMPESTRI read from
memory need not be aligned.  */
 -  __asm (%vpcmpestri $0, (%1), %2
 -  : =c(index) : r(s), x(search), a(4), d(16));
 +  sv = __builtin_ia32_loaddqu ((const char *) s);
 +  index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0);
 +


 Surely the comment can be removed too then?
 
 I'm not sure there. The builtin, as defined, expects V16QI operand
 with xm constraint.

Fair enough.  I'm ok with the patch as-is.


r~




[PATCH, libcpp]: Use x86 __builtin_ia32_pcmpestri128 instead of asm.

2012-06-18 Thread Uros Bizjak
Hello!

Following the patch that allows unaligned operands in pcmpestri [1],
we can substitute x86 asm in lex.c with equivalent builtin functions.

2011-06-18  Uros Bizjak  ubiz...@gmail.com

* lex.c (search_line_sse42): Use __builtin_ia32_loaddqu and
__builtin_ia32_pcmpestri128 instead of asm.

Bootstrapped and regression tested on x86_64-pc-linux-gnu SSE4.2
target. Also, I have checked that the same code is generated for
changed function.

OK for mainline?

[1] http://gcc.gnu.org/ml/gcc-patches/2012-06/msg01189.html

Uros.
Index: lex.c
===
--- lex.c   (revision 188750)
+++ lex.c   (working copy)
@@ -420,6 +420,7 @@ search_line_sse42 (const uchar *s, const uchar *en
 {
   typedef char v16qi __attribute__ ((__vector_size__ (16)));
   static const v16qi search = { '\n', '\r', '?', '\\' };
+  v16qi sv;
 
   uintptr_t si = (uintptr_t)s;
   uintptr_t index;
@@ -439,8 +440,9 @@ search_line_sse42 (const uchar *s, const uchar *en
 
   /* ??? The builtin doesn't understand that the PCMPESTRI read from
 memory need not be aligned.  */
-  __asm (%vpcmpestri $0, (%1), %2
-: =c(index) : r(s), x(search), a(4), d(16));
+  sv = __builtin_ia32_loaddqu ((const char *) s);
+  index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0);
+
   if (__builtin_expect (index  16, 0))
goto found;
 


Re: [PATCH, libcpp]: Use x86 __builtin_ia32_pcmpestri128 instead of asm.

2012-06-18 Thread Richard Henderson
On 2012-06-18 13:19, Uros Bizjak wrote:
/* ??? The builtin doesn't understand that the PCMPESTRI read from
memory need not be aligned.  */
 -  __asm (%vpcmpestri $0, (%1), %2
 -  : =c(index) : r(s), x(search), a(4), d(16));
 +  sv = __builtin_ia32_loaddqu ((const char *) s);
 +  index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0);
 +


Surely the comment can be removed too then?


r~