Re: [PATCH, libcpp]: Use x86 __builtin_ia32_pcmpestri128 instead of asm.
On Tue, Jun 19, 2012 at 12:07 AM, Richard Henderson r...@redhat.com wrote: On 2012-06-18 13:19, Uros Bizjak wrote: /* ??? The builtin doesn't understand that the PCMPESTRI read from memory need not be aligned. */ - __asm (%vpcmpestri $0, (%1), %2 - : =c(index) : r(s), x(search), a(4), d(16)); + sv = __builtin_ia32_loaddqu ((const char *) s); + index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0); + Surely the comment can be removed too then? I'm not sure there. The builtin, as defined, expects V16QI operand with xm constraint. Using: int test (const char *s1) { const v16qi *p = (const v16qi *)(unsigned long) s1; return __builtin_ia32_pcmpistri128 (*p, ...); } will generate movdqa before pcmpistri. With x86 pcmp[ie]str patch, we trick gcc to pass unaligned memory to the pcmp[ie]str RTX, but we still need __builtin_ia32_loaddqu in front of __builtin_ia32_pcmpestri128. Uros.
Re: [PATCH, libcpp]: Use x86 __builtin_ia32_pcmpestri128 instead of asm.
On Tue, Jun 19, 2012 at 8:38 AM, Uros Bizjak ubiz...@gmail.com wrote: On Tue, Jun 19, 2012 at 12:07 AM, Richard Henderson r...@redhat.com wrote: On 2012-06-18 13:19, Uros Bizjak wrote: /* ??? The builtin doesn't understand that the PCMPESTRI read from memory need not be aligned. */ - __asm (%vpcmpestri $0, (%1), %2 - : =c(index) : r(s), x(search), a(4), d(16)); + sv = __builtin_ia32_loaddqu ((const char *) s); + index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0); + Surely the comment can be removed too then? I'm not sure there. The builtin, as defined, expects V16QI operand with xm constraint. Using: int test (const char *s1) { const v16qi *p = (const v16qi *)(unsigned long) s1; return __builtin_ia32_pcmpistri128 (*p, ...); } will generate movdqa before pcmpistri. Pedantic correction: __builtin_ia32_pcmpistri128 (v16qi_arg, *p, N); movdqa in front of this builtin will be generated with -O0. Uros.
Re: [PATCH, libcpp]: Use x86 __builtin_ia32_pcmpestri128 instead of asm.
On 2012-06-18 23:38, Uros Bizjak wrote: On Tue, Jun 19, 2012 at 12:07 AM, Richard Henderson r...@redhat.com wrote: On 2012-06-18 13:19, Uros Bizjak wrote: /* ??? The builtin doesn't understand that the PCMPESTRI read from memory need not be aligned. */ - __asm (%vpcmpestri $0, (%1), %2 - : =c(index) : r(s), x(search), a(4), d(16)); + sv = __builtin_ia32_loaddqu ((const char *) s); + index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0); + Surely the comment can be removed too then? I'm not sure there. The builtin, as defined, expects V16QI operand with xm constraint. Fair enough. I'm ok with the patch as-is. r~
[PATCH, libcpp]: Use x86 __builtin_ia32_pcmpestri128 instead of asm.
Hello! Following the patch that allows unaligned operands in pcmpestri [1], we can substitute x86 asm in lex.c with equivalent builtin functions. 2011-06-18 Uros Bizjak ubiz...@gmail.com * lex.c (search_line_sse42): Use __builtin_ia32_loaddqu and __builtin_ia32_pcmpestri128 instead of asm. Bootstrapped and regression tested on x86_64-pc-linux-gnu SSE4.2 target. Also, I have checked that the same code is generated for changed function. OK for mainline? [1] http://gcc.gnu.org/ml/gcc-patches/2012-06/msg01189.html Uros. Index: lex.c === --- lex.c (revision 188750) +++ lex.c (working copy) @@ -420,6 +420,7 @@ search_line_sse42 (const uchar *s, const uchar *en { typedef char v16qi __attribute__ ((__vector_size__ (16))); static const v16qi search = { '\n', '\r', '?', '\\' }; + v16qi sv; uintptr_t si = (uintptr_t)s; uintptr_t index; @@ -439,8 +440,9 @@ search_line_sse42 (const uchar *s, const uchar *en /* ??? The builtin doesn't understand that the PCMPESTRI read from memory need not be aligned. */ - __asm (%vpcmpestri $0, (%1), %2 -: =c(index) : r(s), x(search), a(4), d(16)); + sv = __builtin_ia32_loaddqu ((const char *) s); + index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0); + if (__builtin_expect (index 16, 0)) goto found;
Re: [PATCH, libcpp]: Use x86 __builtin_ia32_pcmpestri128 instead of asm.
On 2012-06-18 13:19, Uros Bizjak wrote: /* ??? The builtin doesn't understand that the PCMPESTRI read from memory need not be aligned. */ - __asm (%vpcmpestri $0, (%1), %2 - : =c(index) : r(s), x(search), a(4), d(16)); + sv = __builtin_ia32_loaddqu ((const char *) s); + index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0); + Surely the comment can be removed too then? r~