Re: k-byte memset/memcpy/strlen builtins

2017-01-12 Thread Martin Sebor
On 01/11/2017 09:16 AM, Robin Dapp wrote: Hi, When examining the performance of some test cases on s390 I realized that we could do better for constructs like 2-byte memcpys or 2-byte/4-byte memsets. Due to some s390-specific architectural properties, we could be faster by e.g. avoiding excessiv

Re: k-byte memset/memcpy/strlen builtins

2017-01-12 Thread Richard Biener
On Thu, Jan 12, 2017 at 9:26 AM, Robin Dapp wrote: >> Yes, for memset with larger element we could add an optab plus >> internal function combination and use that when the target wants. Or >> always use such IFN and fall back to loopy expansion. > > So, adding additional patterns in tree-loop-dis

Re: k-byte memset/memcpy/strlen builtins

2017-01-12 Thread Robin Dapp
> Yes, for memset with larger element we could add an optab plus > internal function combination and use that when the target wants. Or > always use such IFN and fall back to loopy expansion. So, adding additional patterns in tree-loop-distribute.c (and mapping them to dedicated optabs) is fine?

Re: k-byte memset/memcpy/strlen builtins

2017-01-11 Thread Aaron Sawdey
On Wed, 2017-01-11 at 17:16 +0100, Robin Dapp wrote: > Hi, Hi Robin, I thought I'd share some of what I've run into while doing similar things for the rs6000 target. First off, be aware that glibc does some macro expansion things to try to handle 1/2/3 byte string operations in some cases. Sec

Re: k-byte memset/memcpy/strlen builtins

2017-01-11 Thread Richard Biener
On January 11, 2017 5:16:43 PM GMT+01:00, Robin Dapp wrote: >Hi, > >When examining the performance of some test cases on s390 I realized >that we could do better for constructs like 2-byte memcpys or >2-byte/4-byte memsets. Due to some s390-specific architectural >properties, we could be faster b

k-byte memset/memcpy/strlen builtins

2017-01-11 Thread Robin Dapp
Hi, When examining the performance of some test cases on s390 I realized that we could do better for constructs like 2-byte memcpys or 2-byte/4-byte memsets. Due to some s390-specific architectural properties, we could be faster by e.g. avoiding excessive unrolling and using dedicated memory instr