On Fri, Jul 10, 2015 at 02:06:26PM -0600, Jeff Law wrote:
> On 07/10/2015 07:25 AM, Ondřej Bílka wrote:
> >On Fri, Jul 10, 2015 at 12:43:48PM +0200, Jakub Jelinek wrote:
> >>On Fri, Jul 10, 2015 at 11:37:18AM +0200, Uros Bizjak wrote:
> >>>Have you tried new SSE4.2 implementation (the one with asm
On 07/10/2015 07:25 AM, Ondřej Bílka wrote:
On Fri, Jul 10, 2015 at 12:43:48PM +0200, Jakub Jelinek wrote:
On Fri, Jul 10, 2015 at 11:37:18AM +0200, Uros Bizjak wrote:
Have you tried new SSE4.2 implementation (the one with asm flags) with
unrolled loop?
Also, the SSE4.2 implementation looks s
On Fri, Jul 10, 2015 at 12:43:48PM +0200, Jakub Jelinek wrote:
> On Fri, Jul 10, 2015 at 11:37:18AM +0200, Uros Bizjak wrote:
> > Have you tried new SSE4.2 implementation (the one with asm flags) with
> > unrolled loop?
>
> Also, the SSE4.2 implementation looks shorter, so more I-cache friendly,
>