-Original Message-
From: dan.j.willi...@gmail.com [mailto:dan.j.willi...@gmail.com] On
Behalf Of Dan Williams
Sent: Wednesday, August 15, 2012 4:02 AM
To: Liu Qiang-B32616
Cc: dan.j.willi...@intel.com; vinod.k...@intel.com; a...@arndb.de;
herb...@gondor.apana.org.au;
Quoting Johannes Goetzfried
johannes.goetzfr...@informatik.stud.uni-erlangen.de:
This patch adds a x86_64/avx assembler implementation of the Twofish block
cipher. The implementation processes eight blocks in parallel (two 4 block
chunk AVX operations). The table-lookups are done in
On Wed, Aug 15, 2012 at 11:42:16AM +0300, Jussi Kivilinna wrote:
I started thinking about the performance on AMD Bulldozer.
vmovq/vmovd/vpextr*/vpinsr* between FPU and general purpose registers
on AMD CPU is alot slower (latencies from 8 to 12 cycles) than on
Intel sandy-bridge (where
Quoting Borislav Petkov b...@alien8.de:
On Wed, Aug 15, 2012 at 11:42:16AM +0300, Jussi Kivilinna wrote:
I started thinking about the performance on AMD Bulldozer.
vmovq/vmovd/vpextr*/vpinsr* between FPU and general purpose registers
on AMD CPU is alot slower (latencies from 8 to 12 cycles)
Ok, here we go. Raw data below.
On Wed, Aug 15, 2012 at 02:00:16PM +0300, Jussi Kivilinna wrote:
And if you tell me exactly how to run the tests and on what kernel,
I'll try to do so.
Ok, the box is a single-socket Bulldozer: AMD FX(tm)-8100 Eight-Core
Processor stepping 02; kernel is
Quoting Borislav Petkov b...@alien8.de:
Ok, here we go. Raw data below.
Thanks alot!
Twofish-avx appears somewhat slower than 3way, ~9% slower with 256byte
blocks to ~3% slower with 8kb blocks.
snip
Let me know if you need more tests.
I posted patch that optimize twofish-avx few
On Wed, Aug 15, 2012 at 04:48:54PM +0300, Jussi Kivilinna wrote:
I posted patch that optimize twofish-avx few weeks ago:
http://marc.info/?l=linux-crypto-vgerm=134364845024825w=2
I'd be interested to know, if this is patch helps on Bulldozer.
Sure, can you inline it here too please. The
On Wed, Aug 15, 2012 at 04:48:54PM +0300, Jussi Kivilinna wrote:
I posted patch that optimize twofish-avx few weeks ago:
http://marc.info/?l=linux-crypto-vgerm=134364845024825w=2
I'd be interested to know, if this is patch helps on Bulldozer.
Sure, can you inline it here too please.
On Wed, Aug 15, 2012 at 05:22:03PM +0300, Jussi Kivilinna wrote:
Patch replaces 'movb' instructions with 'movzbl' to break false
register dependencies and interleaves instructions better for
out-of-order scheduling.
Also move common round code to separate function to reduce object
size.
Quoting Borislav Petkov b...@alien8.de:
On Wed, Aug 15, 2012 at 05:22:03PM +0300, Jussi Kivilinna wrote:
Patch replaces 'movb' instructions with 'movzbl' to break false
register dependencies and interleaves instructions better for
out-of-order scheduling.
Also move common round code to
10 matches
Mail list logo