Re: [PATCH v3 0/4] Patchset to use PCLMULQDQ to accelerate CRC-T10DIF checksum computation

2013-05-13 Thread Herbert Xu
On Wed, May 01, 2013 at 12:52:47PM -0700, Tim Chen wrote:
 Currently the CRC-T10DIF checksum is computed using a generic table lookup
 algorithm.  By switching the checksum to PCLMULQDQ based computation,
 we can speedup the computation by 8x for checksumming 512 bytes and
 even more for larger buffer size.  This will improve performance of SCSI
 drivers turning on the CRC-T10IDF checksum.  In our SSD based experiments,
 we have seen increase disk throughput by 3.5x with T10DIF for 512 byte
 block size.
 
 This patch set provides the x86_64 routine using PCLMULQDQ instruction
 and switches the crc_t10dif library function to use the faster PCLMULQDQ
 based routine when available.
 
 Tim
 
 v3
 1. Update the crct10dif crypto transform used in the crct10dif library in a 
 safe way.
 2. Load the accelerated t10dif transform for the x86_64 cpus that support it.
 3. Added generic crct10dif crypto transform.

All applied.  Thanks Tim.
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/4] Patchset to use PCLMULQDQ to accelerate CRC-T10DIF checksum computation

2013-05-06 Thread Tim Chen
On Wed, 2013-05-01 at 12:52 -0700, Tim Chen wrote:
 Currently the CRC-T10DIF checksum is computed using a generic table lookup
 algorithm.  By switching the checksum to PCLMULQDQ based computation,
 we can speedup the computation by 8x for checksumming 512 bytes and
 even more for larger buffer size.  This will improve performance of SCSI
 drivers turning on the CRC-T10IDF checksum.  In our SSD based experiments,
 we have seen increase disk throughput by 3.5x with T10DIF for 512 byte
 block size.
 
 This patch set provides the x86_64 routine using PCLMULQDQ instruction
 and switches the crc_t10dif library function to use the faster PCLMULQDQ
 based routine when available.
 
 Tim
 
 v3
 1. Update the crct10dif crypto transform used in the crct10dif library in a 
 safe way.
 2. Load the accelerated t10dif transform for the x86_64 cpus that support it.
 3. Added generic crct10dif crypto transform.
 

Herbert,

Any feedback on this updated patchset?  Thanks.

Tim

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/4] Patchset to use PCLMULQDQ to accelerate CRC-T10DIF checksum computation

2013-05-01 Thread Tim Chen
Currently the CRC-T10DIF checksum is computed using a generic table lookup
algorithm.  By switching the checksum to PCLMULQDQ based computation,
we can speedup the computation by 8x for checksumming 512 bytes and
even more for larger buffer size.  This will improve performance of SCSI
drivers turning on the CRC-T10IDF checksum.  In our SSD based experiments,
we have seen increase disk throughput by 3.5x with T10DIF for 512 byte
block size.

This patch set provides the x86_64 routine using PCLMULQDQ instruction
and switches the crc_t10dif library function to use the faster PCLMULQDQ
based routine when available.

Tim

v3
1. Update the crct10dif crypto transform used in the crct10dif library in a 
safe way.
2. Load the accelerated t10dif transform for the x86_64 cpus that support it.
3. Added generic crct10dif crypto transform.

v2
1. Get rid of unnecessary xmm registers save and restore and fix ENDPROC
position in PCLMULQDQ version of crc t10dif computation.
2. Fix URL to paper reference of CRC computation with PCLMULQDQ.
3. Add one additional tcrypt test case to exercise more code paths through
crc t10dif computation.
4. Fix config dependencies of CRYPTO_CRCT10DIF.

Thanks to Herbert Xu, Matthew Wilcox and Jussi Kivilinna who reviewed the 
patches and
Keith Busch for testing version 1 of the patch set.


Tim Chen (4):
  Wrap crc_t10dif function all to use crypto transform framework
  Accelerated CRC T10 DIF computation with PCLMULQDQ instruction
  Glue code to cast accelerated CRCT10DIF assembly as a crypto
transform
  Simple correctness and speed test for CRCT10DIF hash

 arch/x86/crypto/Makefile|   2 +
 arch/x86/crypto/crct10dif-pcl-asm_64.S  | 643 
 arch/x86/crypto/crct10dif-pclmul_glue.c | 157 
 crypto/Kconfig  |  20 +
 crypto/Makefile |   1 +
 crypto/crct10dif.c  | 126 +++
 crypto/tcrypt.c |   8 +
 crypto/testmgr.c|  10 +
 crypto/testmgr.h|  33 ++
 include/linux/crc-t10dif.h  |   5 +
 lib/crc-t10dif.c|  95 -
 11 files changed, 1098 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/crypto/crct10dif-pcl-asm_64.S
 create mode 100644 arch/x86/crypto/crct10dif-pclmul_glue.c
 create mode 100644 crypto/crct10dif.c

-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html