Hi all

This patch uses a parallel computing optimization algorithm to improve crc32c 
computing performance on ARM. The algorithm comes from Intel whitepaper: 
crc-iscsi-polynomial-crc32-instruction-paper. Input data is divided into three 
equal-sized blocks.Three parallel blocks (crc0, crc1, crc2) for 1024 Bytes.One 
Block: 42(BLK_LENGTH) * 8(step length: crc32c_u64) bytes

Crc32c unitest: https://gist.github.com/gaoxyt/138fd53ca1eead8102eeb9204067f7e4
Crc32c benchmark: 
https://gist.github.com/gaoxyt/4506c10fc06b3501445e32c4257113e9
It gets ~2x speedup compared to linear Arm crc32c instructions.

I'll create a CommitFests ticket for this submission.
Any comments or feedback are welcome.

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

Attachment: 0001-crc32c-parallel-computation-optimization-on-arm.patch
Description: 0001-crc32c-parallel-computation-optimization-on-arm.patch

Reply via email to