Hi, #1. It's prohibitively painful to maintain [and develop] code without access to relevant hardware. Is there a way to arrange access to Nano-based system [remote is preferred]?
#2. Please do provide up-to-date documentation. Sample code doesn't provide required details. > This a patch set which updates PadLock engine for VIA C7 > and Nano CPUs.It refers to AES with ECB/CBC/CFB/OFB, SHA1/224/256, > RSA sign/verify and RNG, and all of them are accelerated by PadLock > hardware of VIA C7 and Nano CPUs. Some parts of this patch set are > based on the codes originally written by Michal Ludvig and Timo Teras. > The patch set is available for both 32-bit/64-bit GNU compilers and > MS compilers, and it is produced from OpenSSL-1.0.0-stable branch > on OpenSSL CVS server. If other versionsof this patch is needed, such > as OpenSSL-0.9.8-stable branch, please tell me, I will send it soon. The protocol is to develop in development HEAD branch and then back-port if[!] possible/applicable. As for 1.0.0 and earlier version. A principal decision was taken to effectively freeze them and accept only genuine bug fixes. This means that suggested patch won't be committed to CVS. Alternative is provide something similar to "Intel Acceleration Engine", see http://www.mail-archive.com/openssl-dev@openssl.org/msg29626.html. Naturally provided that new code works in development branch first. I've just committed overhauled version that makes Padlock engine independent of inline assembler (and therefore independent on compiler), see http://cvs.openssl.org/chngview?cn=21360. Assembler modules already include SHA primitives (not tested though), but no PMM yet (planned). x86_64 module was not actually tested, feedback is appreciated. Either way, this is new base-line to build upon further. As for submission per se. How come bn_mul_mont_padlock doesn't check if result is larger than modulus? For reference crypto/bn/asm/via-mont.pl does it. crypto/bn/asm/via-mont.pl effectively says that surrounding C code is critical for performance. Yet we see malloc in suggested code. But don't rush to replace it with e.g. alloca, because "right thing to do" is to allocate buffers in modular exponentiation subroutine and thus take buffer handling out of montgomery multiplication subroutine [as well as out of loop calling it]. Moreover, in modular exponentiation context there is no need to have BN_mod_mul_montgomery_padlock, one that handles all possible lengths. See latest bn_exp.c in development branch for ideas (for reference, there will be further optimizations). To summarize. Modular exponentiation subroutine should allocate all the buffers for Montgomery multiplication (on stack or with malloc depending on how much data is required) and call the latter directly with minimal overhead. Suggested modulo exponentiation follows implementation path that was shown to be prone to side-channel attacks. Yes, the attack is applicable on multi-core designs, but is there indication that there won't be multi-core Padlock-capable CPUs in the future? I'd argue that there is no reason for not taking "constant time" approach. As for SHA. It was shown that there is a way to use SHA even on pre-Nano, see http://www.mail-archive.com/openssl-dev@openssl.org/msg21787.html. Challenge is to make it multi-thread safe. It would take allocation of dynamic lock and serializing access to "crash page" allocated at engine load. ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org