On Tue, 2008-12-16 at 19:12 +0800, Andy Polyakov wrote: > >>> The cipher and digest support is at the granularity of "nid"s, and > >>> these combine algorithm, key-length, and mode. So if you implement > >>> support for those cipher,length,mode combinations that can be > >>> accelerated by AES-NI, your engine will only be invoked for those > >>> combinations. You're not obliged to implement anything else, and > >>> indeed there is nothing to be gained by doing so. > >> The situation is: > >> > >> - We implement cbc and ecb mode in engine > >> - If we implement cfb and ofb in engine too, we will duplicate code of > >> cfb and ofb mode itself. > > The plan is to consolidate mode implementations, so it doesn't have to > be the case [anymore], see http://cvs.openssl.org/chngview?cn=17692.
Good! Hope that can be merged quickly. > >> - If we do not implement cfb and ofb in engine, no code duplication, > >> BUT we can NOT get AES-NI acceleration for AES core block algorithm > >> (which benefit cfb and ofb too) until we have a "branch" version. > > > > OK, I (mis)understood from your original mail that you could only > > accelerate a subset of modes. > > Just to clarify CBC situation. While it's absolutely correct that > *de*cryption is the one that can take full advantage of pipe-lining, > dedicated *en*cryption procedure should also be implemented in > assembler. Why? It doesn't come as surprise that CBC timing is sum of > time spent in block procedure and time spent performing the block > chaining. The latter can be underestimated and as block procedure gets > faster it actually becomes underestimated. I reckon that with 4x faster > block procedure, C timing for block chaining would be comparable with > block procedure. This in turn means that overall performance would be > almost twice as low as if chaining was implemented in assembler. This > applies to x86_64, on x86 performance loss would be even higher... OK, I will implement CBC encryption with ASM too. > > If you can accelerate them all, then > > please do so by implementing an intel/aes-ni engine. But not by > > branching in the vanilla implementation. > > > >> So my suggestion is: > >> > >> - Accelerate AES core block algorithm with "branch" version. Which is > >> used by cbc, cfb and ofb too. > >> > >> - Accelerate AES ecb and ctr? with "engine" version. > > > > And my suggestion is: > > > > - write an engine for your hardware. > > I second it. And additional note. As padlock engine was mentioned, I can > imagine that the idea of using inline assembler will pop up in the head. > Please don't! As already mentioned we support other compilers as well > and it's favorable if gcc-ims can be avoided. Well, in 32-bit case it > might be acceptable (both GNU and Microsoft compilers support inline > assembler), but not in 64-bit case (GNU is the only one supporting > inline assembler). OK. I will use same format as aes-x86_64.pl. > As for FIPS. Given current precedent it should be noted that if "branch" > version is certified, then the branch becomes bound to be taken. In > other words "branched" version would be prohibited to reach certified > mode of operation on CPU that does not support the instruction set > extension in question. Then why does it have to be branched? Having this > in mind wouldn't it make as much sense to implement module that can be > used as *drop-in replacement* for aes-[586|x86_64].pl? So that those who > are willing to pursue certification for given hardware can do so with > not so much hassle(*)? Would it be effort duplication? Does not have to > be! Because the code can be used in engine context just as well... > > Now to practicalities. What I can do to help. I can put together perl > scripts for x86_64 and x86, which can be used as drop-in replacement for > aes-[586|x86_64].pl as well as in engine context. Note that "drop-in > replacement" implies presence of CBC procedure, though I'd be reluctant > to implement pipe-lined version. At least not without further > consideration, because it might turn out that pipe-lined version doesn't > have to monolithic. Most notably one can break decryption into > multi-block ECB and separate multi-block chaining to minimize developing > effort. A. Thank you very much. I can change the format to perl format, but need your help to test it on Windows 64 and fix some issue such as SSE operands. I think AES-NI based pipelined implementation can be a start point for general version. Best Regards, Huang Ying
signature.asc
Description: This is a digitally signed message part