On Tue, 2008-12-16 at 19:12 +0800, Andy Polyakov wrote:
> >>> The cipher and digest support is at the granularity of "nid"s, and
> >>> these combine algorithm, key-length, and mode. So if you implement
> >>> support for those cipher,length,mode combinations that can be
> >>> accelerated by AES-NI, your engine will only be invoked for those
> >>> combinations. You're not obliged to implement anything else, and
> >>> indeed there is nothing to be gained by doing so.
> >> The situation is:
> >>
> >> - We implement cbc and ecb mode in engine
> >> - If we implement cfb and ofb in engine too, we will duplicate code of
> >> cfb and ofb mode itself.
> 
> The plan is to consolidate mode implementations, so it doesn't have to 
> be the case [anymore], see http://cvs.openssl.org/chngview?cn=17692.

Good! Hope that can be merged quickly.

> >> - If we do not implement cfb and ofb in engine, no code duplication,
> >> BUT we can NOT get AES-NI acceleration for AES core block algorithm
> >> (which benefit cfb and ofb too) until we have a "branch" version.
> > 
> > OK, I (mis)understood from your original mail that you could only 
> > accelerate a subset of modes.
> 
> Just to clarify CBC situation. While it's absolutely correct that 
> *de*cryption is the one that can take full advantage of pipe-lining, 
> dedicated *en*cryption procedure should also be implemented in 
> assembler. Why? It doesn't come as surprise that CBC timing is sum of 
> time spent in block procedure and time spent performing the block 
> chaining. The latter can be underestimated and as block procedure gets 
> faster it actually becomes underestimated. I reckon that with 4x faster 
> block procedure, C timing for block chaining would be comparable  with 
> block procedure. This in turn means that overall performance would be 
> almost twice as low as if chaining was implemented in assembler. This 
> applies to x86_64, on x86 performance loss would be even higher...

OK, I will implement CBC encryption with ASM too.

> > If you can accelerate them all, then 
> > please do so by implementing an intel/aes-ni engine. But not by 
> > branching in the vanilla implementation.
> > 
> >> So my suggestion is:
> >>
> >> - Accelerate AES core block algorithm with "branch" version. Which is
> >> used by cbc, cfb and ofb too.
> >>
> >> - Accelerate AES ecb and ctr? with "engine" version.
> > 
> > And my suggestion is:
> > 
> > - write an engine for your hardware.
> 
> I second it. And additional note. As padlock engine was mentioned, I can 
> imagine that the idea of using inline assembler will pop up in the head. 
> Please don't! As already mentioned we support other compilers as well 
> and it's favorable if gcc-ims can be avoided. Well, in 32-bit case it 
> might be acceptable (both GNU and Microsoft compilers support inline 
> assembler), but not in 64-bit case (GNU is the only one supporting 
> inline assembler).

OK. I will use same format as aes-x86_64.pl.

> As for FIPS. Given current precedent it should be noted that if "branch" 
> version is certified, then the branch becomes bound to be taken. In 
> other words "branched" version would be prohibited to reach certified 
> mode of operation on CPU that does not support the instruction set 
> extension in question. Then why does it have to be branched? Having this 
> in mind wouldn't it make as much sense to implement module that can be 
> used as *drop-in replacement* for aes-[586|x86_64].pl? So that those who 
> are willing to pursue certification for given hardware can do so with 
> not so much hassle(*)? Would it be effort duplication? Does not have to 
> be! Because the code can be used in engine context just as well...
> 
> Now to practicalities. What I can do to help. I can put together perl 
> scripts for x86_64 and x86, which can be used as drop-in replacement for 
> aes-[586|x86_64].pl as well as in engine context. Note that "drop-in 
> replacement" implies presence of CBC procedure, though I'd be reluctant 
> to implement pipe-lined version. At least not without further 
> consideration, because it might turn out that pipe-lined version doesn't 
> have to monolithic. Most notably one can break decryption into 
> multi-block ECB and separate multi-block chaining to minimize developing 
> effort. A.

Thank you very much. I can change the format to perl format, but need
your help to test it on Windows 64 and fix some issue such as SSE
operands.

I think AES-NI based pipelined implementation can be a start point for
general version.

Best Regards,
Huang Ying

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to