Hello, I'm working on developing an OpenSSL engine to take advantage of various encryption algorithms which can be loaded into an FPGA which resides in a host computer system (SGI Altix, FWIW).
These algorithms see an approximately 2x speedup if the FPGA hardware can directly access the system memory which contains the input and output buffers, rather than having the CPU push the data into the FPGA (i.e. the FPGA device becomes a DMA master). To achieve this, the system memory of course needs to be locked down (i.e. non-swappable), be contiguous over the length of the entire buffer (possibly hundreds of megabytes), and needs to obey certain alignment restrictions (128 bytes in our particular case). This can all be achieved fairly readily, as long as the application programmer is aware of this, by allocating the correct memory and passing these as the buffers to EVP_EncryptUpdate(), and ensuring that only even multiples of the encryption block length are passed down to EVP_EncryptUpdate(). One suitable way to do this on Linux is through the use of the hugeltb filesystem to perform the allocations. However, there is a problem with the tail data processing that may occur in EVP_EncryptFinal_ex(), or when the provided data doesn't fill an encryption block in EVP_EncryptUpdate(). In these functions, ctx->cipher->do_cipher() is called using ctx->buf as the input data source. This buffer is not allocated in any special manner to ensure the block size, alignment, and page locking we require. I'm not sure who the authority on this would be, but in general do you think it would meet with acceptance for mainline OpenSSL inclusion if I provided an extension to the encryption engine interface to provide a means to allocate this buffer via an engine entrypoint (i.e. alongside do_cipher, init, finish, and ctrl)? I'd also like to expose this as a top-level EVP_CIPHER_CTX_malloc() function, so that applications could perform engine-optimized allocations without needing to be aware of the specifics of how that allocation should occur, leaving those specifics to the engine code. If there were another way around this I would certainly do so, but as it's a limitation of the DMA hardware, and gives us such a substantial boost in performance (again, approximately 2x using PIO transfers from the CPU), it seems a reasonable thing to do. Thanks, Brent Casavant -- Brent Casavant All music is folk music. I ain't [EMAIL PROTECTED] never heard a horse sing a song. Silicon Graphics, Inc. -- Louis Armstrong ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List [email protected] Automated List Manager [EMAIL PROTECTED]
