Hello..
I've started looking into adding SSL hardware acceleration for our upcoming
Luna 340 based NSP board.. There seems to be no well defined way of adding hw
acceleration or handling keys which are not stored in host RAM.. We are aiming
at accelerating the handshaking phase as well as the actual payload
encryption/decryption and releasing the result back to the OpenSSL project..
I'd like to run a couple things past you to see if what i came up with
so far is not blatantly wrong :)
The accelerator has a set of fairly high level functionality allowing you
to generate pub/priv key pairs, derive keys, speedup the DH/RSA key
exchange, do packet crypto etc.. The main idea here is to accelerate the session
setup/renegotiantion and any subsequent payload processing..
The additions i have in mind require some sort of a mechanism to store
key handles (ex a 32bit value) representing actual keys stored on the chip..
So far i've found three ways of doing it.. First by using the ex_data struct
in X509 and EVP_KEY structures. Second, by using the key buffers itself.
Third, by adding a new field to X509/EVP_KEY representing the on chip key
handle. I'm leaning towards the ex_data structure approach as it seems to
disturb the least amount of existing code. Is there a better way of
approaching it?
Second thing which needs to be done is accelerating the server/client key
exchange, ie: the RSA & DH operations in ssl3_send_server_key_exchange
(DH_generate_key and RSA_sign in particular) as well as
ssl3_get_client_key_exchange (RSA_private_decrypt & DH_compute_key followed
by generate_master_secret() which the NSP can perform in one step)..
Once you generate your master (NSP stores it onboard), all subsequent
payload processing happens onboard (ie API is called with the ssl session id
, master key handle and payload).. It computes the mac/enc and returns you
the encrypted/decrypted data..
The obvious solution to this would be to 'override' the ssl3_accept to
call a specialized ssl3_send_server_key_exchange & ssl3_get_client_key_exchange
for the handshaking phase and then change (call a different) ssl3_get_record,
do_ssl3_write to perform payload crypto onboard. What i don't like about this
approach is that it doesn't really provide any framework for adding different
acceleration boards.. You'd simply end up with alot of slightly changed
ssl3_accept
FSMs if some crypto board can accelerate only a part of the handshaking process
in a way that requires you to fiddle with that fn. The alternative is alot of
ifdef's in
original ssl3_accept which is as ugly imo..
Basically, does anyone see a better way of adding a hardware acceleration
layer generic enough to handle accel boards from different vendors?
---------------------------------------------
Maciek Klimkowski
Software Developer - Chrysalis-ITS
1688 Woodward Drive - Ottawa, Ontario K2C 3R7
Tel. (613) 723-5076 x431
email - [EMAIL PROTECTED]
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List [EMAIL PROTECTED]
Automated List Manager [EMAIL PROTECTED]