Re: X509 reference counts
Geoff Thorpe [EMAIL PROTECTED]: The idea is that you hand those BIOs over to the SSL library, you usually don't keep pointers of your own. SSL_free(ssl) will call BIO_free for each of them, but just once if bio_read == bio_write, so usually everything works as intended. Obviously this is not the cleanest interface conceivable ... Well as I mentioned repeatedly on the SSL_SESSION discussion a while back, "the idea is that ..." seems to be common justification for deliberate and lazy inconsistency, both from the point of view of the library design, and from the point of view of library-usage. No, there are also inconsistencies that cannot be justified this way :-) (SSL_CTX_set_tmp_dh does not increase a reference count or copy the structure, while SSL_CTX_set_tmp_rsa does. There is an explanation for this, but I would not call it a "justification".) In this case I think the programmer(s) just wanted to "throw the BIOs into the SSL and forget about them" ... ie. why free your reference when you can just pass it over to another structure and forget about your references? Whilst this "works" it makes a mockery of the so-called "reference counting mechanism" and leaves programmers constantly in a quandry as how to correctly manage references - and occasionally how to avoid race-conditions too! Which is why, one day, we should provide variants of all those "set"s and "get"s that somehow encode in there names whether reference counts are increased or not. [...] "handing over" references and such cutting of corners leads to these inflexibilities in the way these objects/structures can be used. Agreed. [...] If you keep only the bio_ssl pointer, BIO_free_all will free everything (bio_read will be freed twice because it's also at the next_bio pointer, so its reference count goes down to zero). But I don't want to "keep only the bio_ssl pointer" ... for one thing, once that handshake is complete I want to examine the session negotiated, grab the peer certificate (both of which reveal further reference counting anomalies but that's another soap-box), and possibly other SSL things that have nothing to do with the BIO which is simply an IO device. If you want to do those SSL things, then the easiest thing is not to use a bio_ssl in the first place. Many programs work directly with SSL pointers without adding the extra BIO layer; probably this is why the reference-counting problems inherent in bio_ssl's don't seem to disturb too many people. (Or, they avoid bio_ssl's *because* they look strange.) Similarly, using memory BIOs like ssl_test.s does always looked like an ugly kludge to me, presumably because it is -- thus usually the read and write BIOs are the same and the reference counting problems in this area are not visible to the library user. [...] That's why the comment in bio.h says "WARNING WARNING ... make sure you are doing a BIO_free_all()" since SSLeay 0.9.0b. BIO_free_all follows the next_bio chain, BIO_free does not; I'm not sure if BIO_free has any use at all except when you are sure that that next_bio == NULL in which case both functions do the same anyway. Hardly bodes well when one has to sift through the .h file to discover API design comments that themselves admit to being a tad obscure and, even after finding and reading, leaves us in some doubt as to how things work and what the point is. Finding out how things work is usually possible given that the source is available; I'm not always sure what the point is though. __ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]
Re: X509 reference counts
Ramsay, Ron wrote: When you speak of breaking existing applications, I guess you mean applications based on the s_server code (or those that use sockets). As someone who has written code based on BIOs and not sockets, I am somewhat concerned that BIO reference counts are broken. Apparently I cannot make my code robust! We're still not at release 1.0. Please, can something be done? Well the problem is this it doesn't just apply to BIOs and SSL it applies to lots of other things as well. There are two types of function that obtain and set one structure in another. There is the XXX_get_YYY() and XXX_set_YYY(). The problem in this case is with the XXX_set_YYY(). Some of these up the reference count of the passed object and some do not. The general logic is that those that do are the ones where it makes sense to use the passed object later and those that do not are where the object is effectively "taken over". It could be argued that once you've set a BIO to be used for SSL you aren't going to do anything with it later. Unfortunately the logic isn't consistent and isn't documented on a function by function basis. Anyway the problem for existing code is this. If you have a function that "takes over" an object then you'll have something like this: x = X_new(); XXX_set_YYY(y); ... do something with x ... X_free(x); Whereas a reference count function will do this: x = X_new(); XXX_set_YYY(y); Y_free(y); ... do something with x ... X_free(x); If you change the logic then any code using a "take over" function will need to be converted or it will have a memory leak. For a long running mutlithreaded SSL server if this is not done it will grow until it eats all available memory. A "take over" function can't use the "reference count" version because it will end up using freed memory and crash... I think the best long term solution is to include a log of various "top level" functions that keeps account of which function allocates what in which thread. Then a thread can call thread_cleanup() or whatever and automatically have its unfreed memory released or at least have it log things in a more readable way. That way the logic can be changed and the cleanup function can handle the mess. Steve. -- Dr Stephen N. Henson. http://www.drh-consultancy.demon.co.uk/ Personal Email: [EMAIL PROTECTED] Senior crypto engineer, Celo Communications: http://www.celocom.com/ Core developer of the OpenSSL project: http://www.openssl.org/ Business Email: [EMAIL PROTECTED] PGP key: via homepage. __ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]
Re: X509 reference counts
Hi, On Tue, 9 Nov 1999, Dr Stephen Henson wrote: Well I don't know the reason myself but from looking at the code I'd guess that those functions that return an object with an upped reference count tend to be those where its likely the application will want to keep the object around. This things like X509_get_version() will return an internal pointer because you are more likely to just check the value and do nothing else whereas with other examples you might want to keep the object afterwards. I'd reached a similar guess ... perhaps this could be made at least mildly sane by restricting this behaviour to "internal" functions (ones beginning with a lower case name, eg. ssl_get_whatever rather than any SSL_get_zombied_pointer) but it would still be a bit ugly. Another way to perhaps legitimise when reference counting can be safely ignored would be as follows; if it is contained within a function operating on an object which already has a reference to the thing being "get"'d. Eg. Inside a function operating on an SSL* object, an internal function may examine the peer certificate without incrementing the reference count - the SSL object has already obtained (upped) a reference in the X509 object so the function could temporarily "get" and use an X509* pointer without upping the reference count and all would be ok. However, this really shouldn't be allowed to happen across function boundaries, and should be restricted to those small internal managings where it's provably OK not to grab another reference and would otherwise drag down concurrency by having excess locking going on. Anything less than that however, and the reference counting can't be trusted by the programmer and all code will have to carefully hack around what OpenSSL does (or doesn't do) in each particular case. Of course having said that this isn't consistent and I changed the X509_get_pubkey() behaviour myself to up the public key reference count. Tracing and fixing all the memory leaks this caused was not an easy task :-( I've had to do exactly the same thing with what I believe is the worst culprit I've found yet; SSL_get_session(). It returns an SSL_SESSION* pointer that is, from the moment you receive it, unsafe. Even if you immediately make a manual call to up its reference count you could still get caught out by a context-switch race. If you don't even try this, and instead just hold onto the pointer for any length of time then you really invite trouble. I was using this for a client utility where I didn't want a cache, just a record of the last successfully used session ... so upon SSL_is_init_finished(), I would call SSL_get_session and store whatever was retrieved. Then next time I start a handshake, I'd check that the session was still within reasonable timeout limits before applying it to a new client SSL. The problem is that if the session had expired and been destroyed during that time, the code that stored the SSL_SESSION* pointer will never know until it segfaults trying to do something with the SSL_SESSION* pointer. I'm using the following patch, anyone have any objections to making this permanent? --- clip here - --- openssl-0-9-5-dev/ssl/ssl_sess.cThu Nov 4 21:21:02 1999 +++ openssl-0-9-5-geoff/ssl/ssl_sess.c Wed Nov 10 15:39:34 1999 @@ -69,7 +69,16 @@ SSL_SESSION *SSL_get_session(SSL *ssl) { - return(ssl-session); + SSL_SESSION *sess; + /* Need to lock this all up rather than just use CRYPTO_add so that +* somebody doesn't free ssl-session between when we check it's +* non-null and when we up the reference count. */ + CRYPTO_r_lock(CRYPTO_LOCK_SSL_SESSION); + sess = ssl-session; + if(sess) + sess-references++; + CRYPTO_r_unlock(CRYPTO_LOCK_SSL_SESSION); + return(sess); } int SSL_SESSION_get_ex_new_index(long argl, char *argp, int (*new_func)(), --- clip here - Changing the logic is problematical not just because of the effort involved. Many programs will expect the old behaviour. In the example of a multithreaded server its memory requirements will grow steadily if they don't free things up: this is one case where a multithreaded server is at a disadvantage compared to a fork() + exit() kind. Which is probably why secure web-servers haven't blown the lid off OpenSSL yet. (a) the session caching is generally overriden by a more carefully managed shared-memory cache, and (b) the child processes can be allowed to die off after n requests and thus bury any memory leaks that might have amassed during that time. I think the best way to attempt to keep the memory under control would be to get the reference counting consistent and understandable, then fix the kludged code that used it and is thus temporarily broken, and then we'd have some hope of being able to keep OpenSSL's memory management stable and usable without fork()s and exit()s (or ugly code).
Re: X509 reference counts
Hi, On Tue, 9 Nov 1999, Dr Stephen Henson wrote: Yes horrible isn't it? I've also noticed wildly inconsistent behaviour. When you call something like XXX_get_YYY() you might get something which will last only as long as the parent and it shouldn't be freed at all or something which will persist by virtue of the upped reference count. sure is horrible ... I played around with this quite a bit yesterday and the further in I looked the weirder it got. I'd ask why it is this way but I'm not sure I want to know. :-) Also the way reference counts are upped is not friendly. I suppose we should have functions like _up_reference_count(x); for use at an application level rather than the current stuff which messes round with structure internals. I'm currently looking into rewriting the verify code. I'm hoping to retain compatability with the old behavior except where it does things that are really silly. More consistency with the reference counts is certainly one issue... However if it works the way I hope it will then it will support things like proper chain verify and certificate trust settings so hopefully no one would *want* to do things the old way. I suppose the problem is that the more code that is built on top of (and around the skirting of) the existing code - the more there is that will break if the underlying machinery is fixed up. Adding code that *does* work logically will be (almost by definition) inconsistent with the existing API. I find reference counts extremely useful, but they do more harm than good when implemented inconsistently. If I have an X509 object, and a call to "add it to something" returns successfully, I expect that I can hang on to my reference for as long or short a period of time as I choose before freeing it and things will "just work". In my case I wanted to maintain my pointers to ssl, bio_ssl, bio_read, and bio_write and free them all when I was done with them (not at the same time I might add). One look at the reference counts and I knew it was not going to be that straightforward. The most obvious case is that by setting the read and write BIOs on an SSL, one should be able to free the BIOs afterward and the SSL will still have references to them. That is not the case - freeing the BIOs causes the SSL to go bang (ssl/ssl_lib.c:SSL_set_bio). I also tracked down that weirdness about connecting a BIO_SSL with an SSL, namely it only grabs a reference to the SSL's read bio (ssl/bio_ssl.c:ssl_ctrl with BIO_C_SET_SSL). It's described in a comment in crypto/bio/bio.h about line 415. I can't understand why it's done this way and I know if I carelessly wade in there trying to change it I will unleash the dominos. The fact that this jumps around between ssl/ and crypto/bio so much is enough of a deterrent in itself ... Cheers, Geoff -- Geoff ThorpeEmail: [EMAIL PROTECTED] Cryptographic Software Engineer, C2Net Europehttp://www.int.c2.net -- May I just take this opportunity to say that of all the people I have EVER emailed, you are definitely one of them. __ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]
Re: X509 reference counts
Geoff Thorpe wrote: Hi, On Tue, 9 Nov 1999, Dr Stephen Henson wrote: Yes horrible isn't it? I've also noticed wildly inconsistent behaviour. When you call something like XXX_get_YYY() you might get something which will last only as long as the parent and it shouldn't be freed at all or something which will persist by virtue of the upped reference count. sure is horrible ... I played around with this quite a bit yesterday and the further in I looked the weirder it got. I'd ask why it is this way but I'm not sure I want to know. :-) Well I don't know the reason myself but from looking at the code I'd guess that those functions that return an object with an upped reference count tend to be those where its likely the application will want to keep the object around. This things like X509_get_version() will return an internal pointer because you are more likely to just check the value and do nothing else whereas with other examples you might want to keep the object afterwards. Of course having said that this isn't consistent and I changed the X509_get_pubkey() behaviour myself to up the public key reference count. Tracing and fixing all the memory leaks this caused was not an easy task :-( Changing the logic is problematical not just because of the effort involved. Many programs will expect the old behaviour. In the example of a multithreaded server its memory requirements will grow steadily if they don't free things up: this is one case where a multithreaded server is at a disadvantage compared to a fork() + exit() kind. Steve. -- Dr Stephen N. Henson. http://www.drh-consultancy.demon.co.uk/ Personal Email: [EMAIL PROTECTED] Senior crypto engineer, Celo Communications: http://www.celocom.com/ Core developer of the OpenSSL project: http://www.openssl.org/ Business Email: [EMAIL PROTECTED] PGP key: via homepage. __ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]
RE: X509 reference counts
When you speak of breaking existing applications, I guess you mean applications based on the s_server code (or those that use sockets). As someone who has written code based on BIOs and not sockets, I am somewhat concerned that BIO reference counts are broken. Apparently I cannot make my code robust! We're still not at release 1.0. Please, can something be done? -Original Message- From: Dr Stephen Henson [SMTP:[EMAIL PROTECTED]] Sent: Wednesday, 10 November 1999 8:19 To: [EMAIL PROTECTED] Subject: Re: X509 reference counts Geoff Thorpe wrote: Hi, On Tue, 9 Nov 1999, Dr Stephen Henson wrote: Yes horrible isn't it? I've also noticed wildly inconsistent behaviour. When you call something like XXX_get_YYY() you might get something which will last only as long as the parent and it shouldn't be freed at all or something which will persist by virtue of the upped reference count. sure is horrible ... I played around with this quite a bit yesterday and the further in I looked the weirder it got. I'd ask why it is this way but I'm not sure I want to know. :-) Well I don't know the reason myself but from looking at the code I'd guess that those functions that return an object with an upped reference count tend to be those where its likely the application will want to keep the object around. This things like X509_get_version() will return an internal pointer because you are more likely to just check the value and do nothing else whereas with other examples you might want to keep the object afterwards. Of course having said that this isn't consistent and I changed the X509_get_pubkey() behaviour myself to up the public key reference count. Tracing and fixing all the memory leaks this caused was not an easy task :-( Changing the logic is problematical not just because of the effort involved. Many programs will expect the old behaviour. In the example of a multithreaded server its memory requirements will grow steadily if they don't free things up: this is one case where a multithreaded server is at a disadvantage compared to a fork() + exit() kind. Steve. -- Dr Stephen N. Henson. http://www.drh-consultancy.demon.co.uk/ Personal Email: [EMAIL PROTECTED] Senior crypto engineer, Celo Communications: http://www.celocom.com/ Core developer of the OpenSSL project: http://www.openssl.org/ Business Email: [EMAIL PROTECTED] PGP key: via homepage. __ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED] __ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]
X509 reference counts
Hi there, I've been checking up on some memory management issues and have a couple of questions. If anyone has any thoughts or views I'd very much appreciate them. X509_STORE_CTX_get_current_cert() and a few others do not up the reference count on the X509 object returned. I was carefully X509_free()ing everything I used and found this anomaly courtesy of some seg-faults. Some of the functions in x509_vfy.c seem to modify the reference count so I'm not sure if this is a bug or intentional. It's easy to workaround just by not freeing, but I'd prefer to know if this should be changed rather than missing it and having my programs develop leaks. I'm getting extremely weird reference counts on SSLs and BIOs as below - this is happening with 0_9_4 and with a recent snapshot of 0_9_5. (1) I create two BIOs, a read (bio_read) and write (bio_write) that are both BIO_s_mem()s, an SSL bio (bio_ssl), and an SSL object (ssl). At this point they all have reference count 1. Ie. in crude ASCII-art form; +-+ --- bio_read bio_ssl --- | ssl | +-+ --- bio_write (2) I add set the read and write BIOs to "ssl" with SSL_set_bio(ssl, bio_read, bio_write). This does not change the reference count on anything (everything remains at 1). I'd have expected this to up the reference count on bio_read and bio_write. (3) When I call SSL_free(ssl) later, it causes the reference counts for all of ssl, bio_read and bio_write to be decremented. This is inconsistent with (2) but seems the right way to operate. (4) When I call BIO_set_ssl(bio_ssl, ssl, ...) it cause an increment in ONLY the reference count of bio_read - this is the weirdest of all. I would expect this to cause ssl, bio_read AND bio_write to all get incremented but it only increments bio_read. (5) Calling BIO_free(bio_ssl) decrements the reference counts for all of ssl, bio_read, and bio_write. This is reasonable but inconsistent with (4) and pretty much everything else. I'm not quite sure what I should be doing with these and what the intended behaviour of the reference mechanisms are - it would certainly appear that they don't function correctly whatever the intention. Any thoughts? Thanks in advance, Geoff -- Geoff ThorpeEmail: [EMAIL PROTECTED] Cryptographic Software Engineer, C2Net Europehttp://www.int.c2.net -- May I just take this opportunity to say that of all the people I have EVER emailed, you are definitely one of them. __ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]
Re: X509 reference counts
Geoff Thorpe wrote: Hi there, I've been checking up on some memory management issues and have a couple of questions. If anyone has any thoughts or views I'd very much appreciate them. X509_STORE_CTX_get_current_cert() and a few others do not up the reference count on the X509 object returned. I was carefully X509_free()ing everything I used and found this anomaly courtesy of some seg-faults. Some of the functions in x509_vfy.c seem to modify the reference count so I'm not sure if this is a bug or intentional. It's easy to workaround just by not freeing, but I'd prefer to know if this should be changed rather than missing it and having my programs develop leaks. [ other examples deleted ] Yes horrible isn't it? I've also noticed wildly inconsistent behaviour. When you call something like XXX_get_YYY() you might get something which will last only as long as the parent and it shouldn't be freed at all or something which will persist by virtue of the upped reference count. Also the way reference counts are upped is not friendly. I suppose we should have functions like _up_reference_count(x); for use at an application level rather than the current stuff which messes round with structure internals. I'm currently looking into rewriting the verify code. I'm hoping to retain compatability with the old behavior except where it does things that are really silly. More consistency with the reference counts is certainly one issue... However if it works the way I hope it will then it will support things like proper chain verify and certificate trust settings so hopefully no one would *want* to do things the old way. Steve. -- Dr Stephen N. Henson. http://www.drh-consultancy.demon.co.uk/ Personal Email: [EMAIL PROTECTED] Senior crypto engineer, Celo Communications: http://www.celocom.com/ Core developer of the OpenSSL project: http://www.openssl.org/ Business Email: [EMAIL PROTECTED] PGP key: via homepage. __ OpenSSL Project http://www.openssl.org Development Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]