Re: [squid-dev] Broken trunk after r14735, r14726

Christos Tsantilas Wed, 20 Jul 2016 09:04:35 -0700

On 07/19/2016 09:52 AM, Amos Jeffries wrote:

On 18/07/2016 11:12 p.m., Christos Tsantilas wrote:

On 07/16/2016 03:56 PM, Amos Jeffries wrote:

On 16/07/2016 7:02 a.m., Alex Rousskov wrote:

Hello,


      There are two more recent changes that broke trunk:

* After r14735 (Replaced TidyPointer with std::unique_ptr), Squid cannot
start due to an "std::bad_function_call" exception.

* After r14726 (GnuTLS: support for TLS session resume): Squid segfaults
when attempting to connect to a Secure ICAP service. Official Squid
v4.0.12 suffers from this bug.

Stack traces from both crashes are quoted at the end of this email.

Please fix these regressions or undo the changes that created or exposed
them.


<snip>

* segfault when attempting to connect to a Secure ICAP REQMOD service
(tested with r14726, r14734):


Does this patch fix the session issue ?

=== modified file 'src/security/Session.cc'
--- src/security/Session.cc     2016-07-07 19:03:02 +0000
+++ src/security/Session.cc     2016-07-16 12:43:38 +0000
@@ -53,7 +53,7 @@
   void
   Security::SetSessionResumeData(const Security::SessionPtr &s, const
Security::SessionStatePointer &data)
   {
-    if (s) {
+    if (data) {
   #if USE_OPENSSL
           (void)SSL_set_session(s, data.get());
   #elif USE_GNUTLS


I'm a little worried about the code calling SetSessionResumeData.
OpenSSL documentation states:
    "If there is already a session set inside ssl (because it was set with
SSL_set_session() before or because the same ssl was already used for a
connection), SSL_SESSION_free() will be called for that session."

But our SetSessionResumeData() is called after setting up the sessions
host data, etc. So I'm thinking all that setup in
Ssl::BlindPeerConnector::initializeTls() may be thrown away by the
resume action being called.


Squid crashes at the first TLS connection trying to establish to the
ICAP server.  There is not any session to resume so the SSL session
related methods should not called at all.

It is a strange crash. Is it a corrupted SSL object?

I must say that I am worrying a lot for all of these changes.
It is very difficult for me to follow them, and already I have
difficulties to read and debug squid openSSL relate code.

We are using our own naming scheme for openSSL structures, eg
"Security::SessionPtr" instead of "SSL *" or
"Security::SessionStatePointer" instead of "SSL_SESSION *" and this is
make it very difficult to follow Squid/openSSL code.

It is difficult to read an example openSSL code and trying convert it to
squid, or reading a reference implementation and trying check against
squid code. Someone is obligated to search for definitions and
equivalents types all the time.

I have not ever study for a good way to support both GnuTLS and openSSL
for squid, to know the problems,  but probably I would start to
implement openSSLAcceptor/gnuTlsAcceptor and
openSSLPeerConnector/gnuTLSPeerConnector classes.
Internally gnuTLS and openSSL should use their own types. This is will
help current and future squid developers.


That is indeed where I started, and what I started doing. It turns out
we then have to maintain feature compatibility in those two relatively
high-level classes. Which are internally designed and implemented in
different ways. Thats a bit tricky to start with, but seemed reasonable.
   Then there is the killer fact that to do ssl-bump those classes are
all using fd_table[].ssl things shared between Server and Client and
Icap and Ecap 'sides' (head => meet brick wall, doh!).


I am just expressing my worries here, I do not want to impose anything
and if there was a related discussion in squid-dev, sorry that I do not
participate and I did not express my concerns earlier.


As Alex noted, there has not been any detailed discussion. Just partial
coverage of some details during the occasional patch reviews. And it is
good to have opinions voiced/written.


As the one most recently looking at the ways to do combined API. I'm
just as frustrated as you. For the same reasons. Though in the reverse
direction. It has been quite hard to look at nice 3-4 line GnuTLS
examples and then trying to figure out a) what 10-20 OpenSSL calls are
used to do the equivalent, and even more annoyingly b) where those
OpenSSL calls are spinkled around the Squid codebase.

Regarding (b); TBH, I'm quite impressed you can translate from the
OpenSSL examples to the Squid use of OpenSSL.

It is already difficult, squid openSSL related code is not trivial, weshould not make it worst.



The choices we have for combining basically come down to four ways:

1) implementing everything twice. The way OpenSSL is spreading its calls
all over the place that means the entire SSL-Bump, HTTPS receiving,
sending, BIO, and Comm I/O layers and FD socket handling.

About SSL-Bump, already there is a try to move out of openSSL hugeparts. An example is the SSL messages parser.

You need to implement HTTPS receiving.sending, BIO, Comm I/O layers andfd socket twice. You can not avoid it.


A *huge* amount of duplication. Almost everything that stores OpenSSL
raw pointers today would need duplicating to work differently for
GnuTLS. Just because the pointer usage is different. Sometimes only in
the meaning of 0 return values - but thats enough.


Exactly.

2) picking one library API and writing our own compatibility layer for
other libraries using that API.

Unfortunately, there is a history of people trying this. The GnuTLS
compatibility openssl.h header is lacking many OpenSSL API symbols for
good reasons which boil down to intentional major design differences in
the libraries:
  a) OpenSSL exposes these annoying but useful locking references. GnuTLS
hides any locking from the callers - leaving us unsure if something is
referenced or copied, but safe to use any pointers we're given.
  b) OpenSSL uses N more API calls to explicitly do all sorts of things,
with ordering dependencies IIRC - where GnuTLS uses far fewer calls.

Sadly the Squid logics are sometimes heavily leveraging the locking
behaviour of OpenSSL to pass info from one context to another, by
updating shared structures.

This is has to do with internal OpenSSL operation. This is how this APIworks.

All of the above problems you are describing show me how difficult andmaybe wrong is to try implement GnuTLS having in your mind the existingOpenSSL implementation.

The two SDKs although have similarities are different in philosophy.

Why do we need common types for both SDKs?

The only type needed by squid for openSSL is the "SSL *" which is storedinside fde class. And the gnutls_session_int for gnutls. These areshould be enough

(maybe with some other like the certificate errors.).

Squid should use SSL or gnutls_session_int and pass it to openSSL orgnuTLS related classes and should not need to know about other internalOpenSSL or GnuTLS structures.

Recently we split the PeerConnector classes from FwdState class, theFwdState now does not have strong dependencies to openSSL.


We can do the same for ConnStateData class.

If you decide to implement NEW GnuTls PeerConnector class, withoutsslbump for the beggining, it is easy. The PeerConnector class is about500 lines of code and the BlindPeerConnector other 100 lines.

And after this is ready the common code like the certificates validation(150 lines of code) can be moved into a parent class.

But trying to fit GnuTLS and openSSL into the same PeerConnector classwill fail and will make the code complex.


The SSLBump also is still under heavy development.

Again here the PeekingPeerConnector is about 400 lines of code. Is thiscode we are worrying that it will be duplicated?Because most of the other SSL related code (~6000 lines of code) MUSTreimplemented for gnutls.

Recently we implemented our own Handshake messages parser, we are notrelying on openSSL for detecting SSL features and basic errors. The *Bioclasses are very simple now, they just do buffering.

This is will help a future GnuTLS implementation too.



3) sprinkling #if conditions like candy through the whole codebase.

(Er, yuck). We have too much of that already for OpenSSL.


4) designing our code to use an abstraction API that renames all the
library structures and functions to some thing we understand easier **.

Still a lot of code re-writing, and even more re-arranging so the
library call(s) themselves happen away from the huge number of
components mentioned in (1). But doable within a decade (maybe).


So, I've gone with this (4) and taking Alex suggestions on how to
abstract better as I go.


I believe that that the PeerConnector is a part of this API.
It is an AsyncJob which when finishes calls a callback function.


If you think its a good way to go, we might be able to use the reverse
compatibility direction for (3). ie. Using GnuTLS symbol names and
implementing them for OpenSSL API. But that wont help much when it comes
to supporting NSS for Fedora/RHEL/CentOS, PolarSSL (for SuSE? I forget),
or Crypto for MacOS.


On server side you do not need to use a lot of "#if".

This is required for ConnStateData. But for ConnStateData we should golike we did for FwdState. We should implement a new class and put herethe SSL related code.

I do not know the form of this class. Maybe we need an TlsConnStateDatakid of ConnStateData, or maybe we need to implement TlsAcceptor classes.


Then this class should be implemented for openSSL and for gnuTls.

I know it is not easy.



** YMMV on easier. I fully sympathize with the examples-vs-abstraction
problem. It might help if we communicated a bit better on that so it
turned out to be something you understand as well, not just my ideas.
  I had the idea that you were flooded with SSL-Bump problems? has that
situation eased?

The new naming scheme for SSL structures disrupted me, looks to medifficult to use it and adds an overhead.

I am just worry.



Amos

_______________________________________________
squid-dev mailing list
[email protected]
http://lists.squid-cache.org/listinfo/squid-dev

Re: [squid-dev] Broken trunk after r14735, r14726

Reply via email to