On Wed, May 11, 2011 at 12:58 AM, Michael Gale <[email protected]>wrote:

>
> Hey,
>
>     First, thanks for the reply and feedback.
>
> The section of the python code that invokes the function call is below, I
> highlighted line 138 which is where the exception is occurring according to
> the stack trace.
>
> --snip-- *Python-2.7/lib/python2.7/ssl.py*
> 132     def read(self, len=1024):
> 133
> 134         """Read up to LEN bytes and return them.
> 135         Return zero-length string on EOF."""
> 136
> 137         try:
> *138             return self._sslobj.read(len)*
> 139         except SSLError, x:
> 140             if x.args[0] == SSL_ERROR_EOF and
> self.suppress_ragged_eofs:
> 141                 return ''
> 142             else:
> 143                 raise
> --snip--
>
> I traced this to *Python-2.7/Modules/_ssl.c*, then to *
> openssl-0.9.8b/ssl/ssl_lib.c* where it calls SSL_read. So it does not look
> like it is calling the internal method.
>

Nothing glaringly obvious to me in any of the code snippets. :-(   (And,
Dave, thanks for catching f.u. where I was missing the __LINE__ in my code!)


It's past 0200 hours here so I'd better get some shut-eye, but here's a few
thoughts to ponder:


- you mentioned 'satellite link'. Given the wickedness of the issue, a few
baseline checks to get our assumptions straight: does your QA rig simulate
the 'long fat pipe' that's usual for satellite comms?  (or does it not?)
(long fat pipe ~ long round trip delay, high RX/TX rate, but I've also seen
extremely high RX/TX ratios (way back when: both sides had different uplink
bandwidth then; kinda like ADSL but much worse). All this /should/'ve been
handled at the TCP level and not affect any upper layers (like SSL/sockets),
but better ask now then ride on an incorrect assumption.

- the above implies the assumption that you're using SSL over TCP, not DTLS
(which, IIRC, is UDP-based. Correct me if I'm wrong on that one.)

- given a grep through the code, all I can find as 'probable culprits' are
internal rbuf (read buffer) related and nothing that can be more or less
directly be tickled by SSL_write/read calls. Mix that with my own experience
of no trouble whatsoever in the RX/TX buffer dept. for years plus an error
which hints at an impending buffer overrun happening at yours (and then,
only in production. Nasty bugger to tackle) my next bet it's something
related to the 'satellite comms behaviour' as visible at the socket I/F
level. (I was also thinking about packet size and thus ever-so-slightly
altered recv/send behaviour, but I've been using OpenSSL over PPP, raw
serial, Ethernets and a few network oddities and never encountered this. :-S
)

- You said you had to 'rollback': is that to a state where you use an older
OpenSSL stack, a different SSL stack or no SSL at all? (Just another
baseline check here.)


- the only thing that I can see which /might/ have impact (and this is mere
conjecture! But we're apparently looking at some sort of obscure edge case,
so options are open in my mind) is when both ends differ in their OpenSSL
builds, where one has OpenSSL compiled _with_ compression enabled, and the
other has not (OPENSSL_NO_COMP).
Line of thought goes like this: the thing that possibly maybe might screw
you up is when the other side sends packets which don't somehow fit in the
RX buffer, the SSL stack keeps on asking ssl3_read_n() to fetch ever more,
and the SSL packet is not caught as overlarge in other parts of the code
(which would be odd by itself, but let's continue). So the question is then:
how is the RX buffer dimensioned? It's set up in either ssl3_setup_buffers()
or ssl3_setup_read_buffer() and space depends on the SSL3_RT_MAX_PACKET_SIZE
define. Which, when followed through, is only really dependent on either the
OPENSSL_NO_COMP setting or someone seriously screwing around in the OpenSSL
code internals. So I opt for the 'no_comp' setting being a 'possibly maybe',
however improbable it may be. (because I've used that mix before and no
trouble for me; besides, one can expect both flavors to exist in the wild so
why is this a first then?)


Another thing that affects packet size in OpenSSL is the
SSL_OP_MICROSOFT_BIG_SSLV3_BUFFER option, which you can set, for example,
using the SSL_ctrl(s, SSL_CTRL_OPTIONS, SSL_OP_MICROSOFT_BIG_SSLV3_BUFFER,
NULL) call (or the eqv. for the SSL_CTX: SSL->options are copied from the
SSL_CTX). When one side has the option and the other has not, it just
might..... (again, with low probability, but these are the ways I can see
that things /could/ go wrong as you described. (Without a system where this
can be reproduced, we're down to 'intelligent' guesswork and a couple of
straws.)


Cheers and good night!


-- 
Met vriendelijke groeten / Best regards,

Ger Hobbelt

--------------------------------------------------
web:    http://www.hobbelt.com/
        http://www.hebbut.net/
mail:   [email protected]
mobile: +31-6-11 120 978
--------------------------------------------------

Reply via email to