More information on our issue... it appears that the failure is occurring in a 
SSL_read that had immediately followed a SSL_write on the same 
SSL-instance/socket.  Some instances of the issue are during the same thread 
slice and some have a context-switch or two between them (for whatever that's 
worth).  Is there a case/need of some kind of "flush" between transitions of 
reading and writing?  I'm assuming not, but just fishing for leads/ideas...

Thanks, Mark.


________________________________
 From: Mark Pietras <[email protected]>
To: "[email protected]" <[email protected]> 
Sent: Wednesday, August 21, 2013 12:38 PM
Subject: Re: Fw: 1.0.0e decryption failed or bad record mac
 


We rebuilt OpenSSL 1.0.1e with the changes from commit 9ab3ce124616 and it does 
not impact the problem we're seeing.

For what it's worth, we're seeing this "decryption failed or bad record mac" 
pretty regularly.  We see it dozens of times per hour on a server with 
thousands of simultaneous active users (fairly light CPU but moderate I/O 
load), but since it's a real-time communications channel going down, the effect 
is immediate and obvious to users.

It's difficult to "run tests" and make changes since it's a mission-critical 
24x7 service (otherwise we'd just cycle backwards on versions until it wasn't 
breaking any more to help diagnose), but we're willing to try out other 
suggestions or possibly-related patches?  It does appear to be some kind of 
timing issue but we've been unable to isolate it into a unit test failure 
scenario... only happens in production (of course)...

Thanks for any help, Mark.



________________________________
 From: Dr. Stephen Henson <[email protected]>
To: [email protected] 
Sent: Friday, August 16, 2013 3:15 PM
Subject: Re: Fw: 1.0.0e decryption failed or bad record mac
 

On Fri, Aug 16, 2013, Mark Pietras wrote:

> Posted something similar to -users but I thought it might make more sense
> here on -dev, I apologize if that's not the case:
> 
> 
> Recently (within last month or so), we started randomly getting this error
> in the middle of active long-duration connections (connection having been
> open minutes to hours with application traffic minimally every 60s):
> 
> error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad record
> mac
> 
> It seems to occur during bursty traffic periods. The only recent change to
> our application in a way that changes the utilization of OpenSSL (other than
> perhaps timing differences) was to set cipher preferences to server instead
> of client
 via:
> 
> SSL_CTX_set_options( ssl_ctx_server, SSL_OP_CIPHER_SERVER_PREFERENCE );
> 
> We did some searching and see a lot of discussion regarding this "decryption
> failed" error.  Some search results indicate issues with utilizing AES
> (which is certainly a possibility given our cipher preference change).
> 
> Some recent (2013) search results indicate a seemingly related issue fixed
> in 1.0.0e, however that's the version we're on.
> 
> Some other results indicate this patch is
> related: http://git.openssl.org/gitweb/?p=openssl.git;a=commitdiff;h=32cc247
> but the patch seems to be (just) prior to 1.0.0e, it's not clear.
> 
> Anyone have any insight on this based on this admittedly small level of
> information?  Thanks... Mark.
> 

Do you mean 1.0.1e? If so see if commit 9ab3ce124616 helps.

Steve.
--
Dr Stephen N. Henson. OpenSSL project core developer.
Commercial tech support now available see: http://www.openssl.org/
______________________________________________________________________
OpenSSL Project                                http://www.openssl.org
Development Mailing List                      [email protected]
Automated List Manager                          [email protected]

Reply via email to