This is related to this issue:
http://rt.openssl.org/Ticket/Display.html?id=2100

The issue was closed as resolved recently, saying timer limiting was
implemented. However, the closing message didn't respond to the previous
message from Andre Heinecke which mention that the timer limiting isn't
applied to the inner loop of all code branches.

The one for Visual Studio builds has the time limiter on the inner loop
when calling Heap32Next, but for all other compilers the time limiter is
only applied to the outer Heap32NextList loop, not the inner one.

We got affected and it has a severe performance impact to our application.

Some background to our application, I work for Trimble SketchUp and we
embed the Ruby interpreter into our application in order to provide a Ruby
scripting API. We release for Windows and OSX.
We use the Ruby builds from Ruby Windows Installer which uses mingw -
meaning the binaries we have doesn't have the inner loop time limiter.

Upon a fresh startup of our application when the process takes about ~70MB
we experience 3-5 second lag when OpenSSL::Random::random_bytes(0) is
called the first time. This function is called by several things in the
Ruby libraries so it has a high surface area for us.

As we load 3d models the time increases. The worst case we've seen so far
has been ~15 minutes when we had a model open and our process consumed over
4GB RAM.

The profiler flagged Heap32Next as red hot being called from OpenSSL. From
that we investigated and found the issue linked in the top of this thread.

We even tried to call that from a second thread, but for some reason it
appear to block the whole process.

We'd like to ask for the issue to be reopened and look at again for all
compilers.


I guess the short term quick fix is to add the time limiter to the inner
loop of the second code path. (rand_win.c line 559 - 1.0.1j)


But even then, it means we should always get a one second lag upon first
time use. Under OSX there is no such lag - and it would be nice to have
matching performance. One second lag even under low memory consumption
isn't always ideal.

As part of researching the Heap32Next performance issues we came across
this article by Raymond Chen:

*Why is the Heap32Next function incredibly slow on Windows 7?*
http://blogs.msdn.com/b/oldnewthing/archive/2012/03/23/10286665.aspx

There he describe the history of the function and how its performance has
degraded over time in order to prevent misuse and memory leaks. But he also
mention:


*But since the toolhelp library was intended for diagnostic purposes anyway
(I mean, it's right there in the name: tool help), these weren't considered
serious problems. Your debugging plug-in might use it to walk the heap
looking for memory leaks, but you wouldn't deploy it in production, right?*

And if you look at the MSDN documention for the Heap32* functions:

*The functions provided by the tool help library make it easier for you to
obtain information about currently executing applications. These functions
are designed to streamline the creation of tools, specifically debuggers.*


The indications are strong that these functions are not meant to be used by
release products like in OpenSSL.

Raymond suggest in the end of this article to use the newer HeapWalk
function:




*By the way, the recommended way to walk the contents of the heap is to use
the Heap­Walk function. The Heap­Walk function does not suffer from this
problem; enumerating the entire heap via repeated calls to Heap­Walk has
total running time proportional to the number of heap blocks. Note that
Heap­Walk can only enumerate heap blocks from the current process. If
you're doing cross-process heap walking for diagnostic purposes, then
you're stuck with Heap32­First/Heap32­Next, but since you're just doing it
for diagnostic purposes, correctness should be more important to you than
performance.*

http://msdn.microsoft.com/en-us/library/windows/desktop/aa366710(v=vs.85).aspx

We did some quick tests using Ruby to call the Win32 functions, one code
snippet walking the heap like OpenSSL currently do, and then using HeapWalk
- the latter was order of magnitude faster. Now, the struct it traverses is
somewhat difference, but I assume you can still extract random bytes from
that?

Or even some other way to generate the random bytes, we're not really
concerned about how, but rather the performance.
I don't have the know how on how one securely generates random bytes so I
didn't attempt to make a patch for this issue. But I do plea that the
implementation is improved and we are willing to test the performance of
new implementations to give it real world testing in an application that do
stress the memory and CPU a lot of the computer.

-Thomas Thomassen

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       [email protected]
Automated List Manager                           [email protected]

Reply via email to