This is related to this issue: http://rt.openssl.org/Ticket/Display.html?id=2100
The issue was closed as resolved recently, saying timer limiting was implemented. However, the closing message didn't respond to the previous message from Andre Heinecke which mention that the timer limiting isn't applied to the inner loop of all code branches. The one for Visual Studio builds has the time limiter on the inner loop when calling Heap32Next, but for all other compilers the time limiter is only applied to the outer Heap32NextList loop, not the inner one. We got affected and it has a severe performance impact to our application. Some background to our application, I work for Trimble SketchUp and we embed the Ruby interpreter into our application in order to provide a Ruby scripting API. We release for Windows and OSX. We use the Ruby builds from Ruby Windows Installer which uses mingw - meaning the binaries we have doesn't have the inner loop time limiter. Upon a fresh startup of our application when the process takes about ~70MB we experience 3-5 second lag when OpenSSL::Random::random_bytes(0) is called the first time. This function is called by several things in the Ruby libraries so it has a high surface area for us. As we load 3d models the time increases. The worst case we've seen so far has been ~15 minutes when we had a model open and our process consumed over 4GB RAM. The profiler flagged Heap32Next as red hot being called from OpenSSL. From that we investigated and found the issue linked in the top of this thread. We even tried to call that from a second thread, but for some reason it appear to block the whole process. We'd like to ask for the issue to be reopened and look at again for all compilers. I guess the short term quick fix is to add the time limiter to the inner loop of the second code path. (rand_win.c line 559 - 1.0.1j) But even then, it means we should always get a one second lag upon first time use. Under OSX there is no such lag - and it would be nice to have matching performance. One second lag even under low memory consumption isn't always ideal. As part of researching the Heap32Next performance issues we came across this article by Raymond Chen: *Why is the Heap32Next function incredibly slow on Windows 7?* http://blogs.msdn.com/b/oldnewthing/archive/2012/03/23/10286665.aspx There he describe the history of the function and how its performance has degraded over time in order to prevent misuse and memory leaks. But he also mention: *But since the toolhelp library was intended for diagnostic purposes anyway (I mean, it's right there in the name: tool help), these weren't considered serious problems. Your debugging plug-in might use it to walk the heap looking for memory leaks, but you wouldn't deploy it in production, right?* And if you look at the MSDN documention for the Heap32* functions: *The functions provided by the tool help library make it easier for you to obtain information about currently executing applications. These functions are designed to streamline the creation of tools, specifically debuggers.* The indications are strong that these functions are not meant to be used by release products like in OpenSSL. Raymond suggest in the end of this article to use the newer HeapWalk function: *By the way, the recommended way to walk the contents of the heap is to use the HeapWalk function. The HeapWalk function does not suffer from this problem; enumerating the entire heap via repeated calls to HeapWalk has total running time proportional to the number of heap blocks. Note that HeapWalk can only enumerate heap blocks from the current process. If you're doing cross-process heap walking for diagnostic purposes, then you're stuck with Heap32First/Heap32Next, but since you're just doing it for diagnostic purposes, correctness should be more important to you than performance.* http://msdn.microsoft.com/en-us/library/windows/desktop/aa366710(v=vs.85).aspx We did some quick tests using Ruby to call the Win32 functions, one code snippet walking the heap like OpenSSL currently do, and then using HeapWalk - the latter was order of magnitude faster. Now, the struct it traverses is somewhat difference, but I assume you can still extract random bytes from that? Or even some other way to generate the random bytes, we're not really concerned about how, but rather the performance. I don't have the know how on how one securely generates random bytes so I didn't attempt to make a patch for this issue. But I do plea that the implementation is improved and we are willing to test the performance of new implementations to give it real world testing in an application that do stress the memory and CPU a lot of the computer. -Thomas Thomassen ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List [email protected] Automated List Manager [email protected]
