I'm using Webware for a web application we're developing and I believe we may
have uncovered an obscure bug related to forwarding requests to other
servlets. While we were having a penetration test done on the application on
our live server, we noticed that the app server would stop responding after a
period of time and output an error message of "OSError: [Errno 24] Too many
open files". The web server itself still responded, so it is definitely
something to do with Webware. We theorized that it had something to do with
the volume of traffic being sent at the site during the testing, however we
have this web application running on another server that is currently in
production use that is not experiencing this problem and it's getting hit
with very high traffic as well.
We eventually figured out that it only happened when repeated errors were
generated, as was the case during the penetration test when the site was
being hammered for cross-site scripting attacks, SQL injection statements and
the like. We were able to reproduce the bug on our alpha system by using a
simple script to repeatedly access the same page in a loop while making sure
we generated an exception during the servlet processing each time. We saw
the same error in the output in less than 100 iterations consistently.
Upon further investigation, it became apparent that this only happens when
accessing a page that ends up forwarding the request to another servlet,
which then generates the offending exception. If we accessed a
non-forwarding page using this script, it would run seemingly indefinitely
(at least well over 500 iterations) even generating an exception and
triggering a 500 response each time.
I investigated the forwarding code in WebKit/Application.py and found the
following code in the includeURL() method in lines 671-683:
try:
servlet.runTransaction(trans)
except EndResponse:
pass
self.returnServlet(servlet, trans)
# Restore everything properly
req.popParent()
req.setURLPath(currentPath)
req._serverSidePath = currentServerSidePath
req._serverSideContextPath = currentServerSideContextPath
req._contextName = currentContextName
trans._servlet = currentServlet
All the code past the try/except block does not get executed if an exception
is generated during servlet processing. I tried catching the exception in a
general except statement, then specifically executing that code before
continuing to throw the exception back up the chain and this seemed to
eliminate the error. I tried commenting out the various lines and narrowed
it down to a single line that seems to be causing this error if not executed.
The new code is as follows:
try:
servlet.runTransaction(trans)
except EndResponse:
pass
except:
trans._servlet = currentServlet
raise
self.returnServlet(servlet, trans)
# Restore everything properly
req.popParent()
req.setURLPath(currentPath)
req._serverSidePath = currentServerSidePath
req._serverSideContextPath = currentServerSideContextPath
req._contextName = currentContextName
trans._servlet = currentServlet
After I did this, I hammered the forwarding page over 900 times with my script
without seeing the too many open files error. For some reason, not
reassigning the original servlet back to the transaction after a forward
somehow causes something to be left open, which eventually generates this
error. Unfortunately, I don't have an answer to why it causes this, but our
best guess is that not doing this somehow causes the original servlet to get
stuck in "limbo" without being returned to the servlet cache. This would in
turn cause new servlet objects to be generated for each new request. I guess
it's running out of available files for the process before running out of
memory?
In any case, adding these 3 lines to the Application class seems to fix the
problem, but I'm concerned that this is simply a band-aid solution to a
deeper rooted problem related to the caching and returning of the servlet
objects to the pool. Nearest I can figure is that it has something to do
with when the die() method of the transaction object eventually gets called,
which dereferences all the attributes associated with the object. If anyone
has more experience with the lower level functions of Webware, perhaps they
can shed some light on this issue. If this solution is indeed a sound one,
then we'll simply continue to use it (perhaps it can make it's way into a
future version?) but I want to be sure we're not missing something important.
If this is indeed a symptom of a deeper problem, it could potentially show up
elsewhere. Thanks for your help!
--