GitHub user sachouche opened a pull request:
https://github.com/apache/drill/pull/1008
drill-5890: Fixed a file descriptor leak in Drill's test-suite
Problem Description
- The Drill test-suite uses two surefire processes to run tests
- This has the advantage of avoiding class reloading if the JVM exited
after running a test class
- The side effect of this approach, is that resource leaks could be
problematic
- When running the Drill's test-suite on MacOS (Sierra) my tests failed
with a max FD descriptors reached
- Had to increase the maximum number of open FDs for the whole machine and
per process
- The process is described in the following
[link](https://superuser.com/questions/302754/increase-the-maximum-number-of-open-file-descriptors-in-snow-leopard)
- Two limit files "limit.maxfiles.plist" and "limit.maxproc.plist" have to
be created under "/Library/LaunchDaemons"
- Originally, I had to set the maximum number of FDs per process to a large
value (100,000 and the system to 200,000) for the tests to succeed
FD Leak Cause
Debugging the Drill test suite, it was noticed
- A base class BaseTestQuery has a @BeforeClass and @AfterClass TestNG tags
- This means that each Drill test class extending from BaseTestQuery will
have a setup method called before any tests are executed and a cleanup method
invoked when all the tests are done (or a fatal error in between)
- The OpenClient() method was starting a DrillBit and creating a client
connection to it
- DrillBit's BootStrapContext class was initializing two Netty
EventLoopGroup objects which internally opened 20 FDs each
- It was noticed that one of them was not getting de-initialized
Fix
- Added logic within the BootStrapContext object to shutdown the
EventLoopGroup objects if they have been already shutdown (and are not in the
process of being shutdown)
- The fix tries to shut both objects because the container class should
ideally manage the lifecycle of its objects; at least, the code should clearly
articulate lifecycle management responsibilities to avoid leaks
- Used the "shutdownGracefully" method since it was a) already used by our
code and b) is advertised to have sensible timeout values
- The added shutdown calls are being invoked only when consumer objects
have been also shutdown
- Running the tests show that the number of FDs per surefire process
doesn't extend beyond few hundreds (majority created for JAR files loading)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sachouche/drill drill-5890
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/drill/pull/1008.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1008
commit 5f8bad865a78ab265015ca21184d6c59e22d1c95
Author: Salim Achouche
Date: 2017-10-24T17:12:19Z
drill-5890: Fixed a file descriptor leak in Drill's test-suite
---