Vyacheslav Pascarel created NUTCH-2407: ------------------------------------------
Summary: Memory leak causing Nutch Server to run out of memory Key: NUTCH-2407 URL: https://issues.apache.org/jira/browse/NUTCH-2407 Project: Nutch Issue Type: Bug Components: nutch server Affects Versions: 2.3.1 Environment: Ubuntu 16.04 64-bit Oracle Java 8 64-bit Nutch 2.3.1 (standalone deployment) MongoDB 3.4 Reporter: Vyacheslav Pascarel My application is trying to perform continuous crawling using Nutch REST services. The application injects a seed URL and then repeats GENERATE/FETCH/PARSE/UPDATEDB sequence requested number of times (each step in the sequence is executed upon successful competition of the previous step then the whole sequence is repeated again). Here is a brief description of the job: * Number of GENERATE/FETCH/PARSE/UPDATEDB cycles per run: 50 * 'topN' parameter value of GENERATE step in each cycle: 10 * Seed URL: http://www.cnn.com * Regex URL filters for all jobs: ** *"-^.\{1000,\}$"* - exclude very long URLs ** *"+."* - include the rest To monitor Nutch server I use Java VisualVM that comes with Java SDK. After each run (50 cycles of GENERATE/FETCH/PARSE/UPDATEDB) I perform garbage collection using the mentioned tool and check memory usage. My observation is that Nutch Server leaks ~25MB per run. NOTES: I added custom HTTP DELETE services to clean job history in NutchServerPoolExecutor and remove all custom configurations from RAMConfManager after each run. So observed ~25MB memory leak is after job history/configuration cleanup. -- This message was sent by Atlassian JIRA (v6.4.14#64029)