Hello again,
I decided to strip things down to the bare minimum, and I have what I
believe to be a test case that should reproduce this situation,
leaving
tomcat etc. Out of the equation. I have attached below a very simple
class that loops over a search (TestNutch.java). If I run this with
the
following arguments :
java -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=8086
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false -classpath
../lib/nutch-1.0.jar:../lib/hadoop-0.19.1-core.jar:../lib/commons-
loggin
g-1.0.4.jar:../lib/lucene-core-2.4.0.jar:../lib/lucene-
misc-2.4.0.jar:.
TestNutch
I can attach to it via JMX in VisualVM and watch what's going on. I
can
see a continuously growing number of threads being created, all of
which
are in the following state :
"Thread-735" - Thread t...@752
java.lang.Thread.State: TIMED_WAITING
at java.lang.Thread.sleep(Native Method)
at
org.apache.nutch.searcher.FetchedSegments
$SegmentUpdater.run(FetchedSegm
ents.java:115)
Locked ownable synchronizers:
- None
And I can see my heap growing to the maximum size. As far as I am
aware,
this should not be happening - the call to nutchBean.close() should
stop
everything. This looks very much like the issue being reported at
NUTCH-738, but I have this patch installed, along with the related
NUTCH-746. Can anyone else confirm this ?
Thanks,
-Mark
---TestNutch.java class below this line
import java.io.*;
import org.apache.nutch.searcher.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.*;
import org.apache.nutch.util.*;
public class TestNutch {
public static void main(String[] args) {
String searchTerm = "test";
String configFile =
"/var/lib/nutch/conf/nutch-site.xml";
while (true) {
try {
Configuration nutchConf =
NutchConfiguration.create();
Path configPath = new
Path(configFile);
nutchConf.addResource(configPath);
NutchBean nutchBean = new
NutchBean(nutchConf);
Query nutchQuery =
Query.parse(searchTerm, nutchConf);
Hits nutchHits =
nutchBean.search(nutchQuery, 100);
System.out.println("Searched for " +
searchTerm + ", got "
+
nutchHits.getLength());
nutchBean.close();
} catch (IOException e) {
System.out.println(e.toString());
}
}
}
}
---End of TestNutch.java
-----Original Message-----
From: Mark Round [mailto:[email protected]]
Sent: 20 August 2009 11:23
To: [email protected]
Subject: Possible memory leak in Nutch-1.0 ?
Hi all,
I am experiencing serious out of memory errors when querying Nutch,
and
would appreciate any pointers or advice. I have a Nutch index that
I'm
searching using a simple servlet. This servlet queries the index and
returns the results as XML, so other systems in my network can make
use
of the index as a web service.
In a nutshell, the problem seems to be that after successive
queries to
this servlet, the Tenured Gen increases until I run out of heap
space.
I am running Nutch-1.0, with the NUTCH-738 and NUTCH-746 patches
applied
(more about that below), Tomcat 6.0.20 and Sun's JVM, 1.6.0_12-b04 on
Debian Lenny 32-bit. I have also tested with OpenJDK, and got the
same
results.
My servlet just does the following :
Configuration nutchConf = NutchConfiguration.create();
Path configPath = new Path(NUTCH_DIR + "/conf/" + site+
"/nutch-site.xml");
nutchConf.addResource(configPath);
NutchBean nutchBean = new NutchBean(nutchConf);
Query nutchQuery = Query.parse(nutchSearchString, nutchConf);
Hits nutchHits = nutchBean.search(nutchQuery, maxResults);
...
... Format the results as XML and output them
...
nutchBean.close();
After querying it a few hundred times, my Tenured Gen is up to 50Mb,
after a few thousand requests, I end up with over 500Mb used. I can
of
course increase my heap size, but the problem is that no matter
what I
set it to, eventually it will all get consumed and the only option
is to
restart Tomcat.
I have obtained a heap dump and run it through jhat, but to be honest
I'm not really sure what I'm looking for. I've made the dump
available
at http://www.markround.com/static/tomcat.hprof, in case that helps
anyone investigate further.
For what it's worth, I didn't seem to get this issue with Nutch-0.9.
Regarding the two patches I have applied - I had to make use of
them as
otherwise, I get a lot of threads in the TIMED_WAITING state, which
according to Lambda Probe are stuck here :
java.lang.Thread.sleep ( native code )
org.apache.nutch.searcher.FetchedSegments$SegmentUpdater.run (
FetchedSegments.java:115 )
With the 2 patches applied, I still get lots of these "stuck"
threads,
but they do seem to eventually get cleaned up; I wonder if this could
have anything to do with the problem ?
Please let me know if there are any other diagnostics I can run, or
information I can provide.
Many thanks,
-Mark