Hello again,

I decided to strip things down to the bare minimum, and I have what I
believe to be a test case that should reproduce this situation, leaving
tomcat etc. Out of the equation. I have attached below a very simple
class that loops over a search (TestNutch.java). If I run this with the
following arguments :

java -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=8086
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false -classpath
../lib/nutch-1.0.jar:../lib/hadoop-0.19.1-core.jar:../lib/commons-loggin
g-1.0.4.jar:../lib/lucene-core-2.4.0.jar:../lib/lucene-misc-2.4.0.jar:.
TestNutch

I can attach to it via JMX in VisualVM and watch what's going on. I can
see a continuously growing number of threads being created, all of which
are in the following state :


"Thread-735" - Thread t...@752
   java.lang.Thread.State: TIMED_WAITING
        at java.lang.Thread.sleep(Native Method)
        at
org.apache.nutch.searcher.FetchedSegments$SegmentUpdater.run(FetchedSegm
ents.java:115)

   Locked ownable synchronizers:
        - None

And I can see my heap growing to the maximum size. As far as I am aware,
this should not be happening - the call to nutchBean.close() should stop
everything. This looks very much like the issue being reported at
NUTCH-738, but I have this patch installed, along with the related
NUTCH-746. Can anyone else confirm this ?

Thanks,

-Mark

---TestNutch.java class below this line

import java.io.*;

import org.apache.nutch.searcher.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.*;
import org.apache.nutch.util.*;

public class TestNutch {

        public static void main(String[] args) {

                String searchTerm = "test";
                String configFile =
"/var/lib/nutch/conf/nutch-site.xml";

                while (true) {
                        try {

                                Configuration nutchConf =
NutchConfiguration.create();
                                Path configPath = new Path(configFile);
                                nutchConf.addResource(configPath);
                                NutchBean nutchBean = new
NutchBean(nutchConf);

                                Query nutchQuery =
Query.parse(searchTerm, nutchConf);
                                Hits nutchHits =
nutchBean.search(nutchQuery, 100);

                                System.out.println("Searched for " +
searchTerm + ", got "
                                                +
nutchHits.getLength());

                                nutchBean.close();

                        } catch (IOException e) {
                                System.out.println(e.toString());
                        }

                }

        }

}

---End of TestNutch.java




-----Original Message-----
From: Mark Round [mailto:[email protected]] 
Sent: 20 August 2009 11:23
To: [email protected]
Subject: Possible memory leak in Nutch-1.0 ?

Hi all,

I am experiencing serious out of memory errors when querying Nutch, and
would appreciate any pointers or advice. I have a Nutch index that I'm
searching using a simple servlet. This servlet queries the index and
returns the results as XML, so other systems in my network can make use
of the index as a web service. 

In a nutshell, the problem seems to be that after successive queries to
this servlet, the Tenured Gen increases until I run out of heap space. 

I am running Nutch-1.0, with the NUTCH-738 and NUTCH-746 patches applied
(more about that below), Tomcat 6.0.20 and Sun's JVM, 1.6.0_12-b04 on
Debian Lenny 32-bit. I have also tested with OpenJDK, and got the same
results.

My servlet just does the following :

Configuration nutchConf = NutchConfiguration.create();
Path configPath = new Path(NUTCH_DIR + "/conf/" + site+
"/nutch-site.xml");
nutchConf.addResource(configPath);
NutchBean nutchBean = new NutchBean(nutchConf);
Query nutchQuery = Query.parse(nutchSearchString, nutchConf);
Hits nutchHits = nutchBean.search(nutchQuery, maxResults);
...
... Format the results as XML and output them
...
nutchBean.close();

After querying it a few hundred times, my Tenured Gen is up to 50Mb,
after a few thousand requests, I end up with over 500Mb used. I can of
course increase my heap size, but the problem is that no matter what I
set it to, eventually it will all get consumed and the only option is to
restart Tomcat.

I have obtained a heap dump and run it through jhat, but to be honest
I'm not really sure what I'm looking for. I've made the dump available
at http://www.markround.com/static/tomcat.hprof, in case that helps
anyone investigate further.

For what it's worth, I didn't seem to get this issue with Nutch-0.9. 

Regarding the two patches I have applied - I had to make use of them as
otherwise, I get a lot of threads in the TIMED_WAITING state, which
according to Lambda Probe are stuck here :

java.lang.Thread.sleep ( native code )
org.apache.nutch.searcher.FetchedSegments$SegmentUpdater.run (
FetchedSegments.java:115 )

With the 2 patches applied, I still get lots of these "stuck" threads,
but they do seem to eventually get cleaned up; I wonder if this could
have anything to do with the problem ?

Please let me know if there are any other diagnostics I can run, or
information I can provide.

Many thanks,

-Mark







Reply via email to