Hi.
You said that you open and close the nutch bean at every request.

first
this is very expensive. create the nutch bean only once and save it in the application and read it from the application if needed.

second!!
not sure but maybe it is possible that the PluginRepository has the memory leak. i think the cache (the weakhashmap) is growing and growing.

we have detect this when we run nutch as a server application (for example inside the gui).

but i have no test to demonstrate it, only these few lines.

System.out.println(CACHE.size());
    for (int i = 0; i < 100; i++) {
      Configuration create = NutchConfiguration.create();
      NutchBean nutchBean = new NutchBean(create);
      Query.parse("foo", create);
      nutchBean.close();
      System.out.println(CACHE.size());
    }
    Thread.sleep(3000);
    System.gc();
    System.out.println("endsize:" + CACHE.size());

the end size is '100'.
but i think the size in cache should be 0.

what do you think....

marko




On Aug 20, 2009, at 4:45 PM, Kirby Bohling wrote:

Mark,

   I'm very interested in this problem.  I'm the author of those
patches.  I have access to YourKit.  I will setup your test case and
look into it hopefully in the next couple of days.  I know we have
done some stress testing, but clearly not enough if you are having
this problem.

  Anything else not boilerplate on your end?  How much memory are you
configuring it all with?  Have you tried forcing GC (if you connect
with VirtualVM you can do that sort of thing easily).  Seen what
happens in that case?

  I have seen this behave badly in the fact of Tomcat Restarts.  I
can't find identify the problem.  It runs out of permgen, but all my
profiling tools show that there is ample permgen space.

  Thanks,
        Kirby


On Thu, Aug 20, 2009 at 9:11 AM, Mark Round<[email protected]> wrote:
Hello again,

I decided to strip things down to the bare minimum, and I have what I
believe to be a test case that should reproduce this situation, leaving
tomcat etc. Out of the equation. I have attached below a very simple
class that loops over a search (TestNutch.java). If I run this with the
following arguments :

java -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=8086
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false -classpath
../lib/nutch-1.0.jar:../lib/hadoop-0.19.1-core.jar:../lib/commons- loggin g-1.0.4.jar:../lib/lucene-core-2.4.0.jar:../lib/lucene- misc-2.4.0.jar:.
TestNutch

I can attach to it via JMX in VisualVM and watch what's going on. I can see a continuously growing number of threads being created, all of which
are in the following state :


"Thread-735" - Thread t...@752
  java.lang.Thread.State: TIMED_WAITING
       at java.lang.Thread.sleep(Native Method)
       at
org.apache.nutch.searcher.FetchedSegments $SegmentUpdater.run(FetchedSegm
ents.java:115)

  Locked ownable synchronizers:
       - None

And I can see my heap growing to the maximum size. As far as I am aware, this should not be happening - the call to nutchBean.close() should stop
everything. This looks very much like the issue being reported at
NUTCH-738, but I have this patch installed, along with the related
NUTCH-746. Can anyone else confirm this ?

Thanks,

-Mark

---TestNutch.java class below this line

import java.io.*;

import org.apache.nutch.searcher.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.*;
import org.apache.nutch.util.*;

public class TestNutch {

       public static void main(String[] args) {

               String searchTerm = "test";
               String configFile =
"/var/lib/nutch/conf/nutch-site.xml";

               while (true) {
                       try {

                               Configuration nutchConf =
NutchConfiguration.create();
Path configPath = new Path(configFile);
                               nutchConf.addResource(configPath);
                               NutchBean nutchBean = new
NutchBean(nutchConf);

                               Query nutchQuery =
Query.parse(searchTerm, nutchConf);
                               Hits nutchHits =
nutchBean.search(nutchQuery, 100);

                               System.out.println("Searched for " +
searchTerm + ", got "
                                               +
nutchHits.getLength());

                               nutchBean.close();

                       } catch (IOException e) {
                               System.out.println(e.toString());
                       }

               }

       }

}

---End of TestNutch.java




-----Original Message-----
From: Mark Round [mailto:[email protected]]
Sent: 20 August 2009 11:23
To: [email protected]
Subject: Possible memory leak in Nutch-1.0 ?

Hi all,

I am experiencing serious out of memory errors when querying Nutch, and would appreciate any pointers or advice. I have a Nutch index that I'm
searching using a simple servlet. This servlet queries the index and
returns the results as XML, so other systems in my network can make use
of the index as a web service.

In a nutshell, the problem seems to be that after successive queries to this servlet, the Tenured Gen increases until I run out of heap space.

I am running Nutch-1.0, with the NUTCH-738 and NUTCH-746 patches applied
(more about that below), Tomcat 6.0.20 and Sun's JVM, 1.6.0_12-b04 on
Debian Lenny 32-bit. I have also tested with OpenJDK, and got the same
results.

My servlet just does the following :

Configuration nutchConf = NutchConfiguration.create();
Path configPath = new Path(NUTCH_DIR + "/conf/" + site+
"/nutch-site.xml");
nutchConf.addResource(configPath);
NutchBean nutchBean = new NutchBean(nutchConf);
Query nutchQuery = Query.parse(nutchSearchString, nutchConf);
Hits nutchHits = nutchBean.search(nutchQuery, maxResults);
...
... Format the results as XML and output them
...
nutchBean.close();

After querying it a few hundred times, my Tenured Gen is up to 50Mb,
after a few thousand requests, I end up with over 500Mb used. I can of course increase my heap size, but the problem is that no matter what I set it to, eventually it will all get consumed and the only option is to
restart Tomcat.

I have obtained a heap dump and run it through jhat, but to be honest
I'm not really sure what I'm looking for. I've made the dump available
at http://www.markround.com/static/tomcat.hprof, in case that helps
anyone investigate further.

For what it's worth, I didn't seem to get this issue with Nutch-0.9.

Regarding the two patches I have applied - I had to make use of them as
otherwise, I get a lot of threads in the TIMED_WAITING state, which
according to Lambda Probe are stuck here :

java.lang.Thread.sleep ( native code )
org.apache.nutch.searcher.FetchedSegments$SegmentUpdater.run (
FetchedSegments.java:115 )

With the 2 patches applied, I still get lots of these "stuck" threads,
but they do seem to eventually get cleaned up; I wonder if this could
have anything to do with the problem ?

Please let me know if there are any other diagnostics I can run, or
information I can provide.

Many thanks,

-Mark










Reply via email to