I've heard reports that the pdf parser can sometimes blow up in this way. Was a pdf document one of the last fetched? If so, please send the url and we can construct a reproducible test case.

As a workaround, you could try putting something like the following in your conf/nutch-site.xml, to disable the pdf parser:

<property>
  <name>plugin.excludes</name>
  <value>parse-pdf</value>
</property>

Doug

Jason Boss wrote:
Ok...back to .05 (or the .5 outside of the nutch-nightly directory) with
about 1.3 million pages.

Running Tomcat 1.4

Trying to search "http" I am getting this:  java.lang.OutOfMemoryError

[EMAIL PROTECTED] nutch-nightly]# free -m
             total       used       free     shared    buffers     cached
Mem:          2015        191       1824          0          6         82
-/+ buffers/cache:        103       1912
Swap:         1019          0       1019
[EMAIL PROTECTED] nutch-nightly]#

What am I doing wrong?

J



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
_______________________________________________
Nutch-general mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-general


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
_______________________________________________
Nutch-general mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to