As a workaround, you could try putting something like the following in your conf/nutch-site.xml, to disable the pdf parser:
<property> <name>plugin.excludes</name> <value>parse-pdf</value> </property>
Doug
Jason Boss wrote:
Ok...back to .05 (or the .5 outside of the nutch-nightly directory) with about 1.3 million pages.
Running Tomcat 1.4
Trying to search "http" I am getting this: java.lang.OutOfMemoryError
[EMAIL PROTECTED] nutch-nightly]# free -m total used free shared buffers cached Mem: 2015 191 1824 0 6 82 -/+ buffers/cache: 103 1912 Swap: 1019 0 1019 [EMAIL PROTECTED] nutch-nightly]#
What am I doing wrong?
J
------------------------------------------------------- This SF.Net email is sponsored by BEA Weblogic Workshop FREE Java Enterprise J2EE developer tools! Get your free copy of BEA WebLogic Workshop 8.1 today. http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click _______________________________________________ Nutch-general mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-general
------------------------------------------------------- This SF.Net email is sponsored by BEA Weblogic Workshop FREE Java Enterprise J2EE developer tools! Get your free copy of BEA WebLogic Workshop 8.1 today. http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click _______________________________________________ Nutch-general mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-general
