I'm having the same problem on a Xeon machine with a gig of RAM running
Debian and Sun's Java. I'm using the intranet method with about 10,000
sites. After 12-24 hours of running, it really slows down (can be several
minutes between fetches). It finally dies with with
SEVERE error writing output:java.lang.OutOfMemoryError
Thus, the next thing I'm trying is following what is said on the wiki:
http://www.nutch.org/cgi-bin/twiki/view/Main/ErrorMessages
(while none of those address fetching, increasing memory size seems
blinding obvious).
Also, if anybody wants to try my data and settings, I'd be more than happy
to supply them.
- Bill
raplph said:
> Hi Matthias,
>
> I downloaded the latest available nightly build (nutch-2004-11-10.tar.gz ).
> I followed the same steps before and I am still having the same problem.
>
> I did grep on the craw log for the status print out, and I noticed that
> the initial download rate falls from ~40 pages/second to ~5
> pages/second and then the
> crawl just hangs. As before hitting control-c in the gnu screen window
> starts the crawl
> again at around ~20 pages/second, but again it falls quickly ~5
> pages/second and the
> crawl hangs.
>
> I have read on the mailing list that people have been able to run
> crawls for at least 5
> days straight. I would like to be able to do this.
> In the meantime, is it reasonable for me to touch fetcher.done in all
> the segments
> and restart the crawling script?
>
> Any suggestions would be helpful.
>
> thanks,
> ralph
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by:
> Sybase ASE Linux Express Edition - download now for FREE
> LinuxWorld Reader's Choice Award Winner for best database on Linux.
> http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click
> _______________________________________________
> Nutch-developers mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/nutch-developers
--
*------------------------------------------------------*
| Bill Goffe [EMAIL PROTECTED] |
| Department of Economics voice: (315) 312-3444 |
| SUNY Oswego fax: (315) 312-5444 |
| 416 Mahar Hall <wuecon.wustl.edu/~goffe> |
| Oswego, NY 13126 |
*--------*------------------------------------------------------*-----------*
| "If I had tried to demand fees ... there would be no World Wide Web. |
| There would be lots of small webs." |
| -- Tim Berners-Lee, the inventor of the World Wide Web, on why he didn't |
| want any fees charged for the web. "Free Was the Key, Says Web |
| Founder, CNN, June 16, 2004. |
*---------------------------------------------------------------------------*
-------------------------------------------------------
This SF.Net email is sponsored by:
Sybase ASE Linux Express Edition - download now for FREE
LinuxWorld Reader's Choice Award Winner for best database on Linux.
http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers