Hello everybody! Yesterday, I tried to run a crawl at depth 5 and topN 120000. In the middle of the 5th depth I got this error:
2014-03-19 19:16:11,608 WARN fetcher.FetcherJob - fetch of http://www.weather.com/outlook/health/allergies/common/allergens/FL-allergen-716 failed with: java.lang.OutOfMemoryError: Java heap space 2014-03-19 19:16:11,608 INFO fetcher.FetcherJob - fetching http://www.weather.com/outlook/health/allergies/pollenalert/USCA9000 (queue crawl delay=0ms) 2014-03-19 19:16:22,291 ERROR http.Http - Failed with the following error: java.lang.OutOfMemoryError: Java heap space 2014-03-19 19:16:24,677 INFO fetcher.FetcherJob - fetching http://www.weather.com/outlook/recreation/outdoors/fishing/29547:21 (queue crawl delay=0ms) 2014-03-19 19:16:24,677 WARN fetcher.FetcherJob - fetch of http://www.weather.com/outlook/health/allergies/pollenalert/USCA9000 failed with: java.lang.OutOfMemoryError: Java heap space 2014-03-19 19:16:33,550 ERROR http.Http - Failed with the following error: java.lang.OutOfMemoryError: Java heap space 2014-03-19 19:16:35,568 INFO fetcher.FetcherJob - fetching http://www.weather.com/outlook/health/allergies/common/allergens/NV-allergen-1187 (queue crawl delay=0ms) 2014-03-19 19:16:35,568 WARN fetcher.FetcherJob - fetch of http://www.weather.com/outlook/recreation/outdoors/fishing/29547:21 failed with: java.lang.OutOfMemoryError: Java heap space 2014-03-19 19:16:41,928 ERROR http.Http - Failed with the following error: java.lang.OutOfMemoryError: Java heap space 2014-03-19 19:16:43,535 INFO fetcher.FetcherJob - fetching http://www.weather.com/outlook/health/allergies/common/allergens/OH-allergen-928 (queue crawl delay=0ms) 2014-03-19 19:16:43,535 WARN fetcher.FetcherJob - fetch of http://www.weather.com/outlook/health/allergies/common/allergens/NV-allergen-1187 failed with: java.lang.OutOfMemoryError: Java heap space 2014-03-19 19:16:50,432 ERROR http.Http - Failed with the following error: java.lang.OutOfMemoryError: Java heap space 2014-03-19 19:16:50,888 WARN fetcher.FetcherJob - fetch of http://www.weather.com/outlook/health/allergies/common/allergens/OH-allergen-928 failed with: java.lang.OutOfMemoryError: Java heap space 2014-03-19 19:16:51,580 INFO fetcher.FetcherJob - fetching http://www.weather.com/outlook/health/allergies/common/allergens/FL-allergen-235 (queue crawl delay=0ms) 2014-03-19 19:16:53,120 ERROR http.Http - Failed with the following error: 2014-03-19 19:16:53,711 INFO fetcher.FetcherJob - fetching http://www.weather.com/outlook/recreation/outdoors/fishing/27891:21 (queue crawl delay=0ms) 2014-03-19 19:16:54,659 INFO fetcher.FetcherJob - -finishing thread FetcherThread20, activeThreads=46 2014-03-19 19:17:06,734 INFO fetcher.FetcherJob - -finishing thread FetcherThread48, activeThreads=44 2014-03-19 19:17:08,348 ERROR http.Http - Failed with the following error: java.lang.OutOfMemoryError: Java heap space As you can see, I have problems with the Java heap space. I ran this crawl using Nutch 2.2.1, Eclipse and MySQL. Any ideas on how to solve this thing? Recently, I changed metadata field from blob to longblob and put http.content.limit to -1 (None of them caused any trouble so far though).

