I used nutch0.6 to crawl a million websites. When it
fetched around 2.5 million web pages, it always throws
out of memory exception. I catched the exception, and
tried to print out the stack trace, somehow, it just
print out
Exception in thread "main" java.lang.OutOfMemoryError:
Java heap space
the strange thing here is when I use 0.5 nutch
version. it could fetch more than 7 million web pages.
I don't know what happens there? Can anybody shed
light on this.
Thanks in advance!
Regards,
smith.
__________________________________
Do you Yahoo!?
Yahoo! Mail - You care about security. So do we.
http://promotions.yahoo.com/new_mail