If that doesn't help, try forcing the garbage collector to work more often. Everything slows down (unless you are running on multiple CPU's) but it's worth the memory.

Byron Miller wrote:

If you use the default nutch script i would set a
NUTCH_HEAPSIZE of 2000. That generally works for me
and i have over 100 million urls in db and generally
10 million urls per segment/index.

-byron

--- smith learner <[EMAIL PROTECTED]> wrote:


Thanks for your reply. But I guess this solution
doesn't work for me. Actually, I didn't use this
parameter (I removed it from nutch script).


BTW: My RAM is 4G. I use redhat. kernel is
2.4.20-31.9bigmem.

 Have you ever got the out of memory exception when
you used nutch to crawl millions of website?

Regards,

Jack.


--- cao yuzhong <[EMAIL PROTECTED]> wrote:


Changing the JVM parameter -Xmx may help you.



From: smith learner <[EMAIL PROTECTED]>
Reply-To: [email protected]
To: [email protected]
Subject: out of memory exception.
Date: Fri, 22 Apr 2005 12:44:04 -0700 (PDT)

I used nutch0.6 to crawl a million websites. When


it


fetched around 2.5 million web pages, it always


throws


out of memory exception. I catched the exception,


and


tried to print out the stack trace, somehow, it


just


print out

Exception in thread "main"


java.lang.OutOfMemoryError:


Java heap space

the strange thing here is when I use 0.5 nutch
version. it could fetch more than 7 million web


pages.


I don't know what happens there? Can anybody shed
light on this.

Thanks in advance!

Regards,

smith.







__________________________________
Do you Yahoo!?
Yahoo! Mail - You care about security. So do we.
http://promotions.yahoo.com/new_mail







-------------------------------------------------------


SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT
Products from real users.
Discover which products truly live up to the hype.
Start reading now.



http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click


_______________________________________________
Nutch-general mailing list
[email protected]



https://lists.sourceforge.net/lists/listinfo/nutch-general



__________________________________ Do you Yahoo!? Yahoo! Mail - Helps protect you from nasty viruses. http://promotions.yahoo.com/new_mail






-------------------------------------------------------


SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT
Products from real users.
Discover which products truly live up to the hype.
Start reading now.



http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click


_______________________________________________
Nutch-general mailing list
[email protected]



https://lists.sourceforge.net/lists/listinfo/nutch-general



__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com


Reply via email to