Thank you for the info. The OOM exception in your previous email indicates that your system is running out of heap memory. You either have instantiated too many objects, or there are memory leaks in the source codes.
Hope this will help you! Cheer!! Adam Shuy, President ePacific Web Design & Hosting Professional Web/Software developer TEL: 408-272-6946 www.epacificweb.com -----Original Message----- From: Kai_testing Middleton [mailto:[EMAIL PROTECTED] Sent: Monday, July 16, 2007 8:43 AM To: [EMAIL PROTECTED] Subject: Re: OOM error during parsing with nekohtml You could try looking at these two discussions: http://www.mail-archive.com/[EMAIL PROTECTED]/msg06571.html http://www.mail-archive.com/[EMAIL PROTECTED]/msg06571.html --Kai ----- Original Message ---- From: Tsengtan A Shuy <[EMAIL PROTECTED]> To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Monday, July 16, 2007 3:45:59 AM Subject: RE: OOM error during parsing with nekohtml I successfully run the whole-web crawl with the my new ubuntu OS, and I am ready to fix the bug. I need someone to guide me to get the most updated source code and the bug assignment. Thank you in advance!! Adam Shuy, President ePacific Web Design & Hosting Professional Web/Software developer TEL: 408-272-6946 www.epacificweb.com -----Original Message----- From: Shailendra Mudgal [mailto:[EMAIL PROTECTED] Sent: Monday, July 16, 2007 3:05 AM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: OOM error during parsing with nekohtml Hi All, We are getting an OOM Exception during the processing of http://www.fotofinity.com/cgi-bin/homepages.cgi . We have also applied Nutch-497 patch to our source code. But actually the error is coming during the parse method. Does anybody has any idea regarding this. Here is the complete stacktrace : java.lang.OutOfMemoryError: Java heap space at java.lang.String.toUpperCase(String.java:2637) at java.lang.String.toUpperCase(String.java:2660) at org.cyberneko.html.filters.NamespaceBinder.bindNamespaces(NamespaceBinder.ja va:443) at org.cyberneko.html.filters.NamespaceBinder.startElement(NamespaceBinder.java :252) at org.cyberneko.html.HTMLTagBalancer.callStartElement(HTMLTagBalancer.java:100 9) at org.cyberneko.html.HTMLTagBalancer.startElement(HTMLTagBalancer.java:639) at org.cyberneko.html.HTMLTagBalancer.startElement(HTMLTagBalancer.java:646) at org.cyberneko.html.HTMLScanner$ContentScanner.scanStartElement(HTMLScanner.j ava:2343) at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1820) at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:789) at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478) at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431) at org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:16 4) at org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:265) at org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:229) at org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:168) at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:84) at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:75) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445) Regards, Shailendra ____________________________________________________________________________ ________ Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games. http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
