OOM error during parsing with nekohtml

2007-07-16 Thread Shailendra Mudgal
Hi All, We are getting an OOM Exception during the processing of http://www.fotofinity.com/cgi-bin/homepages.cgi . We have also applied Nutch-497 patch to our source code. But actually the error is coming during the parse method. Does anybody has any idea regarding this. Here is the complete

RE: OOM error during parsing with nekohtml

2007-07-16 Thread Tsengtan A Shuy
I successfully run the whole-web crawl with the my new ubuntu OS, and I am ready to fix the bug. I need someone to guide me to get the most updated source code and the bug assignment. Thank you in advance!! Adam Shuy, President ePacific Web Design Hosting Professional Web/Software developer

[jira] Created: (NUTCH-515) Next fetch time is set incorrectly

2007-07-16 Thread JIRA
Next fetch time is set incorrectly -- Key: NUTCH-515 URL: https://issues.apache.org/jira/browse/NUTCH-515 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: 1.0.0

[jira] Commented: (NUTCH-439) Top Level Domains Indexing / Scoring

2007-07-16 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512930 ] Doğacan Güney commented on NUTCH-439: - A big +1 from me. Though, it may be useful to break this patch into

Re: OOM error during parsing with nekohtml

2007-07-16 Thread Kai_testing Middleton
You could try looking at these two discussions: http://www.mail-archive.com/nutch-dev@lucene.apache.org/msg06571.html http://www.mail-archive.com/nutch-dev@lucene.apache.org/msg06571.html --Kai - Original Message From: Tsengtan A Shuy [EMAIL PROTECTED] To: nutch-dev@lucene.apache.org;

[jira] Commented: (NUTCH-515) Next fetch time is set incorrectly

2007-07-16 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513019 ] Andrzej Bialecki commented on NUTCH-515: - +1 - sorry for the mess up ... Next fetch time is set

[jira] Commented: (NUTCH-515) Next fetch time is set incorrectly

2007-07-16 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513040 ] Doğacan Güney commented on NUTCH-515: - With more than a hundred config options, and with the way we use hadoop's

[jira] Commented: (NUTCH-506) Nutch should delegate compression to Hadoop

2007-07-16 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513044 ] Doğacan Güney commented on NUTCH-506: - If there are no objections, I am going to commit this one. Just to get

Re: OOM error during parsing with nekohtml

2007-07-16 Thread Shailendra Mudgal
Hi all, Thanks for your suggestions. I am running parse on a single url ( http://www.fotofinity.com/cgi-bin/homepages.cgi). For other urls, parse works perfectly. we are getting this error because of the html of the page. The page contains many anchor tags which are not closed properly. Hence