[jira] Updated: (NUTCH-311) Page with tens of thousands of links OOME'd.

2006-06-22 Thread [EMAIL PROTECTED] (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-311?page=all ] [EMAIL PROTECTED] updated NUTCH-311: Attachment: too-many-links.patch Adds configurable upper bound to link field in CrawlDatum. > Page with tens of thousands of links OOME'd. > ---

[jira] Created: (NUTCH-311) Page with tens of thousands of links OOME'd.

2006-06-22 Thread [EMAIL PROTECTED] (JIRA)
Page with tens of thousands of links OOME'd. Key: NUTCH-311 URL: http://issues.apache.org/jira/browse/NUTCH-311 Project: Nutch Type: Bug Versions: 0.8-dev Reporter: [EMAIL PROTECTED] Priority: Minor Attac

[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-06-22 Thread KuroSaka TeruHiko (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12417391 ] KuroSaka TeruHiko commented on NUTCH-266: - I'm sorry for adding many comment. This would be the last for today. As an experiment, I replaced hadoop-0.2-dev.jar that cam

[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-06-22 Thread KuroSaka TeruHiko (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12417387 ] KuroSaka TeruHiko commented on NUTCH-266: - Both Eugine's case and my case are failing in the call chain started at line 101 of LocalJobRunner.java, which reads:

Re: svn commit: r416346 [1/3] - in /lucene/nutch/trunk/src: java/org/apache/nutch/analysis/ java/org/apache/nutch/clustering/ java/org/apache/nutch/crawl/ java/org/apache/nutch/fetcher/ java/org/apach

2006-06-22 Thread Jérôme Charron
I don't think guards should be added everywhere. That's right Doug. It was a rude first pass on logging. The next pass (finest) will be done with NUTCH-310. Rather, guards should only be added in performance critical code, and then only for "Debug"-level output. "Info" and "Warn" levels are n

[jira] Commented: (NUTCH-303) logging improvements

2006-06-22 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-303?page=comments#action_12417346 ] Doug Cutting commented on NUTCH-303: Jerome: thanks very much for all of your great work improving Nutch's logging! > logging improvements > > >

RE: [jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-06-22 Thread Teruhiko Kurosaka
Thank you for your reply, Sami. > >I am not intend to run hadoop at all, so this > hadoop-site.xlm is empty. ... > You should at least set values for 'mapred.system.dir' and 'mapred.local.dir' > and point them to a dir that has enough space available (I think they > default to under /tmp at leas

Re: svn commit: r416346 [1/3] - in /lucene/nutch/trunk/src: java/org/apache/nutch/analysis/ java/org/apache/nutch/clustering/ java/org/apache/nutch/crawl/ java/org/apache/nutch/fetcher/ java/org/apach

2006-06-22 Thread Doug Cutting
[EMAIL PROTECTED] wrote: NUTCH-309 : Added logging code guards [ ... ] + if (LOG.isWarnEnabled()) { +LOG.warn("Line does not contain a field name: " + line); + } [ ...] -1 I don't think guards should be added everywhere. They make the code bigger and provide l

Re: do not index

2006-06-22 Thread Jérôme Charron
as far I can see nutch's html parser does only support the meta tag noindex ( ) but there is an inoffiziel html tag. http://www.webmasterworld.com/forum10003/2703.htm Hello Stefan, Here is a previous discussion about this : http://www.mail-archive.com/nutch-user@lucene.apache.org/msg04576.html

do not index

2006-06-22 Thread Stefan Groschupf
Hi, as far I can see nutch's html parser does only support the meta tag noindex ( ) but there is an inoffiziel html tag. http://www.webmasterworld.com/forum10003/2703.htm May be this would be another thing to make nutch more polite. Also please remember my patch to support crawl-delay prope

[jira] Resolved: (NUTCH-309) Uses commons logging Code Guards

2006-06-22 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-309?page=all ] Jerome Charron resolved NUTCH-309: -- Resolution: Fixed Logging code guards added. http://svn.apache.org/viewvc?view=rev&revision=416346 > Uses commons logging Code Guards > --

Problem opening checksum file

2006-06-22 Thread anton
I create file on dfs (for example filename "done"). After I try copy this file from dfs to local filesystem. In result I get this file in local filesystem and error: Problem opening checksum file: /user/root/crawl/done. Ignoring with exception org.apache.hadoop.ipc.RemoteException: jav a.io.IOExc

[jira] Created: (NUTCH-310) Review Log Levels

2006-06-22 Thread Jerome Charron (JIRA)
Review Log Levels - Key: NUTCH-310 URL: http://issues.apache.org/jira/browse/NUTCH-310 Project: Nutch Type: Improvement Versions: 0.8-dev Reporter: Jerome Charron Assigned to: Jerome Charron Priority: Minor Fix For: 0.8-dev R

[jira] Created: (NUTCH-309) Uses commons logging Code Guards

2006-06-22 Thread Jerome Charron (JIRA)
Uses commons logging Code Guards Key: NUTCH-309 URL: http://issues.apache.org/jira/browse/NUTCH-309 Project: Nutch Type: Improvement Versions: 0.8-dev Reporter: Jerome Charron Assigned to: Jerome Charron Priority: M