[ http://issues.apache.org/jira/browse/NUTCH-311?page=all ]
[EMAIL PROTECTED] updated NUTCH-311:
Attachment: too-many-links.patch
Adds configurable upper bound to link field in CrawlDatum.
> Page with tens of thousands of links OOME'd.
> ---
Page with tens of thousands of links OOME'd.
Key: NUTCH-311
URL: http://issues.apache.org/jira/browse/NUTCH-311
Project: Nutch
Type: Bug
Versions: 0.8-dev
Reporter: [EMAIL PROTECTED]
Priority: Minor
Attac
[
http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12417391 ]
KuroSaka TeruHiko commented on NUTCH-266:
-
I'm sorry for adding many comment. This would be the last for today.
As an experiment, I replaced hadoop-0.2-dev.jar that cam
[
http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12417387 ]
KuroSaka TeruHiko commented on NUTCH-266:
-
Both Eugine's case and my case are failing in the call chain started at line
101 of LocalJobRunner.java,
which reads:
I don't think guards should be added everywhere.
That's right Doug.
It was a rude first pass on logging.
The next pass (finest) will be done with NUTCH-310.
Rather, guards should only be added
in performance critical code, and then only for "Debug"-level output.
"Info" and "Warn" levels are n
[
http://issues.apache.org/jira/browse/NUTCH-303?page=comments#action_12417346 ]
Doug Cutting commented on NUTCH-303:
Jerome: thanks very much for all of your great work improving Nutch's logging!
> logging improvements
>
>
>
Thank you for your reply, Sami.
> >I am not intend to run hadoop at all, so this
> hadoop-site.xlm is empty.
...
> You should at least set values for 'mapred.system.dir' and
'mapred.local.dir'
> and point them to a dir that has enough space available (I think they
> default to under /tmp at leas
[EMAIL PROTECTED] wrote:
NUTCH-309 : Added logging code guards
[ ... ]
+ if (LOG.isWarnEnabled()) {
+LOG.warn("Line does not contain a field name: " + line);
+ }
[ ...]
-1
I don't think guards should be added everywhere. They make the code
bigger and provide l
as far I can see nutch's html parser does only support the meta tag
noindex ( ) but there
is an inoffiziel html tag.
http://www.webmasterworld.com/forum10003/2703.htm
Hello Stefan,
Here is a previous discussion about this :
http://www.mail-archive.com/nutch-user@lucene.apache.org/msg04576.html
Hi,
as far I can see nutch's html parser does only support the meta tag
noindex ( ) but there
is an inoffiziel html tag.
http://www.webmasterworld.com/forum10003/2703.htm
May be this would be another thing to make nutch more polite.
Also please remember my patch to support crawl-delay prope
[ http://issues.apache.org/jira/browse/NUTCH-309?page=all ]
Jerome Charron resolved NUTCH-309:
--
Resolution: Fixed
Logging code guards added.
http://svn.apache.org/viewvc?view=rev&revision=416346
> Uses commons logging Code Guards
> --
I create file on dfs (for example filename "done"). After I try copy this
file from dfs to local filesystem. In result I get this file in local
filesystem and error:
Problem opening checksum file: /user/root/crawl/done. Ignoring with
exception org.apache.hadoop.ipc.RemoteException: jav
a.io.IOExc
Review Log Levels
-
Key: NUTCH-310
URL: http://issues.apache.org/jira/browse/NUTCH-310
Project: Nutch
Type: Improvement
Versions: 0.8-dev
Reporter: Jerome Charron
Assigned to: Jerome Charron
Priority: Minor
Fix For: 0.8-dev
R
Uses commons logging Code Guards
Key: NUTCH-309
URL: http://issues.apache.org/jira/browse/NUTCH-309
Project: Nutch
Type: Improvement
Versions: 0.8-dev
Reporter: Jerome Charron
Assigned to: Jerome Charron
Priority: M
14 matches
Mail list logo