[
https://issues.apache.org/jira/browse/NUTCH-515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513040
]
Doğacan Güney commented on NUTCH-515:
-------------------------------------
With more than a hundred config options, and with the way we use hadoop's
configuration system (not that there is anything wrong with it but we have to
specify a default value for most cases and we generally specify what is in
nutch-default.xml as the default value) there are bound to be mistakes
somewhere no matter how careful one is. I think this is my third "wrong
configuration option" fix and I wonder how many I am missing.
Perhaps, we can add a ConfParams class that stores parameter names. I mean, if
you need say, db.outlinks.max.per.page option, you get its key as
ConfParams.DB_OUTLINKS_MAX_PER_PAGE (So
conf.getInt(ConfParams.DB_OUTLINKS_MAX_PER_PAGE, 100). Or we can add a
hierarchy to it: ConfParams.IndexParams.MAX_TOKENS. A class with tens of static
final strings in it is not the most elegant thing, but IMHO, it is better than
what we are currently doing.
> Next fetch time is set incorrectly
> ----------------------------------
>
> Key: NUTCH-515
> URL: https://issues.apache.org/jira/browse/NUTCH-515
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 1.0.0
> Reporter: Doğacan Güney
> Priority: Blocker
> Fix For: 1.0.0
>
> Attachments: NUTCH-515.patch
>
>
> After NUTCH-61 , db.default.fetch.interval option is deprecated and
> superceded by db.fetch.interval.default. However, various parts in nutch
> still use the old option. Since old option is in days (with default being 30)
> and new option in seconds (default is ~250000), when nutch fetches a url, its
> next fetch time is set as ***30 SECONDS*** later. This means that nutch keeps
> refetching same urls over and over and over and over.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers