[ 
https://issues.apache.org/jira/browse/NUTCH-515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513040
 ] 

Doğacan Güney commented on NUTCH-515:
-------------------------------------

With more than a hundred config options, and with the way we use hadoop's 
configuration system (not that there is anything wrong with it but we have to 
specify a default value for most cases and we generally specify what is in 
nutch-default.xml as the default value) there are bound to be mistakes 
somewhere no matter how careful one is. I think this is my third "wrong 
configuration option" fix and I wonder how many I am missing.

Perhaps, we can add a ConfParams class that stores parameter names. I mean, if 
you need say, db.outlinks.max.per.page option, you get its key as 
ConfParams.DB_OUTLINKS_MAX_PER_PAGE (So 
conf.getInt(ConfParams.DB_OUTLINKS_MAX_PER_PAGE, 100). Or we can add a 
hierarchy to it: ConfParams.IndexParams.MAX_TOKENS. A class with tens of static 
final strings in it is not the most elegant thing, but IMHO, it is better than 
what we are currently doing.



> Next fetch time is set incorrectly
> ----------------------------------
>
>                 Key: NUTCH-515
>                 URL: https://issues.apache.org/jira/browse/NUTCH-515
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.0.0
>            Reporter: Doğacan Güney
>            Priority: Blocker
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-515.patch
>
>
> After NUTCH-61 , db.default.fetch.interval option is deprecated and 
> superceded by db.fetch.interval.default. However, various parts in nutch 
> still use the old option. Since old option is in days (with default being 30) 
> and new option in seconds (default is ~250000), when nutch fetches a url, its 
> next fetch time is set as ***30 SECONDS*** later. This means that nutch keeps 
> refetching same urls over and over and over and over.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to