[ https://issues.apache.org/jira/browse/NUTCH-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797176#action_12797176 ]
Julien Nioche commented on NUTCH-655: ------------------------------------- good idea. I've made the modification and documented in the javadoc : The URL files contain one URL per line, optionally followed by custom metadata separated by tabs with the metadata key separated from the corresponding value by '='. Note that some metadata keys are reserved : - <i>nutch.score</i> : allows to set a custom score for a specific URL <br> - <i>nutch.fetchInterval</i> : allows to set a custom fetch interval for a specific URL <br> e.g. http://www.nutch.org/ \t nutch.score=10 \t nutch.fetchInterval=2592000 \t userType=open_source > Injecting Crawl metadata > ------------------------ > > Key: NUTCH-655 > URL: https://issues.apache.org/jira/browse/NUTCH-655 > Project: Nutch > Issue Type: Improvement > Components: injector > Reporter: Julien Nioche > Assignee: Julien Nioche > Priority: Minor > Fix For: 1.1 > > Attachments: Injector.patch, NUTCH-655.v2 > > > the patch attached allows to inject metadata into the crawlDB. The input file > has to contain fields separated by tabs, with the URL being on the first > column. The metadata names and values are separated by '='. A input line > might look like this: > http://www.myurl.com \t categ=value1 \t categ2=value2 > This functionality can be useful to store external knowledge and index it > with a custom plugin -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.