[jira] [Assigned] (NUTCH-1403) Add default ScoringFilter for manipulating metadata

2021-02-01 Thread Sebastian Nagel (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel reassigned NUTCH-1403:
--

Assignee: (was: Sebastian Nagel)

> Add default ScoringFilter for manipulating metadata 
> 
>
> Key: NUTCH-1403
> URL: https://issues.apache.org/jira/browse/NUTCH-1403
> Project: Nutch
>  Issue Type: Improvement
>Reporter: Julien Nioche
>Priority: Major
> Fix For: 1.19
>
>
> This is currently done by the urlmeta plugin, which has too vague a name and 
> a redundant indexing filter now that we have the index-metadata plugin. This 
> scoring filter would help defining which metadata to pass from : 
> - the crawl metadata to the content metadata
> - the content metadata to the parse metadata
> - the parse metadata to the crawldatum for the outlinks
> I'd make this scoring filter available by default i.e. not in a separate 
> plugin as its functionalities are commonly used.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (NUTCH-1403) Add default ScoringFilter for manipulating metadata

2019-09-27 Thread Sebastian Nagel (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel reassigned NUTCH-1403:
--

Assignee: Sebastian Nagel

> Add default ScoringFilter for manipulating metadata 
> 
>
> Key: NUTCH-1403
> URL: https://issues.apache.org/jira/browse/NUTCH-1403
> Project: Nutch
>  Issue Type: Improvement
>Reporter: Julien Nioche
>Assignee: Sebastian Nagel
>Priority: Major
> Fix For: 1.17
>
>
> This is currently done by the urlmeta plugin, which has too vague a name and 
> a redundant indexing filter now that we have the index-metadata plugin. This 
> scoring filter would help defining which metadata to pass from : 
> - the crawl metadata to the content metadata
> - the content metadata to the parse metadata
> - the parse metadata to the crawldatum for the outlinks
> I'd make this scoring filter available by default i.e. not in a separate 
> plugin as its functionalities are commonly used.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)