[ 
https://issues.apache.org/jira/browse/NUTCH-734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved NUTCH-734.
----------------------------------------

    Resolution: Won't Fix

This is simply not required and dated. Plus I assume by referring to "a", we 
mean stop words. These are filtered during the IR process in (all?) modern 
indexing servers. 
                
> option to filter "a" tag text
> -----------------------------
>
>                 Key: NUTCH-734
>                 URL: https://issues.apache.org/jira/browse/NUTCH-734
>             Project: Nutch
>          Issue Type: New Feature
>    Affects Versions: 1.0.0
>            Reporter: ron
>
> Motivation:
> When fetching pages with "menue links" the menues (for example search) appear 
> on all pages of the site. Searching for the word "search" then returns all 
> pages of the site, instead of just returning the the search page.
> Change request:
> Add options to filter texts of "a" tags, or more generally add filters to 
> avoid texts within specific tags.
> I have worked around this by changing DOMContentUtils.getTextHelper : 
>      if (nodeType == Node.TEXT_NODE && !(currentNode.getParentNode() != null 
> && "a".equalsIgnoreCase(currentNode.getParentNode().getNodeName()))) 
> - Ron

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to