[ https://issues.apache.org/jira/browse/NUTCH-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris A. Mattmann updated NUTCH-750: ------------------------------------ Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 > HtmlParser plugin - page title extraction > ----------------------------------------- > > Key: NUTCH-750 > URL: https://issues.apache.org/jira/browse/NUTCH-750 > Project: Nutch > Issue Type: Improvement > Components: parser > Affects Versions: 1.0.0 > Reporter: Alexey Torochkov > Priority: Minor > Attachments: SkipBody.patch > > > A little improvement to trying to extract <title> tag in body if it doesn't > exist in head. > In current version DOMContentUtils just skip all after <body> in getTitle() > method. > Attached patch allows to change this behavior (for default it doesn't change > anything) and can cope with webmasters mistakes -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.