Michael Wechner wrote:
Sami Siren wrote:
Michael Wechner wrote:
Hi
It seems to me that Nutch 0.8.x cannot extract the title from an XHTML
page, e.g.
Try changing the following in your parse-plugins.xml
mimeType name=application/xhtml+xml
plugin id=parse-html /
Michael Wechner wrote:
I have added a patch
https://issues.apache.org/jira/secure/ManageAttachments.jspa?id=12359202
sorry, I actually meant
https://issues.apache.org/jira/browse/NUTCH-418
Cheers
Michi
Thanks
Michi
Cheers
Michi
--
Sami Siren
--
Michael Wechner
Michael Wechner wrote:
Hi
It seems to me that Nutch 0.8.x cannot extract the title from an XHTML
page, e.g.
Try changing the following in your parse-plugins.xml
mimeType name=application/xhtml+xml
plugin id=parse-html /
/mimeType
This was changed in trunk