Re: Extracting title from XHTML pages

2006-12-21 Thread Michael Wechner
Michael Wechner wrote: Sami Siren wrote: Michael Wechner wrote: Hi It seems to me that Nutch 0.8.x cannot extract the title from an XHTML page, e.g. Try changing the following in your parse-plugins.xml mimeType name=application/xhtml+xml plugin id=parse-html /

Re: Extracting title from XHTML pages

2006-12-21 Thread Michael Wechner
Michael Wechner wrote: I have added a patch https://issues.apache.org/jira/secure/ManageAttachments.jspa?id=12359202 sorry, I actually meant https://issues.apache.org/jira/browse/NUTCH-418 Cheers Michi Thanks Michi Cheers Michi -- Sami Siren -- Michael Wechner

Re: Extracting title from XHTML pages

2006-12-20 Thread Sami Siren
Michael Wechner wrote: Hi It seems to me that Nutch 0.8.x cannot extract the title from an XHTML page, e.g. Try changing the following in your parse-plugins.xml mimeType name=application/xhtml+xml plugin id=parse-html / /mimeType This was changed in trunk