Michael Wechner wrote: > Sami Siren wrote: > >> Michael Wechner wrote: >> >> >>> Hi >>> >>> It seems to me that Nutch 0.8.x cannot extract the title from an XHTML >>> page, e.g. >>> >> >> >> Try changing the following in your parse-plugins.xml >> >> <mimeType name="application/xhtml+xml"> >> <plugin id="parse-html" /> >> </mimeType> >> >> This was changed in trunk and it _should_ fix that problem. >> >> > > thanks :-) this seems to work. > > Shall I send a patch for nutch-0.8.x? Or is nutch 0.8.x unmaintained?
I have added a patch https://issues.apache.org/jira/secure/ManageAttachments.jspa?id=12359202 Thanks Michi > > Cheers > > Michi > >> -- >> Sami Siren >> >> >> > > -- Michael Wechner Wyona - Open Source Content Management - Apache Lenya http://www.wyona.com http://lenya.apache.org [EMAIL PROTECTED] [EMAIL PROTECTED] +41 44 272 91 61 ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers