Hi It seems to me that Nutch 0.8.x cannot extract the title from an XHTML page, e.g.
http://www.yulup.org/ 2006-12-20 14:22:22,375 INFO fetcher.Fetcher - fetching http://www.yulup.org/ 2006-12-20 14:22:22,684 WARN parse.ParserFactory - ParserFactory:Plugin: org.apache.nutch.parse.text.TextParser mapped to contentType application/xhtml+xml via parse-plugins.xml, but its plugin.xml file does not claim to support contentType: application/xhtml+xml Can anyone confirm this resp. shall I add a bug entry? Thanks Michi -- Michael Wechner Wyona - Open Source Content Management - Apache Lenya http://www.wyona.com http://lenya.apache.org [EMAIL PROTECTED] [EMAIL PROTECTED] +41 44 272 91 61 ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers