On 11/8/05, Mike Reynols <[EMAIL PROTECTED]> wrote: > > Is there a plugin of some sort that I need in order to take a web site > (which serves up a collection of xml documents) and crawl it's non html > files?
Hi Mike, First of all, which nutch version are you using? Concerning a xml, there's actually no parse-xml plugin in nutch. We have currently some discussion with two other nutch developpers to provide such plugin... but it is still in early stages. Now when I stripped out all the xml and left just raw text, I recieved the > following error: Ok, you renamed your documents... but what is the mime-type returned by your server? It seems it is application/xml => there's no parse plugin that handle such content-type. Regards Jérôme -- http://motrech.free.fr/ http://www.frutch.org/
