On 11/8/05, Mike Reynols <[EMAIL PROTECTED]> wrote:
>
> Is there a plugin of some sort that I need in order to take a web site
> (which serves up a collection of xml documents) and crawl it's non html
> files?

Hi Mike,

First of all, which nutch version are you using?
Concerning a xml, there's actually no parse-xml plugin in nutch.
We have currently some discussion with two other nutch developpers to
provide such plugin... but it is still in early stages.

Now when I stripped out all the xml and left just raw text, I recieved the
> following error:

Ok, you renamed your documents... but what is the mime-type returned by your
server?
It seems it is application/xml => there's no parse plugin that handle such
content-type.
Regards

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/

Reply via email to