> One part of fixing this problem is correct mime type identification for
> document types, which I know that Jerome is working on an update to, and
> will soon have a new mime type registry committed to Nutch.

The futur Mime Type Registry will be compatible with the FreeDesktop Shared
Mime Info specification.
http://standards.freedesktop.org/shared-mime-info-spec/shared-mime-info-spec-0.13.html
As you can notice, this specification provides some XML recognition
mechanism with a *root-XML* elements that provides a way to identify the
precise mime-type of a XML document based on its nameSpaceURI or/and its
localName.
This part of the specification is not yet implemented (but planned), so
that, in a near futur (I hope!!) the Mime Type Registry will be able to
solve your use case.

Regards

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/

Reply via email to