I filed TIKA-298 to capture this issue.

Unfortunately the patch will need to wait until I get back from vacation, I think (so early Nov).

BTW, is there any info on the "ongoing redesign of the mime type registry"? The only Jira issue I see is TIKA-89 (minor renaming).

Thanks,

-- Ken

On Sep 30, 2009, at 2:31am, Jukka Zitting wrote:

Hi,

On Tue, Sep 29, 2009 at 12:08 AM, Ken Krugler
<kkrugler_li...@transpac.com> wrote:
Just for grins, I set up for types with names ending in +xml to
automatically get application/xml as the parent mimetype.

But when I used TikaCLI to process a test.xspf file, no content was
generated.

The issue is that CompositeParser.getParser() doesn't use supertypes when falling back - if it can't get a parser for the exact mimetype, then it goes
straight to the fallback parser.

It seems like it should try to use the mimetype hierarchy. If so, I can file
an issue and a patch.

Correct, that would be great.

Note that both the MimeType.getSuperType()  method already does some
of this and we have related supertype settings stored in the
tika-mimetypes.xml configuration. The type registry could also be told
about the +xml convention and related implicit supertype settings like
the ones encoded in the MediaType.isSpecializationOf() method.

(Note that we currently have both MimeType and MediaType classes for
similar purposes. This is due to an ongoing redesign of the mime type
registry. For now it's probably best to work on the MimeType class
until the redesign is more complete.)

BR,

Jukka Zitting

--------------------------
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-210-6378

Reply via email to