Hi all,
I was trying out my mbox parser via the TikaCLI command line tool.
My parser wasn't getting called - rather, the generic text parser was
used.
The problem is that in tika-mimetypes.xml, the application/mbox entry
didn't specify that it was a subtype of text/plain.
So even though the name detector code correctly generated application/
mbox as the type hint due to the .mbox suffix, this was ignored
because it wasn't a subtype of the content-based type that was derived
previously as text/plain.
Easy enough to fix, but in looking through the tika-mimetypes.xml file
I wonder how many other types need similar treatment. For example:
<mime-type type="application/xspf+xml">
<glob pattern="*.xspf"/>
</mime-type>
If I use the TikaCLI with a test.xspf file, the mime-type it derives
is application/xml, not application/xspf+xml as expected.
One partial fix here would be to extend the MimeTypes.forName
method to check for "+xml" at the end, similar to how it checks for
"text/" at the beginning, and auto-set the parent to application/xml.
-- Ken
--------------------------
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-210-6378