Content-Type may be more reliable/specific because for some file types, the
parser updates the file type during the parse.  For example the PDF parser
updates application/pdf -> application/illustrator (or similar?) if the
parser determines that the file is a PDF-based Adobe Illustrator file.  The
detector doesn't do a full parse so it will only return "application/pdf".

On Wed, Aug 9, 2023 at 9:54 AM Keith Bennett <[email protected]>
wrote:

> Hello. I am updating Rika (https://github.com/keithrbennett/rika, JRuby
> wrapper for Tika) to work with current Tika versions and to add a command
> line executable.
>
> I noticed that Rika opens the document's input stream twice; once to call
> Tika#detect to get its media type, and again to do the parsing. Is this
> detect call unnecessary? I noticed a Content-Type in the parsed metadata,
> which has the same value as the value returned by Tika#detect. Is
> Content-Type at least as reliable as Tika#detect?
>
> Thanks for any help on this. Also, if you have any interest in rika, feel
> free to let me know. It would be great to talk to any current or
> prospective users of the gem.
>
> - Keith
>
>
>
>

Reply via email to