Content-Type may be more reliable/specific because for some file types, the parser updates the file type during the parse. For example the PDF parser updates application/pdf -> application/illustrator (or similar?) if the parser determines that the file is a PDF-based Adobe Illustrator file. The detector doesn't do a full parse so it will only return "application/pdf".
On Wed, Aug 9, 2023 at 9:54 AM Keith Bennett <[email protected]> wrote: > Hello. I am updating Rika (https://github.com/keithrbennett/rika, JRuby > wrapper for Tika) to work with current Tika versions and to add a command > line executable. > > I noticed that Rika opens the document's input stream twice; once to call > Tika#detect to get its media type, and again to do the parsing. Is this > detect call unnecessary? I noticed a Content-Type in the parsed metadata, > which has the same value as the value returned by Tika#detect. Is > Content-Type at least as reliable as Tika#detect? > > Thanks for any help on this. Also, if you have any interest in rika, feel > free to let me know. It would be great to talk to any current or > prospective users of the gem. > > - Keith > > > >
