Yes. Let us know if you find otherwise! On Mon, Aug 14, 2023 at 11:16 AM Keith Bennett <[email protected]> wrote:
> Tim, thank you so much for responding. Can I rely on Content-Type to > always be populated by a parse? > > - Keith > > > On Mon, Aug 14, 2023 at 10:09 PM Tim Allison <[email protected]> wrote: > >> Content-Type may be more reliable/specific because for some file types, >> the parser updates the file type during the parse. For example the PDF >> parser updates application/pdf -> application/illustrator (or similar?) if >> the parser determines that the file is a PDF-based Adobe Illustrator file. >> The detector doesn't do a full parse so it will only return >> "application/pdf". >> >> On Wed, Aug 9, 2023 at 9:54 AM Keith Bennett <[email protected]> >> wrote: >> >>> Hello. I am updating Rika (https://github.com/keithrbennett/rika, JRuby >>> wrapper for Tika) to work with current Tika versions and to add a command >>> line executable. >>> >>> I noticed that Rika opens the document's input stream twice; once to >>> call Tika#detect to get its media type, and again to do the parsing. Is >>> this detect call unnecessary? I noticed a Content-Type in the parsed >>> metadata, which has the same value as the value returned by Tika#detect. Is >>> Content-Type at least as reliable as Tika#detect? >>> >>> Thanks for any help on this. Also, if you have any interest in rika, >>> feel free to let me know. It would be great to talk to any current or >>> prospective users of the gem. >>> >>> - Keith >>> >>> >>> >>>
