Yes.  Let us know if you find otherwise!

On Mon, Aug 14, 2023 at 11:16 AM Keith Bennett <[email protected]>
wrote:

> Tim, thank you so much for responding. Can I rely on Content-Type to
> always be populated by a parse?
>
> - Keith
>
>
> On Mon, Aug 14, 2023 at 10:09 PM Tim Allison <[email protected]> wrote:
>
>> Content-Type may be more reliable/specific because for some file types,
>> the parser updates the file type during the parse.  For example the PDF
>> parser updates application/pdf -> application/illustrator (or similar?) if
>> the parser determines that the file is a PDF-based Adobe Illustrator file.
>> The detector doesn't do a full parse so it will only return
>> "application/pdf".
>>
>> On Wed, Aug 9, 2023 at 9:54 AM Keith Bennett <[email protected]>
>> wrote:
>>
>>> Hello. I am updating Rika (https://github.com/keithrbennett/rika, JRuby
>>> wrapper for Tika) to work with current Tika versions and to add a command
>>> line executable.
>>>
>>> I noticed that Rika opens the document's input stream twice; once to
>>> call Tika#detect to get its media type, and again to do the parsing. Is
>>> this detect call unnecessary? I noticed a Content-Type in the parsed
>>> metadata, which has the same value as the value returned by Tika#detect. Is
>>> Content-Type at least as reliable as Tika#detect?
>>>
>>> Thanks for any help on this. Also, if you have any interest in rika,
>>> feel free to let me know. It would be great to talk to any current or
>>> prospective users of the gem.
>>>
>>> - Keith
>>>
>>>
>>>
>>>

Reply via email to