Hello,

We’ve noticed that Tika is incorrectly detecting the file .cr3 as 
video/quicktime, other raw files are detected as image/tiff (including the 
.cr3’s predecessor the .cr2). I’ve uploaded a sample file here 
https://dropfiles.org/j8CS4Snr (that was taken from this review 
https://www.photographyblog.com/reviews/canon_eos_r10_review#google_vignette)

When we add a custom-mimetypes.xml file with a mime-type entry like this:
<mime-type type="image/x-raw-canon">
  <_comment>Canon raw image</_comment>
  <sub-class-of type="image/tiff"/>
  <glob pattern="*.crw"/>
  <glob pattern="*.cr2"/>
  <glob pattern="*.cr3"/>
</mime-type>

The .cr3 file is still identified as video/quicktime but when we add the below 
configuration Tika matches it to something close to what we want:
<mime-type type="image/x-raw-canon3">
  <_comment>Canon raw image</_comment>
  <sub-class-of type="video/quicktime"/>
  <glob pattern="*.cr3"/>
</mime-type>

But this won’t give us our desired output as we’re hoping to group all Canon 
raw images under the same mime-type.

Do you have any ideas how to get this working?

We’re using tika-core 2.7.0 in a Java 8 project.

Thank you,

Richard






Reply via email to