On Fri, 26 Apr 2024, Mauler, David wrote:
I'm in the process of troubleshooting an issue with certain mp4 video files and tika. After a bunch of digging, it appears to be related to whatever ISO is set for the mp4 file. An mp4 with an ISO of 14496-12:2003 will be detected as video/quicktime but an mp4 with an ISO of 14496-14 is detected as video/mp4 which is what I was expecting for both files.

Depends where in the file the type box lives. At the moment, we only have mime-magic based detection for the Quicktime / MP4 family of formats. If the right box in the container is at the start we're ok, if it comes later we can't tell with just a mime magic signature

What we really need is a container-aware detector for the file format, similar to what we have for Zip files, and for the Ogg family. That would properly process the file in a format-aware way, checking for the contents to correctly identify the type.

The long-standing issue is https://issues.apache.org/jira/browse/TIKA-2935
- do you have a few days of spare coding time you could put towards this, and/or a bit of budget to sponsor someone to?

Thanks
Nick

Reply via email to