Ah, that’s good news, will look into that! I’ve only been using the 2.2.1-full official Tika docker image with default config, only added some more Tesseract languages for OCR.
Vennlig hilsen Willy T. Koch [email protected] Mob: +47 480 321 77 Den Tor 10 feb 2022, kl. 22:40, skrev Nick Burch: > On Thu, 10 Feb 2022, Willy T. Koch wrote: > > As for content detection, today the content-type field with mime type is > > returned. What we would need is a mime-type to file extension lookup and > > it seems logical that this was also returned by Tika. > > How are you calling Tika? We already have APIs for this. Just ask the > MimeTypes class (available via TikaConfig.getMimeRepository) about a type, > and it'll return the details including the preferred extension and other > possible well-known extensions > > Nick >
