Sorry, actually webp is supported by tesseract directly using libwebp (which caused a recent RCE vulnerability in a lot of tools, including tesseract itself...)
Em sáb, 4 de nov de 2023 00:23, Luís Filipe Nassif <[email protected]> escreveu: > A possible future improvement is to use ImageMagick, if available on PATH, > to convert formats not supported by java to PNG before OCR. We do this for > heic, psd, svg, emf, wmf, webp (also supported by twelvemonkeys) and a few > others. > > Luis > > Em dom, 29 de out de 2023 17:40, Tilman Hausherr <[email protected]> > escreveu: > >> On 29.10.2023 18:00, Tyler Salwierz wrote: >> >> Is Apple using their own custom ocr scanner then because spotlight does >> ocr locally on Heic. >> >> if they can display HEIC, then they can also convert it. >> >> They have an OCR: >> >> >> https://developer.apple.com/documentation/vision/recognizing_text_in_images >> Tilman >> >> >> >> On Oct 29, 2023, at 9:12 AM, Tilman Hausherr <[email protected]> >> <[email protected]> wrote: >> >> >> On 29.10.2023 14:16, Tyler Salwierz wrote: >> >> I’m using fscrawler which uses Tika and it’s not generating OCR on heic >> images. The actual image metadata is indexed but the content is empty. >> >> Is there any fix for this if it is a Tika bug? >> >> https://pastebin.com/raw/Jp5kBi5M >> >> Heic isn't supported by tesseract, thus it isn't a bug. >> >> https://github.com/tesseract-ocr/tesseract/issues/2930 >> >> https://tesseract-ocr.github.io/tessdoc/InputFormats.html >> >> Tilman >> >> >> >>
