A possible future improvement is to use ImageMagick, if available on PATH,
to convert formats not supported by java to PNG before OCR. We do this for
heic, psd, svg, emf, wmf, webp (also supported by twelvemonkeys) and a few
others.

Luis

Em dom, 29 de out de 2023 17:40, Tilman Hausherr <[email protected]>
escreveu:

> On 29.10.2023 18:00, Tyler Salwierz wrote:
>
> Is Apple using their own custom ocr scanner then because spotlight does
> ocr locally on Heic.
>
> if they can display HEIC, then they can also convert it.
>
> They have an OCR:
>
> https://developer.apple.com/documentation/vision/recognizing_text_in_images
> Tilman
>
>
>
> On Oct 29, 2023, at 9:12 AM, Tilman Hausherr <[email protected]>
> <[email protected]> wrote:
>
> 
> On 29.10.2023 14:16, Tyler Salwierz wrote:
>
> I’m using fscrawler which uses Tika and it’s not generating OCR on heic
> images. The actual image metadata is indexed but the content is empty.
>
> Is there any fix for this if it is a Tika bug?
>
> https://pastebin.com/raw/Jp5kBi5M
>
> Heic isn't supported by tesseract, thus it isn't a bug.
>
> https://github.com/tesseract-ocr/tesseract/issues/2930
>
> https://tesseract-ocr.github.io/tessdoc/InputFormats.html
>
> Tilman
>
>
>
>

Reply via email to