Sorry, actually webp is supported by tesseract directly using libwebp
(which caused a recent RCE vulnerability in a lot of tools, including
tesseract itself...)

Em sáb, 4 de nov de 2023 00:23, Luís Filipe Nassif <[email protected]>
escreveu:

> A possible future improvement is to use ImageMagick, if available on PATH,
> to convert formats not supported by java to PNG before OCR. We do this for
> heic, psd, svg, emf, wmf, webp (also supported by twelvemonkeys) and a few
> others.
>
> Luis
>
> Em dom, 29 de out de 2023 17:40, Tilman Hausherr <[email protected]>
> escreveu:
>
>> On 29.10.2023 18:00, Tyler Salwierz wrote:
>>
>> Is Apple using their own custom ocr scanner then because spotlight does
>> ocr locally on Heic.
>>
>> if they can display HEIC, then they can also convert it.
>>
>> They have an OCR:
>>
>>
>> https://developer.apple.com/documentation/vision/recognizing_text_in_images
>> Tilman
>>
>>
>>
>> On Oct 29, 2023, at 9:12 AM, Tilman Hausherr <[email protected]>
>> <[email protected]> wrote:
>>
>> 
>> On 29.10.2023 14:16, Tyler Salwierz wrote:
>>
>> I’m using fscrawler which uses Tika and it’s not generating OCR on heic
>> images. The actual image metadata is indexed but the content is empty.
>>
>> Is there any fix for this if it is a Tika bug?
>>
>> https://pastebin.com/raw/Jp5kBi5M
>>
>> Heic isn't supported by tesseract, thus it isn't a bug.
>>
>> https://github.com/tesseract-ocr/tesseract/issues/2930
>>
>> https://tesseract-ocr.github.io/tessdoc/InputFormats.html
>>
>> Tilman
>>
>>
>>
>>

Reply via email to