I found the parameters
"C:\Program Files\Tesseract-OCR\tesseract.exe" "..\Lambregts0001 - 
cleaned.jpg" "Lambregts0001 - cleaned.txt" -c 
tessedit_char_whitelist="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
 
:@."
It is not working. "uw BTW nummer:: NLOO7900000B01"

Any other ideas ?

Op donderdag 21 september 2023 om 22:25:12 UTC+2 schreef elvi...@gmail.com:

> White list the digits so that the O will not confuse it. 
> You can also try --psm 13 if all of your texts are single line.
>
> On Thu, Sep 21, 2023, 4:07 PM A Nederpelt <powe...@gmail.com> wrote:
>
>> Hi.
>> I am trying to use the tesseract engine instead of the nuance engine.
>> When i currently use tesseract.exe the image it returns a few strange 
>> characters.
>> 2x OO instead of 00
>>   "uw BTW nummer:: NLOO7900000B01"
>> instead of
>>   "uw BTW nummer:: NL007900000B01"
>> and
>> "Tel £01"
>> instead of
>> "Tel : 01"
>> but "Tel : 0168-452452" is recognized ok.
>>
>> I see no optimization using 
>> https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md 
>> because it are really clean documents.
>>
>> Am i missing some parameters ? Like a second run, or more accurate run 
>> etc.
>> Maybe compile tesseract.exe myself with different more quality parameters 
>> ?
>>
>> Thanks,
>> Alwin
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/6f5f957e-4f33-419f-aba6-2e8a3f6f8d92n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/6f5f957e-4f33-419f-aba6-2e8a3f6f8d92n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1658873b-4b35-4273-ac1b-629689ee70d1n%40googlegroups.com.

Reply via email to