Zdenko, thank you very much!

1. As far as I understand eng.wordlist is just a plain text file with a
single word per line. Am I correct regarding the formal format?

2. Is this file is used *only* to generate synthetic texts to teach
Tesseract a new language,
or
Is this vocabulary *also* used by Tesseract to guess (in case of a doubt)
during word recognition? Or are spell checker dictionaries are used for
this purpose and not eng.wordlist?

Thank you!

On Sun, Jun 20, 2021 at 2:04 PM Zdenko Podobny <[email protected]> wrote:

> see https://github.com/tesseract-ocr/langdata/tree/master/eng
>
> Zdenko
>
>
> ne 20. 6. 2021 o 7:33 Sim Tov <[email protected]> napĂ­sal(a):
>
>>
>> Hello,
>>
>> it is written in the documentation/Creating Starter Traineddata:
>>
>>
>> https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html#creating-starter-traineddata
>>
>> that an "optional word list files" can be supplied for the training
>> purpose.
>>
>> 1. what is the proper format for this file?
>> 2. is there an example of such a file online?
>> 3. can a standard MySpell/HunSpell/etc. dictionary be used for this
>> purpose? If yes - what formats are supported?
>>
>> Thank you in advance!
>> ST
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/ffc64b9c-9020-4398-9d17-c15f832d6b38n%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/ffc64b9c-9020-4398-9d17-c15f832d6b38n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/tesseract-ocr/l8jqmKEdqgY/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8y1XkeSz7NwyNpYtO8W%3D5QLny_za-9-w0pMi9poGAeE3A%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8y1XkeSz7NwyNpYtO8W%3D5QLny_za-9-w0pMi9poGAeE3A%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CA%2BX_a%2BzW01ms9A8GMT%2BaFX%2BPYs0RVeMe_M_-3GRF-7Yin4HAdA%40mail.gmail.com.

Reply via email to