Zdenko, thank you very much! 1. As far as I understand eng.wordlist is just a plain text file with a single word per line. Am I correct regarding the formal format?
2. Is this file is used *only* to generate synthetic texts to teach Tesseract a new language, or Is this vocabulary *also* used by Tesseract to guess (in case of a doubt) during word recognition? Or are spell checker dictionaries are used for this purpose and not eng.wordlist? Thank you! On Sun, Jun 20, 2021 at 2:04 PM Zdenko Podobny <[email protected]> wrote: > see https://github.com/tesseract-ocr/langdata/tree/master/eng > > Zdenko > > > ne 20. 6. 2021 o 7:33 Sim Tov <[email protected]> napĂsal(a): > >> >> Hello, >> >> it is written in the documentation/Creating Starter Traineddata: >> >> >> https://tesseract-ocr.github.io/tessdoc/tess4/TrainingTesseract-4.00.html#creating-starter-traineddata >> >> that an "optional word list files" can be supplied for the training >> purpose. >> >> 1. what is the proper format for this file? >> 2. is there an example of such a file online? >> 3. can a standard MySpell/HunSpell/etc. dictionary be used for this >> purpose? If yes - what formats are supported? >> >> Thank you in advance! >> ST >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/ffc64b9c-9020-4398-9d17-c15f832d6b38n%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/ffc64b9c-9020-4398-9d17-c15f832d6b38n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- > You received this message because you are subscribed to a topic in the > Google Groups "tesseract-ocr" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/tesseract-ocr/l8jqmKEdqgY/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8y1XkeSz7NwyNpYtO8W%3D5QLny_za-9-w0pMi9poGAeE3A%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8y1XkeSz7NwyNpYtO8W%3D5QLny_za-9-w0pMi9poGAeE3A%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CA%2BX_a%2BzW01ms9A8GMT%2BaFX%2BPYs0RVeMe_M_-3GRF-7Yin4HAdA%40mail.gmail.com.

