Hi all,

I would like to announce pytesstrain, a collection of Tesseract training 
tools, as well as the underlying library. The tools were created while 
training Tesseract to recognise Akkadian language (stay tuned for more 
posts!), to solve the problems that emerged in the process.

You can install it with pip install pytesstrain.

The PyPI page for the package is https://pypi.org/project/pytesstrain/. The 
GitHub project page is https://github.com/wincentbalin/pytesstrain.

This package contains the tools to create dictionary data (wordlist, bi- 
and unigram lists, etc.), rewrap lines in text files to the specified 
length, collect most frequent recognition errors and dump them into 
unicharambigs file, and to perform recognition metrics (WER and CER). It 
also contains the run_test() function, which creates an image file from the 
given string and performs OCR on it afterwards, as well as its parallelised 
version, run_tests(), which can be used in future tools.

Feedback, suggestions, etc would be most welcome.

Yours truly,

Wincent

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a8162fc0-edb2-4b7d-93b8-f2bb99612f0b%40googlegroups.com.

Reply via email to