Hi Shree, I am glad you find the package already useful :-) .
As to your question: I did not use the ocr-evaluation tools, only the language_metrics utility. So, regrettably, I cannot help you here. But maybe you could try the same utility too? By the way, I added a create_ground_truth utility, which creates .gt.txt files as well as the associated .tif files for every specified font, to the package. I think it could be useful for anyone who does not have a ground truth collection yet. Kind regards, Wincent Am Mittwoch, 29. Januar 2020 06:47:01 UTC+1 schrieb shree: > > Hi Wincent, > > Thank you for sharing these tools. I find create-dictdata to be very > useful. > > I wanted to know if you have modified any ocr-evaluation tools to handle > the high unicode range such as for Akkadian language. > > I was trying to test regarding Modi script (*Range*: U+11600..U+1165F; > (96 code points)) and found that `ocrevalutf8 accuracy` does not work > well for it. Any suggestions ... > > Shree > > On Sunday, January 5, 2020 at 2:22:50 AM UTC+5:30, Wincent Balin wrote: >> >> Hi all, >> >> I would like to announce pytesstrain, a collection of Tesseract training >> tools, as well as the underlying library. The tools were created while >> training Tesseract to recognise Akkadian language (stay tuned for more >> posts!), to solve the problems that emerged in the process. >> >> You can install it with pip install pytesstrain. >> >> The PyPI page for the package is https://pypi.org/project/pytesstrain/. >> The GitHub project page is https://github.com/wincentbalin/pytesstrain. >> >> This package contains the tools to create dictionary data (wordlist, bi- >> and unigram lists, etc.), rewrap lines in text files to the specified >> length, collect most frequent recognition errors and dump them into >> unicharambigs file, and to perform recognition metrics (WER and CER). It >> also contains the run_test() function, which creates an image file from >> the given string and performs OCR on it afterwards, as well as its >> parallelised version, run_tests(), which can be used in future tools. >> >> Feedback, suggestions, etc would be most welcome. >> >> Yours truly, >> >> Wincent >> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3df5801b-7119-4451-9bb5-5fabc3e66bb1%40googlegroups.com.

