I am new to tesseract and using it through Tess4J. I am trying to OCR faxes where pages are represented as TIFF (CCITT T.6) images - 2509 x 3530 @ 300 dpi (1 bit - i.e. BW).
I have two set of questions *Speed* On an intel i7-4800 MQ @ 2.7GHz I am getting approximately 6 PPM using 1 thread. I was looking for suggestions on how to speed up page processing. I use parallelStream to process each page in a separate thread, *Training* I am trying to learn about training Tesseract for improved accuracy. Given that the fonts / box files used to generate eng.traindata are not available can one specify the fonts used for english? Also, is there a description of the various training artifacts ? I used "combine_tessdata -u" to unpack eng.traindata and "dawg2wordlist" to extract thee wordlist, however was looking for documentation to better understand the various training artifacts. Thanks - viraf -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2c982172-9eb4-4e0c-b65a-74b6c3c2064b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

