> I was wondering if OCRopus still uses Tesseract for line recognition? > From what I gather in the release notes for 0.4 (and from what I have > determined from putting print statements in the code to follow the > execution path), OCRopus no longer uses Tesseract, but rather a new > line recognizer created by you guys.
Correct. > If this is the case, could you > provide an overview of the changes required to have it again call > Tesseract? I thought it would be a simple one line change in ocr- > commands.cc by including the tesseract header and in the > main_lines2fsts( ) method changing: > > linerec = glinerec::make_Linerec(); > > to > > linerec = make_TesseractRecognizeLine(); Unfortunately, interfacing with Tesseract isn't easy; that's why we don't have it in the default build anymore. There is a separate subproject for a Tesseract interface called ocrotess here: http://iupr1.cs.uni-kl.de/cgi-bin/hgwebdir.cgi/ocrotess/ > Currently Tesseract is providing us better results for our images than > OCRopus is, but we would like to see the results that OCRopus gives > when it is using Tesseract. It's pointless to carry out performance comparisons between OCRopus and Tesseract right now; the models shipping with the OCRopus recognizer have been trained on only a small number of characters and styles. They will perform well on some styles and poorly on others, depending on resolution and fonts. Furthermore, for book recognition, you should use book-adaptive recognition with OCRopus, which results in substantial improvements in recognition rates. Tom --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
