You can unpack the traineddata file and take a look at the .config file in it.
eg. In case of hin.traineddata the config file uses combined mode - cube as well as OEM which makes it very slow. I changed the config value to use OEM only and recombined the file and that improved the speed. Please see http://tesseract-ocr.googlecode.com/svn/trunk/doc/combine_tessdata.1.html Shree Shree Devi Kumar ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jul 15, 2013 at 9:12 PM, bear <[email protected]> wrote: > Thanks, Nick. After poking through the source, it seems that one of my > assumptions was incorrect; tesseract will default to the OEM_TESSERACT_ONLY > mode, therefore it will not try to infer the best mode to use for > individual languages (by default). > > *tesseractclass.cpp:* > * > * > INT_INIT_MEMBER(tessedit_ocr_engine_mode, tesseract::OEM_TESSERACT_ONLY, > "Which OCR engine(s) to run (Tesseract, Cube, both)." > " Defaults to loading and running only Tesseract" > " (no Cube,no combiner)." > " Values from OcrEngineMode enum in tesseractclass.h)", > this->params()), > > On Monday, July 15, 2013 10:38:00 AM UTC-4, Nick White wrote: >> >> Hi, >> >> > I never set the tessedit_ocr_engine_mode >> > configuration for tesseract, so I assume that it is using the default >> mode >> > which, from my reading, will infer the best mode to use from the engine >> for the >> > particular language. >> >> You're right in your assumptions, it will use the default (non-cube) >> mode unless you tell it otherwise. You're also correct that the >> default mode is likely the best for Spanish. >> >> > Finally, where can I set the tessedit_ocr_engine_mode? I cannot find >> this in >> > any documentation online. Do I need to modify the source before >> compiling? Is >> > there a configuration file that I can modify or add? >> >> It's a configuration variable, which you set the same way as any >> other configuration variable. That is documented a little here: >> http://code.google.com/p/**tesseract-ocr/wiki/**ControlParams<http://code.google.com/p/tesseract-ocr/wiki/ControlParams> >> >> I'm afraid I can't help you with performance, as I have no knowledge >> of android stuff. You might find it useful to look at the code of >> Renard's excellent looking Text Fairy app for android: >> https://github.com/renard314/**textfairy<https://github.com/renard314/textfairy> >> >> Nick >> > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

