I see, that was very helpful.  Thanks Shree.  I unpacked Arabic, and 
noticed the engine mode:

tessedit_ocr_engine_mode 1

I unpacked Spanish, and it did not contain an engine mode variable 
declaration.  Does that mean that it will default to using tesseract only 
(and not cube) as defined in my tesseractclass.cpp?  Or, will the absence 
of the variable from a language specific .config file default to something 
else?

Thanks again.

On Monday, July 15, 2013 1:23:07 PM UTC-4, shree wrote:
>
> You can unpack the traineddata file and take a look at the .config file in 
> it.
>
> eg. In case of hin.traineddata the config file uses combined mode - cube 
> as well as OEM which makes it very slow. I changed the config value to use 
> OEM only and recombined the file and that improved the speed.
>
> Please see 
> http://tesseract-ocr.googlecode.com/svn/trunk/doc/combine_tessdata.1.html
>
> Shree
>
> Shree Devi Kumar
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>  
>
> On Mon, Jul 15, 2013 at 9:12 PM, bear <[email protected] <javascript:>>wrote:
>
>>  Thanks, Nick.  After poking through the source, it seems that one of my 
>> assumptions was incorrect; tesseract will default to the OEM_TESSERACT_ONLY 
>> mode, therefore it will not try to infer the best mode to use for 
>> individual languages (by default).
>>
>> *tesseractclass.cpp:*
>> *
>> *
>> INT_INIT_MEMBER(tessedit_ocr_engine_mode, tesseract::OEM_TESSERACT_ONLY,
>>                     "Which OCR engine(s) to run (Tesseract, Cube, both)."
>>                     " Defaults to loading and running only Tesseract"
>>                     " (no Cube,no combiner)."
>>                     " Values from OcrEngineMode enum in 
>> tesseractclass.h)",
>>                this->params()),
>>
>> On Monday, July 15, 2013 10:38:00 AM UTC-4, Nick White wrote:
>>>
>>> Hi, 
>>>
>>> > I never set the tessedit_ocr_engine_mode 
>>> > configuration for tesseract, so I assume that it is using the default 
>>> mode 
>>> > which, from my reading, will infer the best mode to use from the 
>>> engine for the 
>>> > particular language. 
>>>
>>> You're right in your assumptions, it will use the default (non-cube) 
>>> mode unless you tell it otherwise. You're also correct that the 
>>> default mode is likely the best for Spanish. 
>>>
>>> > Finally, where can I set the tessedit_ocr_engine_mode?  I cannot find 
>>> this in 
>>> > any documentation online.  Do I need to modify the source before 
>>> compiling?  Is 
>>> > there a configuration file that I can modify or add? 
>>>
>>> It's a configuration variable, which you set the same way as any 
>>> other configuration variable. That is documented a little here: 
>>> http://code.google.com/p/**tesseract-ocr/wiki/**ControlParams<http://code.google.com/p/tesseract-ocr/wiki/ControlParams>
>>>  
>>>
>>> I'm afraid I can't help you with performance, as I have no knowledge 
>>> of android stuff. You might find it useful to look at the code of 
>>> Renard's excellent looking Text Fairy app for android: 
>>> https://github.com/renard314/**textfairy<https://github.com/renard314/textfairy>
>>>  
>>>
>>> Nick 
>>>
>>  -- 
>> -- 
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]<javascript:>
>> To unsubscribe from this group, send email to
>> [email protected] <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>  
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>  
>>  
>>
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to