0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR.
3 = Fully automatic page segmentation, but no OSD. (Default)


See whether using OSD to detect the script helps you choose the correct 
traineddata.


On Wednesday, November 19, 2014 12:12:07 AM UTC+5:30, Ryan Dev wrote:
>
> Thanks again.
>
> you may get better results using appropriate language data rather than 
>> just the ascii range. Are the client documents sorted by language?
>>
>
> I'm not sure how they have them organised, I just know they want an 
> "automatic" solution...
>  
>
>>
>> I am attaching files used - i had just copied some tables of ascii range 
>> - you can delete symbols, add multiple copies of letters that are needed.
>>
>>>
>>>
> I'm still getting up and running with training (I'm doing it on linux as 
> there appear to be more tools available that way). But I saw this comment 
> from zdenop
>
> https://groups.google.com/forum/#!searchin/tesseract-ocr/train$20hall$20of$20fame/tesseract-ocr/tq2aHxxndpM/u5ldKIwUANIJ
> and it leads me to believe that getting much better trained data using the 
> common fonts (arial, georgia, segoe, garamond) will not be any better then 
> what is available?
>
> I have complete control over the image data I send to tesseract, so I 
> don't care about skewing, exposure, etc, as my glyphs will always be 
> straight, clear, and separated.
>
> For instance, I want to train for the ligatures ff, ffi, and ffl, which 
> are not in the english or ascii ones, and are missing from even the common 
> fonts like arial, but that my client files may contain. 
>
> Should I train new eng or asc traineddata, or just create a new one for a 
> smaller set of glyphs like these?
>
> Thanks again for your help.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/87c2b556-7a0d-4b8c-9318-62b05a478979%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to