[tesseract-ocr] Re: Tess4J training for new symbols.

Tom Morris Sun, 14 Feb 2016 11:00:06 -0800

I don't know if the Tess4J wrapper supports multiple languages, or which 
language you're using as a base language, but you might consider training 
it and any other symbols you need into an entirely separate language and 
then OCRing using the Tess4J equivalent of -l eng+mylang (or whatever your 
base language is).

There is the equation "language" with code equ, but it apparently doesn't 
include that style division sign, which I was a little surprised at.  You 
might try -l eng+equ to start though and see perhaps the "wavy division 
sign" or some other symbol is close enough that it gets detected reliable 
in place of the plus sign, before going to the trouble of doing your own 
training.

I've attached the characters which are included in the equ training.

Tom

On Saturday, February 13, 2016 at 5:13:46 PM UTC-5, Quan Nguyen wrote:
>
> You'd train Tesseract and then use the resultant .trainneddata file with 
> Tess4J.
>
> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract
>
> On Saturday, February 13, 2016 at 7:38:24 AM UTC-6, Alex wrote:
>>
>> How would I go about training for new Unicode symbols for Tess4J. I need 
>> Tesseract to detect a division symbol (÷), but it detects it as a plus 
>> sign.
>>
>> Thanks.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/325d2dab-b81f-46dd-bdf4-b3c4bfa5b921%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

equtmp.unicharset
Description: Binary data

[tesseract-ocr] Re: Tess4J training for new symbols.

Reply via email to