Training japanese for 3.0

Stane Sat, 18 Sep 2010 17:50:01 -0700

Hi folks,

I try to make my own jpn.traineddata for tesseract 3.0 and for the
beginning with just 10 diffrent Characters/Kanjis which repeates
theirself a few times and are seperates by a space to make sure they
get boxed.


With tesseract I create the box file, edit it with pytesseracttrainer
to make everything nice and correct.
Next i let run tesseract in training-mode to get a .tr file. So far so
good and every things seems to be correct.
But when i run the unicharset_extractor I get an unicharset which
looks like this
"10
NULL 0 NULL 0
亜 0 NULL 0
..."

Well this doesnt look soo healthy to me, I wonder if it is suposed to
be like this and what did I wrong? Have I to create the unicharset for
japanese manualy?

Thanks for any help :-)

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Training japanese for 3.0

Reply via email to