Hi folks, I try to make my own jpn.traineddata for tesseract 3.0 and for the beginning with just 10 diffrent Characters/Kanjis which repeates theirself a few times and are seperates by a space to make sure they get boxed.
With tesseract I create the box file, edit it with pytesseracttrainer to make everything nice and correct. Next i let run tesseract in training-mode to get a .tr file. So far so good and every things seems to be correct. But when i run the unicharset_extractor I get an unicharset which looks like this "10 NULL 0 NULL 0 亜 0 NULL 0 ..." Well this doesnt look soo healthy to me, I wonder if it is suposed to be like this and what did I wrong? Have I to create the unicharset for japanese manualy? Thanks for any help :-) -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

