Using tesstrain.sh with korean training text. You can see the format of generated box files through that.
On Thu, Jul 19, 2018 at 12:06 PM Soumik Ranjan Dasgupta < [email protected]> wrote: > 2) For checking the fonts used in generating the traineddata for your > language, you can see training/language-specific.sh and > langdata/font_properties under your respective language code. > > If I'm not wrong, the language code for korean is "kor". > > Check out langdata/kor directory. > > On Thu, Jul 19, 2018, 11:59 AM nampyo hong <[email protected]> wrote: > >> Hello, >> >> I have two questions about training tesseract 4.0 >> >> 1. >> In case of English, I can find box file and how to training >> such as >> T 112 4663 140 4696 0 >> e 140 4662 160 4686 0 >> s 163 4662 179 4686 0 >> s 182 4661 198 4686 0 >> e 200 4661 220 4685 0 >> r 221 4662 238 4685 0 >> a 239 4661 260 4685 0 >> c 261 4661 281 4685 0 >> t 281 4661 296 4691 0 >> >> but, Korean, I cannot find training example, and I'm confused that >> labelling by consonant and vowel or labelling by "one" letter >> >> 1) 가 10 20 30 40 0 >> >> 2) ㄱ 10 20 30 40 0 >> ㅏ 10 20 30 40 0 >> >> Which is the right way? >> >> 2. Is there a way to find types of font already trained from .traindata >> file? >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/442b6ab3-45f0-4d73-910a-380f0fbea34f%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/442b6ab3-45f0-4d73-910a-380f0fbea34f%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAB_aDAc-8Pt-4GsXZ-i%3Da-6OJvG1sA31P5Qca%2BzRXAkW1m-XQg%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAB_aDAc-8Pt-4GsXZ-i%3Da-6OJvG1sA31P5Qca%2BzRXAkW1m-XQg%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduULmv-i-48iSspgJWkQSqyC_YK1BHRGjfC2S6LrFUdjMg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

