Hi Timothy, I have the same question with Jul. Would you kindly share 1 'textline' boxes file and its corresponding image file which you applied? I assume if I have one image containing one 'textline' as "Thanks", then I will have its corresponding box file as below contents:
Thanks 10 10 500 30 0 //the 10 10 500 30 rectangle contains whole "Thanks" text? But I was wondering if my 'textline' has space character in it, does it still work? For example, if I have an image containing one 'textline' as "Thank you", will its box file looks like this? Thank you 10 10 800 30 0 //the 10 10 800 30 rectangle contains whole "Thank you" text? Not sure if my understainding is correct or not - it's highly appreciated if you can share some examples or experience to us. Thank you very very much! Li-Chung Timothy Snyder於 2019年1月25日星期五 UTC+8下午10時47分47秒寫道: > > I have successfully trained Tesseract 4.0 using boxes that cover an entire > line. I was similarly confused by the mismatch between the docs and that > example. I haven't tested training with character-bounding boxes but I can > confirm that textline boxes works fine. > > On Fri, Jan 25, 2019 at 5:56 AM Jul ius <[email protected] <javascript:>> > wrote: > >> Hi, >> >> I'm interested in training tesseract 4 with real data. As the >> documentation seems very poor and only captures training with font files, I >> have a general question. >> >> On: >> https://github.com/tesseract-ocr/tesseract/wiki/Making-Box-Files---4.0 >> >> It says that the boxes need to cover the whole line in tesseract 4. >> >> When looking inside the linked box file I can clearly see that every box >> covers a single character. >> >> Can anyone verify which layout for the boxes is right? >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/1ab1e0b0-a70a-456b-ab58-2f240a3b479f%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/1ab1e0b0-a70a-456b-ab58-2f240a3b479f%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0f5b70e1-03d9-4b79-a38b-80ccfb1fe480%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

