Hi, that would also be my next question. Don't we need anything like a seperator? Some examples would be great. The amout of information on the internet is very poor as tesseract 4 is new.
Am Sonntag, 27. Januar 2019 18:20:06 UTC+1 schrieb Li-Chung Chou: > > Hi Timothy, > > I have the same question with Jul. Would you kindly share 1 'textline' > boxes file and its corresponding image file which you applied? I assume if > I have one image containing one 'textline' as "Thanks", then I will have > its corresponding box file as below contents: > > Thanks 10 10 500 30 0 //the 10 10 500 30 rectangle contains whole > "Thanks" text? > > But I was wondering if my 'textline' has space character in it, does it > still work? For example, if I have an image containing one 'textline' as > "Thank you", will its box file looks like this? > > Thank you 10 10 800 30 0 //the 10 10 800 30 rectangle contains whole > "Thank you" text? > > Not sure if my understainding is correct or not - it's highly appreciated > if you can share some examples or experience to us. Thank you very very > much! > > Li-Chung > > Timothy Snyder於 2019年1月25日星期五 UTC+8下午10時47分47秒寫道: >> >> I have successfully trained Tesseract 4.0 using boxes that cover an >> entire line. I was similarly confused by the mismatch between the docs and >> that example. I haven't tested training with character-bounding boxes but I >> can confirm that textline boxes works fine. >> >> On Fri, Jan 25, 2019 at 5:56 AM Jul ius <[email protected]> wrote: >> >>> Hi, >>> >>> I'm interested in training tesseract 4 with real data. As the >>> documentation seems very poor and only captures training with font files, I >>> have a general question. >>> >>> On: >>> https://github.com/tesseract-ocr/tesseract/wiki/Making-Box-Files---4.0 >>> >>> It says that the boxes need to cover the whole line in tesseract 4. >>> >>> When looking inside the linked box file I can clearly see that every box >>> covers a single character. >>> >>> Can anyone verify which layout for the boxes is right? >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/1ab1e0b0-a70a-456b-ab58-2f240a3b479f%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/1ab1e0b0-a70a-456b-ab58-2f240a3b479f%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d69b92fd-25de-4b55-9ade-f363def05314%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

