Hi Timothy,

I have the same question with Jul. Would you kindly share 1 'textline' 
boxes file and its corresponding image file which you applied? I assume if 
I have one image containing one 'textline' as "Thanks", then I will have 
its corresponding box file as below contents:

Thanks 10 10 500 30 0  //the 10 10 500 30 rectangle contains whole "Thanks" 
text?

But I was wondering if my 'textline' has space character in it, does it 
still work? For example, if I have an image containing one 'textline' as 
"Thank you", will its box file looks like this?

Thank you 10 10 800 30 0 //the 10 10 800 30 rectangle contains whole "Thank 
you" text?

Not sure if my understainding is correct or not - it's highly appreciated 
if you can share some examples or experience to us. Thank you very very 
much!

Li-Chung

Timothy Snyder於 2019年1月25日星期五 UTC+8下午10時47分47秒寫道:
>
> I have successfully trained Tesseract 4.0 using boxes that cover an entire 
> line. I was similarly confused by the mismatch between the docs and that 
> example. I haven't tested training with character-bounding boxes but I can 
> confirm that textline boxes works fine.
>
> On Fri, Jan 25, 2019 at 5:56 AM Jul ius <[email protected] <javascript:>> 
> wrote:
>
>> Hi,
>>
>> I'm interested in training tesseract 4 with real data. As the 
>> documentation seems very poor and only captures training with font files, I 
>> have a general question.
>>
>> On: 
>> https://github.com/tesseract-ocr/tesseract/wiki/Making-Box-Files---4.0
>>
>> It says that the boxes need to cover the whole line in tesseract 4. 
>>
>> When looking inside the linked box file I can clearly see that every box 
>> covers a single character.
>>
>> Can anyone verify which layout for the boxes is right?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/1ab1e0b0-a70a-456b-ab58-2f240a3b479f%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/1ab1e0b0-a70a-456b-ab58-2f240a3b479f%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0f5b70e1-03d9-4b79-a38b-80ccfb1fe480%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to