Still interested in example of box files for tesseract 4...

Doesn't anyone has an example for us? It would be great to see how we have 
to handle spaces in textlines.



Am Montag, 28. Januar 2019 15:01:49 UTC+1 schrieb Jul ius:
>
> Hi,
>
> that would also be my next question. Don't we need anything like a 
> seperator? Some examples would be great. The amout of information on the 
> internet is very poor as tesseract 4 is new.
>
> Am Sonntag, 27. Januar 2019 18:20:06 UTC+1 schrieb Li-Chung Chou:
>>
>> Hi Timothy,
>>
>> I have the same question with Jul. Would you kindly share 1 'textline' 
>> boxes file and its corresponding image file which you applied? I assume if 
>> I have one image containing one 'textline' as "Thanks", then I will have 
>> its corresponding box file as below contents:
>>
>> Thanks 10 10 500 30 0  //the 10 10 500 30 rectangle contains whole 
>> "Thanks" text?
>>
>> But I was wondering if my 'textline' has space character in it, does it 
>> still work? For example, if I have an image containing one 'textline' as 
>> "Thank you", will its box file looks like this?
>>
>> Thank you 10 10 800 30 0 //the 10 10 800 30 rectangle contains whole 
>> "Thank you" text?
>>
>> Not sure if my understainding is correct or not - it's highly appreciated 
>> if you can share some examples or experience to us. Thank you very very 
>> much!
>>
>> Li-Chung
>>
>> Timothy Snyder於 2019年1月25日星期五 UTC+8下午10時47分47秒寫道:
>>>
>>> I have successfully trained Tesseract 4.0 using boxes that cover an 
>>> entire line. I was similarly confused by the mismatch between the docs and 
>>> that example. I haven't tested training with character-bounding boxes but I 
>>> can confirm that textline boxes works fine.
>>>
>>> On Fri, Jan 25, 2019 at 5:56 AM Jul ius <[email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm interested in training tesseract 4 with real data. As the 
>>>> documentation seems very poor and only captures training with font files, 
>>>> I 
>>>> have a general question.
>>>>
>>>> On: 
>>>> https://github.com/tesseract-ocr/tesseract/wiki/Making-Box-Files---4.0
>>>>
>>>> It says that the boxes need to cover the whole line in tesseract 4. 
>>>>
>>>> When looking inside the linked box file I can clearly see that every 
>>>> box covers a single character.
>>>>
>>>> Can anyone verify which layout for the boxes is right?
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/1ab1e0b0-a70a-456b-ab58-2f240a3b479f%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/1ab1e0b0-a70a-456b-ab58-2f240a3b479f%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5c47b52f-fbaa-4807-ba1e-baa4ab4efdc0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to