Correct me if I am wrong, but shouldn't each character be bound by its own 
box? Try opening this in JTessBoxEditor ( 
http://vietocr.sourceforge.net/training.html ).

On Thursday, August 23, 2018 at 12:33:07 PM UTC+1, [email protected] 
wrote:
>
> I want to train tesseract 4 using images and ground truth text. I have 
> generated the BOX file in for a page in the below format.
>
>
> D 1107 191 1167 209 0 
> a 1107 191 1167 209 0 
> t 1107 191 1167 209 0 
> e 1107 191 1167 209 0 
> : 1107 191 1167 209 0 
>   1107 191 1167 209 0 
> 2 1202 192 1294 209 0 
> 0 1202 192 1294 209 0 
> 1 1202 192 1294 209 0 
> 8 1202 192 1294 209 0 
> - 1202 192 1294 209 0 
> 1 1202 192 1294 209 0 
> - 1202 192 1294 209 0 
> 3 1202 192 1294 209 0 
>  1294 209 1295 210 0 
> W 157 237 313 323 0 
> a 157 237 313 323 0 
> l 157 237 313 323 0 
>   157 237 313 323 0 
> m 321 256 402 322 0 
>   321 256 402 322 0 
> a 406 256 454 323 0 
>   406 256 454 323 0 
> r 460 237 525 323 0 
> t 460 237 525 323 0 
>   460 237 525 323 0 
> e 967 261 1041 280 0 
> - 967 261 1041 280 0 
> S 967 261 1041 280 0 
> D 967 261 1041 280 0 
> R 967 261 1041 280 0 
>   967 261 1041 280 0 
> s 1049 261 1113 281 0 
> e 1049 261 1113 281 0 
> r 1049 261 1113 281 0 
> i 1049 261 1113 281 0 
> a 1049 261 1113 281 0 
> l 1049 261 1113 281 0 
>   1049 261 1113 281 0 
> n 1123 267 1167 281 0 
> o 1123 267 1167 281 0 
> . 1123 267 1167 281 0 
> : 1123 267 1167 281 0 
>   1123 267 1167 281 0 
>   1203 263 1372 281 0 
> C 1203 263 1372 281 0 
> A 1203 263 1372 281 0 
> 1 1203 263 1372 281 0 
> 8 1203 263 1372 281 0 
> 0 1203 263 1372 281 0 
> 1 1203 263 1372 281 0 
> 0 1203 263 1372 281 0 
> 3 1203 263 1372 281 0 
> 0 1203 263 1372 281 0 
> 6 1203 263 1372 281 0 
> 2 1203 263 1372 281 0 
> 2 1203 263 1372 281 0 
> 3 1203 263 1372 281 0 
>  1372 281 1373 282 0
>
>
> where i added the word coordinates for every letter as DATE  and Break the 
> line using *\t.*
>
> Here is an example of tif and box file. The problem that I have CTC 
> compute failure and also when I try to generate BOX file from Tesseract i 
> have the same issue.
>
>
> How to make a valid BOX FILE for a Page.
>
>
>
>  
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e54b6065-48ca-4e3b-9d6a-1c809813f682%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to