I want to train tesseract 4 using images and ground truth text. I have 
generated the BOX file in for a page in the below format.


D 1107 191 1167 209 0 
a 1107 191 1167 209 0 
t 1107 191 1167 209 0 
e 1107 191 1167 209 0 
: 1107 191 1167 209 0 
  1107 191 1167 209 0 
2 1202 192 1294 209 0 
0 1202 192 1294 209 0 
1 1202 192 1294 209 0 
8 1202 192 1294 209 0 
- 1202 192 1294 209 0 
1 1202 192 1294 209 0 
- 1202 192 1294 209 0 
3 1202 192 1294 209 0 
 1294 209 1295 210 0 
W 157 237 313 323 0 
a 157 237 313 323 0 
l 157 237 313 323 0 
  157 237 313 323 0 
m 321 256 402 322 0 
  321 256 402 322 0 
a 406 256 454 323 0 
  406 256 454 323 0 
r 460 237 525 323 0 
t 460 237 525 323 0 
  460 237 525 323 0 
e 967 261 1041 280 0 
- 967 261 1041 280 0 
S 967 261 1041 280 0 
D 967 261 1041 280 0 
R 967 261 1041 280 0 
  967 261 1041 280 0 
s 1049 261 1113 281 0 
e 1049 261 1113 281 0 
r 1049 261 1113 281 0 
i 1049 261 1113 281 0 
a 1049 261 1113 281 0 
l 1049 261 1113 281 0 
  1049 261 1113 281 0 
n 1123 267 1167 281 0 
o 1123 267 1167 281 0 
. 1123 267 1167 281 0 
: 1123 267 1167 281 0 
  1123 267 1167 281 0 
  1203 263 1372 281 0 
C 1203 263 1372 281 0 
A 1203 263 1372 281 0 
1 1203 263 1372 281 0 
8 1203 263 1372 281 0 
0 1203 263 1372 281 0 
1 1203 263 1372 281 0 
0 1203 263 1372 281 0 
3 1203 263 1372 281 0 
0 1203 263 1372 281 0 
6 1203 263 1372 281 0 
2 1203 263 1372 281 0 
2 1203 263 1372 281 0 
3 1203 263 1372 281 0 
 1372 281 1373 282 0


where i added the word coordinates for every letter as DATE  and Break the 
line using *\t.*

Here is an example of tif and box file. The problem that I have CTC compute 
failure and also when I try to generate BOX file from Tesseract i have the 
same issue.


How to make a valid BOX FILE for a Page.



 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f43bfc30-c3c8-4622-9778-6e4defe8788a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

<<attachment: train_sample.zip>>

Reply via email to