I want to train tesseract 4 using images and ground truth text. I have generated the BOX file in for a page in the below format.
D 1107 191 1167 209 0 a 1107 191 1167 209 0 t 1107 191 1167 209 0 e 1107 191 1167 209 0 : 1107 191 1167 209 0 1107 191 1167 209 0 2 1202 192 1294 209 0 0 1202 192 1294 209 0 1 1202 192 1294 209 0 8 1202 192 1294 209 0 - 1202 192 1294 209 0 1 1202 192 1294 209 0 - 1202 192 1294 209 0 3 1202 192 1294 209 0 1294 209 1295 210 0 W 157 237 313 323 0 a 157 237 313 323 0 l 157 237 313 323 0 157 237 313 323 0 m 321 256 402 322 0 321 256 402 322 0 a 406 256 454 323 0 406 256 454 323 0 r 460 237 525 323 0 t 460 237 525 323 0 460 237 525 323 0 e 967 261 1041 280 0 - 967 261 1041 280 0 S 967 261 1041 280 0 D 967 261 1041 280 0 R 967 261 1041 280 0 967 261 1041 280 0 s 1049 261 1113 281 0 e 1049 261 1113 281 0 r 1049 261 1113 281 0 i 1049 261 1113 281 0 a 1049 261 1113 281 0 l 1049 261 1113 281 0 1049 261 1113 281 0 n 1123 267 1167 281 0 o 1123 267 1167 281 0 . 1123 267 1167 281 0 : 1123 267 1167 281 0 1123 267 1167 281 0 1203 263 1372 281 0 C 1203 263 1372 281 0 A 1203 263 1372 281 0 1 1203 263 1372 281 0 8 1203 263 1372 281 0 0 1203 263 1372 281 0 1 1203 263 1372 281 0 0 1203 263 1372 281 0 3 1203 263 1372 281 0 0 1203 263 1372 281 0 6 1203 263 1372 281 0 2 1203 263 1372 281 0 2 1203 263 1372 281 0 3 1203 263 1372 281 0 1372 281 1373 282 0 where i added the word coordinates for every letter as DATE and Break the line using *\t.* Here is an example of tif and box file. The problem that I have CTC compute failure and also when I try to generate BOX file from Tesseract i have the same issue. How to make a valid BOX FILE for a Page. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/f43bfc30-c3c8-4622-9778-6e4defe8788a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
<<attachment: train_sample.zip>>

