Dear all,

I've been training tesseract with a multipage tiff file with 5 pages and
approx 12000 boxes.

Now I increased the samples in the tiff file, I have 12 pages and 29241
boxes.

My concern is that my previous traineddata file size is 321817 bytes and
the new one is 318022 bytes. I don't know if it should be bigger, as I have
no idea about the file format, but I downloaded one version
of eng.traineddata from the tesseract repository and I see that its size is
21876572 bytes. Could it be that perhaps it is computing just the results
of the first page ? I see in the log that at least, at the beginning of the
process, it is processing all the pages.

I am using Tesseract 3.02 on Windows.

I will paste my log here, and below that, my batch file, the one that I use
for training.

Log:

A:\training>tesseract.exe patentesar.normal.exp0.tif
patentesar.normal.exp0 nobatch bo
x.train.stderr
Tesseract Open Source OCR Engine v3.02 with Leptonica
Page 1 of 12
row xheight=88.6667, but median xheight = 59.6
row xheight=81.8333, but median xheight = 59.6
row xheight=75, but median xheight = 59.6
row xheight=71.1875, but median xheight = 59.6
row xheight=71.1875, but median xheight = 59.6
row xheight=71.1875, but median xheight = 59.6
row xheight=68.5333, but median xheight = 59.6
row xheight=67.3333, but median xheight = 59.6
APPLY_BOXES:
   Boxes read from boxfile:    1671
   Found 1671 good blobs.
TRAINING ... Font name = normal
Generated training data for 52 words
Page 2 of 12
APPLY_BOXES:
   Boxes read from boxfile:    2003
   Found 2003 good blobs.
Generated training data for 58 words
Page 3 of 12
FAIL!
APPLY_BOXES: boxfile line 358/0 ((383,4901),(428,4980)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 529/D ((146,4401),(187,4480)): FAILURE!
Couldn't find a matching blob
APPLY_BOXES:
   Boxes read from boxfile:    2128
   Boxes failed resegmentation:       2
   Found 2126 good blobs.
Generated training data for 60 words
Page 4 of 12
APPLY_BOXES:
   Boxes read from boxfile:    2257
   Found 2257 good blobs.
Generated training data for 62 words
Page 5 of 12
APPLY_BOXES:
   Boxes read from boxfile:    2381
   Found 2381 good blobs.
Generated training data for 64 words
Page 6 of 12
FAIL!
APPLY_BOXES: boxfile line 2070/D ((2141,967),(2182,1037)): FAILURE!
Couldn't find a matching blob
APPLY_BOXES:
   Boxes read from boxfile:    2460
   Boxes failed resegmentation:       1
   Found 2459 good blobs.
Generated training data for 65 words
Page 7 of 12
FAIL!
APPLY_BOXES: boxfile line 2082/B ((867,1084),(910,1151)): FAILURE!
Couldn't find a matching blob
APPLY_BOXES:
   Boxes read from boxfile:    2568
   Boxes failed resegmentation:       1
   Found 2567 good blobs.
Generated training data for 67 words
Page 8 of 12
APPLY_BOXES:
   Boxes read from boxfile:    2680
   Found 2680 good blobs.
Generated training data for 68 words
Page 9 of 12
FAIL!
APPLY_BOXES: boxfile line 2391/D ((1184,910),(1220,973)): FAILURE!
Couldn't find a matching blob
APPLY_BOXES:
   Boxes read from boxfile:    2818
   Boxes failed resegmentation:       1
   Found 2817 good blobs.
Generated training data for 70 words
Page 10 of 12
FAIL!
APPLY_BOXES: boxfile line 1248/0 ((1468,3440),(1502,3501)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 2211/0 ((342,1491),(382,1550)): FAILURE!
Couldn't find a matching blob
APPLY_BOXES:
   Boxes read from boxfile:    3000
   Boxes failed resegmentation:       2
   Found 2998 good blobs.
Generated training data for 73 words
Page 11 of 12
FAIL!
APPLY_BOXES: boxfile line 1280/6 ((2054,3645),(2087,3702)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 2750/0 ((496,1051),(528,1105)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 3098/D ((2229,530),(2254,583)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 3347/Q ((1167,90),(1197,142)): FAILURE!
Couldn't find a matching blob
APPLY_BOXES:
   Boxes read from boxfile:    3370
   Boxes failed resegmentation:       4
   Found 3366 good blobs.
Generated training data for 77 words
Page 12 of 12
row xheight=28.6667, but median xheight = 33.5161
row xheight=28.0889, but median xheight = 33.5161
row xheight=27.1, but median xheight = 33.5161
row xheight=29, but median xheight = 33.5161
row xheight=29, but median xheight = 33.5161
row xheight=29, but median xheight = 33.5161
FAIL!
APPLY_BOXES: boxfile line 0/P ((20,5928),(52,5980)): FAILURE! Couldn't
find a matching blob
FAIL!
APPLY_BOXES: boxfile line 1/7 ((73,5928),(89,5980)): FAILURE! Couldn't
find a matching blob
FAIL!
APPLY_BOXES: boxfile line 2/4 ((110,5928),(141,5980)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 3/1 ((162,5928),(189,5980)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 44/M ((20,5855),(48,5907)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 45/M ((69,5855),(96,5907)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 46/B ((117,5855),(148,5907)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 47/O ((169,5855),(198,5907)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 90/D ((20,5783),(50,5834)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 91/P ((71,5783),(102,5834)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 92/O ((123,5783),(148,5834)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 93/N ((169,5783),(202,5834)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 136/6 ((20,5711),(46,5762)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 137/P ((67,5711),(103,5762)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 138/X ((124,5711),(146,5762)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 139/M ((167,5711),(190,5762)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 183/M ((20,5639),(51,5690)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 184/1 ((72,5639),(92,5690)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 185/G ((113,5639),(144,5690)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 186/6 ((165,5639),(189,5690)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 229/1 ((20,5567),(44,5618)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 230/T ((65,5567),(89,5618)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 231/N ((110,5567),(141,5618)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 232/O ((162,5567),(196,5618)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 276/T ((20,5496),(44,5546)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 277/F ((65,5496),(91,5546)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 278/G ((112,5496),(140,5546)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 279/5 ((161,5496),(191,5546)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 322/8 ((20,5425),(45,5475)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 323/W ((66,5425),(94,5475)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 324/R ((115,5425),(145,5475)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 325/G ((166,5425),(192,5475)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 370/W ((20,5354),(52,5404)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 371/0 ((73,5354),(102,5404)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 372/G ((123,5354),(155,5404)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 373/H ((176,5354),(201,5404)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 416/2 ((20,5283),(43,5333)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 417/I ((64,5283),(89,5333)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 418/1 ((110,5283),(137,5333)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 419/D ((158,5283),(186,5333)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 463/I ((20,5212),(45,5262)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 464/Q ((66,5212),(92,5262)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 465/K ((113,5212),(144,5262)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 466/E ((165,5212),(186,5262)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 511/G ((20,5142),(48,5191)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 512/Q ((69,5142),(97,5191)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 513/T ((118,5142),(140,5191)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 514/D ((161,5142),(189,5191)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 517/D ((305,5142),(328,5191)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 558/M ((20,5072),(45,5121)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 559/E ((66,5072),(95,5121)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 560/E ((116,5072),(140,5121)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 561/H ((161,5072),(191,5121)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 606/5 ((20,5002),(51,5051)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 607/I ((72,5002),(102,5051)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 608/M ((123,5002),(149,5051)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 609/I ((170,5002),(192,5051)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 653/0 ((20,4932),(50,4981)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 654/0 ((71,4932),(102,4981)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 655/O ((123,4932),(151,4981)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 656/8 ((172,4932),(199,4981)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 700/0 ((20,4862),(49,4911)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 701/W ((70,4862),(93,4911)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 702/0 ((114,4862),(144,4911)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 703/G ((165,4862),(193,4911)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 747/M ((20,4793),(51,4841)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 748/T ((72,4793),(94,4841)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 749/0 ((115,4793),(150,4841)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 750/R ((171,4793),(198,4841)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 795/C ((20,4724),(46,4772)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 796/7 ((67,4724),(96,4772)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 797/1 ((117,4724),(147,4772)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 843/H ((20,4655),(47,4703)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 844/8 ((68,4655),(95,4703)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 1903/0 ((1824,3398),(1823,3397)): FAILURE!
Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 1904/0 ((1844,3398),(1843,3397)): FAILURE!
Couldn't find a matching blob
APPLY_BOXES:
   Boxes read from boxfile:    1905
   Boxes failed resegmentation:      76
   Found 1829 good blobs.
Generated training data for 48 words

A:\training>unicharset_extractor patentesar.normal.exp0.box
Extracting unicharset from patentesar.normal.exp0.box
Wrote unicharset file ./unicharset.
Presione una tecla para continuar . . .

A:\training>mftraining -F font_properties -U unicharset
patentesar.normal.exp0.tr
Read shape table shapetable of 36 shapes
Reading patentesar.normal.exp0.tr ...
Warning: no protos/configs for g in CreateIntTemplates()
Done!

A:\training>mftraining -F font_properties -U unicharset -O
patentesar.normal.exp0.unic
harset patentesar.normal.exp0.tr
Read shape table shapetable of 36 shapes
Reading patentesar.normal.exp0.tr ...
Warning: no protos/configs for g in CreateIntTemplates()
Done!
Presione una tecla para continuar . . .

A:\training>cntraining patentesar.normal.exp0.tr
Reading patentesar.normal.exp0.tr ...
Clustering ...

Writing normproto ...
Presione una tecla para continuar . . .

A:\training>wordlist2dawg frequent_words_list patentesar.freq-dawg unicharset
Loading unicharset from 'unicharset'
Reading word list from 'frequent_words_list'
Reducing Trie to SquishedDawg
Writing squished DAWG to 'patentesar.freq-dawg'
Presione una tecla para continuar . . .

A:\training>wordlist2dawg words_list patentesar.word-dawg unicharset
Loading unicharset from 'unicharset'
Reading word list from 'words_list'
Reducing Trie to SquishedDawg
Writing squished DAWG to 'patentesar.word-dawg'
Presione una tecla para continuar . . .

A:\training>copy /Y normproto patentesar.normal.exp0.normproto
        1 archivo(s) copiado(s).

A:\training>copy /Y inttemp patentesar.normal.exp0.inttemp
        1 archivo(s) copiado(s).

A:\training>copy /Y pffmtable patentesar.normal.exp0.pffmtable
        1 archivo(s) copiado(s).

A:\training>copy /Y Microfeat patentesar.normal.exp0.Microfeat
El sistema no puede encontrar el archivo especificado.

A:\training>copy /Y shapetable patentesar.normal.exp0.shapetable
        1 archivo(s) copiado(s).

A:\training>copy /Y unicharset patentesar.normal.exp0
        1 archivo(s) copiado(s).

A:\training>copy /Y patentesar.normal.exp0.unicharset patentesar.normal.exp0
        1 archivo(s) copiado(s).

A:\training>move /Y patentesar.normal.exp0.normproto tessdata
Se han movido         1 archivos.

A:\training>move /Y patentesar.normal.exp0.inttemp tessdata
Se han movido         1 archivos.

A:\training>move /Y patentesar.normal.exp0.pffmtable tessdata
Se han movido         1 archivos.

A:\training>move /Y patentesar.normal.exp0.Microfeat tessdata
El sistema no puede encontrar el archivo especificado.

A:\training>move /Y patentesar.normal.exp0.shapetable tessdata
Se han movido         1 archivos.

A:\training>move /Y unicharset tessdata
Se han movido         1 archivos.

A:\training>move /Y patentesar.normal.exp0.unicharset tessdata
Se han movido         1 archivos.
Presione una tecla para continuar . . .

A:\training>combine_tessdata tessdata/patentesar.normal.exp0.
Combining tessdata files
TessdataManager combined tesseract data files.
Offset for type 0 is -1
Offset for type 1 is 140
Offset for type 2 is -1
Offset for type 3 is 2559
Offset for type 4 is 309717
Offset for type 5 is 309988
Offset for type 6 is -1
Offset for type 7 is -1
Offset for type 8 is -1
Offset for type 9 is -1
Offset for type 10 is -1
Offset for type 11 is -1
Offset for type 12 is -1
Offset for type 13 is 317370
Offset for type 14 is -1
Offset for type 15 is -1
Offset for type 16 is -1
Presione una tecla para continuar . . .

Batch file:

@rem #############################
@call set_environment.cmd@SET PATH="%TESSDATA_PREFIX%";%PATH%

tesseract.exe patentesar.normal.exp0.tif patentesar.normal.exp0
nobatch box.train.stderr@pause

unicharset_extractor patentesar.normal.exp0.box@pause

mftraining -F font_properties -U unicharset patentesar.normal.exp0.tr
mftraining -F font_properties -U unicharset -O
patentesar.normal.exp0.unicharset patentesar.normal.exp0.tr@pause

cntraining patentesar.normal.exp0.tr@pause

wordlist2dawg frequent_words_list patentesar.freq-dawg unicharset@pause

wordlist2dawg words_list patentesar.word-dawg unicharset@pause
copy /Y normproto patentesar.normal.exp0.normproto copy /Y inttemp
patentesar.normal.exp0.inttemp copy /Y pffmtable
patentesar.normal.exp0.pffmtable copy /Y Microfeat
patentesar.normal.exp0.Microfeatcopy /Y shapetable
patentesar.normal.exp0.shapetable
copy /Y unicharset patentesar.normal.exp0copy /Y
patentesar.normal.exp0.unicharset patentesar.normal.exp0
move /Y patentesar.normal.exp0.normproto tessdatamove /Y
patentesar.normal.exp0.inttemp tessdatamove /Y
patentesar.normal.exp0.pffmtable tessdatamove /Y
patentesar.normal.exp0.Microfeat tessdatamove /Y
patentesar.normal.exp0.shapetable tessdata
move /Y unicharset tessdatamove /Y patentesar.normal.exp0.unicharset tessdata


@pause
combine_tessdata tessdata/patentesar.normal.exp0.
@pausecopy tessdata\patentesar.normal.exp0.traineddata
"%TESSDATA_PREFIX%"\tessdata"
@pause
tesseract patentesar.normal.exp0.tif output -l patentesar.normal.exp0
type output.txt



Best regards and thank you,

Andres

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CALk3cjShXCkVdOz87_Oyscxy-qTVrZuwc1cUm%3DBy1MKH1hQfQg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to