I'm trying to train the attached files (Tesseract 3.02, following the instructions at http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 ) , and although I can compete the training process successfully I can't get tesseract to work with the produce trainneddata file - I always receive the error:
tessdata_manager.SeekToStart(TESSDATA_INTTEMP):Error:Assert failed:in file adaptmatch.cpp, line 555 I have attached the .box, .tif, and font_properties file I used for training purposes. (Although the training instructions says to add .exp? after the font name in the font_properties file, when I use ocr.exp0 as the font name in that file the shape clustering than fails). The following is the process I use for producing the training file: ./tesseract eng.icr.exp0.tif eng.icr.exp0 nobatch box.train.stderr Tesseract Open Source OCR Engine v3.02.02 with Leptonica APPLY_BOXES: Boxes read from boxfile: 315 Found 315 good blobs. Leaving 26 unlabelled blobs in 0 words. TRAINING ... Font name = icr Generated training data for 18 words ./unicharset_extractor eng.icr.exp0.box ./shapeclustering -F font_properties -U unicharset eng.icr.exp0.tr Reading eng.icr.exp0.tr ... Building master shape table Computing shape distances... Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 Stopped with 0 merged, min dist 999.000000 Computing shape distances... Stopped with 0 merged, min dist 999.000000 Computing shape distances... Stopped with 0 merged, min dist 999.000000 Computing shape distances... 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 Distance = 0.007463: Stopped with 1 merged, min dist 0.101266 Master shape_table:Number of shapes = 36 max unichars = 2 number with multiple unichars = 1 ./cntraining eng.icr.exp0.tr Reading eng.icr.exp0.tr ... Clustering ... Writing normproto ... mv unichartset icr.unicharset mv shapetable icr.shapetable mv normproto icr.normproto mv pffmtable icr.pffmtable mv inttemp icr.inttemp ./combine_tessdata icr. TessdataManager combined tesseract data files. Offset for type 0 is -1 Offset for type 1 is 140 Offset for type 2 is -1 Offset for type 3 is -1 Offset for type 4 is -1 Offset for type 5 is 2528 Offset for type 6 is -1 Offset for type 7 is -1 Offset for type 8 is -1 Offset for type 9 is -1 Offset for type 10 is -1 Offset for type 11 is -1 Offset for type 12 is -1 Offset for type 13 is 7841 Offset for type 14 is -1 Offset for type 15 is -1 Offset for type 16 is -1 -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

