Zdenko, Please download the zip file from https://docs.google.com/file/d/0BwCwgbxF9x6pYm9oUnkyaHMyODA/edit It has the separate tr files as well as the combined tr file. I have included fewer files than earlier test, I got the same error with these.
Let me know if you need the Box/Tif pairs also. Thanks! On Thursday, April 18, 2013 11:46:07 PM UTC+5:30, zdenop wrote: > > post somewhere your files, so we can test it on linux... > > Zdenko > > > On Thu, Apr 18, 2013 at 6:15 AM, Shree Devi Kumar > <[email protected]<javascript:> > > wrote: > >> http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 >> says: >> >> An alternative to multi-page tiffs is to create many single-page tiffs >>> for a single font, and then you must cat together the tr files for each >>> font into several single-font tr files. In any case, the input tr files to >>> mftraining must each contain a single font. >> >> >> I tried to concatenate the multiple tr files for multiple images, all in >> the same font, to create a single tr file for one font. This is on Windows >> 7 and I used the copy command as follows: >> >> >>> copy san.sanskrit2003.exp0001.tr + san.sanskrit2003.exp007.tr + >>> san.sanskrit2003.exp012.tr + san.sanskrit2003.exp000.tr + >>> san.sanskrit2003.exp001.tr + san.sanskrit2003.exp002.tr + >>> san.sanskrit2003.exp003.tr + san.sanskrit2003.exp004.tr + >>> san.sanskrit2003.exp005.tr + san.sanskrit2003.exp006.tr + >>> san.sanskrit2003.exp008.tr + san.sanskrit2003.exp009.tr + >>> san.sanskrit2003.exp010.tr + san.sanskrit2003.exp011.tr + >>> san.sanskrit2003.exp013.tr + san.sanskrit2003.exp014.tr + >>> san.sanskrit2003.exp015.tr + san.sanskrit2003.exp016.tr + >>> san.sanskrit2003.exp017.tr san.sanskrit2003.tr >>> >> >> >>> copy san.sanskrit2003b.exp020.tr + san.sanskrit2003b.exp021.tr + >>> san.sanskrit2003b.exp022.tr + san.sanskrit2003b.exp023.tr >>> san.sanskrit2003b.tr >>> >> >> >>> copy san.unknown.exp00000001.tr san.unknown.tr >> >> >> This created 3 tr files and I ran shapeclustering with the same, but got >> the following error: >> >> >>> shapeclustering -F san.font_properties -U unicharset san.sanskrit2003.tr >>> san.sanskrit2003b.tr san.unknown.tr >>> >> >> >>> Reading san.sanskrit2003.tr ... >>> Bad format in tr file, reading fontname, unichar >>> Reading san.sanskrit2003b.tr ... >>> Bad format in tr file, reading fontname, unichar >>> Reading san.unknown.tr ... >>> Testing feature weight 1:(40,56):32 >>> Total miss >>> Testing feature weight 1:(40,56):32 >>> Total miss >> >> >> I >> s this feature supported in 3.02? I am using the windows version on Win7. >> >> -- >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected]<javascript:> >> To unsubscribe from this group, send email to >> [email protected] <javascript:> >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> >> --- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> >> > > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

