I don't have a lot of time, so I just run some simple tests on linux and here are results:
1. fix name of file: san.sanskrit2003.tr is not correct filename. Should be something like san.sanskrit2003.exp1000.tr 2. I tried to use linux cat instead of windows copy (cat san.sanskrit2003.exp0*.tr > san.sanskrit2003.exp2000.tr). When I compared results (san.sanskrit2003.tr and san.sanskrit2003.exp2000.tr), difference was that copy put something at the end of file (windows end of line char?). Removing this from end-of-line error message "Bad format in tr file, reading fontname, unichar" disappeared... 3. 'shapeclustering -F font_properties -U unicharset san.sanskrit2003.tr' created output - file shapetable. 4. When I compared output of 'shapeclustering -F font_properties -U unicharset san.sanskrit2003.tr' and 'shapeclustering -F font_properties -U unicharset san.sanskrit2003.exp2000.tr' I got binnary identical output. So error message "Bad format in tr file, reading fontname, unichar" had not effect in this case... Zdenko On Sun, Apr 21, 2013 at 10:39 AM, sdk <[email protected]> wrote: > Zdenko, > > Please download the zip file from > https://docs.google.com/file/d/0BwCwgbxF9x6pYm9oUnkyaHMyODA/edit > It has the separate tr files as well as the combined tr file. I have > included fewer files than earlier test, I got the same error with these. > > Let me know if you need the Box/Tif pairs also. > > Thanks! > > > On Thursday, April 18, 2013 11:46:07 PM UTC+5:30, zdenop wrote: > >> post somewhere your files, so we can test it on linux... >> >> Zdenko >> >> >> On Thu, Apr 18, 2013 at 6:15 AM, Shree Devi Kumar <[email protected]>wrote: >> >>> >>> http://code.google.com/p/**tesseract-ocr/wiki/**TrainingTesseract3<http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3> >>> says: >>> >>> An alternative to multi-page tiffs is to create many single-page tiffs >>>> for a single font, and then you must cat together the tr files for each >>>> font into several single-font tr files. In any case, the input tr files to >>>> mftraining must each contain a single font. >>> >>> >>> I tried to concatenate the multiple tr files for multiple images, all in >>> the same font, to create a single tr file for one font. This is on Windows >>> 7 and I used the copy command as follows: >>> >>> >>>> copy san.sanskrit2003.exp0001.tr + san.sanskrit2003.exp007.tr + >>>> san.sanskrit2003.exp012.tr + san.sanskrit2003.exp000.tr + >>>> san.sanskrit2003.exp001.tr + san.sanskrit2003.exp002.tr + >>>> san.sanskrit2003.exp003.tr + san.sanskrit2003.exp004.tr + >>>> san.sanskrit2003.exp005.tr + san.sanskrit2003.exp006.tr + >>>> san.sanskrit2003.exp008.tr + san.sanskrit2003.exp009.tr + >>>> san.sanskrit2003.exp010.tr + san.sanskrit2003.exp011.tr + >>>> san.sanskrit2003.exp013.tr + san.sanskrit2003.exp014.tr + >>>> san.sanskrit2003.exp015.tr + san.sanskrit2003.exp016.tr + >>>> san.sanskrit2003.exp017.tr ** san.sanskrit2003.tr >>>> >>> >>> >>>> copy san.sanskrit2003b.exp020.tr + san.sanskrit2003b.exp021.tr + >>>> san.sanskrit2003b.exp022.tr + san.sanskrit2003b.exp023.tr ** >>>> san.sanskrit2003b.tr >>>> >>> >>> >>>> copy san.unknown.exp00000001.tr san.unknown.tr >>> >>> >>> This created 3 tr files and I ran shapeclustering with the same, but got >>> the following error: >>> >>> >>>> shapeclustering -F san.font_properties -U unicharset >>>> san.sanskrit2003.tr san.sanskrit2003b.tr san.unknown.tr >>>> >>> >>> >>>> Reading san.sanskrit2003.tr ... >>>> Bad format in tr file, reading fontname, unichar >>>> Reading san.sanskrit2003b.tr ... >>>> Bad format in tr file, reading fontname, unichar >>>> Reading san.unknown.tr ... >>>> Testing feature weight 1:(40,56):32 >>>> Total miss >>>> Testing feature weight 1:(40,56):32 >>>> Total miss >>> >>> >>> I >>> s this feature supported in 3.02? I am using the windows version on Win7. >>> >>> -- >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To post to this group, send email to [email protected] >>> >>> To unsubscribe from this group, send email to >>> tesseract-oc...@**googlegroups.com >>> >>> For more options, visit this group at >>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en> >>> >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@**googlegroups.com. >>> >>> For more options, visit >>> https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out> >>> . >>> >>> >>> >> >> -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

