Thanks, Zdenko. I'll change the filename and try using the /b switch with copy as suggested by Quan.
I was trying to concatenate the files because http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 says: An alternative to multi-page tiffs is to create many single-page tiffs for > a single font, and then you must cat together the tr files for each font > into several single-font tr files. In any case, the input tr files to > mftraining must each contain a single font. Is it a requirement to have only one .tr file per font? Currently I have less than 32 .tr files, all of same font and tesseract seems to be working. Maybe the errors will come if I try to use more than one font or if I go over the 32 file limit. Shree Devi Kumar ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Apr 23, 2013 at 1:48 AM, zdenko podobny <[email protected]> wrote: > I don't have a lot of time, so I just run some simple tests on linux and > here are results: > > 1. fix name of file: san.sanskrit2003.tr is not correct filename. > Should be something like san.sanskrit2003.exp1000.tr > 2. I tried to use linux cat instead of windows copy > (cat san.sanskrit2003.exp0*.tr > san.sanskrit2003.exp2000.tr). When I > compared results (san.sanskrit2003.tr and san.sanskrit2003.exp2000.tr), > difference was that copy put something at the end of file (windows end of > line char?). Removing this from end-of-line error message "Bad format in tr > file, reading fontname, unichar" disappeared... > 3. 'shapeclustering -F font_properties -U unicharset > san.sanskrit2003.tr' created output - file shapetable. > 4. When I compared output of 'shapeclustering -F font_properties -U > unicharset san.sanskrit2003.tr' and 'shapeclustering -F > font_properties -U unicharset san.sanskrit2003.exp2000.tr' I got > binnary identical output. So error message "Bad format in tr file, reading > fontname, unichar" had not effect in this case... > > > Zdenko > > > On Sun, Apr 21, 2013 at 10:39 AM, sdk <[email protected]> wrote: > >> Zdenko, >> >> Please download the zip file from >> https://docs.google.com/file/d/0BwCwgbxF9x6pYm9oUnkyaHMyODA/edit >> It has the separate tr files as well as the combined tr file. I have >> included fewer files than earlier test, I got the same error with these. >> >> Let me know if you need the Box/Tif pairs also. >> >> Thanks! >> >> >> On Thursday, April 18, 2013 11:46:07 PM UTC+5:30, zdenop wrote: >> >>> post somewhere your files, so we can test it on linux... >>> >>> Zdenko >>> >>> >>> On Thu, Apr 18, 2013 at 6:15 AM, Shree Devi Kumar <[email protected]>wrote: >>> >>>> >>>> http://code.google.com/p/**tesseract-ocr/wiki/**TrainingTesseract3<http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3> >>>> says: >>>> >>>> An alternative to multi-page tiffs is to create many single-page tiffs >>>>> for a single font, and then you must cat together the tr files for each >>>>> font into several single-font tr files. In any case, the input tr files to >>>>> mftraining must each contain a single font. >>>> >>>> >>>> I tried to concatenate the multiple tr files for multiple images, all >>>> in the same font, to create a single tr file for one font. This is on >>>> Windows 7 and I used the copy command as follows: >>>> >>>> >>>>> copy san.sanskrit2003.exp0001.tr + san.sanskrit2003.exp007.tr + >>>>> san.sanskrit2003.exp012.tr + san.sanskrit2003.exp000.tr + >>>>> san.sanskrit2003.exp001.tr + san.sanskrit2003.exp002.tr + >>>>> san.sanskrit2003.exp003.tr + san.sanskrit2003.exp004.tr + >>>>> san.sanskrit2003.exp005.tr + san.sanskrit2003.exp006.tr + >>>>> san.sanskrit2003.exp008.tr + san.sanskrit2003.exp009.tr + >>>>> san.sanskrit2003.exp010.tr + san.sanskrit2003.exp011.tr + >>>>> san.sanskrit2003.exp013.tr + san.sanskrit2003.exp014.tr + >>>>> san.sanskrit2003.exp015.tr + san.sanskrit2003.exp016.tr + >>>>> san.sanskrit2003.exp017.tr ** san.sanskrit2003.tr >>>>> >>>> >>>> >>>>> copy san.sanskrit2003b.exp020.tr + san.sanskrit2003b.exp021.tr + >>>>> san.sanskrit2003b.exp022.tr + san.sanskrit2003b.exp023.tr ** >>>>> san.sanskrit2003b.tr >>>>> >>>> >>>> >>>>> copy san.unknown.exp00000001.tr san.unknown.tr >>>> >>>> >>>> This created 3 tr files and I ran shapeclustering with the same, but >>>> got the following error: >>>> >>>> >>>>> shapeclustering -F san.font_properties -U unicharset >>>>> san.sanskrit2003.tr san.sanskrit2003b.tr san.unknown.tr >>>>> >>>> >>>> >>>>> Reading san.sanskrit2003.tr ... >>>>> Bad format in tr file, reading fontname, unichar >>>>> Reading san.sanskrit2003b.tr ... >>>>> Bad format in tr file, reading fontname, unichar >>>>> Reading san.unknown.tr ... >>>>> Testing feature weight 1:(40,56):32 >>>>> Total miss >>>>> Testing feature weight 1:(40,56):32 >>>>> Total miss >>>> >>>> >>>> I >>>> s this feature supported in 3.02? I am using the windows version on >>>> Win7. >>>> >>>> -- >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To post to this group, send email to [email protected] >>>> >>>> To unsubscribe from this group, send email to >>>> tesseract-oc...@**googlegroups.com >>>> >>>> For more options, visit this group at >>>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en> >>>> >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesseract-oc...@**googlegroups.com. >>>> >>>> For more options, visit >>>> https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out> >>>> . >>>> >>>> >>>> >>> >>> -- >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected] >> To unsubscribe from this group, send email to >> [email protected] >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> >> --- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> >> > > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

