Thanks, Zdenko.

I'll change the filename and try using the /b switch with copy as suggested
by Quan.

I was trying to concatenate the files because
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 says:

An alternative to multi-page tiffs is to create many single-page tiffs for
> a single font, and then you must cat together the tr files for each font
> into several single-font tr files. In any case, the input tr files to
> mftraining must each contain a single font.


Is it a requirement to have only one .tr file per font?

Currently I have less than 32 .tr files, all of same font and tesseract
seems to be working. Maybe the errors will come if I try to use more than
one font or if I go over the 32 file limit.

Shree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com


On Tue, Apr 23, 2013 at 1:48 AM, zdenko podobny <[email protected]> wrote:

> I don't have a lot of time, so I just run some simple tests on linux and
> here are results:
>
>    1. fix name of file: san.sanskrit2003.tr is not correct filename.
>    Should be something like san.sanskrit2003.exp1000.tr
>    2. I tried to use linux cat instead of windows copy
>    (cat  san.sanskrit2003.exp0*.tr > san.sanskrit2003.exp2000.tr). When I
>    compared results (san.sanskrit2003.tr and  san.sanskrit2003.exp2000.tr),
>    difference was that copy put something at the end of file (windows end of
>    line char?). Removing this from end-of-line error message "Bad format in tr
>    file, reading fontname, unichar" disappeared...
>    3. 'shapeclustering -F font_properties -U unicharset
>    san.sanskrit2003.tr' created output - file shapetable.
>    4. When I compared output of 'shapeclustering -F font_properties -U
>    unicharset san.sanskrit2003.tr' and 'shapeclustering -F
>    font_properties -U unicharset san.sanskrit2003.exp2000.tr' I got
>    binnary identical output. So error message "Bad format in tr file, reading
>    fontname, unichar" had not effect in this case...
>
>
> Zdenko
>
>
> On Sun, Apr 21, 2013 at 10:39 AM, sdk <[email protected]> wrote:
>
>> Zdenko,
>>
>> Please download the zip file from
>> https://docs.google.com/file/d/0BwCwgbxF9x6pYm9oUnkyaHMyODA/edit
>> It has the separate tr files as well as the combined tr file. I have
>> included fewer files than earlier test, I got the same error with these.
>>
>> Let me know if you need the Box/Tif pairs also.
>>
>> Thanks!
>>
>>
>> On Thursday, April 18, 2013 11:46:07 PM UTC+5:30, zdenop wrote:
>>
>>> post somewhere your files, so we can test it on linux...
>>>
>>> Zdenko
>>>
>>>
>>> On Thu, Apr 18, 2013 at 6:15 AM, Shree Devi Kumar <[email protected]>wrote:
>>>
>>>>  
>>>> http://code.google.com/p/**tesseract-ocr/wiki/**TrainingTesseract3<http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3>
>>>> says:
>>>>
>>>> An alternative to multi-page tiffs is to create many single-page tiffs
>>>>> for a single font, and then you must cat together the tr files for each
>>>>> font into several single-font tr files. In any case, the input tr files to
>>>>> mftraining must each contain a single font.
>>>>
>>>>
>>>> I tried to concatenate the multiple tr files for multiple images, all
>>>> in the same font, to create a single tr file for one font. This is on
>>>> Windows 7 and I used the copy command as follows:
>>>>
>>>>
>>>>> copy san.sanskrit2003.exp0001.tr + san.sanskrit2003.exp007.tr +
>>>>> san.sanskrit2003.exp012.tr + san.sanskrit2003.exp000.tr +
>>>>> san.sanskrit2003.exp001.tr + san.sanskrit2003.exp002.tr +
>>>>> san.sanskrit2003.exp003.tr + san.sanskrit2003.exp004.tr +
>>>>> san.sanskrit2003.exp005.tr + san.sanskrit2003.exp006.tr +
>>>>> san.sanskrit2003.exp008.tr + san.sanskrit2003.exp009.tr +
>>>>> san.sanskrit2003.exp010.tr + san.sanskrit2003.exp011.tr +
>>>>> san.sanskrit2003.exp013.tr + san.sanskrit2003.exp014.tr +
>>>>> san.sanskrit2003.exp015.tr + san.sanskrit2003.exp016.tr +
>>>>> san.sanskrit2003.exp017.tr    **   san.sanskrit2003.tr
>>>>>
>>>>
>>>>
>>>>> copy san.sanskrit2003b.exp020.tr + san.sanskrit2003b.exp021.tr +
>>>>> san.sanskrit2003b.exp022.tr + san.sanskrit2003b.exp023.tr   **
>>>>> san.sanskrit2003b.tr
>>>>>
>>>>
>>>>
>>>>> copy san.unknown.exp00000001.tr san.unknown.tr
>>>>
>>>>
>>>> This created 3 tr files and I ran shapeclustering with the same, but
>>>> got the following error:
>>>>
>>>>
>>>>> shapeclustering -F san.font_properties -U unicharset
>>>>> san.sanskrit2003.tr san.sanskrit2003b.tr san.unknown.tr
>>>>>
>>>>
>>>>
>>>>> Reading san.sanskrit2003.tr ...
>>>>> Bad format in tr file, reading fontname, unichar
>>>>> Reading san.sanskrit2003b.tr ...
>>>>> Bad format in tr file, reading fontname, unichar
>>>>> Reading san.unknown.tr ...
>>>>> Testing feature weight 1:(40,56):32
>>>>> Total miss
>>>>> Testing feature weight 1:(40,56):32
>>>>> Total miss
>>>>
>>>>
>>>> I
>>>> s this feature supported in 3.02? I am using the windows version on
>>>> Win7.
>>>>
>>>>  --
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To post to this group, send email to [email protected]
>>>>
>>>> To unsubscribe from this group, send email to
>>>> tesseract-oc...@**googlegroups.com
>>>>
>>>> For more options, visit this group at
>>>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to tesseract-oc...@**googlegroups.com.
>>>>
>>>> For more options, visit 
>>>> https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>
>>>> .
>>>>
>>>>
>>>>
>>>
>>>  --
>> --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>>
>
>  --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to