I don't have a lot of time, so I just run some simple tests on linux and
here are results:

   1. fix name of file: san.sanskrit2003.tr is not correct filename. Should
   be something like san.sanskrit2003.exp1000.tr
   2. I tried to use linux cat instead of windows copy
   (cat  san.sanskrit2003.exp0*.tr > san.sanskrit2003.exp2000.tr). When I
   compared results (san.sanskrit2003.tr and  san.sanskrit2003.exp2000.tr),
   difference was that copy put something at the end of file (windows end of
   line char?). Removing this from end-of-line error message "Bad format in tr
   file, reading fontname, unichar" disappeared...
   3. 'shapeclustering -F font_properties -U unicharset san.sanskrit2003.tr'
   created output - file shapetable.
   4. When I compared output of 'shapeclustering -F font_properties -U
   unicharset san.sanskrit2003.tr' and 'shapeclustering -F font_properties
   -U unicharset san.sanskrit2003.exp2000.tr' I got binnary identical
   output. So error message "Bad format in tr file, reading fontname, unichar"
   had not effect in this case...


Zdenko


On Sun, Apr 21, 2013 at 10:39 AM, sdk <[email protected]> wrote:

> Zdenko,
>
> Please download the zip file from
> https://docs.google.com/file/d/0BwCwgbxF9x6pYm9oUnkyaHMyODA/edit
> It has the separate tr files as well as the combined tr file. I have
> included fewer files than earlier test, I got the same error with these.
>
> Let me know if you need the Box/Tif pairs also.
>
> Thanks!
>
>
> On Thursday, April 18, 2013 11:46:07 PM UTC+5:30, zdenop wrote:
>
>> post somewhere your files, so we can test it on linux...
>>
>> Zdenko
>>
>>
>> On Thu, Apr 18, 2013 at 6:15 AM, Shree Devi Kumar <[email protected]>wrote:
>>
>>>  
>>> http://code.google.com/p/**tesseract-ocr/wiki/**TrainingTesseract3<http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3>
>>> says:
>>>
>>> An alternative to multi-page tiffs is to create many single-page tiffs
>>>> for a single font, and then you must cat together the tr files for each
>>>> font into several single-font tr files. In any case, the input tr files to
>>>> mftraining must each contain a single font.
>>>
>>>
>>> I tried to concatenate the multiple tr files for multiple images, all in
>>> the same font, to create a single tr file for one font. This is on Windows
>>> 7 and I used the copy command as follows:
>>>
>>>
>>>> copy san.sanskrit2003.exp0001.tr + san.sanskrit2003.exp007.tr +
>>>> san.sanskrit2003.exp012.tr + san.sanskrit2003.exp000.tr +
>>>> san.sanskrit2003.exp001.tr + san.sanskrit2003.exp002.tr +
>>>> san.sanskrit2003.exp003.tr + san.sanskrit2003.exp004.tr +
>>>> san.sanskrit2003.exp005.tr + san.sanskrit2003.exp006.tr +
>>>> san.sanskrit2003.exp008.tr + san.sanskrit2003.exp009.tr +
>>>> san.sanskrit2003.exp010.tr + san.sanskrit2003.exp011.tr +
>>>> san.sanskrit2003.exp013.tr + san.sanskrit2003.exp014.tr +
>>>> san.sanskrit2003.exp015.tr + san.sanskrit2003.exp016.tr +
>>>> san.sanskrit2003.exp017.tr    **   san.sanskrit2003.tr
>>>>
>>>
>>>
>>>> copy san.sanskrit2003b.exp020.tr + san.sanskrit2003b.exp021.tr +
>>>> san.sanskrit2003b.exp022.tr + san.sanskrit2003b.exp023.tr   **
>>>> san.sanskrit2003b.tr
>>>>
>>>
>>>
>>>> copy san.unknown.exp00000001.tr san.unknown.tr
>>>
>>>
>>> This created 3 tr files and I ran shapeclustering with the same, but got
>>> the following error:
>>>
>>>
>>>> shapeclustering -F san.font_properties -U unicharset
>>>> san.sanskrit2003.tr san.sanskrit2003b.tr san.unknown.tr
>>>>
>>>
>>>
>>>> Reading san.sanskrit2003.tr ...
>>>> Bad format in tr file, reading fontname, unichar
>>>> Reading san.sanskrit2003b.tr ...
>>>> Bad format in tr file, reading fontname, unichar
>>>> Reading san.unknown.tr ...
>>>> Testing feature weight 1:(40,56):32
>>>> Total miss
>>>> Testing feature weight 1:(40,56):32
>>>> Total miss
>>>
>>>
>>> I
>>> s this feature supported in 3.02? I am using the windows version on Win7.
>>>
>>>  --
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]
>>>
>>> To unsubscribe from this group, send email to
>>> tesseract-oc...@**googlegroups.com
>>>
>>> For more options, visit this group at
>>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@**googlegroups.com.
>>>
>>> For more options, visit 
>>> https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>
>>> .
>>>
>>>
>>>
>>
>>  --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to