I use tesseract-ocr-w64-setup-v4.0.0.20181030 and   jTessBoxEditor-2.2.0 in
windows10. I use 3 images for test,you can find it in the attach files
sample.zip.

1. I use jTessBoxEditor to merge the 3 images.
    The merged file name is  "langyp.fontyp.exp0.tif"
 2. generate box file
     tesseract langyp.fontyp.exp0.tif langyp.fontyp.exp0 -l eng --psm 7
--oem 3 batch.nochop makebox
     Then generate  langyp.fontyp.exp0.box file
3.Open JTessBoxEditor -> Box Editor --> open langyp.fontyp.exp0.tif -->
modify mistakes
[image: image.png]


4. generate font_properties
    echo "fontyp 0 0 0 0 0" > font_properties
5. generate training file
       tesseract langyp.fontyp.exp0.tif langyp.fontyp.exp0 -l eng --psm 7
--oem 3 nobatch box.train
      Then langyp.font.exp0.tr file
6. generate charset file
       unicharset_extractor langyp.fontyp.exp0.box
      Then generate unicharset file
7. generate shape file
       shapeclustering -F font_properties -U unicharset -O
langyp.unicharset langyp.fontyp.exp0.tr
8. mftraining -F font_properties -U unicharset -O langyp.unicharset
langyp.fontyp.exp0.tr
9. cntraining langyp.fontyp.exp0.tr
10.rename normproto fontyp.normproto
rename inttemp fontyp.inttemp
rename pffmtable fontyp.pffmtable
rename unicharset fontyp.unicharset
rename shapetable fontyp.shapetable
11.combine_tessdata fontyp.
12.Then you can get the fontyp.traineddata file
But when I follow these steps at step 7,after typing "shapeclustering -F
font_properties -U unicharset -O langyp.unicharset langyp.fontyp.exp0.tr"
this command,
the teriminal does not have any output even though wating for more than 20
minutes.

If I skip the step 7 do step8, after typing *"**mftraining -F
font_properties -U unicharset -O langyp.unicharset langyp.fontyp.exp0.tr
<http://langyp.fontyp.exp0.tr>"* this command,
only one warning "No shape table file present: shapetable"

Then, the teriminal does not have any output even though waiting for long
time.




On Sat, Dec 29, 2018 at 5:02 PM Zdenko Podobny <[email protected]> wrote:

> Please provide real information and data - not "meta" description of you
> process.
>
> Zdenko
>
>
> so 29. 12. 2018 o 9:37 <[email protected]> napísal(a):
>
>> I also encounter this problem,I tried tesseract 3.5 and  tesseract 4.0,
>> the result is same.
>>
>> 在 2018年7月17日星期二 UTC+8下午5:16:26,[email protected]写道:
>>>
>>> Hi all,
>>>
>>> I'm trying to train Tesseract, I've gone through the first few step
>>> including
>>> 1. getting TIF's
>>> 2. creating the box files
>>> 3. correcting the box files
>>> 4. training(tesseract [language].[fontname].exp[samplenumber].tif
>>> [language].[fontname].exp[samplenumber] box.train)
>>> 5. creating the unicharset file
>>> 6. creating the font_properties file,
>>> so now I already have the files of : tif, .box, .tr, font_properties,
>>> unicharset, all the steps before the shapeclustering were successfully and
>>> there is no error.
>>> But when I ran: shapeclustering -F font_properties -U unicharset -O
>>>  [language].unicharset  [language].[fontname].exp0.tr, the command
>>> prompt is not responding, it's not finished but there's no output.
>>> Can anyone tell me why and how to solve it? Thanks in advance.
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/f75115f5-7613-4a6a-a95a-a0b933b2c88a%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/f75115f5-7613-4a6a-a95a-a0b933b2c88a%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8y93aCKbN8VtZbrMpqMgFCZi4M6a8aO3EdPqG2%3DVzQ1cw%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8y93aCKbN8VtZbrMpqMgFCZi4M6a8aO3EdPqG2%3DVzQ1cw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAPiKE20Gmd2rYXOLfMTYquL8wg9pM4_%2B%2BUHJnXoSe7iHkJ579g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

<<attachment: sample.zip>>

Reply via email to