Re: Tesseract Training

Dmitry Silaev Tue, 15 Feb 2011 22:28:28 -0800

Guys,

If you have more than one box/tiff pair, you can train (i.e. generate a .tr
file) for each of these pairs separately.


Then you can concatenate (simply "cat" or "copy") all resulted .tr files
together and then run all training tools on the single final .tr file. This
relieves you from the 32 file limit.

For your convenience you can craft a batch file or shell script which would
train, concatenate, cluster, etc. in one run. You should analyze all errors
carefully though.

Warm regards,
Dmitry Silaev




On Wed, Feb 16, 2011 at 5:56 AM, Sriranga(78yrsold) <[email protected]
> wrote:

> Dimitry,
> It appears that Khem has not endorsed copy to you as such I am forwarding
> for valuable guidance/comments - which may help me in my Kannada project..
> with regards,
> -sriranga(78yrs)
>
> ---------- Forwarded message ----------
> From: KHEM Sochenda <[email protected]>
> Date: Wed, Feb 16, 2011 at 7:45 AM
> Subject: Re: Tesseract Training
> To: "Sriranga(78yrsold)" <[email protected]>
>
>
> Dear Sriranga,
>
> The below are the steps that I did the trainings:
>
>    1. I created 3 pages of training images as you can see in the
>    attachments( khm.limons1.1 is page, khm.limons1.2 is page 2, and the
>    khm.limons1.3 is the page 3)
>    2. I create box files of every page (khm.limons1.1.box and so on) with
>    the command line:
>
>    "*tesseract khm.limons1.1.tif khm.limons1.1 batch.nochop  makebox*" for
>    page 1 and "*tesseract khm.limons1.2.tif khm.limons1.2 batch.nochop
>    makebox*" for page two and the same for the page 3.
>    3. Then I edit the box files, I got the final result in the
>    attachments.
>    4. I merged the images together into a single file (khm.limons1.0.tif)
>    5. I merged to three box files into a single box file with page number
>    assigned (khm.limons1.0.box)
>    6.
>
>    I ran the command to train the sinble file "*tesseract
>    khm.limons1.1.tif khm.limons1.0.tif khm.limons1.0 nobatch box.train*"..
>    Result look okay at this step. (My purpose to merge this into one file is I
>    want a single font to be in just one .tr file)
>    7. I then run the command "unicharset_extractor khm.limons1.0.box " to
>    extract every single glyp from the box files. The result look okay.
>    8.
>
>    Then I tried running this to extract the feature "*mftraining –U
>    unicharset –O khm.unicharset khm.limons1.0.tr" and "cntraining
>    khm.limons1.0.tr*" I failed in this step.
>
> --------------------------------------------------------------------------------------------------------
>
> Since I have no clue getting the above idea works, I obmitted the step 4
> and 5 and skipped to point 6, 7, and 8 using the separated box files, I got
> the traineddata as in the attached file. With three .tr files separately is
> not what I want to do.
>
> Currently I used the obtained trained data for my temporary OCR system.
> What I wished to do is to add other fonts, but the number of .tr files are
> limited to 32 only... This is what I concerned.
>
> Best Regards,
>
> Sochenda
>
>
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Re: Tesseract Training

Reply via email to