> Hello from neighboring Georgia!
Yay!
Thank you, I'll do a git pull and give it a try!
Not right now, cause I am under the load.

I've also already noticed that without "-l" I can get it work.
Thank you again, I guess I may have further questions.

მადლობტ

ლორაირ


On Jun 4, 8:03 pm, Derek <[email protected]> wrote:
> Hi Shikamuk,
>
> Hello from neighboring Georgia! You're exactly right, the issue is that you
> don't have hye.traineddata yet. For completely new character sets, you need
> to issue the tesseract command without "-l yournewlanguage". The line
> you're referring to is suggesting what to do after you have trained
> Tesseract on one font in your new language. Since you are training for a
> unique script, it doesn't really matter what you use as the language code;
> you will get equally bad results no matter what.
>
> I don't suggest using auto_train.sh at this stage; you will need to edit
> the boxfiles generated by make_boxes.sh before continuing the training
> process, so I suggest running make_boxes.sh on its own, and then using
> merge_boxes.py and align_boxfile.py along with manual editing to get the
> boxfiles in order before continuing with the training process. I've made
> some small modifications to the scripts and README to make this clear, so I
> suggest doing 'git pull' to get the latest copy.
>
> Hope that helps!
>
> Derek
>
>
>
>
>
>
>
> On Sunday, June 3, 2012 10:29:26 PM UTC+4, shikamuk wrote:
>
> > Hey, Derek.
> > Thank you for scripts, they seem to work.
>
> > However, a couple of questions:
>
> > 0. So, I've compiled svn version of tesseract and installed it to the /
> > local/tesseract-svn prefix with all language files.
> > I've also exported /local/tesseract-svn/bin in PATH so that binaries
> > from there can be called from scripts.
>
> > 1. Then, I've created the text.txt file with a nice long text in it.
>
> > 2.  I've run
> > python text2img.py -b -i _some_fonts_here
> > Now I have png files.
>
> > 3. Then I run png2tif.sh and get all tif files.
> > That's correct.
>
> > 4. Then I am supposed to run autotrain.sh, right?
> > Anyway, it is failing on the first step - make_boxes.sh
> > I debugged the script by putting "set -x" there and I have
>
> > ---
> > + LANG=hye
> > + for file in '*.tif'
> > ++ basename hye.Dejavu_Serifbold.exp0.tif
> > + filename=hye.Dejavu_Serifbold.exp0.tif
> > + filename=hye.Dejavu_Serifbold.exp0
> > + tesseract hye.Dejavu_Serifbold.exp0.tif hye.Dejavu_Serifbold.exp0 -l
> > hye batch.nochop makebox
> > Error opening data file /local/tesseract-svn/share/tessdata/
> > hye.traineddata
> > Please make sure the TESSDATA_PREFIX environment variable is set to
> > the parent directory of your "tessdata" directory.
> > Failed loading language 'hye'
> > Tesseract couldn't load any languages!
> > Could not initialize tesseract.
> > ---
>
> > and the same messages for the all fonts.
>
> > Obviously, there is no hye.traineddata file there.
> > I wonder if it should be there on this step, when I am bootstrapping a
> > new language?
>
> > According to the
> >http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
> > while bootstrapping a new language one has to issue:
> > tesseract [lang].[fontname].exp[num].tif [lang].[fontname].exp[num] -l
> > yournewlanguage batch.nochop makebox
>
> > which is what make_boxes.sh script tries to do and what is failed from
> > the commandline as well:
>
> > $tesseract hye.DejaVu_Sansitalic.exp0.tif hye.DejaVu_Sansitalic.exp0 -
> > l hy batch.nochop makebox
> > Error opening data file /local/tesseract-svn/share/tessdata/
> > hy.traineddata
> > Please make sure the TESSDATA_PREFIX environment variable is set to
> > the parent directory of your "tessdata" directory.
> > Failed loading language 'hy'
> > Tesseract couldn't load any languages!
> > Could not initialize tesseract.
>
> > Any ideas?
>
> > On May 24, 11:02 pm, Derek Dohler <[email protected]> wrote:
> > > Hi all,
>
> > > I have been doing a lot of tesseract training recently, so I decided to
> > put together some Python and shell scripts to speed up the process. I
> > haven't done any prep to prepare these for public consumption, but they
> > have made my life a lot easier, so I thought I'd throw them out on the list
> > in case anyone else finds them useful.
>
> > > Just a head's up, the default language is Georgian because that's what
> > I'm training for, so make sure to change that to your language when
> > training.
>
> > >https://github.com/ddohler/tess_school
>
> > > Cheers,
> > > Derek

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to