Hey, Derek. Thank you for scripts, they seem to work. However, a couple of questions:
0. So, I've compiled svn version of tesseract and installed it to the / local/tesseract-svn prefix with all language files. I've also exported /local/tesseract-svn/bin in PATH so that binaries from there can be called from scripts. 1. Then, I've created the text.txt file with a nice long text in it. 2. I've run python text2img.py -b -i _some_fonts_here Now I have png files. 3. Then I run png2tif.sh and get all tif files. That's correct. 4. Then I am supposed to run autotrain.sh, right? Anyway, it is failing on the first step - make_boxes.sh I debugged the script by putting "set -x" there and I have --- + LANG=hye + for file in '*.tif' ++ basename hye.Dejavu_Serifbold.exp0.tif + filename=hye.Dejavu_Serifbold.exp0.tif + filename=hye.Dejavu_Serifbold.exp0 + tesseract hye.Dejavu_Serifbold.exp0.tif hye.Dejavu_Serifbold.exp0 -l hye batch.nochop makebox Error opening data file /local/tesseract-svn/share/tessdata/ hye.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory. Failed loading language 'hye' Tesseract couldn't load any languages! Could not initialize tesseract. --- and the same messages for the all fonts. Obviously, there is no hye.traineddata file there. I wonder if it should be there on this step, when I am bootstrapping a new language? According to the http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 while bootstrapping a new language one has to issue: tesseract [lang].[fontname].exp[num].tif [lang].[fontname].exp[num] -l yournewlanguage batch.nochop makebox which is what make_boxes.sh script tries to do and what is failed from the commandline as well: $tesseract hye.DejaVu_Sansitalic.exp0.tif hye.DejaVu_Sansitalic.exp0 - l hy batch.nochop makebox Error opening data file /local/tesseract-svn/share/tessdata/ hy.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory. Failed loading language 'hy' Tesseract couldn't load any languages! Could not initialize tesseract. Any ideas? On May 24, 11:02 pm, Derek Dohler <[email protected]> wrote: > Hi all, > > I have been doing a lot of tesseract training recently, so I decided to put > together some Python and shell scripts to speed up the process. I haven't > done any prep to prepare these for public consumption, but they have made my > life a lot easier, so I thought I'd throw them out on the list in case anyone > else finds them useful. > > Just a head's up, the default language is Georgian because that's what I'm > training for, so make sure to change that to your language when training. > > https://github.com/ddohler/tess_school > > Cheers, > Derek -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

