Dňa 19.04.2010 09:05, MARTIN Pierre wrote / napísal(a): > Hello Zdpo, > > As said in my mail on 13th of April, as an answer to Sriranga: > > >>> I am extremely thankful for the attachment. I could not understand "OCRB >>> font" - which I don't have. It is presumed any fonts can do/be used ? >>> >> Exactly. Basically, you'll have to create your custom language which will >> still contain a certain number of fonts. Each font can be train with >> multiple pictures. That's why the file names for the boxes are decomposed >> this way: xxx.FFFFF.ppp.box (xxx=language, FFF=font, ppp=page if you have >> multiple training pictures by font), this way the files are better organised. >> > As you can see, the names of the input files when training Tesseract > (Especially the .tr files) are determining the font names. > > This is visible in the source code too, if you make a search for > "CurrentFont" in the whold source code, you'll see what i mean. > > Pierre. > > When I make tests on linux I experienced crash of tesseract... I tried to understood source code (+ to some work with debuger ;-) ) and I think there is a bug (or at least code did not handle possible inputs correctly). My experience (+ patch for my problems) can be found on http://www.sk-spell.sk.cx/tesseract-ocr-en-language-training-300...
Zdenko
smime.p7s
Description: S/MIME Cryptographic Signature

