Hi for you quick answer Zdenko. As you pointed out, I'm already using tif / box pair from spanish language to train my catalan .traineddata language. (As spanish characters suits catalan characters too).
But doing just this (with no words in dictionary files) the dictionary is not quite good. I think the difference is from the words used in those dictionaries. So I'm asking for that utf8 files... Don't know if you (or a developer) can provide them. Thanks. Ramon. On 28 Abr, 15:55, zdenko podobny <[email protected]> wrote: > Hello Ramon, > > for extending existing language you need "Tif/Box pairs" > seehttp://code.google.com/p/tesseract-ocr/wiki/FAQand there "How do I add just > one character or one font to my favourite language, without having to > retrain from scratch?" > > Unfortunately tif/box pairs are provided only for eng, deu, fra, ita, nld > and spa languages... So you can wait that somebody will someday release > tif/box pairs for your language or you will start training from scratch. I > choose second option and this is reason why I started with testing of > training process for tesseract 3.00. > > BR, > > Zdenko > > > > > > On Mon, Apr 26, 2010 at 11:06 AM, Ramon <[email protected]> wrote: > > Hi, > > After some tests I realized the best for me is to put effort to extend > > the Catalan Diccionari which is in svn repository (v3). > > It will be so useful if you can do one of these: > > > -> deliver the different files combined to create the cat.traineddata > > unified file. (the utf8 files used to generate the dawg would be also > > amazing!). > > -> show how to extract these files from the cat.traineddata and how to > > dawg2utf8 (if it is possible). > > > THANKS! > > > -- > > You received this message because you are subscribed to the Google Groups > > "tesseract-ocr" group. > > To post to this group, send email to [email protected]. > > To unsubscribe from this group, send email to > > [email protected]<tesseract-ocr%2bunsubscr...@goog > > legroups.com> > > . > > For more options, visit this group at > >http://groups.google.com/group/tesseract-ocr?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group > athttp://groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

