How training language like arab?

gold snake Tue, 15 Jan 2013 05:23:41 -0800

My language some special, just like arab font, but bitween arab font have 
some different, actually only different on shape of the font. and It's 
writing right to left too.
I'm using standard tutorial 
: https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3


but when i'm finish and test, it can't be accurately identify. 
my step is :

tesseract as.kadas.exp0.tif as.kadas.exp0 batch.nochop makebox

tesseract as.kadas.exp0.tif as.kadas.exp0 nobatch box.train

unicharset_extractor as.kadas.exp0.box

shapeclustering -F font_properties -U unicharset as.kadas.exp0.tr

mftraining -F font_properties -U unicharset -O as.unicharset 
as.kadas.exp0.tr

cntraining as.kadas.exp0.tr

I haven't words dict. so ... i'm not use some step.
rename some file , add as. prefix

combine_tessdata as.

there is no any error until i'm combne, so i'm sure it's not have any 
problem.
and when i'm test picture ,content is 13.  the result is : ئئ
when i'm test any words, the result just ئ



and i'm find the D:\Little\Tesseract-OCR\tessdata , and i'm found some file 
:

ara.cube.bigrams
ara.cube.fold
ara.cube.lm
ara.cube.nn
ara.cube.params
ara.cube.size
ara.cube.word-freq
ara.traineddata

and i can't understand. why the arab trainddata not only 
have ara.traineddata? what is any other arab.* file ?? and if i'm trainning 
my lanugage it's necessary??
and how i cant find that file or create??

thanks very much...

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

How training language like arab?

Reply via email to