Have you looked through the archives to check for the people working on Farsi? They would have a good idea how to solve this problem.
"Arsalan Ghasrsaz" <[email protected]> https://github.com/reza1615/PersianOcr --Sven On Sat, Jan 19, 2013 at 7:31 AM, gold snake <[email protected]> wrote: > I'm training failure, final result looks like very bad. maybe because i > don't know how handle the same character in different position. > you looking like that: م , ئما , تىم , مور > actually i'm writing like that: م , ئما , تىم , مور > can you see one character like O, it's a same character, but when it > position change, it style change. > i don't know what can i do. i think why the result so terrible, may be > because this . computer get 1 character for training, but there is have 4 > different style........... > > in any body tell me what i need to do training language something like > this.... > > 在 2013年1月15日星期二UTC+8下午9时16分04秒,gold snake写道: > >> My language some special, just like arab font, but bitween arab font have >> some different, actually only different on shape of the font. and It's >> writing right to left too. >> I'm using standard tutorial : https://code.google.com/p/** >> tesseract-ocr/wiki/**TrainingTesseract3<https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3> >> >> but when i'm finish and test, it can't be accurately identify. >> my step is : >> >> tesseract as.kadas.exp0.tif as.kadas.exp0 batch.nochop makebox >> >> tesseract as.kadas.exp0.tif as.kadas.exp0 nobatch box.train >> >> unicharset_extractor as.kadas.exp0.box >> >> shapeclustering -F font_properties -U unicharset as.kadas.exp0.tr >> >> mftraining -F font_properties -U unicharset -O as.unicharset >> as.kadas.exp0.tr >> >> cntraining as.kadas.exp0.tr >> >> I haven't words dict. so ... i'm not use some step. >> rename some file , add as. prefix >> >> combine_tessdata as. >> >> there is no any error until i'm combne, so i'm sure it's not have any >> problem. >> and when i'm test picture ,content is 13. the result is : ئئ >> when i'm test any words, the result just ئ >> >> >> >> and i'm find the D:\Little\Tesseract-OCR\**tessdata , and i'm found some >> file : >> >> ara.cube.bigrams >> ara.cube.fold >> ara.cube.lm >> ara.cube.nn >> ara.cube.params >> ara.cube.size >> ara.cube.word-freq >> ara.traineddata >> >> and i can't understand. why the arab trainddata not only >> have ara.traineddata? what is any other arab.* file ?? and if i'm trainning >> my lanugage it's necessary?? >> and how i cant find that file or create?? >> >> thanks very much... >> >> -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- ``All that is gold does not glitter, not all those who wander are lost; the old that is strong does not wither, deep roots are not reached by the frost. >From the ashes a fire shall be woken, a light from the shadows shall spring; renewed shall be blade that was broken, the crownless again shall be king.” -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

