Re: How training language like arab?

Sven Pedersen Tue, 22 Jan 2013 09:55:51 -0800

Have you looked through the archives to check for the people working on
Farsi? They would have a good idea how to solve this problem.


"Arsalan Ghasrsaz" <[email protected]>

https://github.com/reza1615/PersianOcr

--Sven


On Sat, Jan 19, 2013 at 7:31 AM, gold snake <[email protected]> wrote:

> I'm training failure, final result looks like very bad. maybe because i
> don't know how handle the same character in different position.
> you looking like that: م , ئما , تىم  , مور
> actually i'm writing like that: م , ئما , تىم  , مور
> can you see one character like O, it's a same character, but when it
> position change, it style change.
> i don't know what can i do. i think why the result so terrible, may be
> because this . computer get 1 character for training, but there is have 4
> different style...........
>
>  in any body tell me what i need to do training language something like
> this....
>
> 在 2013年1月15日星期二UTC+8下午9时16分04秒，gold snake写道：
>
>> My language some special, just like arab font, but bitween arab font have
>> some different, actually only different on shape of the font. and It's
>> writing right to left too.
>> I'm using standard tutorial : https://code.google.com/p/**
>> tesseract-ocr/wiki/**TrainingTesseract3<https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3>
>>
>> but when i'm finish and test, it can't be accurately identify.
>> my step is :
>>
>> tesseract as.kadas.exp0.tif as.kadas.exp0 batch.nochop makebox
>>
>> tesseract as.kadas.exp0.tif as.kadas.exp0 nobatch box.train
>>
>> unicharset_extractor as.kadas.exp0.box
>>
>> shapeclustering -F font_properties -U unicharset as.kadas.exp0.tr
>>
>> mftraining -F font_properties -U unicharset -O as.unicharset
>> as.kadas.exp0.tr
>>
>> cntraining as.kadas.exp0.tr
>>
>> I haven't words dict. so ... i'm not use some step.
>> rename some file , add as. prefix
>>
>> combine_tessdata as.
>>
>> there is no any error until i'm combne, so i'm sure it's not have any
>> problem.
>> and when i'm test picture ,content is 13.  the result is : ئئ
>> when i'm test any words, the result just ئ
>>
>>
>>
>> and i'm find the D:\Little\Tesseract-OCR\**tessdata , and i'm found some
>> file :
>>
>> ara.cube.bigrams
>> ara.cube.fold
>> ara.cube.lm
>> ara.cube.nn
>> ara.cube.params
>> ara.cube.size
>> ara.cube.word-freq
>> ara.traineddata
>>
>> and i can't understand. why the arab trainddata not only
>> have ara.traineddata? what is any other arab.* file ?? and if i'm trainning
>> my lanugage it's necessary??
>> and how i cant find that file or create??
>>
>> thanks very much...
>>
>>  --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>



-- 
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
>From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: How training language like arab?

Reply via email to