On Wed, Jan 16, 2013 at 3:34 PM, Sven Pedersen <[email protected]>wrote:

> The reason why Arabic has those files and your language does not is that
> Arabic is set up to use the "cube" feature to combine it with other
> languages, so you can do "-l ara+eng" and OCR a document with both Arabic
> and English.
>

Are you sure? What it the reason for OEM_CUBE_ONLY mode[1]?

BTW: I changed OEM_DEFAULT[2] to OEM_TESSERACT_ONLY and simultaneous
multi-language capability works for me...

[1]
https://code.google.com/p/tesseract-ocr/source/browse/trunk/ccstruct/publictypes.h?r=820#244
[2]
https://code.google.com/p/tesseract-ocr/source/browse/trunk/api/tesseractmain.cpp?r=814#138

 That training is harder, and not necessary if you mainly want to do
> monolingual documents.
>
>

> And what Zdenko is saying is that you are asking questions that don't show
> that you're tried to solve the problem yourself. We're all professional
> programmers and we want to help people but we don't have time to teach
> elementary web searching or programming. You seem to be a smart guy, but
> your questions appear to be lazy. You need to make an effort to solve the
> problems and come to us for help, not ask us to solve them for you.
> --Sven
>
>
> On Wed, Jan 16, 2013 at 2:59 AM, gold snake <[email protected]> wrote:
>
>> I can't found any answer for my question in this link.
>> can you just tolk to me? Is have necessary to bully a rookie?
>> please...
>>
>> 在 2013年1月16日星期三UTC+8下午4时02分25秒,zdenop写道:
>>>
>>> Really ;-)? I got 93 results. E.g.:
>>>
>>> https://groups.google.com/**forum/#!msg/tesseract-ocr/**
>>> 0msQtTB_XrI/D1noel9GpPgJ<https://groups.google.com/forum/#!msg/tesseract-ocr/0msQtTB_XrI/D1noel9GpPgJ>
>>> https://groups.google.com/d/**topic/tesseract-ocr/tyV5_**
>>> z65XMk/discussion<https://groups.google.com/d/topic/tesseract-ocr/tyV5_z65XMk/discussion>
>>> https://groups.google.com/d/**msg/tesseract-ocr/R7UCx0oV3PA/**
>>> GE7KJ_76kS0J<https://groups.google.com/d/msg/tesseract-ocr/R7UCx0oV3PA/GE7KJ_76kS0J>
>>>
>>> Please honor time of people on this list...
>>>
>>> Zdenko
>>>
>>>
>>> On Wed, Jan 16, 2013 at 8:18 AM, gold snake <[email protected]> wrote:
>>>
>>>> I can't found anything. common....
>>>>
>>>> 在 2013年1月15日星期二UTC+8下午10时38分42秒,**zdenop写道:
>>>>>
>>>>> search archive of tesseract forums for cube.
>>>>>
>>>>> Zdenko
>>>>>
>>>>>
>>>>> On Tue, Jan 15, 2013 at 2:16 PM, gold snake <[email protected]>wrote:
>>>>>
>>>>>>  My language some special, just like arab font, but bitween arab
>>>>>> font have some different, actually only different on shape of the font. 
>>>>>> and
>>>>>> It's writing right to left too.
>>>>>> I'm using standard tutorial : https://code.google.com/p/**te**
>>>>>> sseract-ocr/wiki/**TrainingTesse**ract3<https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3>
>>>>>>
>>>>>> but when i'm finish and test, it can't be accurately identify.
>>>>>> my step is :
>>>>>>
>>>>>> tesseract as.kadas.exp0.tif as.kadas.exp0 batch.nochop makebox
>>>>>>
>>>>>> tesseract as.kadas.exp0.tif as.kadas.exp0 nobatch box.train
>>>>>>
>>>>>> unicharset_extractor as.kadas.exp0.box
>>>>>>
>>>>>> shapeclustering -F font_properties -U unicharset as.kadas.exp0.tr
>>>>>>
>>>>>> mftraining -F font_properties -U unicharset -O as.unicharset
>>>>>> as.kadas.exp0.tr
>>>>>>
>>>>>> cntraining as.kadas.exp0.tr
>>>>>>
>>>>>> I haven't words dict. so ... i'm not use some step.
>>>>>> rename some file , add as. prefix
>>>>>>
>>>>>> combine_tessdata as.
>>>>>>
>>>>>> there is no any error until i'm combne, so i'm sure it's not have any
>>>>>> problem.
>>>>>> and when i'm test picture ,content is 13.  the result is : ئئ
>>>>>> when i'm test any words, the result just ئ
>>>>>>
>>>>>>
>>>>>>
>>>>>> and i'm find the D:\Little\Tesseract-OCR\**te**ssdata , and i'm
>>>>>> found some file :
>>>>>>
>>>>>> ara.cube.bigrams
>>>>>> ara.cube.fold
>>>>>> ara.cube.lm
>>>>>> ara.cube.nn
>>>>>> ara.cube.params
>>>>>> ara.cube.size
>>>>>> ara.cube.word-freq
>>>>>> ara.traineddata
>>>>>>
>>>>>> and i can't understand. why the arab trainddata not only
>>>>>> have ara.traineddata? what is any other arab.* file ?? and if i'm 
>>>>>> trainning
>>>>>> my lanugage it's necessary??
>>>>>> and how i cant find that file or create??
>>>>>>
>>>>>> thanks very much...
>>>>>>
>>>>>>  --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To post to this group, send email to [email protected]
>>>>>>
>>>>>> To unsubscribe from this group, send email to
>>>>>> tesseract-oc...@**googlegroups.**com
>>>>>>
>>>>>> For more options, visit this group at
>>>>>> http://groups.google.com/**group**/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>>>
>>>>>
>>>>>  --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To post to this group, send email to [email protected]
>>>> To unsubscribe from this group, send email to
>>>> tesseract-oc...@**googlegroups.com
>>>> For more options, visit this group at
>>>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>
>>>
>>>  --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>
>
>
> --
> ``All that is gold does not glitter,
>   not all those who wander are lost;
> the old that is strong does not wither,
>   deep roots are not reached by the frost.
> From the ashes a fire shall be woken,
>   a light from the shadows shall spring;
> renewed shall be blade that was broken,
>   the crownless again shall be king.”
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to