the Arab and English font some think very different. 
English font if you input a+b , the result is :ab
 but if you use Arab font input ئ+ا the result is ئا , if you not 
understand, you can copy ئا and add a space for middle, you can find if you 
input 2 different font , the result is a new font style.

My language too, so, i just afraid the cube is the control for this. if 
cube is for this , it's terrible, because i don't know how create(i not 
mean you tell me how, i just need some example or document about this 
information.)

and about the RTL , looks mean that is not any way for handle this , may be 
we only use programming handle this(when read finish, change display 
mode....something like that).

thanks.

在 2013年1月17日星期四UTC+8下午10时36分44秒,sventech写道:
>
> OK, the fact that cube is something different than combining languages is 
> a major revelation to me. However, huangjingshe, I don't think you need the 
> cube feature for what you're doing. I believe the problem you're having is 
> something else. I would solve the other issues first and then maybe try the 
> cube feature if necessary.
> --Sven
>
>
> On Wed, Jan 16, 2013 at 10:07 PM, gold snake <[email protected]<javascript:>
> > wrote:
>
>> thanks again .but  i have same question. if use cube just for combine 
>> with other language when training. why when we read document can choice 
>> cube mode just like Sven said??
>>
>> it that you mean we can combine with other language  use -l [lang]because 
>> it's have cube file. if there is no any cube file. we can't use 
>> -l [lang]??
>>
>> but i'm test, and everybody knows china language only have .traindata 
>> file, not have cube file .but i can use 
>> tesseract -l chi_sim [lang].[fontname].exp0.tif [lang].[fontname].exp0 
>> batch.nochop makeb
>>
>> so , it's maybe not about cube file. or i'm not using right.....
>>
>>
>> 在 2013年1月17日星期四UTC+8上午3时34分25秒,sventech写道:
>>>
>>> Cube means combining different languages. There is not much 
>>> documentation on it -- Google developed it internally. But I don't think 
>>> you need it. The list of files you sent is related to the cube feature, so 
>>> you don't need to create them. For right to left, search the archives for 
>>> "right to left" -- someone wrote a python script to convert, though he 
>>> didn't provide info about how to use it.
>>>
>>> utility to convert training files:
>>> https://groups.google.com/**forum/?fromgroups=#!searchin/**
>>> tesseract-ocr/rtl/tesseract-**ocr/T035ZyQVlMU/tQVoGWdlBDMJ<https://groups.google.com/forum/?fromgroups=#!searchin/tesseract-ocr/rtl/tesseract-ocr/T035ZyQVlMU/tQVoGWdlBDMJ>
>>>
>>> basic trick for right to left output from Dmitri Silaev:
>>> https://groups.google.com/**forum/?fromgroups=#!searchin/**
>>> tesseract-ocr/right$20to$**20left$20output/tesseract-ocr/**
>>> 8r2qGvMzz9U/so1WuMTyaU8J<https://groups.google.com/forum/?fromgroups=#!searchin/tesseract-ocr/right$20to$20left$20output/tesseract-ocr/8r2qGvMzz9U/so1WuMTyaU8J>
>>> --Sven
>>>
>>>
>>> On Wed, Jan 16, 2013 at 10:57 AM, gold snake <[email protected]> wrote:
>>>
>>>> so you mean: cube exists just because for user combine it with other 
>>>> language, the mean i'm not be need(because my language is not arab). 
>>>> thanks.may be i'm English not good. i just cant understand what is "cube", 
>>>> what is for use , can't find Introduction.
>>>>
>>>> and that mean cube and my result is left to right(accurate results must 
>>>> is right to left ) not any relationship. then why when i'm use 
>>>> command:tesseract 
>>>> 14.jpg output -l [lang]. the result(output.txt) content is left to 
>>>> right??
>>>>
>>>> i'm very sorry if let masters take the beautiful time for these small 
>>>> problems. just some days ago i'm event don't know what is OCR
>>>>  if i can find that some question answer....believe me i'm not gonna 
>>>> ask anybody , because it's true, 
>>>> i really understand every friend is very busy. so , i'm trying hard 
>>>> search some problem from now. sorry again....
>>>>
>>>> 在 2013年1月16日星期三UTC+8下午10时34分21秒,**sventech写道:
>>>>>
>>>>> The reason why Arabic has those files and your language does not is 
>>>>> that Arabic is set up to use the "cube" feature to combine it with other 
>>>>> languages, so you can do "-l ara+eng" and OCR a document with both Arabic 
>>>>> and English. That training is harder, and not necessary if you mainly 
>>>>> want 
>>>>> to do monolingual documents.
>>>>>
>>>>> And what Zdenko is saying is that you are asking questions that don't 
>>>>> show that you're tried to solve the problem yourself. We're all 
>>>>> professional programmers and we want to help people but we don't have 
>>>>> time 
>>>>> to teach elementary web searching or programming. You seem to be a smart 
>>>>> guy, but your questions appear to be lazy. You need to make an effort to 
>>>>> solve the problems and come to us for help, not ask us to solve them for 
>>>>> you.
>>>>> --Sven
>>>>>
>>>>>
>>>>> On Wed, Jan 16, 2013 at 2:59 AM, gold snake <[email protected]>wrote:
>>>>>
>>>>>> I can't found any answer for my question in this link.
>>>>>> can you just tolk to me? Is have necessary to bully a rookie?
>>>>>> please...
>>>>>>
>>>>>> 在 2013年1月16日星期三UTC+8下午4时02分25秒,**z**denop写道:
>>>>>>>
>>>>>>> Really ;-)? I got 93 results. E.g.:
>>>>>>>
>>>>>>> https://groups.google.com/**foru****m/#!msg/tesseract-ocr/**0msQtTB_
>>>>>>> ****XrI/D1noel9GpPgJ<https://groups.google.com/forum/#!msg/tesseract-ocr/0msQtTB_XrI/D1noel9GpPgJ>
>>>>>>> https://groups.google.com/d/**to****pic/tesseract-ocr/tyV5_**z65XMk/
>>>>>>> ****discussion<https://groups.google.com/d/topic/tesseract-ocr/tyV5_z65XMk/discussion>
>>>>>>> https://groups.google.com/d/**ms****g/tesseract-ocr/R7UCx0oV3PA/**GE
>>>>>>> ****7KJ_76kS0J<https://groups.google.com/d/msg/tesseract-ocr/R7UCx0oV3PA/GE7KJ_76kS0J>
>>>>>>>
>>>>>>> Please honor time of people on this list...
>>>>>>>
>>>>>>> Zdenko
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 16, 2013 at 8:18 AM, gold snake <[email protected]>wrote:
>>>>>>>
>>>>>>>> I can't found anything. common....
>>>>>>>>
>>>>>>>> 在 2013年1月15日星期二UTC+8下午10时38分42秒,******zdenop写道:
>>>>>>>>>
>>>>>>>>> search archive of tesseract forums for cube.
>>>>>>>>>
>>>>>>>>> Zdenko
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jan 15, 2013 at 2:16 PM, gold snake <[email protected]>wrote:
>>>>>>>>>
>>>>>>>>>>  My language some special, just like arab font, but bitween arab 
>>>>>>>>>> font have some different, actually only different on shape of the 
>>>>>>>>>> font. and 
>>>>>>>>>> It's writing right to left too.
>>>>>>>>>> I'm using standard tutorial : https://code.google.com/p/**te*****
>>>>>>>>>> *sseract-ocr/wiki/**TrainingTesse******ract3<https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3>
>>>>>>>>>>
>>>>>>>>>> but when i'm finish and test, it can't be accurately identify. 
>>>>>>>>>> my step is :
>>>>>>>>>>
>>>>>>>>>> tesseract as.kadas.exp0.tif as.kadas.exp0 batch.nochop makebox
>>>>>>>>>>
>>>>>>>>>> tesseract as.kadas.exp0.tif as.kadas.exp0 nobatch box.train
>>>>>>>>>>
>>>>>>>>>> unicharset_extractor as.kadas.exp0.box
>>>>>>>>>>
>>>>>>>>>> shapeclustering -F font_properties -U unicharset as.kadas.exp0.tr
>>>>>>>>>>
>>>>>>>>>> mftraining -F font_properties -U unicharset -O as.unicharset 
>>>>>>>>>> as.kadas.exp0.tr
>>>>>>>>>>
>>>>>>>>>> cntraining as.kadas.exp0.tr
>>>>>>>>>>
>>>>>>>>>> I haven't words dict. so ... i'm not use some step.
>>>>>>>>>> rename some file , add as. prefix
>>>>>>>>>>
>>>>>>>>>> combine_tessdata as.
>>>>>>>>>>
>>>>>>>>>> there is no any error until i'm combne, so i'm sure it's not have 
>>>>>>>>>> any problem.
>>>>>>>>>> and when i'm test picture ,content is 13.  the result is : ئئ
>>>>>>>>>> when i'm test any words, the result just ئ
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> and i'm find the D:\Little\Tesseract-OCR\**te******ssdata , and 
>>>>>>>>>> i'm found some file :
>>>>>>>>>>
>>>>>>>>>> ara.cube.bigrams
>>>>>>>>>> ara.cube.fold
>>>>>>>>>> ara.cube.lm
>>>>>>>>>> ara.cube.nn
>>>>>>>>>> ara.cube.params
>>>>>>>>>> ara.cube.size
>>>>>>>>>> ara.cube.word-freq
>>>>>>>>>> ara.traineddata
>>>>>>>>>>
>>>>>>>>>> and i can't understand. why the arab trainddata not only 
>>>>>>>>>> have ara.traineddata? what is any other arab.* file ?? and if i'm 
>>>>>>>>>> trainning 
>>>>>>>>>> my lanugage it's necessary??
>>>>>>>>>> and how i cant find that file or create??
>>>>>>>>>>
>>>>>>>>>> thanks very much...
>>>>>>>>>>
>>>>>>>>>>  -- 
>>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>>>> To post to this group, send email to [email protected]
>>>>>>>>>>
>>>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>>>> tesseract-oc...@**googlegroups.**c****om
>>>>>>>>>>
>>>>>>>>>> For more options, visit this group at
>>>>>>>>>> http://groups.google.com/**group******/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  -- 
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>> To post to this group, send email to [email protected]
>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>> tesseract-oc...@**googlegroups.**c**om
>>>>>>>> For more options, visit this group at
>>>>>>>> http://groups.google.com/**group****/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>>>>>
>>>>>>>
>>>>>>>  -- 
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To post to this group, send email to [email protected]
>>>>>> To unsubscribe from this group, send email to
>>>>>> tesseract-oc...@**googlegroups.**com
>>>>>> For more options, visit this group at
>>>>>> http://groups.google.com/**group**/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> ``All that is gold does not glitter,
>>>>>   not all those who wander are lost;
>>>>> the old that is strong does not wither,
>>>>>   deep roots are not reached by the frost.
>>>>> From the ashes a fire shall be woken,
>>>>>   a light from the shadows shall spring;
>>>>> renewed shall be blade that was broken,
>>>>>   the crownless again shall be king.” 
>>>>>
>>>>  -- 
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To post to this group, send email to [email protected]
>>>> To unsubscribe from this group, send email to
>>>> tesseract-oc...@**googlegroups.com
>>>> For more options, visit this group at
>>>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>
>>>
>>>
>>>
>>> -- 
>>> ``All that is gold does not glitter,
>>>   not all those who wander are lost;
>>> the old that is strong does not wither,
>>>   deep roots are not reached by the frost.
>>> From the ashes a fire shall be woken,
>>>   a light from the shadows shall spring;
>>> renewed shall be blade that was broken,
>>>   the crownless again shall be king.” 
>>>
>>  -- 
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]<javascript:>
>> To unsubscribe from this group, send email to
>> [email protected] <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>
>
>
> -- 
> ``All that is gold does not glitter,
>   not all those who wander are lost;
> the old that is strong does not wither,
>   deep roots are not reached by the frost.
> From the ashes a fire shall be woken,
>   a light from the shadows shall spring;
> renewed shall be blade that was broken,
>   the crownless again shall be king.” 
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to