Cube means combining different languages. There is not much documentation
on it -- Google developed it internally. But I don't think you need it. The
list of files you sent is related to the cube feature, so you don't need to
create them. For right to left, search the archives for "right to left" --
someone wrote a python script to convert, though he didn't provide info
about how to use it.

utility to convert training files:
https://groups.google.com/forum/?fromgroups=#!searchin/tesseract-ocr/rtl/tesseract-ocr/T035ZyQVlMU/tQVoGWdlBDMJ

basic trick for right to left output from Dmitri Silaev:
https://groups.google.com/forum/?fromgroups=#!searchin/tesseract-ocr/right$20to$20left$20output/tesseract-ocr/8r2qGvMzz9U/so1WuMTyaU8J
--Sven


On Wed, Jan 16, 2013 at 10:57 AM, gold snake <[email protected]> wrote:

> so you mean: cube exists just because for user combine it with other
> language, the mean i'm not be need(because my language is not arab).
> thanks.may be i'm English not good. i just cant understand what is "cube",
> what is for use , can't find Introduction.
>
> and that mean cube and my result is left to right(accurate results must is
> right to left ) not any relationship. then why when i'm use command:tesseract
> 14.jpg output -l [lang]. the result(output.txt) content is left to right??
>
> i'm very sorry if let masters take the beautiful time for these small
> problems. just some days ago i'm event don't know what is OCR
>  if i can find that some question answer....believe me i'm not gonna ask
> anybody , because it's true,
> i really understand every friend is very busy. so , i'm trying hard search
> some problem from now. sorry again....
>
> 在 2013年1月16日星期三UTC+8下午10时34分21秒,sventech写道:
>>
>> The reason why Arabic has those files and your language does not is that
>> Arabic is set up to use the "cube" feature to combine it with other
>> languages, so you can do "-l ara+eng" and OCR a document with both Arabic
>> and English. That training is harder, and not necessary if you mainly want
>> to do monolingual documents.
>>
>> And what Zdenko is saying is that you are asking questions that don't
>> show that you're tried to solve the problem yourself. We're all
>> professional programmers and we want to help people but we don't have time
>> to teach elementary web searching or programming. You seem to be a smart
>> guy, but your questions appear to be lazy. You need to make an effort to
>> solve the problems and come to us for help, not ask us to solve them for
>> you.
>> --Sven
>>
>>
>> On Wed, Jan 16, 2013 at 2:59 AM, gold snake <[email protected]> wrote:
>>
>>> I can't found any answer for my question in this link.
>>> can you just tolk to me? Is have necessary to bully a rookie?
>>> please...
>>>
>>> 在 2013年1月16日星期三UTC+8下午4时02分25秒,**zdenop写道:
>>>>
>>>> Really ;-)? I got 93 results. E.g.:
>>>>
>>>> https://groups.google.com/**foru**m/#!msg/tesseract-ocr/**0msQtTB_**
>>>> XrI/D1noel9GpPgJ<https://groups.google.com/forum/#!msg/tesseract-ocr/0msQtTB_XrI/D1noel9GpPgJ>
>>>> https://groups.google.com/d/**to**pic/tesseract-ocr/tyV5_**z65XMk/**
>>>> discussion<https://groups.google.com/d/topic/tesseract-ocr/tyV5_z65XMk/discussion>
>>>> https://groups.google.com/d/**ms**g/tesseract-ocr/R7UCx0oV3PA/**GE**
>>>> 7KJ_76kS0J<https://groups.google.com/d/msg/tesseract-ocr/R7UCx0oV3PA/GE7KJ_76kS0J>
>>>>
>>>> Please honor time of people on this list...
>>>>
>>>> Zdenko
>>>>
>>>>
>>>> On Wed, Jan 16, 2013 at 8:18 AM, gold snake <[email protected]> wrote:
>>>>
>>>>> I can't found anything. common....
>>>>>
>>>>> 在 2013年1月15日星期二UTC+8下午10时38分42秒,****zdenop写道:
>>>>>>
>>>>>> search archive of tesseract forums for cube.
>>>>>>
>>>>>> Zdenko
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 15, 2013 at 2:16 PM, gold snake <[email protected]>wrote:
>>>>>>
>>>>>>>  My language some special, just like arab font, but bitween arab
>>>>>>> font have some different, actually only different on shape of the font. 
>>>>>>> and
>>>>>>> It's writing right to left too.
>>>>>>> I'm using standard tutorial : https://code.google.com/p/**te****
>>>>>>> sseract-ocr/wiki/**TrainingTesse****ract3<https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3>
>>>>>>>
>>>>>>> but when i'm finish and test, it can't be accurately identify.
>>>>>>> my step is :
>>>>>>>
>>>>>>> tesseract as.kadas.exp0.tif as.kadas.exp0 batch.nochop makebox
>>>>>>>
>>>>>>> tesseract as.kadas.exp0.tif as.kadas.exp0 nobatch box.train
>>>>>>>
>>>>>>> unicharset_extractor as.kadas.exp0.box
>>>>>>>
>>>>>>> shapeclustering -F font_properties -U unicharset as.kadas.exp0.tr
>>>>>>>
>>>>>>> mftraining -F font_properties -U unicharset -O as.unicharset
>>>>>>> as.kadas.exp0.tr
>>>>>>>
>>>>>>> cntraining as.kadas.exp0.tr
>>>>>>>
>>>>>>> I haven't words dict. so ... i'm not use some step.
>>>>>>> rename some file , add as. prefix
>>>>>>>
>>>>>>> combine_tessdata as.
>>>>>>>
>>>>>>> there is no any error until i'm combne, so i'm sure it's not have
>>>>>>> any problem.
>>>>>>> and when i'm test picture ,content is 13.  the result is : ئئ
>>>>>>> when i'm test any words, the result just ئ
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> and i'm find the D:\Little\Tesseract-OCR\**te****ssdata , and i'm
>>>>>>> found some file :
>>>>>>>
>>>>>>> ara.cube.bigrams
>>>>>>> ara.cube.fold
>>>>>>> ara.cube.lm
>>>>>>> ara.cube.nn
>>>>>>> ara.cube.params
>>>>>>> ara.cube.size
>>>>>>> ara.cube.word-freq
>>>>>>> ara.traineddata
>>>>>>>
>>>>>>> and i can't understand. why the arab trainddata not only
>>>>>>> have ara.traineddata? what is any other arab.* file ?? and if i'm 
>>>>>>> trainning
>>>>>>> my lanugage it's necessary??
>>>>>>> and how i cant find that file or create??
>>>>>>>
>>>>>>> thanks very much...
>>>>>>>
>>>>>>>  --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To post to this group, send email to [email protected]
>>>>>>>
>>>>>>> To unsubscribe from this group, send email to
>>>>>>> tesseract-oc...@**googlegroups.**c**om
>>>>>>>
>>>>>>> For more options, visit this group at
>>>>>>> http://groups.google.com/**group****/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>>>>
>>>>>>
>>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To post to this group, send email to [email protected]
>>>>> To unsubscribe from this group, send email to
>>>>> tesseract-oc...@**googlegroups.**com
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/**group**/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>>
>>>>
>>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> tesseract-oc...@**googlegroups.com
>>> For more options, visit this group at
>>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>
>>
>>
>>
>> --
>> ``All that is gold does not glitter,
>>   not all those who wander are lost;
>> the old that is strong does not wither,
>>   deep roots are not reached by the frost.
>> From the ashes a fire shall be woken,
>>   a light from the shadows shall spring;
>> renewed shall be blade that was broken,
>>   the crownless again shall be king.”
>>
>  --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>



-- 
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
>From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to