Regarding cube:

   - there are no more public information about cube than that 92 hits at
   the forum I mentioned already (+ source code ;-))
   - there are no information how to create cube data files (ok some of
   them are text files...)


So you can:

   1. try to use/train tesseract without cube part (IMO you will need for
   it for cube, because it looks like some cube files are part of traineddata
   file[1]
   2. try to analyze cube data and share your finding - it
   can encourage more people to have a look on it :-)

[1]
http://tesseract-ocr.googlecode.com/svn/trunk/doc/combine_tessdata.1.html#_components

Zdenko


On Thu, Jan 17, 2013 at 5:33 PM, gold snake <[email protected]> wrote:

> the Arab and English font some think very different.
> English font if you input a+b , the result is :ab
>  but if you use Arab font input ئ+ا the result is ئا , if you not
> understand, you can copy ئا and add a space for middle, you can find if
> you input 2 different font , the result is a new font style.
>
> My language too, so, i just afraid the cube is the control for this. if
> cube is for this , it's terrible, because i don't know how create(i not
> mean you tell me how, i just need some example or document about this
> information.)
>
> and about the RTL , looks mean that is not any way for handle this , may
> be we only use programming handle this(when read finish, change display
> mode....something like that).
>
> thanks.
>
> 在 2013年1月17日星期四UTC+8下午10时36分44秒,sventech写道:
>>
>> OK, the fact that cube is something different than combining languages is
>> a major revelation to me. However, huangjingshe, I don't think you need the
>> cube feature for what you're doing. I believe the problem you're having is
>> something else. I would solve the other issues first and then maybe try the
>> cube feature if necessary.
>> --Sven
>>
>>
>> On Wed, Jan 16, 2013 at 10:07 PM, gold snake <[email protected]> wrote:
>>
>>> thanks again .but  i have same question. if use cube just for combine
>>> with other language when training. why when we read document can choice
>>> cube mode just like Sven said??
>>>
>>> it that you mean we can combine with other language  use -l [lang]because 
>>> it's have cube file. if there is no any cube file. we can't use
>>> -l [lang]??
>>>
>>> but i'm test, and everybody knows china language only have .traindata
>>> file, not have cube file .but i can use
>>> tesseract -l chi_sim [lang].[fontname].exp0.tif [lang].[fontname].exp0
>>> batch.nochop makeb
>>>
>>> so , it's maybe not about cube file. or i'm not using right.....
>>>
>>>
>>> 在 2013年1月17日星期四UTC+8上午3时34分25秒,**sventech写道:
>>>>
>>>> Cube means combining different languages. There is not much
>>>> documentation on it -- Google developed it internally. But I don't think
>>>> you need it. The list of files you sent is related to the cube feature, so
>>>> you don't need to create them. For right to left, search the archives for
>>>> "right to left" -- someone wrote a python script to convert, though he
>>>> didn't provide info about how to use it.
>>>>
>>>> utility to convert training files:
>>>> https://groups.google.com/**foru**m/?fromgroups=#!searchin/**tesse**
>>>> ract-ocr/rtl/tesseract-**ocr/**T035ZyQVlMU/tQVoGWdlBDMJ<https://groups.google.com/forum/?fromgroups=#!searchin/tesseract-ocr/rtl/tesseract-ocr/T035ZyQVlMU/tQVoGWdlBDMJ>
>>>>
>>>> basic trick for right to left output from Dmitri Silaev:
>>>> https://groups.google.com/**foru**m/?fromgroups=#!searchin/**tesse**
>>>> ract-ocr/right$20to$**20left$**20output/tesseract-ocr/**8r2qGvM**
>>>> zz9U/so1WuMTyaU8J<https://groups.google.com/forum/?fromgroups=#!searchin/tesseract-ocr/right$20to$20left$20output/tesseract-ocr/8r2qGvMzz9U/so1WuMTyaU8J>
>>>> --Sven
>>>>
>>>>
>>>> On Wed, Jan 16, 2013 at 10:57 AM, gold snake <[email protected]>wrote:
>>>>
>>>>> so you mean: cube exists just because for user combine it with other
>>>>> language, the mean i'm not be need(because my language is not arab).
>>>>> thanks.may be i'm English not good. i just cant understand what is "cube",
>>>>> what is for use , can't find Introduction.
>>>>>
>>>>> and that mean cube and my result is left to right(accurate results
>>>>> must is right to left ) not any relationship. then why when i'm use
>>>>> command:tesseract 14.jpg output -l [lang]. the result(output.txt)
>>>>> content is left to right??
>>>>>
>>>>> i'm very sorry if let masters take the beautiful time for these small
>>>>> problems. just some days ago i'm event don't know what is OCR
>>>>>  if i can find that some question answer....believe me i'm not gonna
>>>>> ask anybody , because it's true,
>>>>> i really understand every friend is very busy. so , i'm trying hard
>>>>> search some problem from now. sorry again....
>>>>>
>>>>> 在 2013年1月16日星期三UTC+8下午10时34分21秒,****sventech写道:
>>>>>>
>>>>>> The reason why Arabic has those files and your language does not is
>>>>>> that Arabic is set up to use the "cube" feature to combine it with other
>>>>>> languages, so you can do "-l ara+eng" and OCR a document with both Arabic
>>>>>> and English. That training is harder, and not necessary if you mainly 
>>>>>> want
>>>>>> to do monolingual documents.
>>>>>>
>>>>>> And what Zdenko is saying is that you are asking questions that don't
>>>>>> show that you're tried to solve the problem yourself. We're all
>>>>>> professional programmers and we want to help people but we don't have 
>>>>>> time
>>>>>> to teach elementary web searching or programming. You seem to be a smart
>>>>>> guy, but your questions appear to be lazy. You need to make an effort to
>>>>>> solve the problems and come to us for help, not ask us to solve them for
>>>>>> you.
>>>>>> --Sven
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 16, 2013 at 2:59 AM, gold snake <[email protected]>wrote:
>>>>>>
>>>>>>> I can't found any answer for my question in this link.
>>>>>>> can you just tolk to me? Is have necessary to bully a rookie?
>>>>>>> please...
>>>>>>>
>>>>>>> 在 2013年1月16日星期三UTC+8下午4时02分25秒,**z****denop写道:
>>>>>>>>
>>>>>>>> Really ;-)? I got 93 results. E.g.:
>>>>>>>>
>>>>>>>> https://groups.google.com/**foru******m/#!msg/tesseract-ocr/**
>>>>>>>> 0msQtTB_******XrI/D1noel9GpPgJ<https://groups.google.com/forum/#!msg/tesseract-ocr/0msQtTB_XrI/D1noel9GpPgJ>
>>>>>>>> https://groups.google.com/d/**to******pic/tesseract-ocr/tyV5_**
>>>>>>>> z65XMk/******discussion<https://groups.google.com/d/topic/tesseract-ocr/tyV5_z65XMk/discussion>
>>>>>>>> https://groups.google.com/d/**ms******g/tesseract-ocr/R7UCx0oV3PA/*
>>>>>>>> *GE******7KJ_76kS0J<https://groups.google.com/d/msg/tesseract-ocr/R7UCx0oV3PA/GE7KJ_76kS0J>
>>>>>>>>
>>>>>>>> Please honor time of people on this list...
>>>>>>>>
>>>>>>>> Zdenko
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jan 16, 2013 at 8:18 AM, gold snake <[email protected]>wrote:
>>>>>>>>
>>>>>>>>> I can't found anything. common....
>>>>>>>>>
>>>>>>>>> 在 2013年1月15日星期二UTC+8下午10时38分42秒,********zdenop写道:
>>>>>>>>>>
>>>>>>>>>>  search archive of tesseract forums for cube.
>>>>>>>>>>
>>>>>>>>>> Zdenko
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 15, 2013 at 2:16 PM, gold snake 
>>>>>>>>>> <[email protected]>wrote:
>>>>>>>>>>
>>>>>>>>>>>  My language some special, just like arab font, but bitween
>>>>>>>>>>> arab font have some different, actually only different on shape of 
>>>>>>>>>>> the
>>>>>>>>>>> font. and It's writing right to left too.
>>>>>>>>>>> I'm using standard tutorial : https://code.google.com/p/**te****
>>>>>>>>>>> ****sseract-ocr/wiki/**TrainingTesse********ract3<https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3>
>>>>>>>>>>>
>>>>>>>>>>> but when i'm finish and test, it can't be accurately identify.
>>>>>>>>>>> my step is :
>>>>>>>>>>>
>>>>>>>>>>> tesseract as.kadas.exp0.tif as.kadas.exp0 batch.nochop makebox
>>>>>>>>>>>
>>>>>>>>>>> tesseract as.kadas.exp0.tif as.kadas.exp0 nobatch box.train
>>>>>>>>>>>
>>>>>>>>>>> unicharset_extractor as.kadas.exp0.box
>>>>>>>>>>>
>>>>>>>>>>> shapeclustering -F font_properties -U unicharset
>>>>>>>>>>> as.kadas.exp0.tr
>>>>>>>>>>>
>>>>>>>>>>> mftraining -F font_properties -U unicharset -O as.unicharset
>>>>>>>>>>> as.kadas.exp0.tr
>>>>>>>>>>>
>>>>>>>>>>> cntraining as.kadas.exp0.tr
>>>>>>>>>>>
>>>>>>>>>>> I haven't words dict. so ... i'm not use some step.
>>>>>>>>>>> rename some file , add as. prefix
>>>>>>>>>>>
>>>>>>>>>>> combine_tessdata as.
>>>>>>>>>>>
>>>>>>>>>>> there is no any error until i'm combne, so i'm sure it's not
>>>>>>>>>>> have any problem.
>>>>>>>>>>> and when i'm test picture ,content is 13.  the result is : ئئ
>>>>>>>>>>> when i'm test any words, the result just ئ
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> and i'm find the D:\Little\Tesseract-OCR\**te********ssdata , and
>>>>>>>>>>> i'm found some file :
>>>>>>>>>>>
>>>>>>>>>>> ara.cube.bigrams
>>>>>>>>>>> ara.cube.fold
>>>>>>>>>>> ara.cube.lm
>>>>>>>>>>> ara.cube.nn
>>>>>>>>>>> ara.cube.params
>>>>>>>>>>> ara.cube.size
>>>>>>>>>>> ara.cube.word-freq
>>>>>>>>>>> ara.traineddata
>>>>>>>>>>>
>>>>>>>>>>> and i can't understand. why the arab trainddata not only
>>>>>>>>>>> have ara.traineddata? what is any other arab.* file ?? and if i'm 
>>>>>>>>>>> trainning
>>>>>>>>>>> my lanugage it's necessary??
>>>>>>>>>>> and how i cant find that file or create??
>>>>>>>>>>>
>>>>>>>>>>> thanks very much...
>>>>>>>>>>>
>>>>>>>>>>>  --
>>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>>> Google
>>>>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>>>>> To post to this group, send email to [email protected]
>>>>>>>>>>>
>>>>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>>>>> tesseract-oc...@**googlegroups.**c******om
>>>>>>>>>>>
>>>>>>>>>>> For more options, visit this group at
>>>>>>>>>>> http://groups.google.com/**group********/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  --
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>>> To post to this group, send email to [email protected]
>>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>>> tesseract-oc...@**googlegroups.**c****om
>>>>>>>>> For more options, visit this group at
>>>>>>>>> http://groups.google.com/**group******/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>>>>>>
>>>>>>>>
>>>>>>>>  --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To post to this group, send email to [email protected]
>>>>>>> To unsubscribe from this group, send email to
>>>>>>> tesseract-oc...@**googlegroups.**c**om
>>>>>>> For more options, visit this group at
>>>>>>> http://groups.google.com/**group****/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ``All that is gold does not glitter,
>>>>>>   not all those who wander are lost;
>>>>>> the old that is strong does not wither,
>>>>>>   deep roots are not reached by the frost.
>>>>>> From the ashes a fire shall be woken,
>>>>>>   a light from the shadows shall spring;
>>>>>> renewed shall be blade that was broken,
>>>>>>   the crownless again shall be king.”
>>>>>>
>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To post to this group, send email to [email protected]
>>>>> To unsubscribe from this group, send email to
>>>>> tesseract-oc...@**googlegroups.**com
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/**group**/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ``All that is gold does not glitter,
>>>>   not all those who wander are lost;
>>>> the old that is strong does not wither,
>>>>   deep roots are not reached by the frost.
>>>> From the ashes a fire shall be woken,
>>>>   a light from the shadows shall spring;
>>>> renewed shall be blade that was broken,
>>>>   the crownless again shall be king.”
>>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> tesseract-oc...@**googlegroups.com
>>> For more options, visit this group at
>>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>
>>
>>
>>
>> --
>> ``All that is gold does not glitter,
>>   not all those who wander are lost;
>> the old that is strong does not wither,
>>   deep roots are not reached by the frost.
>> From the ashes a fire shall be woken,
>>   a light from the shadows shall spring;
>> renewed shall be blade that was broken,
>>   the crownless again shall be king.”
>>
>  --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to