OK, the fact that cube is something different than combining languages is a
major revelation to me. However, huangjingshe, I don't think you need the
cube feature for what you're doing. I believe the problem you're having is
something else. I would solve the other issues first and then maybe try the
cube feature if necessary.
--Sven


On Wed, Jan 16, 2013 at 10:07 PM, gold snake <[email protected]> wrote:

> thanks again .but  i have same question. if use cube just for combine with
> other language when training. why when we read document can choice cube
> mode just like Sven said??
>
> it that you mean we can combine with other language  use -l [lang]because 
> it's have cube file. if there is no any cube file. we can't use
> -l [lang]??
>
> but i'm test, and everybody knows china language only have .traindata
> file, not have cube file .but i can use
> tesseract -l chi_sim [lang].[fontname].exp0.tif [lang].[fontname].exp0
> batch.nochop makeb
>
> so , it's maybe not about cube file. or i'm not using right.....
>
>
> 在 2013年1月17日星期四UTC+8上午3时34分25秒,sventech写道:
>>
>> Cube means combining different languages. There is not much documentation
>> on it -- Google developed it internally. But I don't think you need it. The
>> list of files you sent is related to the cube feature, so you don't need to
>> create them. For right to left, search the archives for "right to left" --
>> someone wrote a python script to convert, though he didn't provide info
>> about how to use it.
>>
>> utility to convert training files:
>> https://groups.google.com/**forum/?fromgroups=#!searchin/**
>> tesseract-ocr/rtl/tesseract-**ocr/T035ZyQVlMU/tQVoGWdlBDMJ<https://groups.google.com/forum/?fromgroups=#!searchin/tesseract-ocr/rtl/tesseract-ocr/T035ZyQVlMU/tQVoGWdlBDMJ>
>>
>> basic trick for right to left output from Dmitri Silaev:
>> https://groups.google.com/**forum/?fromgroups=#!searchin/**
>> tesseract-ocr/right$20to$**20left$20output/tesseract-ocr/**
>> 8r2qGvMzz9U/so1WuMTyaU8J<https://groups.google.com/forum/?fromgroups=#!searchin/tesseract-ocr/right$20to$20left$20output/tesseract-ocr/8r2qGvMzz9U/so1WuMTyaU8J>
>> --Sven
>>
>>
>> On Wed, Jan 16, 2013 at 10:57 AM, gold snake <[email protected]> wrote:
>>
>>> so you mean: cube exists just because for user combine it with other
>>> language, the mean i'm not be need(because my language is not arab).
>>> thanks.may be i'm English not good. i just cant understand what is "cube",
>>> what is for use , can't find Introduction.
>>>
>>> and that mean cube and my result is left to right(accurate results must
>>> is right to left ) not any relationship. then why when i'm use 
>>> command:tesseract
>>> 14.jpg output -l [lang]. the result(output.txt) content is left to
>>> right??
>>>
>>> i'm very sorry if let masters take the beautiful time for these small
>>> problems. just some days ago i'm event don't know what is OCR
>>>  if i can find that some question answer....believe me i'm not gonna
>>> ask anybody , because it's true,
>>> i really understand every friend is very busy. so , i'm trying hard
>>> search some problem from now. sorry again....
>>>
>>> 在 2013年1月16日星期三UTC+8下午10时34分21秒,**sventech写道:
>>>>
>>>> The reason why Arabic has those files and your language does not is
>>>> that Arabic is set up to use the "cube" feature to combine it with other
>>>> languages, so you can do "-l ara+eng" and OCR a document with both Arabic
>>>> and English. That training is harder, and not necessary if you mainly want
>>>> to do monolingual documents.
>>>>
>>>> And what Zdenko is saying is that you are asking questions that don't
>>>> show that you're tried to solve the problem yourself. We're all
>>>> professional programmers and we want to help people but we don't have time
>>>> to teach elementary web searching or programming. You seem to be a smart
>>>> guy, but your questions appear to be lazy. You need to make an effort to
>>>> solve the problems and come to us for help, not ask us to solve them for
>>>> you.
>>>> --Sven
>>>>
>>>>
>>>> On Wed, Jan 16, 2013 at 2:59 AM, gold snake <[email protected]> wrote:
>>>>
>>>>> I can't found any answer for my question in this link.
>>>>> can you just tolk to me? Is have necessary to bully a rookie?
>>>>> please...
>>>>>
>>>>> 在 2013年1月16日星期三UTC+8下午4时02分25秒,**z**denop写道:
>>>>>>
>>>>>> Really ;-)? I got 93 results. E.g.:
>>>>>>
>>>>>> https://groups.google.com/**foru****m/#!msg/tesseract-ocr/**0msQtTB_*
>>>>>> ***XrI/D1noel9GpPgJ<https://groups.google.com/forum/#!msg/tesseract-ocr/0msQtTB_XrI/D1noel9GpPgJ>
>>>>>> https://groups.google.com/d/**to****pic/tesseract-ocr/tyV5_**z65XMk/*
>>>>>> ***discussion<https://groups.google.com/d/topic/tesseract-ocr/tyV5_z65XMk/discussion>
>>>>>> https://groups.google.com/d/**ms****g/tesseract-ocr/R7UCx0oV3PA/**GE*
>>>>>> ***7KJ_76kS0J<https://groups.google.com/d/msg/tesseract-ocr/R7UCx0oV3PA/GE7KJ_76kS0J>
>>>>>>
>>>>>> Please honor time of people on this list...
>>>>>>
>>>>>> Zdenko
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 16, 2013 at 8:18 AM, gold snake <[email protected]>wrote:
>>>>>>
>>>>>>> I can't found anything. common....
>>>>>>>
>>>>>>> 在 2013年1月15日星期二UTC+8下午10时38分42秒,******zdenop写道:
>>>>>>>>
>>>>>>>> search archive of tesseract forums for cube.
>>>>>>>>
>>>>>>>> Zdenko
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jan 15, 2013 at 2:16 PM, gold snake <[email protected]>wrote:
>>>>>>>>
>>>>>>>>>  My language some special, just like arab font, but bitween arab
>>>>>>>>> font have some different, actually only different on shape of the 
>>>>>>>>> font. and
>>>>>>>>> It's writing right to left too.
>>>>>>>>> I'm using standard tutorial : https://code.google.com/p/**te******
>>>>>>>>> sseract-ocr/wiki/**TrainingTesse******ract3<https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3>
>>>>>>>>>
>>>>>>>>> but when i'm finish and test, it can't be accurately identify.
>>>>>>>>> my step is :
>>>>>>>>>
>>>>>>>>> tesseract as.kadas.exp0.tif as.kadas.exp0 batch.nochop makebox
>>>>>>>>>
>>>>>>>>> tesseract as.kadas.exp0.tif as.kadas.exp0 nobatch box.train
>>>>>>>>>
>>>>>>>>> unicharset_extractor as.kadas.exp0.box
>>>>>>>>>
>>>>>>>>> shapeclustering -F font_properties -U unicharset as.kadas.exp0.tr
>>>>>>>>>
>>>>>>>>> mftraining -F font_properties -U unicharset -O as.unicharset
>>>>>>>>> as.kadas.exp0.tr
>>>>>>>>>
>>>>>>>>> cntraining as.kadas.exp0.tr
>>>>>>>>>
>>>>>>>>> I haven't words dict. so ... i'm not use some step.
>>>>>>>>> rename some file , add as. prefix
>>>>>>>>>
>>>>>>>>> combine_tessdata as.
>>>>>>>>>
>>>>>>>>> there is no any error until i'm combne, so i'm sure it's not have
>>>>>>>>> any problem.
>>>>>>>>> and when i'm test picture ,content is 13.  the result is : ئئ
>>>>>>>>> when i'm test any words, the result just ئ
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> and i'm find the D:\Little\Tesseract-OCR\**te******ssdata , and
>>>>>>>>> i'm found some file :
>>>>>>>>>
>>>>>>>>> ara.cube.bigrams
>>>>>>>>> ara.cube.fold
>>>>>>>>> ara.cube.lm
>>>>>>>>> ara.cube.nn
>>>>>>>>> ara.cube.params
>>>>>>>>> ara.cube.size
>>>>>>>>> ara.cube.word-freq
>>>>>>>>> ara.traineddata
>>>>>>>>>
>>>>>>>>> and i can't understand. why the arab trainddata not only
>>>>>>>>> have ara.traineddata? what is any other arab.* file ?? and if i'm 
>>>>>>>>> trainning
>>>>>>>>> my lanugage it's necessary??
>>>>>>>>> and how i cant find that file or create??
>>>>>>>>>
>>>>>>>>> thanks very much...
>>>>>>>>>
>>>>>>>>>  --
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>>> To post to this group, send email to [email protected]
>>>>>>>>>
>>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>>> tesseract-oc...@**googlegroups.**c****om
>>>>>>>>>
>>>>>>>>> For more options, visit this group at
>>>>>>>>> http://groups.google.com/**group******/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>>>>>>
>>>>>>>>
>>>>>>>>  --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To post to this group, send email to [email protected]
>>>>>>> To unsubscribe from this group, send email to
>>>>>>> tesseract-oc...@**googlegroups.**c**om
>>>>>>> For more options, visit this group at
>>>>>>> http://groups.google.com/**group****/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>>>>
>>>>>>
>>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To post to this group, send email to [email protected]
>>>>> To unsubscribe from this group, send email to
>>>>> tesseract-oc...@**googlegroups.**com
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/**group**/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ``All that is gold does not glitter,
>>>>   not all those who wander are lost;
>>>> the old that is strong does not wither,
>>>>   deep roots are not reached by the frost.
>>>> From the ashes a fire shall be woken,
>>>>   a light from the shadows shall spring;
>>>> renewed shall be blade that was broken,
>>>>   the crownless again shall be king.”
>>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> tesseract-oc...@**googlegroups.com
>>> For more options, visit this group at
>>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>
>>
>>
>>
>> --
>> ``All that is gold does not glitter,
>>   not all those who wander are lost;
>> the old that is strong does not wither,
>>   deep roots are not reached by the frost.
>> From the ashes a fire shall be woken,
>>   a light from the shadows shall spring;
>> renewed shall be blade that was broken,
>>   the crownless again shall be king.”
>>
>  --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>



-- 
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
>From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to