if i found create cube solution for my language, i must use it' thanks 
anyway .that result is important 

在 2013年1月18日星期五UTC+8上午6时41分28秒,Patrick Questembert写道:
>
> Yes, cube remains a mystery for the common mortals ... I am experimenting 
> with it within ScanBizCards and here are my findings so far running 
> Tesseract 3.02 on a black & white rendition of a standard business card 
> (image size 1,024x768), on an iPhone 4S:
>
> 1. OcrEngineMode=OEM_TESSERACT_ONLY          // Tess sources comment: Run 
> Tesseract only - fastest
> Time: 6 seconds
> Accuracy: good
>
> 2. OEM_CUBE_ONLY             // Tess sources comment: Run Cube only - 
> better accuracy, but slower
> Time: 53 (!) seconds
> Accuracy: I have yet to run it on a large enough sample but for now I am 
> not convinced this mode is more accurate than OEM_TESSERACT_ONLY, at least 
> for business cards
>
> 3. OEM_TESSERACT_CUBE_COMBINED  // Tess sources comment: Run both and 
> combine results - best accuracy
> Time: 63 (!) seconds
> Accuracy: best, improves on OEM_TESSERACT_ONLY
>
> As you can see, the performance penalty for cube is severe but if you need 
> highest accuracy I would recommend skipping OEM_CUBE_ONLY and using 
> OEM_TESSERACT_CUBE_COMBINED
>
> Patrick
>
> On Thu, Jan 17, 2013 at 5:26 PM, zdenko podobny <[email protected]<javascript:>
> > wrote:
>
>> Regarding cube:
>>
>>    - there are no more public information about cube than that 92 hits 
>>    at the forum I mentioned already (+ source code ;-)) 
>>    - there are no information how to create cube data files (ok some of 
>>    them are text files...) 
>>
>>
>> So you can:
>>
>>    1. try to use/train tesseract without cube part (IMO you will need 
>>    for it for cube, because it looks like some cube files are part of 
>>    traineddata file[1]
>>    2. try to analyze cube data and share your finding - it 
>>    can encourage more people to have a look on it :-) 
>>
>> [1] 
>> http://tesseract-ocr.googlecode.com/svn/trunk/doc/combine_tessdata.1.html#_components
>>
>> Zdenko
>>
>>
>> On Thu, Jan 17, 2013 at 5:33 PM, gold snake <[email protected]<javascript:>
>> > wrote:
>>
>>> the Arab and English font some think very different. 
>>> English font if you input a+b , the result is :ab
>>>  but if you use Arab font input ئ+ا the result is ئا , if you not 
>>> understand, you can copy ئا and add a space for middle, you can find if 
>>> you input 2 different font , the result is a new font style.
>>>
>>> My language too, so, i just afraid the cube is the control for this. if 
>>> cube is for this , it's terrible, because i don't know how create(i not 
>>> mean you tell me how, i just need some example or document about this 
>>> information.)
>>>
>>> and about the RTL , looks mean that is not any way for handle this , may 
>>> be we only use programming handle this(when read finish, change display 
>>> mode....something like that).
>>>
>>> thanks.
>>>
>>> 在 2013年1月17日星期四UTC+8下午10时36分44秒,sventech写道:
>>>>
>>>> OK, the fact that cube is something different than combining languages 
>>>> is a major revelation to me. However, huangjingshe, I don't think you need 
>>>> the cube feature for what you're doing. I believe the problem you're 
>>>> having 
>>>> is something else. I would solve the other issues first and then maybe try 
>>>> the cube feature if necessary.
>>>> --Sven
>>>>
>>>>
>>>> On Wed, Jan 16, 2013 at 10:07 PM, gold snake <[email protected]>wrote:
>>>>
>>>>> thanks again .but  i have same question. if use cube just for combine 
>>>>> with other language when training. why when we read document can choice 
>>>>> cube mode just like Sven said??
>>>>>
>>>>> it that you mean we can combine with other language  use -l [lang]because 
>>>>> it's have cube file. if there is no any cube file. we can't use 
>>>>> -l [lang]??
>>>>>
>>>>> but i'm test, and everybody knows china language only have .traindata 
>>>>> file, not have cube file .but i can use 
>>>>> tesseract -l chi_sim [lang].[fontname].exp0.tif [lang].[fontname].exp0 
>>>>> batch.nochop makeb
>>>>>
>>>>> so , it's maybe not about cube file. or i'm not using right.....
>>>>>
>>>>>
>>>>> 在 2013年1月17日星期四UTC+8上午3时34分25秒,**sventech写道:
>>>>>>
>>>>>> Cube means combining different languages. There is not much 
>>>>>> documentation on it -- Google developed it internally. But I don't think 
>>>>>> you need it. The list of files you sent is related to the cube feature, 
>>>>>> so 
>>>>>> you don't need to create them. For right to left, search the archives 
>>>>>> for 
>>>>>> "right to left" -- someone wrote a python script to convert, though he 
>>>>>> didn't provide info about how to use it.
>>>>>>
>>>>>> utility to convert training files:
>>>>>> https://groups.google.com/**foru**m/?fromgroups=#!searchin/**tesse**
>>>>>> ract-ocr/rtl/tesseract-**ocr/**T035ZyQVlMU/tQVoGWdlBDMJ<https://groups.google.com/forum/?fromgroups=#!searchin/tesseract-ocr/rtl/tesseract-ocr/T035ZyQVlMU/tQVoGWdlBDMJ>
>>>>>>
>>>>>> basic trick for right to left output from Dmitri Silaev:
>>>>>> https://groups.google.com/**foru**m/?fromgroups=#!searchin/**tesse**
>>>>>> ract-ocr/right$20to$**20left$**20output/tesseract-ocr/**8r2qGvM**
>>>>>> zz9U/so1WuMTyaU8J<https://groups.google.com/forum/?fromgroups=#!searchin/tesseract-ocr/right$20to$20left$20output/tesseract-ocr/8r2qGvMzz9U/so1WuMTyaU8J>
>>>>>> --Sven
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 16, 2013 at 10:57 AM, gold snake <[email protected]>wrote:
>>>>>>
>>>>>>> so you mean: cube exists just because for user combine it with other 
>>>>>>> language, the mean i'm not be need(because my language is not arab). 
>>>>>>> thanks.may be i'm English not good. i just cant understand what is 
>>>>>>> "cube", 
>>>>>>> what is for use , can't find Introduction.
>>>>>>>
>>>>>>> and that mean cube and my result is left to right(accurate results 
>>>>>>> must is right to left ) not any relationship. then why when i'm use 
>>>>>>> command:tesseract 14.jpg output -l [lang]. the result(output.txt) 
>>>>>>> content is left to right??
>>>>>>>
>>>>>>> i'm very sorry if let masters take the beautiful time for these 
>>>>>>> small problems. just some days ago i'm event don't know what is OCR
>>>>>>>  if i can find that some question answer....believe me i'm not 
>>>>>>> gonna ask anybody , because it's true, 
>>>>>>> i really understand every friend is very busy. so , i'm trying hard 
>>>>>>> search some problem from now. sorry again....
>>>>>>>
>>>>>>> 在 2013年1月16日星期三UTC+8下午10时34分21秒,****sventech写道:
>>>>>>>>
>>>>>>>> The reason why Arabic has those files and your language does not is 
>>>>>>>> that Arabic is set up to use the "cube" feature to combine it with 
>>>>>>>> other 
>>>>>>>> languages, so you can do "-l ara+eng" and OCR a document with both 
>>>>>>>> Arabic 
>>>>>>>> and English. That training is harder, and not necessary if you mainly 
>>>>>>>> want 
>>>>>>>> to do monolingual documents.
>>>>>>>>
>>>>>>>> And what Zdenko is saying is that you are asking questions that 
>>>>>>>> don't show that you're tried to solve the problem yourself. We're all 
>>>>>>>> professional programmers and we want to help people but we don't have 
>>>>>>>> time 
>>>>>>>> to teach elementary web searching or programming. You seem to be a 
>>>>>>>> smart 
>>>>>>>> guy, but your questions appear to be lazy. You need to make an effort 
>>>>>>>> to 
>>>>>>>> solve the problems and come to us for help, not ask us to solve them 
>>>>>>>> for 
>>>>>>>> you.
>>>>>>>> --Sven
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jan 16, 2013 at 2:59 AM, gold snake <[email protected]>wrote:
>>>>>>>>
>>>>>>>>> I can't found any answer for my question in this link.
>>>>>>>>> can you just tolk to me? Is have necessary to bully a rookie?
>>>>>>>>> please...
>>>>>>>>>
>>>>>>>>> 在 2013年1月16日星期三UTC+8下午4时02分25秒,**z****denop写道:
>>>>>>>>>>
>>>>>>>>>> Really ;-)? I got 93 results. E.g.:
>>>>>>>>>>
>>>>>>>>>> https://groups.google.com/**foru******m/#!msg/tesseract-ocr/**
>>>>>>>>>> 0msQtTB_******XrI/D1noel9GpPgJ<https://groups.google.com/forum/#!msg/tesseract-ocr/0msQtTB_XrI/D1noel9GpPgJ>
>>>>>>>>>> https://groups.google.com/d/**to******pic/tesseract-ocr/tyV5_**
>>>>>>>>>> z65XMk/******discussion<https://groups.google.com/d/topic/tesseract-ocr/tyV5_z65XMk/discussion>
>>>>>>>>>> https://groups.google.com/d/**ms******
>>>>>>>>>> g/tesseract-ocr/R7UCx0oV3PA/**GE******7KJ_76kS0J<https://groups.google.com/d/msg/tesseract-ocr/R7UCx0oV3PA/GE7KJ_76kS0J>
>>>>>>>>>>
>>>>>>>>>> Please honor time of people on this list...
>>>>>>>>>>
>>>>>>>>>> Zdenko
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Jan 16, 2013 at 8:18 AM, gold snake 
>>>>>>>>>> <[email protected]>wrote:
>>>>>>>>>>
>>>>>>>>>>> I can't found anything. common....
>>>>>>>>>>>
>>>>>>>>>>> 在 2013年1月15日星期二UTC+8下午10时38分42秒,********zdenop写道:
>>>>>>>>>>>>
>>>>>>>>>>>>  search archive of tesseract forums for cube.
>>>>>>>>>>>>
>>>>>>>>>>>> Zdenko
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 15, 2013 at 2:16 PM, gold snake <[email protected]
>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>  My language some special, just like arab font, but bitween 
>>>>>>>>>>>>> arab font have some different, actually only different on shape 
>>>>>>>>>>>>> of the 
>>>>>>>>>>>>> font. and It's writing right to left too.
>>>>>>>>>>>>> I'm using standard tutorial : https://code.google.com/p/**te**
>>>>>>>>>>>>> ******sseract-ocr/wiki/**TrainingTesse********ract3<https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3>
>>>>>>>>>>>>>
>>>>>>>>>>>>> but when i'm finish and test, it can't be accurately identify. 
>>>>>>>>>>>>> my step is :
>>>>>>>>>>>>>
>>>>>>>>>>>>> tesseract as.kadas.exp0.tif as.kadas.exp0 batch.nochop makebox
>>>>>>>>>>>>>
>>>>>>>>>>>>> tesseract as.kadas.exp0.tif as.kadas.exp0 nobatch box.train
>>>>>>>>>>>>>
>>>>>>>>>>>>> unicharset_extractor as.kadas.exp0.box
>>>>>>>>>>>>>
>>>>>>>>>>>>> shapeclustering -F font_properties -U unicharset 
>>>>>>>>>>>>> as.kadas.exp0.tr
>>>>>>>>>>>>>
>>>>>>>>>>>>> mftraining -F font_properties -U unicharset -O as.unicharset 
>>>>>>>>>>>>> as.kadas.exp0.tr
>>>>>>>>>>>>>
>>>>>>>>>>>>> cntraining as.kadas.exp0.tr
>>>>>>>>>>>>>
>>>>>>>>>>>>> I haven't words dict. so ... i'm not use some step.
>>>>>>>>>>>>> rename some file , add as. prefix
>>>>>>>>>>>>>
>>>>>>>>>>>>> combine_tessdata as.
>>>>>>>>>>>>>
>>>>>>>>>>>>> there is no any error until i'm combne, so i'm sure it's not 
>>>>>>>>>>>>> have any problem.
>>>>>>>>>>>>> and when i'm test picture ,content is 13.  the result is : ئئ
>>>>>>>>>>>>> when i'm test any words, the result just ئ
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> and i'm find the D:\Little\Tesseract-OCR\**te********ssdata , and 
>>>>>>>>>>>>> i'm found some file :
>>>>>>>>>>>>>
>>>>>>>>>>>>> ara.cube.bigrams
>>>>>>>>>>>>> ara.cube.fold
>>>>>>>>>>>>> ara.cube.lm
>>>>>>>>>>>>> ara.cube.nn
>>>>>>>>>>>>> ara.cube.params
>>>>>>>>>>>>> ara.cube.size
>>>>>>>>>>>>> ara.cube.word-freq
>>>>>>>>>>>>> ara.traineddata
>>>>>>>>>>>>>
>>>>>>>>>>>>> and i can't understand. why the arab trainddata not only 
>>>>>>>>>>>>> have ara.traineddata? what is any other arab.* file ?? and if i'm 
>>>>>>>>>>>>> trainning 
>>>>>>>>>>>>> my lanugage it's necessary??
>>>>>>>>>>>>> and how i cant find that file or create??
>>>>>>>>>>>>>
>>>>>>>>>>>>> thanks very much...
>>>>>>>>>>>>>
>>>>>>>>>>>>>  -- 
>>>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>>>> Google
>>>>>>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>>>>>>> To post to this group, send email to 
>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>
>>>>>>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>>>>>>> tesseract-oc...@**googlegroups.**c******om
>>>>>>>>>>>>>
>>>>>>>>>>>>> For more options, visit this group at
>>>>>>>>>>>>> http://groups.google.com/**group********/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  -- 
>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>> Google
>>>>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>>>>> To post to this group, send email to [email protected]
>>>>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>>>>> tesseract-oc...@**googlegroups.**c****om
>>>>>>>>>>> For more options, visit this group at
>>>>>>>>>>> http://groups.google.com/**group******/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  -- 
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>>> To post to this group, send email to [email protected]
>>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>>> tesseract-oc...@**googlegroups.**c**om
>>>>>>>>> For more options, visit this group at
>>>>>>>>> http://groups.google.com/**group****/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> ``All that is gold does not glitter,
>>>>>>>>   not all those who wander are lost;
>>>>>>>> the old that is strong does not wither,
>>>>>>>>   deep roots are not reached by the frost.
>>>>>>>> From the ashes a fire shall be woken,
>>>>>>>>   a light from the shadows shall spring;
>>>>>>>> renewed shall be blade that was broken,
>>>>>>>>   the crownless again shall be king.” 
>>>>>>>>
>>>>>>>  -- 
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To post to this group, send email to [email protected]
>>>>>>> To unsubscribe from this group, send email to
>>>>>>> tesseract-oc...@**googlegroups.**com
>>>>>>> For more options, visit this group at
>>>>>>> http://groups.google.com/**group**/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> ``All that is gold does not glitter,
>>>>>>   not all those who wander are lost;
>>>>>> the old that is strong does not wither,
>>>>>>   deep roots are not reached by the frost.
>>>>>> From the ashes a fire shall be woken,
>>>>>>   a light from the shadows shall spring;
>>>>>> renewed shall be blade that was broken,
>>>>>>   the crownless again shall be king.” 
>>>>>>
>>>>>  -- 
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To post to this group, send email to [email protected]
>>>>> To unsubscribe from this group, send email to
>>>>> tesseract-oc...@**googlegroups.com
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> ``All that is gold does not glitter,
>>>>   not all those who wander are lost;
>>>> the old that is strong does not wither,
>>>>   deep roots are not reached by the frost.
>>>> From the ashes a fire shall be woken,
>>>>   a light from the shadows shall spring;
>>>> renewed shall be blade that was broken,
>>>>   the crownless again shall be king.” 
>>>>
>>>  -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]<javascript:>
>>> To unsubscribe from this group, send email to
>>> [email protected] <javascript:>
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]<javascript:>
>> To unsubscribe from this group, send email to
>> [email protected] <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>
>
>
> -- 
> Patrick Questembert, *ScanBizCards*
> +1-917-250-4177 | www.scanbizcards.com
> twitter.com/ScanBizCards | www.facebook.com/ScanBizCards
> Just released: Power Contacts - 
> http://itunes.apple.com/us/app/power-contacts/id476986356?mt=8
>  

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to