On Wed, Jan 16, 2013 at 3:34 PM, Sven Pedersen <[email protected]>wrote:
> The reason why Arabic has those files and your language does not is that > Arabic is set up to use the "cube" feature to combine it with other > languages, so you can do "-l ara+eng" and OCR a document with both Arabic > and English. > Are you sure? What it the reason for OEM_CUBE_ONLY mode[1]? BTW: I changed OEM_DEFAULT[2] to OEM_TESSERACT_ONLY and simultaneous multi-language capability works for me... [1] https://code.google.com/p/tesseract-ocr/source/browse/trunk/ccstruct/publictypes.h?r=820#244 [2] https://code.google.com/p/tesseract-ocr/source/browse/trunk/api/tesseractmain.cpp?r=814#138 That training is harder, and not necessary if you mainly want to do > monolingual documents. > > > And what Zdenko is saying is that you are asking questions that don't show > that you're tried to solve the problem yourself. We're all professional > programmers and we want to help people but we don't have time to teach > elementary web searching or programming. You seem to be a smart guy, but > your questions appear to be lazy. You need to make an effort to solve the > problems and come to us for help, not ask us to solve them for you. > --Sven > > > On Wed, Jan 16, 2013 at 2:59 AM, gold snake <[email protected]> wrote: > >> I can't found any answer for my question in this link. >> can you just tolk to me? Is have necessary to bully a rookie? >> please... >> >> 在 2013年1月16日星期三UTC+8下午4时02分25秒,zdenop写道: >>> >>> Really ;-)? I got 93 results. E.g.: >>> >>> https://groups.google.com/**forum/#!msg/tesseract-ocr/** >>> 0msQtTB_XrI/D1noel9GpPgJ<https://groups.google.com/forum/#!msg/tesseract-ocr/0msQtTB_XrI/D1noel9GpPgJ> >>> https://groups.google.com/d/**topic/tesseract-ocr/tyV5_** >>> z65XMk/discussion<https://groups.google.com/d/topic/tesseract-ocr/tyV5_z65XMk/discussion> >>> https://groups.google.com/d/**msg/tesseract-ocr/R7UCx0oV3PA/** >>> GE7KJ_76kS0J<https://groups.google.com/d/msg/tesseract-ocr/R7UCx0oV3PA/GE7KJ_76kS0J> >>> >>> Please honor time of people on this list... >>> >>> Zdenko >>> >>> >>> On Wed, Jan 16, 2013 at 8:18 AM, gold snake <[email protected]> wrote: >>> >>>> I can't found anything. common.... >>>> >>>> 在 2013年1月15日星期二UTC+8下午10时38分42秒,**zdenop写道: >>>>> >>>>> search archive of tesseract forums for cube. >>>>> >>>>> Zdenko >>>>> >>>>> >>>>> On Tue, Jan 15, 2013 at 2:16 PM, gold snake <[email protected]>wrote: >>>>> >>>>>> My language some special, just like arab font, but bitween arab >>>>>> font have some different, actually only different on shape of the font. >>>>>> and >>>>>> It's writing right to left too. >>>>>> I'm using standard tutorial : https://code.google.com/p/**te** >>>>>> sseract-ocr/wiki/**TrainingTesse**ract3<https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3> >>>>>> >>>>>> but when i'm finish and test, it can't be accurately identify. >>>>>> my step is : >>>>>> >>>>>> tesseract as.kadas.exp0.tif as.kadas.exp0 batch.nochop makebox >>>>>> >>>>>> tesseract as.kadas.exp0.tif as.kadas.exp0 nobatch box.train >>>>>> >>>>>> unicharset_extractor as.kadas.exp0.box >>>>>> >>>>>> shapeclustering -F font_properties -U unicharset as.kadas.exp0.tr >>>>>> >>>>>> mftraining -F font_properties -U unicharset -O as.unicharset >>>>>> as.kadas.exp0.tr >>>>>> >>>>>> cntraining as.kadas.exp0.tr >>>>>> >>>>>> I haven't words dict. so ... i'm not use some step. >>>>>> rename some file , add as. prefix >>>>>> >>>>>> combine_tessdata as. >>>>>> >>>>>> there is no any error until i'm combne, so i'm sure it's not have any >>>>>> problem. >>>>>> and when i'm test picture ,content is 13. the result is : ئئ >>>>>> when i'm test any words, the result just ئ >>>>>> >>>>>> >>>>>> >>>>>> and i'm find the D:\Little\Tesseract-OCR\**te**ssdata , and i'm >>>>>> found some file : >>>>>> >>>>>> ara.cube.bigrams >>>>>> ara.cube.fold >>>>>> ara.cube.lm >>>>>> ara.cube.nn >>>>>> ara.cube.params >>>>>> ara.cube.size >>>>>> ara.cube.word-freq >>>>>> ara.traineddata >>>>>> >>>>>> and i can't understand. why the arab trainddata not only >>>>>> have ara.traineddata? what is any other arab.* file ?? and if i'm >>>>>> trainning >>>>>> my lanugage it's necessary?? >>>>>> and how i cant find that file or create?? >>>>>> >>>>>> thanks very much... >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To post to this group, send email to [email protected] >>>>>> >>>>>> To unsubscribe from this group, send email to >>>>>> tesseract-oc...@**googlegroups.**com >>>>>> >>>>>> For more options, visit this group at >>>>>> http://groups.google.com/**group**/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en> >>>>>> >>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To post to this group, send email to [email protected] >>>> To unsubscribe from this group, send email to >>>> tesseract-oc...@**googlegroups.com >>>> For more options, visit this group at >>>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en> >>>> >>> >>> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected] >> To unsubscribe from this group, send email to >> [email protected] >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> > > > > -- > ``All that is gold does not glitter, > not all those who wander are lost; > the old that is strong does not wither, > deep roots are not reached by the frost. > From the ashes a fire shall be woken, > a light from the shadows shall spring; > renewed shall be blade that was broken, > the crownless again shall be king.” > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

