the Arab and English font some think very different. English font if you input a+b , the result is :ab but if you use Arab font input ئ+ا the result is ئا , if you not understand, you can copy ئا and add a space for middle, you can find if you input 2 different font , the result is a new font style.
My language too, so, i just afraid the cube is the control for this. if cube is for this , it's terrible, because i don't know how create(i not mean you tell me how, i just need some example or document about this information.) and about the RTL , looks mean that is not any way for handle this , may be we only use programming handle this(when read finish, change display mode....something like that). thanks. 在 2013年1月17日星期四UTC+8下午10时36分44秒,sventech写道: > > OK, the fact that cube is something different than combining languages is > a major revelation to me. However, huangjingshe, I don't think you need the > cube feature for what you're doing. I believe the problem you're having is > something else. I would solve the other issues first and then maybe try the > cube feature if necessary. > --Sven > > > On Wed, Jan 16, 2013 at 10:07 PM, gold snake <[email protected]<javascript:> > > wrote: > >> thanks again .but i have same question. if use cube just for combine >> with other language when training. why when we read document can choice >> cube mode just like Sven said?? >> >> it that you mean we can combine with other language use -l [lang]because >> it's have cube file. if there is no any cube file. we can't use >> -l [lang]?? >> >> but i'm test, and everybody knows china language only have .traindata >> file, not have cube file .but i can use >> tesseract -l chi_sim [lang].[fontname].exp0.tif [lang].[fontname].exp0 >> batch.nochop makeb >> >> so , it's maybe not about cube file. or i'm not using right..... >> >> >> 在 2013年1月17日星期四UTC+8上午3时34分25秒,sventech写道: >>> >>> Cube means combining different languages. There is not much >>> documentation on it -- Google developed it internally. But I don't think >>> you need it. The list of files you sent is related to the cube feature, so >>> you don't need to create them. For right to left, search the archives for >>> "right to left" -- someone wrote a python script to convert, though he >>> didn't provide info about how to use it. >>> >>> utility to convert training files: >>> https://groups.google.com/**forum/?fromgroups=#!searchin/** >>> tesseract-ocr/rtl/tesseract-**ocr/T035ZyQVlMU/tQVoGWdlBDMJ<https://groups.google.com/forum/?fromgroups=#!searchin/tesseract-ocr/rtl/tesseract-ocr/T035ZyQVlMU/tQVoGWdlBDMJ> >>> >>> basic trick for right to left output from Dmitri Silaev: >>> https://groups.google.com/**forum/?fromgroups=#!searchin/** >>> tesseract-ocr/right$20to$**20left$20output/tesseract-ocr/** >>> 8r2qGvMzz9U/so1WuMTyaU8J<https://groups.google.com/forum/?fromgroups=#!searchin/tesseract-ocr/right$20to$20left$20output/tesseract-ocr/8r2qGvMzz9U/so1WuMTyaU8J> >>> --Sven >>> >>> >>> On Wed, Jan 16, 2013 at 10:57 AM, gold snake <[email protected]> wrote: >>> >>>> so you mean: cube exists just because for user combine it with other >>>> language, the mean i'm not be need(because my language is not arab). >>>> thanks.may be i'm English not good. i just cant understand what is "cube", >>>> what is for use , can't find Introduction. >>>> >>>> and that mean cube and my result is left to right(accurate results must >>>> is right to left ) not any relationship. then why when i'm use >>>> command:tesseract >>>> 14.jpg output -l [lang]. the result(output.txt) content is left to >>>> right?? >>>> >>>> i'm very sorry if let masters take the beautiful time for these small >>>> problems. just some days ago i'm event don't know what is OCR >>>> if i can find that some question answer....believe me i'm not gonna >>>> ask anybody , because it's true, >>>> i really understand every friend is very busy. so , i'm trying hard >>>> search some problem from now. sorry again.... >>>> >>>> 在 2013年1月16日星期三UTC+8下午10时34分21秒,**sventech写道: >>>>> >>>>> The reason why Arabic has those files and your language does not is >>>>> that Arabic is set up to use the "cube" feature to combine it with other >>>>> languages, so you can do "-l ara+eng" and OCR a document with both Arabic >>>>> and English. That training is harder, and not necessary if you mainly >>>>> want >>>>> to do monolingual documents. >>>>> >>>>> And what Zdenko is saying is that you are asking questions that don't >>>>> show that you're tried to solve the problem yourself. We're all >>>>> professional programmers and we want to help people but we don't have >>>>> time >>>>> to teach elementary web searching or programming. You seem to be a smart >>>>> guy, but your questions appear to be lazy. You need to make an effort to >>>>> solve the problems and come to us for help, not ask us to solve them for >>>>> you. >>>>> --Sven >>>>> >>>>> >>>>> On Wed, Jan 16, 2013 at 2:59 AM, gold snake <[email protected]>wrote: >>>>> >>>>>> I can't found any answer for my question in this link. >>>>>> can you just tolk to me? Is have necessary to bully a rookie? >>>>>> please... >>>>>> >>>>>> 在 2013年1月16日星期三UTC+8下午4时02分25秒,**z**denop写道: >>>>>>> >>>>>>> Really ;-)? I got 93 results. E.g.: >>>>>>> >>>>>>> https://groups.google.com/**foru****m/#!msg/tesseract-ocr/**0msQtTB_ >>>>>>> ****XrI/D1noel9GpPgJ<https://groups.google.com/forum/#!msg/tesseract-ocr/0msQtTB_XrI/D1noel9GpPgJ> >>>>>>> https://groups.google.com/d/**to****pic/tesseract-ocr/tyV5_**z65XMk/ >>>>>>> ****discussion<https://groups.google.com/d/topic/tesseract-ocr/tyV5_z65XMk/discussion> >>>>>>> https://groups.google.com/d/**ms****g/tesseract-ocr/R7UCx0oV3PA/**GE >>>>>>> ****7KJ_76kS0J<https://groups.google.com/d/msg/tesseract-ocr/R7UCx0oV3PA/GE7KJ_76kS0J> >>>>>>> >>>>>>> Please honor time of people on this list... >>>>>>> >>>>>>> Zdenko >>>>>>> >>>>>>> >>>>>>> On Wed, Jan 16, 2013 at 8:18 AM, gold snake <[email protected]>wrote: >>>>>>> >>>>>>>> I can't found anything. common.... >>>>>>>> >>>>>>>> 在 2013年1月15日星期二UTC+8下午10时38分42秒,******zdenop写道: >>>>>>>>> >>>>>>>>> search archive of tesseract forums for cube. >>>>>>>>> >>>>>>>>> Zdenko >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Jan 15, 2013 at 2:16 PM, gold snake <[email protected]>wrote: >>>>>>>>> >>>>>>>>>> My language some special, just like arab font, but bitween arab >>>>>>>>>> font have some different, actually only different on shape of the >>>>>>>>>> font. and >>>>>>>>>> It's writing right to left too. >>>>>>>>>> I'm using standard tutorial : https://code.google.com/p/**te***** >>>>>>>>>> *sseract-ocr/wiki/**TrainingTesse******ract3<https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3> >>>>>>>>>> >>>>>>>>>> but when i'm finish and test, it can't be accurately identify. >>>>>>>>>> my step is : >>>>>>>>>> >>>>>>>>>> tesseract as.kadas.exp0.tif as.kadas.exp0 batch.nochop makebox >>>>>>>>>> >>>>>>>>>> tesseract as.kadas.exp0.tif as.kadas.exp0 nobatch box.train >>>>>>>>>> >>>>>>>>>> unicharset_extractor as.kadas.exp0.box >>>>>>>>>> >>>>>>>>>> shapeclustering -F font_properties -U unicharset as.kadas.exp0.tr >>>>>>>>>> >>>>>>>>>> mftraining -F font_properties -U unicharset -O as.unicharset >>>>>>>>>> as.kadas.exp0.tr >>>>>>>>>> >>>>>>>>>> cntraining as.kadas.exp0.tr >>>>>>>>>> >>>>>>>>>> I haven't words dict. so ... i'm not use some step. >>>>>>>>>> rename some file , add as. prefix >>>>>>>>>> >>>>>>>>>> combine_tessdata as. >>>>>>>>>> >>>>>>>>>> there is no any error until i'm combne, so i'm sure it's not have >>>>>>>>>> any problem. >>>>>>>>>> and when i'm test picture ,content is 13. the result is : ئئ >>>>>>>>>> when i'm test any words, the result just ئ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> and i'm find the D:\Little\Tesseract-OCR\**te******ssdata , and >>>>>>>>>> i'm found some file : >>>>>>>>>> >>>>>>>>>> ara.cube.bigrams >>>>>>>>>> ara.cube.fold >>>>>>>>>> ara.cube.lm >>>>>>>>>> ara.cube.nn >>>>>>>>>> ara.cube.params >>>>>>>>>> ara.cube.size >>>>>>>>>> ara.cube.word-freq >>>>>>>>>> ara.traineddata >>>>>>>>>> >>>>>>>>>> and i can't understand. why the arab trainddata not only >>>>>>>>>> have ara.traineddata? what is any other arab.* file ?? and if i'm >>>>>>>>>> trainning >>>>>>>>>> my lanugage it's necessary?? >>>>>>>>>> and how i cant find that file or create?? >>>>>>>>>> >>>>>>>>>> thanks very much... >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>>> Groups "tesseract-ocr" group. >>>>>>>>>> To post to this group, send email to [email protected] >>>>>>>>>> >>>>>>>>>> To unsubscribe from this group, send email to >>>>>>>>>> tesseract-oc...@**googlegroups.**c****om >>>>>>>>>> >>>>>>>>>> For more options, visit this group at >>>>>>>>>> http://groups.google.com/**group******/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "tesseract-ocr" group. >>>>>>>> To post to this group, send email to [email protected] >>>>>>>> To unsubscribe from this group, send email to >>>>>>>> tesseract-oc...@**googlegroups.**c**om >>>>>>>> For more options, visit this group at >>>>>>>> http://groups.google.com/**group****/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To post to this group, send email to [email protected] >>>>>> To unsubscribe from this group, send email to >>>>>> tesseract-oc...@**googlegroups.**com >>>>>> For more options, visit this group at >>>>>> http://groups.google.com/**group**/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ``All that is gold does not glitter, >>>>> not all those who wander are lost; >>>>> the old that is strong does not wither, >>>>> deep roots are not reached by the frost. >>>>> From the ashes a fire shall be woken, >>>>> a light from the shadows shall spring; >>>>> renewed shall be blade that was broken, >>>>> the crownless again shall be king.” >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To post to this group, send email to [email protected] >>>> To unsubscribe from this group, send email to >>>> tesseract-oc...@**googlegroups.com >>>> For more options, visit this group at >>>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en> >>>> >>> >>> >>> >>> -- >>> ``All that is gold does not glitter, >>> not all those who wander are lost; >>> the old that is strong does not wither, >>> deep roots are not reached by the frost. >>> From the ashes a fire shall be woken, >>> a light from the shadows shall spring; >>> renewed shall be blade that was broken, >>> the crownless again shall be king.” >>> >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected]<javascript:> >> To unsubscribe from this group, send email to >> [email protected] <javascript:> >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> > > > > -- > ``All that is gold does not glitter, > not all those who wander are lost; > the old that is strong does not wither, > deep roots are not reached by the frost. > From the ashes a fire shall be woken, > a light from the shadows shall spring; > renewed shall be blade that was broken, > the crownless again shall be king.” > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

