Re: How training language like arab?

2013-01-22 Thread Sven Pedersen
Have you looked through the archives to check for the people working on Farsi? They would have a good idea how to solve this problem. Arsalan Ghasrsaz ghasr...@googlemail.com https://github.com/reza1615/PersianOcr --Sven On Sat, Jan 19, 2013 at 7:31 AM, gold snake huangjin...@gmail.com wrote:

Re: How training language like arab?

2013-01-19 Thread gold snake
I'm training failure, final result looks like very bad. maybe because i don't know how handle the same character in different position. you looking like that: م , ئما , تىم , مور actually i'm writing like that: م , ئما , تىم , مور can you see one character like O, it's a same character, but

Re: How training language like arab?

2013-01-18 Thread gold snake
if i found create cube solution for my language, i must use it' thanks anyway .that result is important 在 2013年1月18日星期五UTC+8上午6时41分28秒,Patrick Questembert写道: Yes, cube remains a mystery for the common mortals ... I am experimenting with it within ScanBizCards and here are my findings so far

Re: How training language like arab?

2013-01-17 Thread Nick White
Hi Tesseract folks (it's nice to be back), On Wed, Jan 16, 2013 at 01:34:25PM -0600, Sven Pedersen wrote: Cube means combining different languages. Really? I don't think this is correct. Certainly using eng+grc works to combine English and Ancient Greek recognition, despite grc having no cube

Re: How training language like arab?

2013-01-17 Thread Sven Pedersen
OK, the fact that cube is something different than combining languages is a major revelation to me. However, huangjingshe, I don't think you need the cube feature for what you're doing. I believe the problem you're having is something else. I would solve the other issues first and then maybe try

Re: How training language like arab?

2013-01-17 Thread gold snake
the Arab and English font some think very different. English font if you input a+b , the result is :ab but if you use Arab font input ئ+ا the result is ئا , if you not understand, you can copy ئا and add a space for middle, you can find if you input 2 different font , the result is a new font

Re: How training language like arab?

2013-01-17 Thread Sven Pedersen
Yes, glyph handling and combining is important -- if you search the archives you'll see how people have dealt with it for Asian languages -- mainly Indian / Indic scripts. You need to specify the component parts in your training. I sent you 2 links about the right to left support (RTL) in

Re: How training language like arab?

2013-01-17 Thread zdenko podobny
Regarding cube: - there are no more public information about cube than that 92 hits at the forum I mentioned already (+ source code ;-)) - there are no information how to create cube data files (ok some of them are text files...) So you can: 1. try to use/train tesseract without

Re: How training language like arab?

2013-01-16 Thread zdenko podobny
Really ;-)? I got 93 results. E.g.: https://groups.google.com/forum/#!msg/tesseract-ocr/0msQtTB_XrI/D1noel9GpPgJ https://groups.google.com/d/topic/tesseract-ocr/tyV5_z65XMk/discussion https://groups.google.com/d/msg/tesseract-ocr/R7UCx0oV3PA/GE7KJ_76kS0J Please honor time of people on this

Re: How training language like arab?

2013-01-16 Thread gold snake
I can't found any answer for my question in this link. can you just tolk to me? Is have necessary to bully a rookie? please... 在 2013年1月16日星期三UTC+8下午4时02分25秒,zdenop写道: Really ;-)? I got 93 results. E.g.: https://groups.google.com/forum/#!msg/tesseract-ocr/0msQtTB_XrI/D1noel9GpPgJ

Re: How training language like arab?

2013-01-16 Thread Sven Pedersen
The reason why Arabic has those files and your language does not is that Arabic is set up to use the cube feature to combine it with other languages, so you can do -l ara+eng and OCR a document with both Arabic and English. That training is harder, and not necessary if you mainly want to do

Re: How training language like arab?

2013-01-16 Thread gold snake
so you mean: cube exists just because for user combine it with other language, the mean i'm not be need(because my language is not arab). thanks.may be i'm English not good. i just cant understand what is cube, what is for use , can't find Introduction. and that mean cube and my result is left

Re: How training language like arab?

2013-01-16 Thread zdenko podobny
On Wed, Jan 16, 2013 at 3:34 PM, Sven Pedersen sven.peder...@gmail.comwrote: The reason why Arabic has those files and your language does not is that Arabic is set up to use the cube feature to combine it with other languages, so you can do -l ara+eng and OCR a document with both Arabic and

Re: How training language like arab?

2013-01-16 Thread Sven Pedersen
Cube means combining different languages. There is not much documentation on it -- Google developed it internally. But I don't think you need it. The list of files you sent is related to the cube feature, so you don't need to create them. For right to left, search the archives for right to left --

Re: How training language like arab?

2013-01-16 Thread gold snake
thanks again .but i have same question. if use cube just for combine with other language when training. why when we read document can choice cube mode just like Sven said?? it that you mean we can combine with other language use -l [lang] because it's have cube file. if there is no any cube

How training language like arab?

2013-01-15 Thread gold snake
My language some special, just like arab font, but bitween arab font have some different, actually only different on shape of the font. and It's writing right to left too. I'm using standard tutorial : https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 but when i'm finish and

Re: How training language like arab?

2013-01-15 Thread zdenko podobny
search archive of tesseract forums for cube. Zdenko On Tue, Jan 15, 2013 at 2:16 PM, gold snake huangjin...@gmail.com wrote: My language some special, just like arab font, but bitween arab font have some different, actually only different on shape of the font. and It's writing right to left

Re: How training language like arab?

2013-01-15 Thread gold snake
I can't found anything. common 在 2013年1月15日星期二UTC+8下午10时38分42秒,zdenop写道: search archive of tesseract forums for cube. Zdenko On Tue, Jan 15, 2013 at 2:16 PM, gold snake huang...@gmail.comjavascript: wrote: My language some special, just like arab font, but bitween arab font have