Re: Customising Tesseract for character recognition

2012-10-14 Thread zdenko podobny
On Sat, Oct 13, 2012 at 10:47 PM, JVIyer wrote: > *A lot of times I have seen fairly good number plate images being OCRed > inaccurately. This could possibly be due to the word recognition stage. Has > anyone found a way to disable the dictionary / word recognition. > * > Saurabh, Have you been a

Re: Customising Tesseract for character recognition

2012-10-13 Thread JVIyer
*A lot of times I have seen fairly good number plate images being OCRed inaccurately. This could possibly be due to the word recognition stage. Has anyone found a way to disable the dictionary / word recognition. * Saurabh, Have you been able to accomplish this ? Could you kindly share your insi

Re: Customising Tesseract for character recognition

2012-02-20 Thread Aruna Devi
by seeing the output i got. My image has 6 rows and 12 columns, but in my output i got 12 rows and 6 columns , and all was read from right first.(should have started from left) On Feb 17, 6:24 pm, Andres wrote: > Just by curiosity, how did you find that ? > > 2012/2/17 Aruna Devi > > > > > > > >

Re: Customising Tesseract for character recognition

2012-02-17 Thread Andres
Just by curiosity, how did you find that ? 2012/2/17 Aruna Devi > Even i wanted to know how to make tesseract to read my image horizontally. > I have an image consisting of 6 rows, After training i found that my image > is read from right side(Should be from left) and also its going down by > co

Re: Customising Tesseract for character recognition

2012-02-16 Thread Aruna Devi
Even i wanted to know how to make tesseract to read my image horizontally. I have an image consisting of 6 rows, After training i found that my image is read from right side(Should be from left) and also its going down by column and not the row. How to solve this issue? -- You received this me

Re: Customising Tesseract for character recognition

2011-12-14 Thread Prachi Joshi
how to set all these variables? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For mor

Re: Customising Tesseract for character recognition

2011-03-14 Thread Jose
In future that will be my desired approach! for the time beeing I just need a fast and easy solution! I know it's not the most beautiful approach... but I haven't touch a lot of the tesseract framework in order to break anything! I was just short of time and it was easier for me to modify the sourc

Re: Customising Tesseract for character recognition

2011-03-14 Thread Dmitry Silaev
Why don't you consider making your own project and statically include in it Tesseract, or use Tesseract as a dynamic link library? In that way you can implement any formating and other special logic you wish... Warm regards, Dmitry Silaev On Mon, Mar 14, 2011 at 2:13 PM, Jose wrote: > I fire

Re: Customising Tesseract for character recognition

2011-03-14 Thread Jose
I fire the execution of the tesseract in the command line and I didn't find a way to format the results with more info. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsu

Re: Customising Tesseract for character recognition

2011-03-14 Thread Dmitry Silaev
Ehmm... I don't get it. If you've succeeded in using iterators, it's at your full disposal to format the output in any way you want programmatically, isn't it? Warm regards, Dmitry Silaev On Mon, Mar 14, 2011 at 1:56 PM, Jose wrote: > *I only modify how the result is printed! nothing else...

Re: Customising Tesseract for character recognition

2011-03-14 Thread Jose
*I only modify how the result is printed! nothing else... I grab all the info from the word and it's bounding box! that is ok right? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups

Re: Customising Tesseract for character recognition

2011-03-14 Thread Jose
yes, I got the information from the result! I only modify has the result method prints the result.. nothing more of course! I got the information from the bounding box of the result! I'm not modifying it deeper than that. -- You received this message because you are subscribed to the Google Group

Re: Customising Tesseract for character recognition

2011-03-14 Thread Dmitry Silaev
I think the best approach would be to stay as far as possible from modifying the 3rd party code. Take a closer look to ResultIterator and PageIterator classes. Often they suffice for getting all information you need about Tess's recognition results. Warm regards, Dmitry Silaev On Mon, Mar 14,

Re: Customising Tesseract for character recognition

2011-03-14 Thread Jose
Hi Dmitry, thanks for the help! and the end what I did is modify the return result function and include the top location of the the bounding box. then I have the following result: xy x1y1 x2y2 x3y3 x4y4 x5y5 x6y6 x7y7 then I parse

Re: Customising Tesseract for character recognition

2011-03-13 Thread Dmitry Silaev
Jose, I run Tesseract revision 549 from the command line under Windows with no special config and get the segmentation which is almost correct. What language file do you use? I used the following command line tesseract 3.tiff test3 -l eng with no pageseg_mode (-psm argument) as well as with it,

Re: Customising Tesseract for character recognition

2011-03-13 Thread patrickq
You expect way too much from Tesseract: it's not Tesseract's job to slice and dice the text according to various organizational requirements of applications - that's for the application to handle. You can get all the coordinates of all characters and easily determine which one are in what you consi

Re: Customising Tesseract for character recognition

2011-03-13 Thread Jose
Hi Patrick, yes the results are correct! but the format of the results it is not! that's my trouble -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this g

Re: Customising Tesseract for character recognition

2011-03-13 Thread patrickq
Tesseract 3.00 gets this text 100% correct, including the smudged numbers at the bottom. See: http://www.scanbizcards.com/plate1.jpg http://www.scanbizcards.com/plate2.jpg (scanning was done with ScanBizCards on an iPhone - if you try it yourself with the app on Android or iPhone, please disable i

Re: Customising Tesseract for character recognition

2011-03-13 Thread Jose
Hi Dmitry, sorry for the delay... I produced some samples and see if you can give them a look! regards, jose -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe

Re: Customising Tesseract for character recognition

2011-02-24 Thread Jose
Ok I'll try to do that this afternoon. thank you for the help regards, jose -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send email to t

Re: Customising Tesseract for character recognition

2011-02-24 Thread Dmitry Silaev
The best way to explain everything would be just to send your source image examples, describe what information you want to get from them and provide the community with the code snippets you use to interface with Tess. And please be as detailed as possible. Warm regards, Dmitry Silaev On Thu,

Re: Customising Tesseract for character recognition

2011-02-24 Thread Jose
In my particular case is just a matter that the first word of each column is in one font and the other is in another so instead of reading column by column it reads all the columns of the first row and then all the columns of the second row! My god is really hard to explain in english. I get an acc

Re: Customising Tesseract for character recognition

2011-02-24 Thread Dmitry Silaev
Unfortunately not only text output order can suffer from Tess's segmentation, but also extents of some text fragments can be identified incorrectly (say one "segmented" row can span over two "real" rows, probably in partial way), and that in turn can lead to *completely* irrelevant recognition resu

Re: Customising Tesseract for character recognition

2011-02-24 Thread Jose
Dmitry the recognition works the only thing is the way it is parsing it... :S I think segmentation of the images would be too much painful! I only won't to change the other that is display or the bounding boxes so I could now the x and y of the word recognized and thereby can organise the results b

Re: Customising Tesseract for character recognition

2011-02-24 Thread Dmitry Silaev
I don't know if it's affordable for you, but imho decent results can only be achieved if you do segmentation yourself and then pass image fragments to Tesseract on a word-by-word basis. Problems may appear when you have words that are too short, however, as I can see, it's not your case. Long time

Re: Customising Tesseract for character recognition

2011-02-24 Thread Jose
Hi, (as you now Saurabh because we talked in private the other day) I tried the PSM_SINGLE_COLUMN and the accuracy drops dramatically... I can't afford to loose that accuracy. Is it possible to change the way the output is display? Looking a the code it seems rather hard to change it... perhaps I c

Re: Customising Tesseract for character recognition

2011-02-21 Thread Jose Granja
Hi, do you now how to force the page layout to recognise it as horizontal? my issue is with that! you ll make me the happiest person on earth On 17 Feb 2011, at 04:48, Saurabh Gandhi wrote: > Hello everyone, > > I am currently using tesseract 3.x for license plate recognition. > I have an algo

Re: Customising Tesseract for character recognition

2011-02-21 Thread Jose
Saurabh by setting on this: PSM_AUTO,PSM_SINGLE_BLOCK, PSM_CHAR are you forcing the page to read horizontally? My problem is that I have a column of two words separated by a white space (each word is in a diferent font) and Instead of seeing one column of two words the OCR sees two columns of one w

Re: Customising Tesseract for character recognition

2011-02-21 Thread Jose
Ok I'm recompiling now... I'll let you know when it's done! thanks for the help anyway :) -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com. To unsubscribe from this group, send

Re: Customising Tesseract for character recognition

2011-02-21 Thread Jose
you now Saurabh, that was EXACTLY was I was looking for! I couldn't be more thankful to you! that line of code changed my life :D thank you again :) -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract

Re: Customising Tesseract for character recognition

2011-02-21 Thread Jose
this is JPG look like *WORD1 * WORD2 (white space is quite "big" *WORD1 *WORD2 *WORD1 *WORD2 *WORD1 *WORD2 *WORD1 *WORD2 *WORD1 *WORD2 *WORD1 *WORD2 and it reads like: *WORD1 * *WORD1 * *WORD1 * *WORD1 * *WORD1 * *WORD1* WORD2 WORD2 WORD2 WORD2 WORD2 WORD2 WORD2 any help would be r

Re: Customising Tesseract for character recognition

2011-02-21 Thread Jose
ok I'll try that! I have to modify this on the tesseractmain.cpp right? (I'm using command line execution) I replace this line : api.SetPageSegMode(tesseract::PSM_AUTO); for api.SetPageSegMode(tesseract::PSM_SINGLE_COLUMN); and then recompile right? thanks for the help -- You received this mess

Re: Customising Tesseract for character recognition

2011-02-21 Thread Jose
Is there no other workarround? If I reduce the white space size of the WORD1 WORD2 then it all works fine! This space is making the OCR think it's another column! Is there no another way? Splitting the image as many rows looks something not really eficient -- You received this message because you

Re: Customising Tesseract for character recognition

2011-02-18 Thread Saurabh Gandhi
great... -- Regards, Saurabh Gandhi On Fri, Feb 18, 2011 at 5:16 PM, Jose wrote: > you now Saurabh, that was EXACTLY was I was looking for! I couldn't be more > thankful to you! that line of code changed my life :D > > thank you again :) > -- You received this message because you are subsc

Re: Customising Tesseract for character recognition

2011-02-18 Thread Saurabh Gandhi
Yes, thats right. -- Regards, Saurabh Gandhi On Fri, Feb 18, 2011 at 4:57 PM, Jose wrote: > ok I'll try that! I have to modify this on the tesseractmain.cpp right? > (I'm using command line execution) > > I replace this line : api.SetPageSegMode(tesseract::PSM_AUTO); > for api.SetPageSegMode

Re: Customising Tesseract for character recognition

2011-02-18 Thread Saurabh Gandhi
Did you try PSM_SINGLE_COLUMN. I think that is what you need. Could you try this and let us know how it behaves please. PSM_SINGLE_COLUMN, ///< Assume a single column of text of variable sizes. -- Regards, Saurabh Gandhi On Fri, Feb 18, 2011 at 4:29 PM, Jose wrote: > Is there no other work

Re: Customising Tesseract for character recognition

2011-02-18 Thread Saurabh Gandhi
Hello Jose, Setting the mode to PSM_SINGLE_BLOCK or PSM_SINGLE_LINE will not force horizontal reading. These modes will just assume that your input image itself is segmented and consists of just a single line. So, if you want horizontal reading you will have to segment your image and provide it to

Re: Customising Tesseract for character recognition

2011-02-18 Thread Saurabh Gandhi
You can simply use this in your program just after init to set whitelist / blacklist: *api.Init(argv[**0**],** **lang,** **&(argv[arg]),** **argc-arg,** **false** );** **api.SetVariable(**"tessedit_char_whitelist"**,** ** "ABCDEFGHIJKLMNOPQRSTUVWXYZ.0123456789 "**);* -- Regards, Saurabh Gandhi

Re: Customising Tesseract for character recognition

2011-02-18 Thread Sriranga(78yrsold)
*Customise the tesseract engine to recognize only the characters from **A-Z,0-9,.(dot), (space) by setting the character white-list * Kindly furnish the name of the folder in which whitelist as well as blacklist are existed. I want to utilise the same for Kannada scripts. -sriranga(78yrs) On Fr

Re: Customising Tesseract for character recognition

2011-02-17 Thread Ray Smith
>From all this, I have identified the following ways of improving the results: 1. Customise the tesseract engine to recognize only the characters from A-Z,0-9,.(dot), (space) by setting the character white-list. My understanding is that the white-list is the list of characters that are

Customising Tesseract for character recognition

2011-02-16 Thread Saurabh Gandhi
Hello everyone, I am currently using tesseract 3.x for license plate recognition. I have an algorithm which does a good job in pre-processing the input image to localize the plate. However, when I use the Tesseract OCR engine to classify the plate number, the recognition is not that accurate. I