Could you give us a link to where the text of this article can be downloaded from? Can't find it anywhere, only the title and authors.
On Thu, Mar 31, 2011 at 6:09 AM, Cong Nguyen <[email protected]> wrote: > Please refer to "OPTIMIZING SPEED FOR ADAPTIVE LOCAL THRESHOLDING ALGORITHM > USING DYNAMIC PROGRAMMING". > Complexity is: O(n), n is number of pixels. > > -----Original Message----- > From: [email protected] [mailto:[email protected]] > On Behalf Of Max Cantor > Sent: Thursday, March 31, 2011 7:28 AM > To: [email protected] > Cc: [email protected] > Subject: Re: tips for improving Tesseract accuracy and speed... > > Yes. I've had great experience with sauvola binarize from leptonica. Gamer > works too but is much much slower > > On Mar 31, 2011, at 0:02, cong nguyenba <[email protected]> wrote: > >> I have another approach for you here: try to apply binarization using >> adaptive threshold! Delving into engine by following apdaptive >> classification in source code for speedup! I think it is enough for >> your expectation! >> >> On Wednesday, March 30, 2011, Dmitri Silaev <[email protected]> wrote: >>> P.S.: If you're still sure that reasonable downscaling of your images >>> sacrifices the accuracy, please share one or two of your *unprocessed* >>> images to investigate further. >>> >>> And I'd suggest to keep up with the latest revisions of Tesseract. The >>> API changes significantly, but Tess is definitely being improved in >>> the sense of stability, new capabilities and also code efficiency, >>> which explicitly may lead to improved performance which you are >>> looking for. >>> >>> Warm regards, >>> Dmitri Silaev >>> >>> >>> >>> >>> >>> On Tue, Mar 29, 2011 at 8:17 AM, Andres <[email protected]> wrote: >>>> ...required. >>>> >>>> Hello people, >>>> >>>> I'm develping a licence plate recognition system from long ago and I > still >>>> have to improve the use of Tesseract to make it usable. >>>> >>>> My first concern is about speed: >>>> After extracting the licence plate image, I get an image like this: >>>> >>>> > https://docs.google.com/leaf?id=0BxkuvS_LuBAzNmRkODhkYTUtNjcyYS00Nzg5LWE0ZDI > tNWM4YjRkYzhjYTFh&hl=en&authkey=CP-6tsgP >>>> >>>> As you may see, there are only 6 characters (tess is recognizing more >>>> because there are some blemishes over there, but I get rid of them with > some >>>> postprocessing of the layout of the recognized chars) >>>> >>>> In an Intel I7 720 (good power, but using a single thread) the tesseract >>>> part is taking something like 230 ms. This is too much time for what I > need. >>>> >>>> The image is 500 x 117 pixels. I noted that when I reduce the size of > this >>>> image the detection time is reduced in proportion with the image area, > which >>>> makes good sense. But the accuracy of the OCR is poor when the > characters >>>> height is below 90 pixels. >>>> >>>> So, I assume that there is a problem with the way I trained tesseract. >>>> >>>> Because the characters in the plates are assorted (3 alphanumeric, 3 >>>> numeric) I trained it with just a single image with all the letters in > the >>>> alphabet. I saw that you suggest large training but I imagine that that >>>> doesn't apply here where the characters are not organized in words. Am I >>>> correct with this ? >>>> >>>> So, for you to see, this is the image with what I trained Tesseract: >>>> >>>> > https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0BxkuvS_Lu > BAzODc1YjIxNWUtNzIxMS00Yjg3LTljMDctNDkyZGIxZWM4YWVm&hl=en&authkey=CMXwo-AL >>>> >>>> In this image the characters are about 55 pixels height. >>>> >>>> Then, for frequent_word_list and words_list I included a single entry > for >>>> each character, I mean, something starting with this: >>>> >>>> A >>>> B >>>> C >>>> D >>>> ... >>>> >>>> Do you see something to be improved on what I did ? Should I perhaps use > a >>>> training image with more letters, with more combinations ? Will that > help >>>> somehow ? >>>> >>>> Should I include in the same image a copy the same character set but > with >>>> smaller size ? In that way, will I be able to pass Tesseract smaller > images >>>> and get more speed without sacrificing detection quality ? >>>> >>>> >>>> On the other hand, I found some strange behavior of Tesseract about > which I >>>> would like to know a little more: >>>> In my preprocessing I tried Otsu thresholding >>>> (http://en.wikipedia.org/wiki/Otsu%27s_method) and I visually got too > much >>>> better results, but surprisingly for Tesseract it was worse. It > decreased >>>> the thickness of the draw of the chars, and the chars I used to train >>>> Tesseract were bolder. So, Tesseract matches the "boldness" of the >>>> characters ? Should I train Tesseract with different levels of boldness > ? >>>> >>>> I'm using Tesseract 2.04 for this. Do you think that some of these > issues >>>> will go better by using Tess 3.0 ? >>>> >>>> >>>> Thanks, >>>> >>>> Andres >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google > Groups >>>> "tesseract-ocr" group. >>>> To post to this group, send email to [email protected]. >>>> To unsubscribe from this group, send email to >>>> [email protected]. >>>> For more options, visit this group at >>>> http://groups.google.com/group/tesseract-ocr?hl=en. >>>> >>> >>> -- >>> You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. >>> To post to this group, send email to >> >> -- >> You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to > [email protected]. >> For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. >> > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

