Can you perhaps provide any code samples for calling the leptonica sauvola method with say a byte array of unsigned char*? I am trying to put something similar together. I am using adaptive thresholding currently which works pretty well but I wanted to compare and contrast with the sauvola method and see if I can get better results.
I tried posting earlier. I hope this is not a second post. I am not sure if the first one bounced or ... ? Thank you, Adam On Wednesday, March 30, 2011 5:27:43 PM UTC-7, Max Cantor wrote: > > Yes. I've had great experience with sauvola binarize from leptonica. Gamer > works too but is much much slower > > On Mar 31, 2011, at 0:02, cong nguyenba <[email protected] <javascript:>> > wrote: > > > I have another approach for you here: try to apply binarization using > > adaptive threshold! Delving into engine by following apdaptive > > classification in source code for speedup! I think it is enough for > > your expectation! > > > > On Wednesday, March 30, 2011, Dmitri Silaev > > <[email protected]<javascript:>> > wrote: > >> P.S.: If you're still sure that reasonable downscaling of your images > >> sacrifices the accuracy, please share one or two of your *unprocessed* > >> images to investigate further. > >> > >> And I'd suggest to keep up with the latest revisions of Tesseract. The > >> API changes significantly, but Tess is definitely being improved in > >> the sense of stability, new capabilities and also code efficiency, > >> which explicitly may lead to improved performance which you are > >> looking for. > >> > >> Warm regards, > >> Dmitri Silaev > >> > >> > >> > >> > >> > >> On Tue, Mar 29, 2011 at 8:17 AM, Andres <[email protected]<javascript:>> > wrote: > >>> ...required. > >>> > >>> Hello people, > >>> > >>> I'm develping a licence plate recognition system from long ago and I > still > >>> have to improve the use of Tesseract to make it usable. > >>> > >>> My first concern is about speed: > >>> After extracting the licence plate image, I get an image like this: > >>> > >>> > https://docs.google.com/leaf?id=0BxkuvS_LuBAzNmRkODhkYTUtNjcyYS00Nzg5LWE0ZDItNWM4YjRkYzhjYTFh&hl=en&authkey=CP-6tsgP > >>> > >>> As you may see, there are only 6 characters (tess is recognizing more > >>> because there are some blemishes over there, but I get rid of them > with some > >>> postprocessing of the layout of the recognized chars) > >>> > >>> In an Intel I7 720 (good power, but using a single thread) the > tesseract > >>> part is taking something like 230 ms. This is too much time for what I > need. > >>> > >>> The image is 500 x 117 pixels. I noted that when I reduce the size of > this > >>> image the detection time is reduced in proportion with the image area, > which > >>> makes good sense. But the accuracy of the OCR is poor when the > characters > >>> height is below 90 pixels. > >>> > >>> So, I assume that there is a problem with the way I trained tesseract. > >>> > >>> Because the characters in the plates are assorted (3 alphanumeric, 3 > >>> numeric) I trained it with just a single image with all the letters in > the > >>> alphabet. I saw that you suggest large training but I imagine that that > >>> doesn't apply here where the characters are not organized in words. Am > I > >>> correct with this ? > >>> > >>> So, for you to see, this is the image with what I trained Tesseract: > >>> > >>> > https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0BxkuvS_LuBAzODc1YjIxNWUtNzIxMS00Yjg3LTljMDctNDkyZGIxZWM4YWVm&hl=en&authkey=CMXwo-AL > >>> > >>> In this image the characters are about 55 pixels height. > >>> > >>> Then, for frequent_word_list and words_list I included a single entry > for > >>> each character, I mean, something starting with this: > >>> > >>> A > >>> B > >>> C > >>> D > >>> ... > >>> > >>> Do you see something to be improved on what I did ? Should I perhaps > use a > >>> training image with more letters, with more combinations ? Will that > help > >>> somehow ? > >>> > >>> Should I include in the same image a copy the same character set but > with > >>> smaller size ? In that way, will I be able to pass Tesseract smaller > images > >>> and get more speed without sacrificing detection quality ? > >>> > >>> > >>> On the other hand, I found some strange behavior of Tesseract about > which I > >>> would like to know a little more: > >>> In my preprocessing I tried Otsu thresholding > >>> (http://en.wikipedia.org/wiki/Otsu%27s_method) and I visually got too > much > >>> better results, but surprisingly for Tesseract it was worse. It > decreased > >>> the thickness of the draw of the chars, and the chars I used to train > >>> Tesseract were bolder. So, Tesseract matches the "boldness" of the > >>> characters ? Should I train Tesseract with different levels of > boldness ? > >>> > >>> I'm using Tesseract 2.04 for this. Do you think that some of these > issues > >>> will go better by using Tess 3.0 ? > >>> > >>> > >>> Thanks, > >>> > >>> Andres > >>> > >>> > >>> > >>> > >>> > >>> > >>> -- > >>> You received this message because you are subscribed to the Google > Groups > >>> "tesseract-ocr" group. > >>> To post to this group, send email to > >>> [email protected]<javascript:> > . > >>> To unsubscribe from this group, send email to > >>> [email protected] <javascript:>. > >>> For more options, visit this group at > >>> http://groups.google.com/group/tesseract-ocr?hl=en. > >>> > >> > >> -- > >> You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > >> To post to this group, send email to > > > > -- > > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > > To post to this group, send email to [email protected]<javascript:> > . > > To unsubscribe from this group, send email to > [email protected] <javascript:>. > > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > > > > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

