Re: tips for improving Tesseract accuracy and speed...

Adam Freeman Mon, 04 Nov 2013 23:35:19 -0800

Can you perhaps provide any code samples for calling the leptonica sauvola 
method with say a byte array of unsigned char*?  I am trying to put 
something similar together.  I am using adaptive thresholding currently 
which works pretty well but I wanted to compare and contrast with the 
sauvola method and see if I can get better results.


I tried posting earlier.  I hope this is not a second post.  I am not sure 
if the first one bounced or ... ?
Thank you,
Adam

On Wednesday, March 30, 2011 5:27:43 PM UTC-7, Max Cantor wrote:
>
> Yes. I've had great experience with sauvola binarize from leptonica. Gamer 
> works too but is much much slower
>
> On Mar 31, 2011, at 0:02, cong nguyenba <[email protected] <javascript:>> 
> wrote:
>
> > I have another approach for you here: try to apply binarization using
> > adaptive threshold! Delving into engine by following apdaptive
> > classification in source code for speedup! I think it is enough for
> > your expectation!
> > 
> > On Wednesday, March 30, 2011, Dmitri Silaev 
> > <[email protected]<javascript:>> 
> wrote:
> >> P.S.: If you're still sure that reasonable downscaling of your images
> >> sacrifices the accuracy, please share one or two of your *unprocessed*
> >> images to investigate further.
> >> 
> >> And I'd suggest to keep up with the latest revisions of Tesseract. The
> >> API changes significantly, but Tess is definitely being improved in
> >> the sense of stability, new capabilities and also code efficiency,
> >> which explicitly may lead to improved performance which you are
> >> looking for.
> >> 
> >> Warm regards,
> >> Dmitri Silaev
> >> 
> >> 
> >> 
> >> 
> >> 
> >> On Tue, Mar 29, 2011 at 8:17 AM, Andres <[email protected]<javascript:>> 
> wrote:
> >>> ...required.
> >>> 
> >>> Hello people,
> >>> 
> >>> I'm develping a licence plate recognition system from long ago and I 
> still
> >>> have to improve the use of Tesseract to make it usable.
> >>> 
> >>> My first concern is about speed:
> >>> After extracting the licence plate image, I get an image like this:
> >>> 
> >>> 
> https://docs.google.com/leaf?id=0BxkuvS_LuBAzNmRkODhkYTUtNjcyYS00Nzg5LWE0ZDItNWM4YjRkYzhjYTFh&hl=en&authkey=CP-6tsgP
> >>> 
> >>> As you may see, there are only 6 characters (tess is recognizing more
> >>> because there are some blemishes over there, but I get rid of them 
> with some
> >>> postprocessing of the layout of the recognized chars)
> >>> 
> >>> In an Intel I7 720 (good power, but using a single thread) the 
> tesseract
> >>> part is taking something like 230 ms. This is too much time for what I 
> need.
> >>> 
> >>> The image is 500 x 117 pixels. I noted that when I reduce the size of 
> this
> >>> image the detection time is reduced in proportion with the image area, 
> which
> >>> makes good sense. But the accuracy of the OCR is poor when the 
> characters
> >>> height is below 90 pixels.
> >>> 
> >>> So, I assume that there is a problem with the way I trained tesseract.
> >>> 
> >>> Because the characters in the plates are assorted (3 alphanumeric, 3
> >>> numeric) I trained it with just a single image with all the letters in 
> the
> >>> alphabet. I saw that you suggest large training but I imagine that that
> >>> doesn't apply here where the characters are not organized in words. Am 
> I
> >>> correct with this ?
> >>> 
> >>> So, for you to see, this is the image with what I trained Tesseract:
> >>> 
> >>> 
> https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0BxkuvS_LuBAzODc1YjIxNWUtNzIxMS00Yjg3LTljMDctNDkyZGIxZWM4YWVm&hl=en&authkey=CMXwo-AL
> >>> 
> >>> In this image the characters are about 55 pixels height.
> >>> 
> >>> Then, for frequent_word_list and words_list I included a single entry 
> for
> >>> each character, I mean, something starting with this:
> >>> 
> >>> A
> >>> B
> >>> C
> >>> D
> >>> ...
> >>> 
> >>> Do you see something to be improved on what I did ? Should I perhaps 
> use a
> >>> training image with more letters, with more combinations ? Will that 
> help
> >>> somehow ?
> >>> 
> >>> Should I include in the same image a copy the same character set but 
> with
> >>> smaller size ? In that way, will I be able to pass Tesseract smaller 
> images
> >>> and get more speed without sacrificing detection quality ?
> >>> 
> >>> 
> >>> On the other hand, I found some strange behavior of Tesseract about 
> which I
> >>> would like to know a little more:
> >>> In my preprocessing I tried Otsu thresholding
> >>> (http://en.wikipedia.org/wiki/Otsu%27s_method) and I visually got too 
> much
> >>> better results, but surprisingly for Tesseract it was worse. It 
> decreased
> >>> the thickness of the draw of the chars, and the chars I used to train
> >>> Tesseract were bolder. So, Tesseract matches the "boldness" of the
> >>> characters ? Should I train Tesseract with different levels of 
> boldness ?
> >>> 
> >>> I'm using Tesseract 2.04 for this. Do you think that some of these 
> issues
> >>> will go better by using Tess 3.0 ?
> >>> 
> >>> 
> >>> Thanks,
> >>> 
> >>> Andres
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> --
> >>> You received this message because you are subscribed to the Google 
> Groups
> >>> "tesseract-ocr" group.
> >>> To post to this group, send email to 
> >>> [email protected]<javascript:>
> .
> >>> To unsubscribe from this group, send email to
> >>> [email protected] <javascript:>.
> >>> For more options, visit this group at
> >>> http://groups.google.com/group/tesseract-ocr?hl=en.
> >>> 
> >> 
> >> --
> >> You received this message because you are subscribed to the Google 
> Groups "tesseract-ocr" group.
> >> To post to this group, send email to
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "tesseract-ocr" group.
> > To post to this group, send email to [email protected]<javascript:>
> .
> > To unsubscribe from this group, send email to 
> [email protected] <javascript:>.
> > For more options, visit this group at 
> http://groups.google.com/group/tesseract-ocr?hl=en.
> > 
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: tips for improving Tesseract accuracy and speed...

Reply via email to