Acctually, the accuracy of the OCR is hard to be guaranteed. As i know you 
may select a smaller region from mGray where your text is, before 
createBitmap - so the more heavy methods that follow process a smaller 
image.

On Tuesday, March 29, 2011 at 12:17:47 PM UTC+8, Andres wrote:
>
> ...required.
>
> Hello people,
>
> I'm develping a licence plate recognition system from long ago and I still 
> have to improve the use of Tesseract 
> <http://www.myknown.com/ocr/tesseract-ocr-engine/> to make it usable.
>
> My first concern is about speed:
> After extracting the licence plate image, I get an image like this:
>
>
> https://docs.google.com/leaf?id=0BxkuvS_LuBAzNmRkODhkYTUtNjcyYS00Nzg5LWE0ZDItNWM4YjRkYzhjYTFh&hl=en&authkey=CP-6tsgP
>
> As you may see, there are only 6 characters (tess is recognizing more 
> because there are some blemishes over there, but I get rid of them with 
> some postprocessing of the layout of the recognized chars)
>
> In an Intel I7 720 (good power, but using a single thread) the tesseract 
> part is taking something like 230 ms. This is too much time for what I need.
>
> The image is 500 x 117 pixels. I noted that when I reduce the size of this 
> image the detection time is reduced in proportion with the image area, 
> which makes good sense. But the accuracy of the OCR 
> <http://www.myknown.com/ocr/optimization/> is poor when the characters 
> height is below 90 pixels.
>
> So, I assume that there is a problem with the way I trained tesseract.
>
> Because the characters in the plates are assorted (3 alphanumeric, 3 
> numeric) I trained it with just a single image with all the letters in the 
> alphabet. I saw that you suggest large training but I imagine that that 
> doesn't apply here where the characters are not organized in words. Am I 
> correct with this ?
>
> So, for you to see, this is the image with what I trained Tesseract:
>
>
> https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0BxkuvS_LuBAzODc1YjIxNWUtNzIxMS00Yjg3LTljMDctNDkyZGIxZWM4YWVm&hl=en&authkey=CMXwo-AL
>
> In this image the characters are about 55 pixels height.
>
> Then, for frequent_word_list and words_list I included a single entry for 
> each character, I mean, something starting with this:
>
> A
> B
> C
> D
> ...
>
> Do you see something to be improved on what I did ? Should I perhaps use a 
> training image with more letters, with more combinations ? Will that help 
> somehow ? 
>
> Should I include in the same image a copy the same character set but with 
> smaller size ? In that way, will I be able to pass Tesseract smaller images 
> and get more speed without sacrificing detection quality ?
>
>
> On the other hand, I found some strange behavior of Tesseract about which 
> I would like to know a little more:
> In my preprocessing I tried Otsu thresholding (
> http://en.wikipedia.org/wiki/Otsu%27s_method) and I visually got too much 
> better results, but surprisingly for Tesseract it was worse. It decreased 
> the thickness of the draw of the chars, and the chars I used to train 
> Tesseract were bolder. So, Tesseract matches the "boldness" of the 
> characters ? Should I train Tesseract with different levels of boldness ?
>
> I'm using Tesseract 2.04 for this. Do you think that some of these issues 
> will go better by using Tess 3.0 ?
>
>
> Thanks,
>
> Andres
>
>
>
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/41a8c2fc-9533-4375-925d-71663057a882%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to