Re: [tesseract-ocr] Re: tips for improving Tesseract accuracy and speed...

Adam Fri, 10 Oct 2014 09:53:36 -0700

Hi Wasim,
Where are you from?
Thanks,
Adam


On Thu, Oct 9, 2014 at 11:22 AM, Wasim Safdar <[email protected]>
wrote:

> Hello,
>           I saw this post and it is very much interesting. I am also
> working on tesseract. I think that preprocessing of image or downscaling
> the original image decreases efficiency of algorithm. Preprocessing of
> image also slows down the overall execution time. I think you are training
> the images well. What you can do is to train the tesseract of different
> character sizes. Then if you downscale your image, it will not effect
> efficiency and also your speed increases.
>
>
> On Tuesday, March 29, 2011 6:17:47 AM UTC+2, Andres wrote:
>>
>> ...required.
>>
>> Hello people,
>>
>> I'm develping a licence plate recognition system from long ago and I
>> still have to improve the use of Tesseract to make it usable.
>>
>> My first concern is about speed:
>> After extracting the licence plate image, I get an image like this:
>>
>> https://docs.google.com/leaf?id=0BxkuvS_LuBAzNmRkODhkYTUtNjcyYS00Nzg5L
>> WE0ZDItNWM4YjRkYzhjYTFh&hl=en&authkey=CP-6tsgP
>>
>> As you may see, there are only 6 characters (tess is recognizing more
>> because there are some blemishes over there, but I get rid of them with
>> some postprocessing of the layout of the recognized chars)
>>
>> In an Intel I7 720 (good power, but using a single thread) the tesseract
>> part is taking something like 230 ms. This is too much time for what I need.
>>
>> The image is 500 x 117 pixels. I noted that when I reduce the size of
>> this image the detection time is reduced in proportion with the image area,
>> which makes good sense. But the accuracy of the OCR is poor when the
>> characters height is below 90 pixels.
>>
>> So, I assume that there is a problem with the way I trained tesseract.
>>
>> Because the characters in the plates are assorted (3 alphanumeric, 3
>> numeric) I trained it with just a single image with all the letters in the
>> alphabet. I saw that you suggest large training but I imagine that that
>> doesn't apply here where the characters are not organized in words. Am I
>> correct with this ?
>>
>> So, for you to see, this is the image with what I trained Tesseract:
>>
>> https://docs.google.com/viewer?a=v&pid=explorer&;
>> chrome=true&srcid=0BxkuvS_LuBAzODc1YjIxNWUtNzIxMS00Yjg3L
>> TljMDctNDkyZGIxZWM4YWVm&hl=en&authkey=CMXwo-AL
>>
>> In this image the characters are about 55 pixels height.
>>
>> Then, for frequent_word_list and words_list I included a single entry for
>> each character, I mean, something starting with this:
>>
>> A
>> B
>> C
>> D
>> ...
>>
>> Do you see something to be improved on what I did ? Should I perhaps use
>> a training image with more letters, with more combinations ? Will that help
>> somehow ?
>>
>> Should I include in the same image a copy the same character set but with
>> smaller size ? In that way, will I be able to pass Tesseract smaller images
>> and get more speed without sacrificing detection quality ?
>>
>>
>> On the other hand, I found some strange behavior of Tesseract about which
>> I would like to know a little more:
>> In my preprocessing I tried Otsu thresholding (
>> http://en.wikipedia.org/wiki/Otsu%27s_method) and I visually got too
>> much better results, but surprisingly for Tesseract it was worse. It
>> decreased the thickness of the draw of the chars, and the chars I used to
>> train Tesseract were bolder. So, Tesseract matches the "boldness" of the
>> characters ? Should I train Tesseract with different levels of boldness ?
>>
>> I'm using Tesseract 2.04 for this. Do you think that some of these issues
>> will go better by using Tess 3.0 ?
>>
>>
>> Thanks,
>>
>> Andres
>>
>>
>>
>>
>>
>>
>>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/tesseract-ocr/enwft4qSDfE/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/0c0ae433-2ef5-4df8-aff5-b80e4558e4f4%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/0c0ae433-2ef5-4df8-aff5-b80e4558e4f4%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CA%2BZB5uevoNB1Bp%3DeOQCF1NUpM15F0WKZN900wMVg9aVWj6Fi6A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Re: tips for improving Tesseract accuracy and speed...

Reply via email to