Hi Wasim, Where are you from? Thanks, Adam
On Thu, Oct 9, 2014 at 11:22 AM, Wasim Safdar <[email protected]> wrote: > Hello, > I saw this post and it is very much interesting. I am also > working on tesseract. I think that preprocessing of image or downscaling > the original image decreases efficiency of algorithm. Preprocessing of > image also slows down the overall execution time. I think you are training > the images well. What you can do is to train the tesseract of different > character sizes. Then if you downscale your image, it will not effect > efficiency and also your speed increases. > > > On Tuesday, March 29, 2011 6:17:47 AM UTC+2, Andres wrote: >> >> ...required. >> >> Hello people, >> >> I'm develping a licence plate recognition system from long ago and I >> still have to improve the use of Tesseract to make it usable. >> >> My first concern is about speed: >> After extracting the licence plate image, I get an image like this: >> >> https://docs.google.com/leaf?id=0BxkuvS_LuBAzNmRkODhkYTUtNjcyYS00Nzg5L >> WE0ZDItNWM4YjRkYzhjYTFh&hl=en&authkey=CP-6tsgP >> >> As you may see, there are only 6 characters (tess is recognizing more >> because there are some blemishes over there, but I get rid of them with >> some postprocessing of the layout of the recognized chars) >> >> In an Intel I7 720 (good power, but using a single thread) the tesseract >> part is taking something like 230 ms. This is too much time for what I need. >> >> The image is 500 x 117 pixels. I noted that when I reduce the size of >> this image the detection time is reduced in proportion with the image area, >> which makes good sense. But the accuracy of the OCR is poor when the >> characters height is below 90 pixels. >> >> So, I assume that there is a problem with the way I trained tesseract. >> >> Because the characters in the plates are assorted (3 alphanumeric, 3 >> numeric) I trained it with just a single image with all the letters in the >> alphabet. I saw that you suggest large training but I imagine that that >> doesn't apply here where the characters are not organized in words. Am I >> correct with this ? >> >> So, for you to see, this is the image with what I trained Tesseract: >> >> https://docs.google.com/viewer?a=v&pid=explorer& >> chrome=true&srcid=0BxkuvS_LuBAzODc1YjIxNWUtNzIxMS00Yjg3L >> TljMDctNDkyZGIxZWM4YWVm&hl=en&authkey=CMXwo-AL >> >> In this image the characters are about 55 pixels height. >> >> Then, for frequent_word_list and words_list I included a single entry for >> each character, I mean, something starting with this: >> >> A >> B >> C >> D >> ... >> >> Do you see something to be improved on what I did ? Should I perhaps use >> a training image with more letters, with more combinations ? Will that help >> somehow ? >> >> Should I include in the same image a copy the same character set but with >> smaller size ? In that way, will I be able to pass Tesseract smaller images >> and get more speed without sacrificing detection quality ? >> >> >> On the other hand, I found some strange behavior of Tesseract about which >> I would like to know a little more: >> In my preprocessing I tried Otsu thresholding ( >> http://en.wikipedia.org/wiki/Otsu%27s_method) and I visually got too >> much better results, but surprisingly for Tesseract it was worse. It >> decreased the thickness of the draw of the chars, and the chars I used to >> train Tesseract were bolder. So, Tesseract matches the "boldness" of the >> characters ? Should I train Tesseract with different levels of boldness ? >> >> I'm using Tesseract 2.04 for this. Do you think that some of these issues >> will go better by using Tess 3.0 ? >> >> >> Thanks, >> >> Andres >> >> >> >> >> >> >> -- > You received this message because you are subscribed to a topic in the > Google Groups "tesseract-ocr" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/tesseract-ocr/enwft4qSDfE/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/0c0ae433-2ef5-4df8-aff5-b80e4558e4f4%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/0c0ae433-2ef5-4df8-aff5-b80e4558e4f4%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CA%2BZB5uevoNB1Bp%3DeOQCF1NUpM15F0WKZN900wMVg9aVWj6Fi6A%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

