Hello everyone, I am currently using tesseract 3.x for license plate recognition. I have an algorithm which does a good job in pre-processing the input image to localize the plate. However, when I use the Tesseract OCR engine to classify the plate number, the recognition is not that accurate. I have gone through the tesseract whitepapers as well as some of the threads discussing the LPR using tesseract.
>From all this, I have identified the following ways of improving the results: 1. Customise the tesseract engine to recognize only the characters from A-Z,0-9,.(dot), (space) by setting the character white-list. My understanding is that the white-list is the list of characters that are going to be sensed. I was inquisitive to know what the blacklist is meant to do? 2. A lot of times I have seen fairly good number plate images being OCRed inaccurately. This could possibly be due to the word recognition stage. Has anyone found a way to disable the dictionary / word recognition. 3. Then there are some page segmentation modes (PSM_AUTO,PSM_SINGLE_BLOCK, PSM_CHAR etc). Does PSM_CHAR imply that it will consider the input image as a single character and run the algorithm accordingly without attempting word recognition? 4. Another important configuration macro that I have seen within the code was AVS_FASTEST = 0, AVS_MOST_ACCURATE = 100. However, I could not find the same being used anywhere in the code. Does this have any impact on the *character recognition* accuracy? 5. Finally, I also plan to use the confidence level data. Are there any indicators of confidence for characters as well. There is word confidence data which can be found in TessBaseAPI::AllWordConfidences(). Awaiting your valuable insights. Thank you. Regards, Saurabh Gandhi -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

