Customising Tesseract for character recognition

Saurabh Gandhi Wed, 16 Feb 2011 20:54:56 -0800

Hello everyone,

I am currently using tesseract 3.x for license plate recognition.
I have an algorithm which does a good job in pre-processing the input image 
to localize the plate.
However, when I use the Tesseract OCR engine to classify the plate number, 
the recognition is not that accurate. I have gone through the tesseract 
whitepapers as well as some of the threads discussing the LPR using 
tesseract.


>From all this, I have identified the following ways of improving the 
results:

   1. Customise the tesseract engine to recognize only the characters from 
   A-Z,0-9,.(dot), (space) by setting the character white-list. My 
   understanding is that the white-list is the list of characters that are 
   going to be sensed. I was inquisitive to know what the blacklist is meant to 
   do?
   2. A lot of times I have seen fairly good number plate images being OCRed 
   inaccurately. This could possibly be due to the word recognition stage. Has 
   anyone found a way to disable the dictionary / word recognition.
   3. Then there are some page segmentation modes 
   (PSM_AUTO,PSM_SINGLE_BLOCK, PSM_CHAR etc). Does PSM_CHAR imply that it will 
   consider the input image as a single character and run the algorithm 
   accordingly without attempting word recognition?
   4. Another important configuration macro that I have seen within the code 
   was AVS_FASTEST = 0,  AVS_MOST_ACCURATE = 100. However, I could not find the 
   same being used anywhere in the code. Does this have any impact on the 
*character 
   recognition* accuracy?
   5. Finally, I also plan to use the confidence level data. Are there any 
   indicators of confidence for characters as well. There is word confidence 
   data which can be found in TessBaseAPI::AllWordConfidences().

Awaiting your valuable insights.
Thank you.

Regards,
Saurabh Gandhi

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Customising Tesseract for character recognition

Reply via email to