Oliver You would need to change your image processing before sending the image to tesseract. This image can't be read by Tesseract any better by training as the pixels around the text are noise that needs to be removed before sending them to Tesseract.
On Thursday, February 15, 2018 at 5:33:44 PM UTC+5:30, Olivier Demin wrote: > > Hi all. I'm completely new to tesseract, so please apologise for > potential "dummy" questions. You're free to make "dummy" answers as well > :-) > > I would like to OCRize pictures of bank cards in order to extract bank > account numbers. I can post-process easily the recognized text with > regular expressions in order to extract the bank account number, but the > quality of the OCR is not good enough with the default parameters of > tesseract. Here is an example of an original picture, and the resulting > image after processing by tesseract. I want to extract the number starting > with "BE19 3770", but tesseract returns "$193,770 7513" instead. Is it > possible to improve this by tuning the tesseract parameters, or do I need > another image processing library to prepare my images before tesseract ? > > > > <https://lh3.googleusercontent.com/-yBwtnhYI8OQ/WoVpR8S8CqI/AAAAAAAAG98/0n54CoBZOOkZ55DWULcCUN6LkIZT9tu3ACLcBGAs/s1600/original.PNG> > > <https://lh3.googleusercontent.com/-8CanWZj7HHw/WoVpVeyjL5I/AAAAAAAAG-A/FYjEi3t4LOAXy3YL_etCUqta1tYLHo3HQCLcBGAs/s1600/processed.PNG> > > > > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ba61d568-9966-4722-9a53-42e08567ecbb%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

