Hello, everyone, I am doing OCR to detect sanned supplier certificate. On the image, there are Chinese simple and traditional and English languages. With standard OCR api, the accuracy is not high, since there are lots of noise, red/blue seal/circles, special terminologies on image. Pleas help me, experts. For example: we need to extract the company name, address, valid date Q1: how to do image preprocessing Q2: how to extract the texts we need Q3: if I use tesseract API, do I need to prepare teminologies to add to the language data
Thank you [image: 01.jpg] -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/76f04f16-7792-4ecb-b121-62108ee8ed0d%40googlegroups.com.