[tesseract-ocr] New fonts/teminologies in Chinese

Bo Tang Wed, 12 Feb 2020 02:00:01 -0800

Hello, everyone, I am doing OCR to detect sanned supplier certificate. On 
the image, there are Chinese simple and traditional and English languages. 
With standard OCR api, the accuracy is not high, since there are lots of 
noise, red/blue seal/circles, special terminologies on image. Pleas help 
me, experts. 
For example: we need to extract the company name, address, valid date 
Q1： how to do image preprocessing
Q2: how to extract the texts we need
Q3: if I use tesseract API, do I need to prepare teminologies to add to the 
language data


Thank you 

[image: 01.jpg]


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/76f04f16-7792-4ecb-b121-62108ee8ed0d%40googlegroups.com.

[tesseract-ocr] New fonts/teminologies in Chinese

Reply via email to