On Tuesday, February 18, 2014 4:54:45 PM UTC+8, Nick White wrote: > > Hi Richard, > > > So I have tried to sharpen > > the image first and then perform OCR, the result is still wrong. > > If you post the image you sent to Tesseract, after you've done all > the preprocessing, we can look and see if there's some obvious > reason for any recognition errors.
Here is the image I have preprocessed using Scan Tailor. I need to mention that for this image, Scan Tailor did not recognize the skewness correctly, so I have deskewed it manually. The input to tesseract is a binary image after content selection and deskewing. To check which phase has caused the error, I output the box file for this binary image again. I have uploaded the image with box superimposed. The boxes of the interesting part is not correct. <https://lh4.googleusercontent.com/-HKtSB7xEg1k/UwQW8UMKSLI/AAAAAAAAAFU/Bzsg2FDbhnk/s1600/box_superimposed.png> <https://lh4.googleusercontent.com/-FGvO-_LHd5c/UwQWtYdGtLI/AAAAAAAAAFM/sMWxMYpMK3Q/s1600/IMG_20140215_152033_tailored.tif> > > > > By the way, do you think it will make the recognition process slower if > I > > enable > > Chinese recognition? As you know, the character recognition process is a > > template matching process. Given an unknown, more templates means more > > candidates to match, which takes longer time. > > Yes, it will almost certainly make the process slower. That's a big > disadvantage to that approach. > > > This is what I am thinking of either. Just that I have not figured how > > to quickly select out candidate patches. > > That isn't an area I know much about, either. I'm sure it can be > done, though... > > Nick > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

