On Tuesday, February 18, 2014 4:54:45 PM UTC+8, Nick White wrote:
>
> Hi Richard, 
>
> > So I have tried to sharpen 
> > the image first and then perform OCR, the result is still wrong. 
>
> If you post the image you sent to Tesseract, after you've done all 
> the preprocessing, we can look and see if there's some obvious 
> reason for any recognition errors.


Here is the image I have preprocessed using Scan Tailor. I need to 
mention that  for this image, Scan Tailor did not recognize the skewness
correctly, so I have deskewed it manually. The input to tesseract is
a binary image after content selection and deskewing. 

To check which phase has caused the error, I output the box file
for this binary image again. I have uploaded the image with box
superimposed.

The boxes of the interesting part is not correct.

<https://lh4.googleusercontent.com/-HKtSB7xEg1k/UwQW8UMKSLI/AAAAAAAAAFU/Bzsg2FDbhnk/s1600/box_superimposed.png>

<https://lh4.googleusercontent.com/-FGvO-_LHd5c/UwQWtYdGtLI/AAAAAAAAAFM/sMWxMYpMK3Q/s1600/IMG_20140215_152033_tailored.tif>
 
 

>
>
> > By the way, do you think it will make the recognition process slower if 
> I 
> > enable 
> > Chinese recognition? As you know, the character recognition process is a 
> > template matching process. Given an unknown, more templates means more 
> > candidates to match, which takes longer time. 
>
> Yes, it will almost certainly make the process slower. That's a big 
> disadvantage to that approach. 
>
> > This is what I am thinking of either. Just that I have not figured how 
> > to quickly select out candidate patches. 
>
> That isn't an area I know much about, either. I'm sure it can be 
> done, though... 
>
> Nick 
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to