Hello all,

I've noticed that OCRing a treated image the script yields some errors that 
affect its precision.

First I treat the image with the script to enhance it and make it grayscale.

textcleaner -g -e normalize -s 1 scan.tif scan2.tif

Then, I OCR it with tesseract and get the following error messages:

tesseract scan2.tif output2 -l eng
Tesseract Open Source OCR Engine v3.02 with Leptonica
Page 0
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

Now, I noticed that my original text on the image says on the first line:

"than Phone: her line 1 | No feel good"

And I get the following instead on the output2.txt file:

"thaeePleoecbakeorliIZB-TlNofoclgood"

It seems to me that there are adjustments to be done.

I have trained the system to use this font, include the dictionary and do 
all the steps as per the guide.

However, the output is not good at all and the errors might point to 
something that I am doing wrong.

I wonder if anyone could share some experience with me and give me some 
good practices hints on how to fix this.

Thanks.


-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to