I was in PSM_SINGLE_LINE mode indeed, because my text is already segmented
into lines, and changing to PSM_AUTO does help with the I-I issue, but I
have to say that the overall quality is still better with PSM_SINGLE_LINE.
With PSM_AUTO I start getting all kinds of punctuation and other errors.
Please attach a copy of the image so that I can try.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, Nov 11, 2014 at 9:43 PM, misonis...@gmail.com wrote:
I was in PSM_SINGLE_LINE mode indeed, because my text is
OK, here is a clean example of what I'm talking about. Running vanilla
tesseract 3.02.02 on this image (in eng and single line mode) yields 6od's
family instead of God's family. Adding the 6 - G rule to unicharambigs
made no difference for me.
--
You received this message because you are
You need to pre-process the image so that G shows up correctly. In the
attached image G looks like a 6 as it is connected.
If that is the shape of G in the font and you need to OCR it, you may
either need to retrain or post-process the text.
You could also try with a newer version of program.
I checked with vietocr beta4, which uses newer version of tesseract - it
recognizes your tiff correctly.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Wed, Nov 12, 2014 at 8:12 AM, ShreeDevi Kumar
Yes, I can pre-process each individual image to make it work, but
unfortunately I've been unable to come up with a consistent pre-processing
method that would work in general. I've been trying for a while now.
I've known that retraining is an option from the beginning but I'm
concerned that it
What PSM mode are you in? I see the H chopped into |-| when
using PSM_SINGLE_LINE especially, and I don't think ever with PSM_AUTO.
For my project I was running into the same issue, but I know my glyphs are
not ever touching or overlapping, so I simply disabled chopping all
together. But for
7 matches
Mail list logo