[tesseract-ocr] Re: 6od instead of God

2014-11-11 Thread misonistic
I was in PSM_SINGLE_LINE mode indeed, because my text is already segmented into lines, and changing to PSM_AUTO does help with the I-I issue, but I have to say that the overall quality is still better with PSM_SINGLE_LINE. With PSM_AUTO I start getting all kinds of punctuation and other errors.

Re: [tesseract-ocr] Re: 6od instead of God

2014-11-11 Thread ShreeDevi Kumar
Please attach a copy of the image so that I can try. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Nov 11, 2014 at 9:43 PM, misonis...@gmail.com wrote: I was in PSM_SINGLE_LINE mode indeed, because my text is

[tesseract-ocr] Re: 6od instead of God

2014-11-11 Thread misonistic
OK, here is a clean example of what I'm talking about. Running vanilla tesseract 3.02.02 on this image (in eng and single line mode) yields 6od's family instead of God's family. Adding the 6 - G rule to unicharambigs made no difference for me. -- You received this message because you are

Re: [tesseract-ocr] Re: 6od instead of God

2014-11-11 Thread ShreeDevi Kumar
You need to pre-process the image so that G shows up correctly. In the attached image G looks like a 6 as it is connected. If that is the shape of G in the font and you need to OCR it, you may either need to retrain or post-process the text. You could also try with a newer version of program.

Re: [tesseract-ocr] Re: 6od instead of God

2014-11-11 Thread ShreeDevi Kumar
I checked with vietocr beta4, which uses newer version of tesseract - it recognizes your tiff correctly. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Nov 12, 2014 at 8:12 AM, ShreeDevi Kumar

[tesseract-ocr] Re: 6od instead of God

2014-11-11 Thread misonistic
Yes, I can pre-process each individual image to make it work, but unfortunately I've been unable to come up with a consistent pre-processing method that would work in general. I've been trying for a while now. I've known that retraining is an option from the beginning but I'm concerned that it

[tesseract-ocr] Re: 6od instead of God

2014-11-10 Thread Ryan Dev
What PSM mode are you in? I see the H chopped into |-| when using PSM_SINGLE_LINE especially, and I don't think ever with PSM_AUTO. For my project I was running into the same issue, but I know my glyphs are not ever touching or overlapping, so I simply disabled chopping all together. But for