I don't know if it's affordable for you, but imho decent results can only be achieved if you do segmentation yourself and then pass image fragments to Tesseract on a word-by-word basis. Problems may appear when you have words that are too short, however, as I can see, it's not your case.
Long time ago, I had started my project relying on Tess's segmentation and struggled much with it, until I came to a word-by-word approach. Finally, I even switched to the character-wise recognition which at last produces decent results. Mostly this transition was caused by specifics of input images I'm working on (photos, usually of low quality), but I think this is almost required for ideally scanned images too. There are some fruitful math ideas behind Tess's segmentation, but I think the current implementation is not mature enough to be used extensively in the production mode. Warm regards, Dmitry Silaev On Thu, Feb 24, 2011 at 1:05 PM, Jose <[email protected]> wrote: > Hi, (as you now Saurabh because we talked in private the other day) I tried > the PSM_SINGLE_COLUMN and the accuracy drops dramatically... I can't afford > to loose that accuracy. Is it possible to change the way the output is > display? Looking a the code it seems rather hard to change it... perhaps I > could print the pos x,y of the word found and then I could work out the > horizontal/vertial layout? What are your thoughts? regards > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

