As a developer I am cautious to estimate the amount of time a code change will take. I am thrilled to have the code and look forward to enhancements as they are ported to .net environments. For now I am cleaning up the image in pre processing steps to remove blobs that are inconsistent with others - this is not a problem in my use case and gets around this tesseract issue just fine.
Thanks to thegroup for clarifying what the issue was. It helped me solve my problem. On Jul 19, 2010 1:01 PM, "patrickq" <[email protected]> wrote: Wrong ... option 2 won't really work unless you want to cut-out individual words. This image where everything in on one line still fails with the same insane forcing of the letters in "John" to be interpreted as tall letters: http://www.scanbizcards.com/johndoeoneline.jpg I think option 2 should be for all of us together now to beg Jimmy to spend the 3-4 hours required to just tell Tesseract to quit this persistent folly of pretending that all blocks are of the same heights. This is issue is arguably the most damaging Tesseract flaw for mixed text material (which is almost everything except books). On Jul 19, 1:34 pm, "Austin Henderson" <[email protected]> wrote: > Ok so safe to say for now my options are.. > > 1- Live with it > 2- Figure out how to get the line... > On 19 July 2010 15:34, Austin Henderson <[email protected]> wrote: > > Thank you for your... > > I just wanted to make sure I didn�t miss an optional setting that would > > allow it to differentiate better between these blocks. > > Nah. Most of the open source OCR guis... > > I suppose I don�t understand why the space before/after the word is not > > "enough" for it to see those as different objects? > > Do you think tosp_table_xht_sp_ratio coul... > > "[email protected]":http://www.scanbizcards.com/johndoe.jpg > > Just because the email address uses a smaller font, Tesseract 3.0 > > stubbornly insists on inte... > For more options, visit this group athttp:// groups.google.com/group/tesseract-ocr?hl=en. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group... -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

