As a developer I am cautious to estimate the amount of time a code change
will take. I am thrilled to have the code and look forward to enhancements
as they are ported to .net environments. For now I am cleaning up the image
in pre processing steps to remove blobs that are inconsistent with others -
this is not a problem in my use case and gets around this tesseract issue
just fine.

Thanks to thegroup for clarifying what the issue was. It helped me solve my
problem.

On Jul 19, 2010 1:01 PM, "patrickq" <[email protected]> wrote:

Wrong ... option 2 won't really work unless you want to cut-out
individual words. This image where everything in on one line still
fails with the same insane forcing of the letters in "John" to be
interpreted as tall letters:
http://www.scanbizcards.com/johndoeoneline.jpg

I think option 2 should be for all of us together now to beg Jimmy to
spend the 3-4 hours required to just tell Tesseract to quit this
persistent folly of pretending that all blocks are of the same
heights. This is issue is arguably the most damaging Tesseract flaw
for mixed text material (which is almost everything except books).

On Jul 19, 1:34 pm, "Austin Henderson" <[email protected]>
wrote:

> Ok so safe to say for now my options are..
>
> 1- Live with it
> 2- Figure out how to get the line...

> On 19 July 2010 15:34, Austin Henderson <[email protected]>
wrote:
> > Thank you for your...
> > I just wanted to make sure I didn�t miss an optional setting that
would

> > allow it to differentiate better between these blocks.
>
> Nah. Most of the open source OCR guis...
> > I suppose I don�t understand why the space before/after the word is
not

> > "enough" for it to see those as different objects?
> > Do you think tosp_table_xht_sp_ratio coul...
> > "[email protected]":http://www.scanbizcards.com/johndoe.jpg

> > Just because the email address uses a smaller font, Tesseract 3.0
> > stubbornly insists on inte...
> For more options, visit this group athttp://
groups.google.com/group/tesseract-ocr?hl=en.


-- 
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group...

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to