Re: Tesseract Reading Issue

Jimmy O'Regan Tue, 20 Jul 2010 05:20:27 -0700

On 20 July 2010 02:52, Austin Henderson <[email protected]> wrote:
> As a developer I am cautious to estimate the amount of time a code change
> will take.


:D I like you a lot right now.

> I am thrilled to have the code and look forward to enhancements
> as they are ported to .net environments.

Nobody has mentioned any plans to write a .net wrapper for Tesseract
3, and the developer of tessnet2 has mentioned that he would rather
pay for someone to reimplement Tesseract than touch it again, so I
wouldn't hold my breath, if I were you.

(On a related note, I spent a little while yesterday looking at some
truly horrifically written spaghetti code[1], so I'm a little less
unsympathetic than before, but I think he's seriously underestimating
the magnitude of such a reimplementation).

[1] Reminded me of this: http://www.ioccc.org/

> For now I am cleaning up the image
> in pre processing steps to remove blobs that are inconsistent with others -
> this is not a problem in my use case and gets around this tesseract issue
> just fine.
>
> Thanks to thegroup for clarifying what the issue was. It helped me solve my
> problem.
>
> On Jul 19, 2010 1:01 PM, "patrickq" <[email protected]> wrote:
>
> Wrong ... option 2 won't really work unless you want to cut-out
> individual words. This image where everything in on one line still
> fails with the same insane forcing of the letters in "John" to be
> interpreted as tall letters:
> http://www.scanbizcards.com/johndoeoneline.jpg
>
> I think option 2 should be for all of us together now to beg Jimmy to
> spend the 3-4 hours required to just tell Tesseract to quit this
> persistent folly of pretending that all blocks are of the same
> heights. This is issue is arguably the most damaging Tesseract flaw
> for mixed text material (which is almost everything except books).
>
> On Jul 19, 1:34 pm, "Austin Henderson" <[email protected]>
> wrote:
>
>> Ok so safe to say for now my options are..
>>
>> 1- Live with it
>> 2- Figure out how to get the line...
>
>> On 19 July 2010 15:34, Austin Henderson <[email protected]>
>> wrote:
>> > Thank you for your...
>
>> > I just wanted to make sure I didnï¿½t miss an optional setting that
>> > would
>
>> > allow it to differentiate better between these blocks.
>>
>> Nah. Most of the open source OCR guis...
>
>> > I suppose I donï¿½t understand why the space before/after the word is
>> > not
>
>> > "enough" for it to see those as different objects?
>> > Do you think tosp_table_xht_sp_ratio coul...
>
>> > "[email protected]":http://www.scanbizcards.com/johndoe.jpg
>
>> > Just because the email address uses a smaller font, Tesseract 3.0
>> > stubbornly insists on inte...
>
>> For more options, visit this group
>> athttp://groups.google.com/group/tesseract-ocr?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group...
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>



-- 
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Re: Tesseract Reading Issue

Reply via email to