On 20 July 2010 02:52, Austin Henderson <[email protected]> wrote: > As a developer I am cautious to estimate the amount of time a code change > will take.
:D I like you a lot right now. > I am thrilled to have the code and look forward to enhancements > as they are ported to .net environments. Nobody has mentioned any plans to write a .net wrapper for Tesseract 3, and the developer of tessnet2 has mentioned that he would rather pay for someone to reimplement Tesseract than touch it again, so I wouldn't hold my breath, if I were you. (On a related note, I spent a little while yesterday looking at some truly horrifically written spaghetti code[1], so I'm a little less unsympathetic than before, but I think he's seriously underestimating the magnitude of such a reimplementation). [1] Reminded me of this: http://www.ioccc.org/ > For now I am cleaning up the image > in pre processing steps to remove blobs that are inconsistent with others - > this is not a problem in my use case and gets around this tesseract issue > just fine. > > Thanks to thegroup for clarifying what the issue was. It helped me solve my > problem. > > On Jul 19, 2010 1:01 PM, "patrickq" <[email protected]> wrote: > > Wrong ... option 2 won't really work unless you want to cut-out > individual words. This image where everything in on one line still > fails with the same insane forcing of the letters in "John" to be > interpreted as tall letters: > http://www.scanbizcards.com/johndoeoneline.jpg > > I think option 2 should be for all of us together now to beg Jimmy to > spend the 3-4 hours required to just tell Tesseract to quit this > persistent folly of pretending that all blocks are of the same > heights. This is issue is arguably the most damaging Tesseract flaw > for mixed text material (which is almost everything except books). > > On Jul 19, 1:34 pm, "Austin Henderson" <[email protected]> > wrote: > >> Ok so safe to say for now my options are.. >> >> 1- Live with it >> 2- Figure out how to get the line... > >> On 19 July 2010 15:34, Austin Henderson <[email protected]> >> wrote: >> > Thank you for your... > >> > I just wanted to make sure I didn�t miss an optional setting that >> > would > >> > allow it to differentiate better between these blocks. >> >> Nah. Most of the open source OCR guis... > >> > I suppose I don�t understand why the space before/after the word is >> > not > >> > "enough" for it to see those as different objects? >> > Do you think tosp_table_xht_sp_ratio coul... > >> > "[email protected]":http://www.scanbizcards.com/johndoe.jpg > >> > Just because the email address uses a smaller font, Tesseract 3.0 >> > stubbornly insists on inte... > >> For more options, visit this group >> athttp://groups.google.com/group/tesseract-ocr?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group... > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > -- <Leftmost> jimregan, that's because deep inside you, you are evil. <Leftmost> Also not-so-deep inside you. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

