OCRopus, which can work with Tesseract, has some degree of capability
to break down pages into segments (layout analysis). It has the
ability to discern lines. It is due to go "beta" this month in version
0.5, but you can check out version 0.4 right now. There is also a GUI
in development. OCRopus and Tesseract are due to be integrated more in
the near future...
Regards...

On Aug 12, 6:12 am, Alcareru <[email protected]> wrote:
> Hi
>
> If the rows on an image are close to each other and somelettersare
> connected (like for example j connects to an l on a row below it.)
> tesseract fails to process the image right. At best it ignores the
> letter below totally or pulls it on the row above. So if have text
> like (l conects to g):
> Aug
>  Helsinki
> it is read as:
> Aulg
>  He sinki
> or
> Aug
>  He sinki
>
> Is there anything one can do to avoid that, (I'm not too keen on
> trying to implement an algorithm that tries to figure out where the
> rows go and space them out a bit, which is the only thing I can come
> up with.)? Are future releases of tesseract possibly addressing this
> issue?
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to