Am 27.08.10 11:53, schrieb Jimmy O'Regan:
On 26 August 2010 16:27, albert<[email protected]> wrote:
Hi,
I need an open OCR library which is able to scan complex printed math
formulas (for example some formulas which were generated via LaTeX). I
want to get some LaTeX-like output (or just some AST-like data).
Can Tesseract do this? Is there something like this already? Or are
current OCR technics just able to parse line-oriented text?
Tesseract does not do that. There's an open enhancement request that
might have more information:
http://code.google.com/p/tesseract-ocr/issues/detail?id=270
Ah, but I am asking for more than just be able to scan math symbols. I
want to have support to scan full formulas which can be quite complex. A
combination of \frac, \int, \sum, etc. It must not only detect the
symbols, it must also see how they belong together (for example the
numerator and the denominator in a fraction).
Is it possible to extend Tesseract to be able to do this or is some
heavy redesign of the whole engine needed (and some fundamental other
technics) to do this?
//
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en.