Am 27.08.10 11:53, schrieb Jimmy O'Regan:
On 26 August 2010 16:27, albert<[email protected]>  wrote:
Hi,

I need an open OCR library which is able to scan complex printed math
formulas (for example some formulas which were generated via LaTeX). I
want to get some LaTeX-like output (or just some AST-like data).

Can Tesseract do this? Is there something like this already? Or are
current OCR technics just able to parse line-oriented text?
Tesseract does not do that. There's an open enhancement request that
might have more information:
http://code.google.com/p/tesseract-ocr/issues/detail?id=270

Ah, but I am asking for more than just be able to scan math symbols. I want to have support to scan full formulas which can be quite complex. A combination of \frac, \int, \sum, etc. It must not only detect the symbols, it must also see how they belong together (for example the numerator and the denominator in a fraction).

Is it possible to extend Tesseract to be able to do this or is some heavy redesign of the whole engine needed (and some fundamental other technics) to do this?

//

--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to