On 27 August 2010 14:06, Albert Zeyer <[email protected]> wrote:
> Am 27.08.10 11:53, schrieb Jimmy O'Regan:
>
> On 26 August 2010 16:27, albert <[email protected]> wrote:
>

[I don't know what e-mail client you're using, but it's completely
useless at quoting text]

>
> Ah, but I am asking for more than just be able to scan math symbols. I want
> to have support to scan full formulas which can be quite complex. A
> combination of \frac, \int, \sum, etc.

I realise that. If I had thought you wanted to recognise individual
symbols, I would have told you to retrain for those characters.

As it is, I pointed you to the enhancement request, which, as you seem
to not have read it, has some - admittedly, not much - extra
information on the topic.

> It must not only detect the symbols,
> it must also see how they belong together (for example the numerator and the
> denominator in a fraction).
>
> Is it possible to extend Tesseract to be able to do this or is some heavy
> redesign of the whole engine needed (and some fundamental other technics) to
> do this?
>

The only current system available for maths recognition - the link is
in the enhancement request - contains its maths recognition as a
separate engine. I don't think that's strictly necessary, but maths
would need to be processed in an entirely different way, and a formula
detection mechanism would be required to ensure it is handled in a
different way. At the very least, the formula would need to be
segmented into a grid, because relative position and size is much more
significant than in text - not just in detecting
superscripts/subscripts, but also in determining if pi means pi or
product, etc.

-- 
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to