On 27 August 2010 14:06, Albert Zeyer <[email protected]> wrote: > Am 27.08.10 11:53, schrieb Jimmy O'Regan: > > On 26 August 2010 16:27, albert <[email protected]> wrote: >
[I don't know what e-mail client you're using, but it's completely useless at quoting text] > > Ah, but I am asking for more than just be able to scan math symbols. I want > to have support to scan full formulas which can be quite complex. A > combination of \frac, \int, \sum, etc. I realise that. If I had thought you wanted to recognise individual symbols, I would have told you to retrain for those characters. As it is, I pointed you to the enhancement request, which, as you seem to not have read it, has some - admittedly, not much - extra information on the topic. > It must not only detect the symbols, > it must also see how they belong together (for example the numerator and the > denominator in a fraction). > > Is it possible to extend Tesseract to be able to do this or is some heavy > redesign of the whole engine needed (and some fundamental other technics) to > do this? > The only current system available for maths recognition - the link is in the enhancement request - contains its maths recognition as a separate engine. I don't think that's strictly necessary, but maths would need to be processed in an entirely different way, and a formula detection mechanism would be required to ensure it is handled in a different way. At the very least, the formula would need to be segmented into a grid, because relative position and size is much more significant than in text - not just in detecting superscripts/subscripts, but also in determining if pi means pi or product, etc. -- <Leftmost> jimregan, that's because deep inside you, you are evil. <Leftmost> Also not-so-deep inside you. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

