Hello Robert, > I was thinking of an earlier post of yours where you were asking if > your source code would need to be re-distributed if you used > Tesseract. I thought that a program based on scripts would probably > be more difficult to keep proprietary (i.e. not fully open source) > than a program based on compiled code, so maybe that's why you > didn't want to work with OCRopus. Not at all, i've read that the core is plain C++, which would be okay. My customer wishes not to reveal hes sources, but my contribution to the Tess core is another thing, and i could be contributing the code while my customer's product would be closed.
> I suspect that OCR is not a simple problem that can be solved with a > clean design. Tesseract is probably filled with small kludges and > workarounds to improve performance. To throw out the code and begin > again based on Tesseract's general design probably means hitting and > working around all the same small problems they already dealt with. Hit me with those problems. > Also, there may be problems in Tesseract's general design that would > be better to avoid in a new project. For example, italics never seem > to be recognized correctly, and someone on this list pointed out a while > ago that the problem is that the bounding boxes for the italic characters > overlap, and this is not handled properly by Tesseract. I'm sure there > are other fundamental problems. How complicated would it be to add a shear factor to the bounding boxes? > For these reasons, I personally think it would be a mistake to start a > new project by reverse engineering Tesseract. I do think that tweaking > the existing code to fix memory leaks and such (maybe introducing > doxygen comments to improve documentation) would be a good thing. Well, i'm not very good at math (Complicated math). This part is not really for me. > Hmm, the Tesseract page shows that two of the people with commit privileges > work on OCRopus now. Maybe helping with OCRopus would be a roundabout way of > getting small fixes pushed upstream to Tesseract. At the least, > they are probably in better contact with Ray Smith than anyone here. Oky, thank you for this information. i'll see if i can do something with them... But then i must ask: What people are doing in this list? Are some of you coding a bit on Tesseract or not at all? Thank you for answering me. Pierre. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

