Hi Pierre, I'll respond to what I did not in the other message -- I don't think that Ray Smith has stated that he was abandoning the project. His last message was from December. However, the last source code change was from last summer. I'm not sure what's happening. I expect that Tesseract would be of use to Google since it has the potential to be a high accuracy engine under their control that could recognize scripts of many languages -- their goal is to index all the information out there, and much of it is in strange languages requiring special scripts be OCR'd.
I would say, please email Ray Smith and the two OCRopus developers suggesting a fork to allow community involvement. I suspect there is just a pause in development -- I remember that there was a similar one last year, and then some improvements came. If they are willing to open up to allow even one or two other developers, then great. We could make small contributions through your oversight and maybe one or two others. Otherwise we could fork and move to SourceForge or something. I believe there are 3 or 4 others here who would make moderately large contributions, and a few like me who could help here and there (my main language is not C++, but perl, Java, and Python). Once the inertia is overcome I believe several other developers will come along, since this is a very promising project. Godspeed! --Sven On Wed, Apr 7, 2010 at 1:26 PM, MARTIN Pierre <[email protected]> wrote: > >> Hi Martin, > Hello Sven, by the way, my first name is Pierre ;) > >> While I agree to some extent with Fuad that there have not been many >> updates from Ray Smith, it is not safe to assume that he has abandoned >> the project. > Are there any news from Ray saying this or the opposite? > >> This is something that is very useful to Google > In what way? Is such a program *really* used by Google for recognition, > knowing it's containing so much memleaks for example? > >> , and I believe that they are doing their usual take on open source >> development -- making an internal version before releasing it to the >> public, then doing more internal development and releasing that. >> OCRopus clearly done in a more open fashion, but support for multiple >> character sets is likely to be provided by Tesseract. > That's good news, it would mean Tess has something OCRopus doesn't, which > would make it a priority for allocating resources dedicated to it's > maintenance... But come on... How hard would it be for OCRopus to support > multi charsets? Not much i think. > >> There are indications that the code is improving and stabilizing and >> that new features are developing, even though it is at a scale of a >> year or two rather than a few months. This is a very old program in >> software years and it is wise that they are cleaning up the API slowly >> so as to maintain stability. > i understand your point here, but then why "closing" this aspect of the > collaboration? i find myself already improving the sources of Tesseract, this > over less than 2 days. i'm no Ph.d, i'm no engeeneer, i'm just a really good > C / C++ developer with 11 years of experience, which is clearly what Tess > would miss here: clean and modular OO re-modeling. But that's really not a > one-man's work, it's a team work, which is totally avoided by this "closed" > state of Tesseract. > > Now, let's say i have the real and hopefull wish of making Tesseract better, > or at least a fork of Tesseract, allowing it to be Apache 2 licensed too. Who > would be able to contribute to that ambition? Are there many good > mathematician around in this list, or just end-users "waiting" for new > versions? Who would like to team up with us? > > Again, thanks a lot for your time, this project needs more of it. Human time. > Pierre. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

