Re: Using tesseract from a C++ application.

MARTIN Pierre Thu, 08 Apr 2010 16:50:20 -0700

The rest of my answer, which is another subject:

> Otherwise we could fork and move to SourceForge or
> something.
This is secondary. Google code seems fine, except for the feedback system which 
i find very poor. i like Trac more. And this would be a requirement, a good 
feedback system, to avoid the project to vegetate for too long again :)
i won't do anything until i have some information about Ray or at least 
something more "into" the maintenance, since i don't want to be forking 
something still under maintenance, even at poor rate. What would be the best, 
would be a centralized fork, where everyone contributes in a public branch, and 
where we merge serious contributions in a release branch (Moderated).


> I believe there are 3 or 4 others here who would make
> moderately large contributions, and a few like me who could help here
> and there
This is a very good news. i'm not looking for a "corporate" rate of 
maintenance, but the assurance that if i stand up for the project, this won't 
be in vain (Because clearly not a one-man work). i myself would do a lot of 
effort about this project, but being alone is not an option.

> (my main language is not C++, but perl, Java, and Python).
This is no big deal, as long as we remodel the objects used in Tess. i myself 
have created fairly big projects, always aiming to provide a framework easy to 
apprehend for developpers from "elsewhere". Being able to do Python is 
important, because the remodeling for Tess will require a lot of object 
background, but i believe that's part of the brainstorm we'll have to have: 
making the first modification an ease for making more later, exposing easy to 
use objects and types.

> Once the inertia is overcome I believe several other developers will come 
> along, since this is a very promising project.
That's exactly my point, agreed :)

As i've wrote a while ago on the wiki, there are a few misstakes to avoid, the 
first one is obviously to rely on another library, then another, then another. 
Being able to read JPG or TIFF is not an OCR's library concern. It should 
accept a const unsigned char* and a width / height tuple, and that's all, given 
that the data should be always 1bpp for it to handle the best. The conversion 
problem is if i might say "the developer problem", the library itself isn't 
easy to use as standalone, so a developer who knows what he's doing will also 
know how to interface the I/O.

Also, unifying the parametters should be a good idea. The configuration system 
is very obscure, i had to go thru the source code to know how to set up the 
API. We could be designing a powerfull yet simple API which takes tuples for 
configuration (And that's clearly an attempt i've seen with this strange 
"varable" class i've seen, mixing macros and ugly copy-constructed entities). 
For example, using a framework such as QT (Which is LGPL and thus doesn't beak 
the Tess license neither) would offer premade functionnality, yet very 
powerfull. QImage for manipulating pixels very fast, QVariants for making a 
"all-in-one" configuration method, QObject / Metaobject can dynamically invoke 
an object's method at runtime from it's name (Yes.... Amazing)... etc. Even if 
it's for later just give up on the Qt (or other) classes once we have something 
clean.

Anyway, i will need know who could be helping, at what rate (Again, no 
"corporation-like" thing, it has to be serious, but mainly for our personnal 
culture and fun), so i can evaluate the human resource we all represent. Again, 
myself i'm very good at object modeling, workflow annalysis, C / C++, 
optimisation.

Thanks a lot,
Pierre.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Re: Using tesseract from a C++ application.

Reply via email to