Re: Using Tesseract from a C++ application.

MARTIN Pierre Thu, 08 Apr 2010 07:07:09 -0700

Hello Robert,

> I was thinking of an earlier post of yours where you were asking if
> your source code would need to be re-distributed if you used
> Tesseract.  I thought that a program based on scripts would probably
> be more difficult to keep proprietary (i.e. not fully open source)
> than a program based on compiled code, so maybe that's why you
> didn't want to work with OCRopus.
Not at all, i've read that the core is plain C++, which would be okay. My 
customer wishes not to reveal hes sources, but my contribution to the Tess core 
is another thing, and i could be contributing the code while my customer's 
product would be closed.


> I suspect that OCR is not a simple problem that can be solved with a
> clean design.  Tesseract is probably filled with small kludges and
> workarounds to improve performance.  To throw out the code and begin
> again based on Tesseract's general design probably means hitting and
> working around all the same small problems they already dealt with.
Hit me with those problems.

> Also, there may be problems in Tesseract's general design that would
> be better to avoid in a new project.  For example, italics never seem
> to be recognized correctly, and someone on this list pointed out a while
> ago that the problem is that the bounding boxes for the italic characters
> overlap, and this is not handled properly by Tesseract.  I'm sure there
> are other fundamental problems.
How complicated would it be to add a shear factor to the bounding boxes?

> For these reasons, I personally think it would be a mistake to start a
> new project by reverse engineering Tesseract.  I do think that tweaking
> the existing code to fix memory leaks and such (maybe introducing
> doxygen comments to improve documentation) would be a good thing.
Well, i'm not very good at math (Complicated math). This part is not really for 
me.

> Hmm, the Tesseract page shows that two of the people with commit privileges 
> work on OCRopus now.  Maybe helping with OCRopus would be a roundabout way of 
> getting small fixes pushed upstream to Tesseract.  At the least,
> they are probably in better contact with Ray Smith than anyone here.
Oky, thank you for this information. i'll see if i can do something with them...

But then i must ask: What people are doing in this list? Are some of you coding 
a bit on Tesseract or not at all?

Thank you for answering me.
Pierre.


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Re: Using Tesseract from a C++ application.

Reply via email to