Hi,

What is OCR? You have a 2 bit image and you try to get text from it.

>From my point of view an OCR engine don't need image library and image
processing library.
Keep the code simple, let the developers bundle it with the image lib
he likes/wants to use (open source (libtiff), OS included
(gdiplus.dll) or commercial (LeadTools, Accusoft...))
The only image processing you can include is thresholding from 24 bit
image.

Today tesseract have 3 big problem :
- memory leak.
- too complex code.
- process oriented, it's not designed to be use as a lib (exit(), file
I/O...)

What we need, I'm sure, is a complete rewritting. Transform the 222
cpp file to less than 20.
A C++ lib should be OS independent, simply because you don't need OS
specific API (no I/O).

I think the correct direction is to
1) reverse engineer the code and document it
2) complete rewriting from the documentation

When you have a good OCR lib then you can bundle it for "public"
usage.
I spent a lot of time in tesseract code source and I don't want to
spend more time in it.
I'm ready to help for a complete rewriting.

Remi

Tessnet2 author
C++ dev since 1989
Windows platform expert (C++/C#)
Image processing expert
Freelance since 2001
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to