I apologize for repeating this comment from the other thread, but I
think there is perhaps the proper place for it. I tend to disagree
with leptonica.

Have you considered instead simplifying the image format all the way?
I mean by this drop the tiff input and replace it with a trivial
encoding such as the Netpbm formats (http://netpbm.sourceforge.net/)
which are extremely portable.

The advantage is simplicity: a reader or writer for the black and
white format takes 5-10 lines of code to write from scratch in just
about any language, so anybody could interface with tesseract easily.
There would be no build dependencies, and no special handling of many
different file formats in the tesseract code, so you can concentrate
on the OCR.

The disadvantage would be that scanners perhaps don't write the pbm
format natively, so users would likely need to convert their images at
some point. Also, pbm files tend to get big, but that's easy to fix
with compression. Many free compression libraries have wrappers for
file handling routines which make reading and writing compressed files
transparent.

BTW, I've uploaded my box editor to the files area and added a comment
to the training instructions, but I am unclear what wiki page is
specifically intended for add-ons.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to