I apologize for repeating this comment from the other thread, but I think there is perhaps the proper place for it. I tend to disagree with leptonica.
Have you considered instead simplifying the image format all the way? I mean by this drop the tiff input and replace it with a trivial encoding such as the Netpbm formats (http://netpbm.sourceforge.net/) which are extremely portable. The advantage is simplicity: a reader or writer for the black and white format takes 5-10 lines of code to write from scratch in just about any language, so anybody could interface with tesseract easily. There would be no build dependencies, and no special handling of many different file formats in the tesseract code, so you can concentrate on the OCR. The disadvantage would be that scanners perhaps don't write the pbm format natively, so users would likely need to convert their images at some point. Also, pbm files tend to get big, but that's easy to fix with compression. Many free compression libraries have wrappers for file handling routines which make reading and writing compressed files transparent. BTW, I've uploaded my box editor to the files area and added a comment to the training instructions, but I am unclear what wiki page is specifically intended for add-ons. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

