> Here are some thoughts, and I would like to get input from the > developer/user community on this issue: > > For leptonica: > > - Some features will depend on it. To get best performance you will need > it. > - It could allow simplification of the code, and elimination of the old > IMAGE class. > - It will allow reading of many more image formats, which a lot of users > have requested. > - It might be easier if the default windows project files assume that you > have leptonica. That would make it easier to build with it, and it would > only be a case of downloading it. > > Against making tesseract dependent on leptonica: > > - It will require several additional components: leptonica, libtiff, > libjpg, libpng, which would bloat the executable, and many (windows) users > have refused to even download libtiff. > - Installation and build support will become much more effort. (Mostly > for windows) If somebody could write a windows installer for it (open > source > of course), then that would simplify installation a lot for the windows > user-only community.
Have you considered instead simplifying the image format all the way? I mean by this drop the tiff input and replace it with a trivial encoding such as the Netpbm formats (http://netpbm.sourceforge.net/) which are extremely portable. The advantage is simplicity: a reader or writer for the black and white format takes 5-10 lines of code to write from scratch in just about any language, so anybody could interface with tesseract easily. There would be no build dependencies, and no special handling of many different file formats in the tesseract code, so you can concentrate on the OCR. The disadvantage would be that scanners perhaps don't write the pbm format natively, so users would likely need to convert their images at some point. Also, pbm files tend to get big, but that's easy to fix with compression. Many free compression libraries have wrappers for file handling routines which make reading and writing compressed files transparent. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

