It appears to be excellent idea. I hope proposed codes will support UT-8 (e.g Indic and other world lang which have consonants plus dependent vowels) and if so, well and good.)
On Sat, Apr 10, 2010 at 11:06 PM, MARTIN Pierre <[email protected]> wrote: > In answer to your example, let me show you an example of code i'm working > on. It is, let's say, a "document handler" for a recognition platform, and > It's a small part of it. > > #ifndef Document_h > #define Document_h > > #include "RETypes.h" > #include <QString> > #include <QImage> > #include <QHash> > > namespace RE { > > // Document class. > class RELibOpt Document { > public: > > > // Document string identifier. > typedef enum RELibOpt _StringId : unsigned int { > StringIdUnknown = 0, // Unknown string. > StringIdMicr = 1 // MICR string. > } StringId; > typedef QMap<StringId, QString> AttachedStrings; > > > // Document image side. > typedef enum RELibOpt _ImageId : unsigned int { > ImageIdUnknown = 0, > ImageIdFront = 1, > ImageIdRear = 2, > ImageIdBarcode = 3, > ImageIdOcrLine = 4 > } ImageId; > typedef QHash<ImageId, QImage> AttachedImages; > > > // Document types. > typedef enum RELibOpt _Type : unsigned int { > TypeUnknown = 0, > TypeSDX = 1 > } Type; > // Document subtypes. > typedef enum RELibOpt _Subtype : unsigned int{ > SubtypeUnknown = 0, > SubTypeA = 1, > SubtypeTR = 2, > SubtypeCDT = 3, > SubtypeCR = 4 > } Subtype; > > > // Recognition results. > typedef QHash<QString, QString> RecognizedData; > > public: > // Constructor / Destructor. > Document (Type type=TypeUnknown, Subtype subtype=SubtypeUnknown); > Document (const Document&); > virtual ~Document (); > > > private: > Type _type; > Subtype _subtype; > AttachedStrings _strings; > AttachedImages _images; > RecognizedData _recognizedData; > > // Inline accessors. > public: > // Getter / Setter on type. > inline Type type () const > { return _type; }; > inline void setType (Type type) > { _type = type; }; > > > // Getter / Setter on subtype. > inline Subtype subtype () const > { return _subtype; }; > inline void setSubtype (Subtype subtype) > { _subtype = subtype; }; > > > // Attachement management. > inline void attachString (StringId id, const QString &string) > { _strings[id] = string; }; > inline const AttachedStrings& attachedStrings () const > { return _strings; }; > inline void attachImage (ImageId id, const QImage &image) > { _images[id] = image; }; > inline const AttachedImages& attachedImages () const > { return _images; }; > > > // Recognition results. > inline const RecognizedData& recognizedData () const > { return _recognizedData; }; > inline void setRecognizedData (const QString &id, const QString &value) > { _recognizedData[id] = value; }; > }; > > }; > > #endif > > > Then the developer who wants to use it just does as follows: > > RecognitionEngine *re = RecognitionEngine::sharedUserspaceInstance(); > // Create an empty document. > Document doc (Document::TypeSDX); > doc.attachString(Document::StringIdMicr, micr); > QImage imgFront = re.fixImageTilt(QImage("~/TestPics/Test.tif")); > doc.attachImage(Document::ImageIdFront, imgFront); > // Recognize document. > re->recognizeDocument(doc); > > The document handler is fairly simple. The recognition engine is very > simple also as far as you look at the header file. Not much external > dependencies (Qt, that's all). > There is no leak at all, but i can ensure you that the engine is full of > new / delete, mostly because there is a single instance of it per userspace > (Singleton pattern). > > The only reason i can pass QImages by copy (Copy constructed indeed), is > because Qt is such a complete framework that QImage internally maintains > referenced pointers of data. Two image objects, copied one to each other, > will internally point to the same memory address for the image data itself > (Unless modified in one, in this case a deep copy is done). > And as you can see, my Document object can be copy constructed, internally > it's a mess of pointers to avoid memory to be copied, but still: no leaks. > Re-developing these kind of object from scratch is time-consumant, and can > lead to bugs easily if maintained "on-the-fly" (For Tesseract, creating new > type would require us to partially code them, and then implement new > functionnality to them once we need it). > > It's full object, easily wrappable (The "surrounding" API is inexistant, > but would be a snap to create). Most of the types are internal. However, > using home-made imaging function is not an option. An OCR engine, as for my > document layout annalysis program, may require information about the image > compression (To solve or not artifacts), depth (Wich could be different than > 1bpp for tesseract in future versions, but variable), etc. > > Pierre. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]<tesseract-ocr%[email protected]> > . > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

