In answer to your example, let me show you an example of code i'm working on.
It is, let's say, a "document handler" for a recognition platform, and It's a
small part of it.
#ifndef Document_h
#define Document_h
#include "RETypes.h"
#include <QString>
#include <QImage>
#include <QHash>
namespace RE {
// Document class.
class RELibOpt Document {
public:
// Document string identifier.
typedef enum RELibOpt _StringId : unsigned int {
StringIdUnknown = 0, // Unknown string.
StringIdMicr = 1 // MICR string.
} StringId;
typedef QMap<StringId, QString> AttachedStrings;
// Document image side.
typedef enum RELibOpt _ImageId : unsigned int {
ImageIdUnknown = 0,
ImageIdFront = 1,
ImageIdRear = 2,
ImageIdBarcode = 3,
ImageIdOcrLine = 4
} ImageId;
typedef QHash<ImageId, QImage> AttachedImages;
// Document types.
typedef enum RELibOpt _Type : unsigned int {
TypeUnknown = 0,
TypeSDX = 1
} Type;
// Document subtypes.
typedef enum RELibOpt _Subtype : unsigned int{
SubtypeUnknown = 0,
SubTypeA = 1,
SubtypeTR = 2,
SubtypeCDT = 3,
SubtypeCR = 4
} Subtype;
// Recognition results.
typedef QHash<QString, QString> RecognizedData;
public:
// Constructor / Destructor.
Document
(Type type=TypeUnknown, Subtype subtype=SubtypeUnknown);
Document
(const Document&);
virtual ~Document
();
private:
Type _type;
Subtype _subtype;
AttachedStrings _strings;
AttachedImages _images;
RecognizedData _recognizedData;
// Inline accessors.
public:
// Getter / Setter on type.
inline Type type
() const
{ return _type; };
inline void setType
(Type type)
{ _type = type; };
// Getter / Setter on subtype.
inline Subtype subtype
() const
{ return _subtype; };
inline void setSubtype
(Subtype subtype)
{ _subtype = subtype; };
// Attachement management.
inline void attachString
(StringId id, const QString &string)
{ _strings[id] = string; };
inline const AttachedStrings& attachedStrings () const
{ return _strings; };
inline void attachImage
(ImageId id, const QImage &image)
{ _images[id] = image; };
inline const AttachedImages& attachedImages () const
{ return _images; };
// Recognition results.
inline const RecognizedData& recognizedData () const
{ return _recognizedData; };
inline void
setRecognizedData (const QString &id, const QString &value)
{ _recognizedData[id] = value; };
};
};
#endif
Then the developer who wants to use it just does as follows:
RecognitionEngine *re = RecognitionEngine::sharedUserspaceInstance();
// Create an empty document.
Document doc (Document::TypeSDX);
doc.attachString(Document::StringIdMicr, micr);
QImage imgFront = re.fixImageTilt(QImage("~/TestPics/Test.tif"));
doc.attachImage(Document::ImageIdFront, imgFront);
// Recognize document.
re->recognizeDocument(doc);
The document handler is fairly simple. The recognition engine is very simple
also as far as you look at the header file. Not much external dependencies (Qt,
that's all).
There is no leak at all, but i can ensure you that the engine is full of new /
delete, mostly because there is a single instance of it per userspace
(Singleton pattern).
The only reason i can pass QImages by copy (Copy constructed indeed), is
because Qt is such a complete framework that QImage internally maintains
referenced pointers of data. Two image objects, copied one to each other, will
internally point to the same memory address for the image data itself (Unless
modified in one, in this case a deep copy is done).
And as you can see, my Document object can be copy constructed, internally it's
a mess of pointers to avoid memory to be copied, but still: no leaks.
Re-developing these kind of object from scratch is time-consumant, and can lead
to bugs easily if maintained "on-the-fly" (For Tesseract, creating new type
would require us to partially code them, and then implement new functionnality
to them once we need it).
It's full object, easily wrappable (The "surrounding" API is inexistant, but
would be a snap to create). Most of the types are internal. However, using
home-made imaging function is not an option. An OCR engine, as for my document
layout annalysis program, may require information about the image compression
(To solve or not artifacts), depth (Wich could be different than 1bpp for
tesseract in future versions, but variable), etc.
Pierre.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en.