I have compiled the *basic example* from the wiki:
https://github.com/tesseract-ocr/tesseract/wiki/APIExample
#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>
int main()
{
char *outText;
tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
// Initialize tesseract-ocr with English, without specifying tessdata
path
if (api->Init(NULL, "eng")) {
fprintf(stderr, "Could not initialize tesseract.\n");
exit(1);
}
// Open input image with leptonica library
Pix *image = pixRead("/usr/src/tesseract/testing/phototest.tif");
api->SetImage(image);
// Get OCR result
outText = api->GetUTF8Text();
printf("OCR output:\n%s", outText);
// Destroy used object and release memory
api->End();
delete [] outText;
pixDestroy(&image);
delete api; // <-- added by me, as it is missing in the example
return 0;
}
When I run valgrind on it, it reports serious memory leak:
==18441== 18,635,728 bytes in 1 blocks are still reachable in loss record 29
of 29
==18441== at 0x4C2CB3F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-
amd64-linux.so)
==18441== by 0x5445978: tesseract::SquishedDawg::read_squished_dawg(
_IO_FILE*, tesseract::DawgType, STRING const&, PermuterType, int) (in /usr/
lib/libtesseract.so.3.0.4)
==18441== by 0x5446C75: tesseract::DawgLoader::Load() (in /usr/lib/
libtesseract.so.3.0.4)
==18441== by 0x5446FD6: tesseract::DawgCache::GetSquishedDawg(STRING
const&, char const*, tesseract::TessdataType, int) (in /usr/lib/libtesseract
.so.3.0.4)
==18441== by 0x544D7B5: tesseract::Dict::Load(tesseract::DawgCache*) (in
/usr/lib/libtesseract.so.3.0.4)
==18441== by 0x541199D: tesseract::Wordrec::program_editup(char const*,
bool, bool) (in /usr/lib/libtesseract.so.3.0.4)
==18441== by 0x5350D68: tesseract::Tesseract::init_tesseract_internal(
char const*, char const*, char const*, tesseract::OcrEngineMode, char**, int
, GenericVector<STRING> const*, GenericVector<STRING> const*, bool) (in /usr
/lib/libtesseract.so.3.0.4)
==18441== by 0x535184C: tesseract::Tesseract::init_tesseract(char const*,
char const*, char const*, tesseract::OcrEngineMode, char**, int,
GenericVector<STRING> const*, GenericVector<STRING> const*, bool) (in /usr/
lib/libtesseract.so.3.0.4)
==18441== by 0x5302247: tesseract::TessBaseAPI::Init(char const*, char
const*, tesseract::OcrEngineMode, char**, int, GenericVector<STRING> const*,
GenericVector<STRING> const*, bool) (in /usr/lib/libtesseract.so.3.0.4)
==18441== by 0x108F26: tesseract::TessBaseAPI::Init(char const*, char
const*) (baseapi.h:240)
==18441== by 0x108DC4: main (main.cpp:10)
There are several more leak related to this function, and some others
related to leptonica, but nothing of this magnitude. I have added the
missing delete api; line, that's missing in the API example, but that
changes nothing.
Is there really such a major leak in the lib, or am I using it incorrectly?
OS: Ubuntu 16.10 (x64)
Tesseract: 3.0.4 from Ubuntu repositories
GCC: gcc version 6.2.0 20161005 (Ubuntu 6.2.0-5ubuntu12)
Best,
vf
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/86211891-bb96-49e9-b58a-41d73030e19c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.