I have compiled the *basic example* from the wiki:
https://github.com/tesseract-ocr/tesseract/wiki/APIExample

#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>

int main()
{
    char *outText;

    tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
    // Initialize tesseract-ocr with English, without specifying tessdata 
path
    if (api->Init(NULL, "eng")) {
        fprintf(stderr, "Could not initialize tesseract.\n");
        exit(1);
    }

    // Open input image with leptonica library
    Pix *image = pixRead("/usr/src/tesseract/testing/phototest.tif");
    api->SetImage(image);
    // Get OCR result
    outText = api->GetUTF8Text();
    printf("OCR output:\n%s", outText);

    // Destroy used object and release memory
    api->End();
    delete [] outText;
    pixDestroy(&image);
    delete api; // <-- added by me, as it is missing in the example

    return 0;
}




When I run valgrind on it, it reports serious memory leak:

==18441== 18,635,728 bytes in 1 blocks are still reachable in loss record 29 
of 29
==18441==    at 0x4C2CB3F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-
amd64-linux.so)
==18441==    by 0x5445978: tesseract::SquishedDawg::read_squished_dawg(
_IO_FILE*, tesseract::DawgType, STRING const&, PermuterType, int) (in /usr/
lib/libtesseract.so.3.0.4)
==18441==    by 0x5446C75: tesseract::DawgLoader::Load() (in /usr/lib/
libtesseract.so.3.0.4)
==18441==    by 0x5446FD6: tesseract::DawgCache::GetSquishedDawg(STRING 
const&, char const*, tesseract::TessdataType, int) (in /usr/lib/libtesseract
.so.3.0.4)
==18441==    by 0x544D7B5: tesseract::Dict::Load(tesseract::DawgCache*) (in 
/usr/lib/libtesseract.so.3.0.4)
==18441==    by 0x541199D: tesseract::Wordrec::program_editup(char const*, 
bool, bool) (in /usr/lib/libtesseract.so.3.0.4)
==18441==    by 0x5350D68: tesseract::Tesseract::init_tesseract_internal(
char const*, char const*, char const*, tesseract::OcrEngineMode, char**, int
, GenericVector<STRING> const*, GenericVector<STRING> const*, bool) (in /usr
/lib/libtesseract.so.3.0.4)
==18441==    by 0x535184C: tesseract::Tesseract::init_tesseract(char const*, 
char const*, char const*, tesseract::OcrEngineMode, char**, int, 
GenericVector<STRING> const*, GenericVector<STRING> const*, bool) (in /usr/
lib/libtesseract.so.3.0.4)
==18441==    by 0x5302247: tesseract::TessBaseAPI::Init(char const*, char 
const*, tesseract::OcrEngineMode, char**, int, GenericVector<STRING> const*, 
GenericVector<STRING> const*, bool) (in /usr/lib/libtesseract.so.3.0.4)
==18441==    by 0x108F26: tesseract::TessBaseAPI::Init(char const*, char 
const*) (baseapi.h:240)
==18441==    by 0x108DC4: main (main.cpp:10)



There are several more leak related to this function, and some others 
related to leptonica, but nothing of this magnitude. I have added the  
missing delete api; line, that's missing in the API example, but that 
changes nothing.

Is there really such a major leak in the lib, or am I using it incorrectly?

OS: Ubuntu 16.10 (x64)
Tesseract: 3.0.4 from Ubuntu repositories
GCC: gcc version 6.2.0 20161005 (Ubuntu 6.2.0-5ubuntu12) 


Best,
vf

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/86211891-bb96-49e9-b58a-41d73030e19c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to