Monitor var problems (maybe unicode?)

Francesco Sat, 13 Mar 2010 19:03:28 -0800

Hi all,
i'm working with Tesseract and Tessnet2 .NET library as mentioned on
my previous post 
http://groups.google.com/group/tesseract-ocr/browse_thread/thread/26b6f0b26c21062c
Tesseract is able to detect text inside this sample image
http://www.francescovannini.com/pub/importo.jpg calling


char* TessBaseAPI::TesseractToText(PAGE_RES* page_res)

after

PAGE_RES* TessBaseAPI::Recognize(BLOCK_LIST* block_list, ETEXT_DESC*
monitor)

gives a good output text. However tessnet2 .NET library uses the
monitor var of Recognize to track progress and extract results in that
way:

List<tessnet2::Word^>^ tessnet2::Tesseract::BuildPage()
{
        List<Word ^>^ result = gcnew List<Word ^>();
        Word^ currentWord = nullptr;
        int j=0;
        int lineIndex=0;
        char unistr[8] = {};
        int confidenceTotal;
        int confidenceCount;

        for (int i=0; i<m_monitor->count; i=j)
        {
                EANYCODE_CHAR* ch = &m_monitor->text[i];

[...]

Debugging the code i noticed that while page_res is correctly filled,
m_monitor is not. In some cases (like with the image linked above),
m_monitor contains only a single char (a tilde) with a confidence of
100. So i suppose that the problem here is the way the monitor var is
updated during image recognition. I suppose also that the problem may
came from character encoding, because the above image get recognized
on a string that starts with an unicode character.
I've worked on this problem for weeks but unluckily i haven't found a
solution yet.
I think i need an help from a valid developer. I've written also to
author of tessnet2 but no answer (but the problem here is not tessnet2
but tesseract).
Any help is apreciated!
Thank you.

FV

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Monitor var problems (maybe unicode?)

Reply via email to