Hi all,
i'm working with Tesseract and Tessnet2 .NET library as mentioned on
my previous post
http://groups.google.com/group/tesseract-ocr/browse_thread/thread/26b6f0b26c21062c
Tesseract is able to detect text inside this sample image
http://www.francescovannini.com/pub/importo.jpg calling
char* TessBaseAPI::TesseractToText(PAGE_RES* page_res)
after
PAGE_RES* TessBaseAPI::Recognize(BLOCK_LIST* block_list, ETEXT_DESC*
monitor)
gives a good output text. However tessnet2 .NET library uses the
monitor var of Recognize to track progress and extract results in that
way:
List<tessnet2::Word^>^ tessnet2::Tesseract::BuildPage()
{
List<Word ^>^ result = gcnew List<Word ^>();
Word^ currentWord = nullptr;
int j=0;
int lineIndex=0;
char unistr[8] = {};
int confidenceTotal;
int confidenceCount;
for (int i=0; i<m_monitor->count; i=j)
{
EANYCODE_CHAR* ch = &m_monitor->text[i];
[...]
Debugging the code i noticed that while page_res is correctly filled,
m_monitor is not. In some cases (like with the image linked above),
m_monitor contains only a single char (a tilde) with a confidence of
100. So i suppose that the problem here is the way the monitor var is
updated during image recognition. I suppose also that the problem may
came from character encoding, because the above image get recognized
on a string that starts with an unicode character.
I've worked on this problem for weeks but unluckily i haven't found a
solution yet.
I think i need an help from a valid developer. I've written also to
author of tessnet2 but no answer (but the problem here is not tessnet2
but tesseract).
Any help is apreciated!
Thank you.
FV
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en.