Hi crews,
I'm developing a java wrapper for the new tesseract. Recognition resullts
will be retrieved using BaseAPI and returned as Java arrays. In order to do
so, I need to know the number of particular items, for example words. I
wasn't able to find any function for obtaining items count like that
tesseract::GetItemsCount(RIL_WORDS);
What I currently do is iterating trough recognition results twice: the
first pass will be used to count items, the second one - to fill the Java
array with data. That looks slow and ugly IMHO. Below the code:
if (nat->api.Recognize(NULL) < 0) {
fprintf(stderr, "Error during recognize!\n");
} else {
/* count number of recognized lines
(this is abit ugly because I don't know how to obtain
number of lines without iterating) */
it = nat->api.GetIterator();
numLines = 0;
do {
if (it->Empty(RIL_TEXTLINE)) continue;
numLines++;
} while (it->Next(RIL_TEXTLINE));
delete it;
}
ret = (jobjectArray)env->NewObjectArray(numLines,
env->FindClass("java/lang/String"),
env->NewStringUTF(""));
if (numLines) {
it = nat->api.GetIterator();
i = 0;
do {
env->SetObjectArrayElement(ret, i++,
env->NewStringUTF(it->GetUTF8Text(RIL_TEXTLINE)));
} while(it->Next(RIL_TEXTLINE));
delete it;
}
*
Is there any possibility to get items count (lines, words, symbols) without
iterating the results twice?*
Many thanks in advance!
Best regards
Max
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en