Hi crews,

I'm developing a java wrapper for the new tesseract. Recognition resullts 
will be retrieved using BaseAPI and returned as Java arrays. In order to do 
so, I need to know the number of particular items, for example words. I 
wasn't able to find any function for obtaining items count like that

tesseract::GetItemsCount(RIL_WORDS);

What I currently do is iterating trough recognition results twice: the 
first pass will be used to count items, the second one - to fill the Java 
array with data. That looks slow and ugly IMHO. Below the code:

if (nat->api.Recognize(NULL) < 0) {
    fprintf(stderr, "Error during recognize!\n");
  } else {
    /* count number of recognized lines
      (this is abit ugly because I don't know how to obtain
       number of lines without iterating) */
    it = nat->api.GetIterator();
    numLines = 0;
    do {
      if (it->Empty(RIL_TEXTLINE)) continue;
      numLines++;
    } while (it->Next(RIL_TEXTLINE));
    delete it;
  }

ret = (jobjectArray)env->NewObjectArray(numLines,
         env->FindClass("java/lang/String"),
         env->NewStringUTF(""));

  if (numLines) {
    it = nat->api.GetIterator();
    i = 0;
    do {
      env->SetObjectArrayElement(ret, i++, 
env->NewStringUTF(it->GetUTF8Text(RIL_TEXTLINE)));
    } while(it->Next(RIL_TEXTLINE));
    delete it;
  }

*
Is there any possibility to get items count (lines, words, symbols) without 
iterating the results twice?*

Many thanks in advance!
Best regards
Max

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to