On Mon, Mar 26, 2012 at 5:13 AM, Max Pole <[email protected]> wrote:
> Hi crews,
>
> I'm developing a java wrapper for the new tesseract. Recognition resullts
> will be retrieved using BaseAPI and returned as Java arrays. In order to do
> so, I need to know the number of particular items, for example words. I
> wasn't able to find any function for obtaining items count like that
>
> tesseract::GetItemsCount(RIL_WORDS);
>
> What I currently do is iterating trough recognition results twice: the first
> pass will be used to count items, the second one - to fill the Java array
> with data. That looks slow and ugly IMHO. Below the code:
>
> if (nat->api.Recognize(NULL) < 0) {
>     fprintf(stderr, "Error during recognize!\n");
>   } else {
>     /* count number of recognized lines
>       (this is abit ugly because I don't know how to obtain
>        number of lines without iterating) */
>     it = nat->api.GetIterator();
>     numLines = 0;
>     do {
>       if (it->Empty(RIL_TEXTLINE)) continue;
>       numLines++;
>     } while (it->Next(RIL_TEXTLINE));
>     delete it;
>   }
>
> ret = (jobjectArray)env->NewObjectArray(numLines,
>          env->FindClass("java/lang/String"),
>          env->NewStringUTF(""));
>
>   if (numLines) {
>     it = nat->api.GetIterator();
>     i = 0;
>     do {
>       env->SetObjectArrayElement(ret, i++,
> env->NewStringUTF(it->GetUTF8Text(RIL_TEXTLINE)));
>     } while(it->Next(RIL_TEXTLINE));
>     delete it;
>   }
>
>
> Is there any possibility to get items count (lines, words, symbols) without
> iterating the results twice?

Looking through the source, this is pretty much what tesseract does
itself. See for example, TessBaseAPI::GetComponentImages(),
TessBaseAPI::AllWordConfidences(), etc. While the list classes defined
in elst.h have a length() method, the problem in this case is you have
to count a particular type of member in the list.

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to