On Mon, Mar 26, 2012 at 5:13 AM, Max Pole <[email protected]> wrote: > Hi crews, > > I'm developing a java wrapper for the new tesseract. Recognition resullts > will be retrieved using BaseAPI and returned as Java arrays. In order to do > so, I need to know the number of particular items, for example words. I > wasn't able to find any function for obtaining items count like that > > tesseract::GetItemsCount(RIL_WORDS); > > What I currently do is iterating trough recognition results twice: the first > pass will be used to count items, the second one - to fill the Java array > with data. That looks slow and ugly IMHO. Below the code: > > if (nat->api.Recognize(NULL) < 0) { > fprintf(stderr, "Error during recognize!\n"); > } else { > /* count number of recognized lines > (this is abit ugly because I don't know how to obtain > number of lines without iterating) */ > it = nat->api.GetIterator(); > numLines = 0; > do { > if (it->Empty(RIL_TEXTLINE)) continue; > numLines++; > } while (it->Next(RIL_TEXTLINE)); > delete it; > } > > ret = (jobjectArray)env->NewObjectArray(numLines, > env->FindClass("java/lang/String"), > env->NewStringUTF("")); > > if (numLines) { > it = nat->api.GetIterator(); > i = 0; > do { > env->SetObjectArrayElement(ret, i++, > env->NewStringUTF(it->GetUTF8Text(RIL_TEXTLINE))); > } while(it->Next(RIL_TEXTLINE)); > delete it; > } > > > Is there any possibility to get items count (lines, words, symbols) without > iterating the results twice?
Looking through the source, this is pretty much what tesseract does itself. See for example, TessBaseAPI::GetComponentImages(), TessBaseAPI::AllWordConfidences(), etc. While the list classes defined in elst.h have a length() method, the problem in this case is you have to count a particular type of member in the list. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

