I am using tesseract-ocr-android library.
See https://github.com/rmtheis/tess-two Now I want to add a new api. (1) Current output api getUTF8Text() The current output api getUTF8Text() outputs the text, where the words are separated by a space. However, I find that a word can be space too. So actually we can not tell word indexes from the getUTF8Text output. Without word indexes, we can not match wordBoundingBoxes, wordConfidences apis to each word. For example, for this line: line 8: “ @- 7- - @ - - @ ” How can you tell how many words in the line? In the hOCR text, we can see: <span class='ocr_line' id='line_8' title="bbox 0 771 2876 834"> <span class='ocrx_word' id='word_125' title="bbox 0 805 70 834"></span> <span class='ocrx_word' id='word_126' title="bbox 184 773 318 831">@-</span> <span class='ocrx_word' id='word_127' title="bbox 387 820 389 822"> </span> <span class='ocrx_word' id='word_128' title="bbox 452 771 550 816">7-</span> <span class='ocrx_word' id='word_129' title="bbox 583 823 585 825"> </span> <span class='ocrx_word' id='word_130' title="bbox 616 808 622 815">-</span> <span class='ocrx_word' id='word_131' title="bbox 758 819 812 832"></span> <span class='ocrx_word' id='word_132' title="bbox 839 821 841 823"> </span> <span class='ocrx_word' id='word_133' title="bbox 865 818 888 831"></span> <span class='ocrx_word' id='word_134' title="bbox 923 802 1016 830"></span> <span class='ocrx_word' id='word_135' title="bbox 1214 816 1216 819">@</span> <span ….. Some words are spaces, some are empty. (2) I want to add a new api to output a string of the result, in the following format: - - - - - - - - - - - - - - - - line, 1, left, top, right, bottom, word1, word2, word3, word4, word5, meanConfidenceOfThisLine \n line, 2, left, top, right, bottom, word1, word2, meanConfidenceOfThisLine \n - - - - - - - - - - - - - - - - Using the tess-two as the base. I define a new api in /jin/com_googlecode_tesseract_android/src/api/baseapi.cpp *char** TessBaseAPI::*GetLineWordConfidenceText*(*int* page_number) (details of the definition is long, omitted here) Declare the method in baseapi.h Then in /jin/com_googlecode_tesseract_android/tessbaseapi.cpp, I add: - - - - - - - - - - - - - - - - jstring * Java_com_googlecode_tesseract_android_TessBaseAPI_nativeGetLineWordConfidenceText *(JNIEnv *env, jobject thiz, jint page_number) { native_data_t *nat = get_native_data(env, thiz); *char* *text = nat->api.GetLineWordConfidenceText((*int*) page_number); jstring result = env->NewStringUTF(text); free(text); *return* result; } - - - - - - - - - - - - - - - - Then in /src/com.googlecode.tesseract.android/TesssBaseAPI.java, I add - - - - - - - - - - - - - - - - *public* String getLineWordConfidenceText() { // Trim because the text will have extra line breaks at the end String text = nativeGetLineWordConfidenceText(0); *return* text.trim(); } - - - - - - - - - - - - - - - - Now in my android app (based on SimpleAndroidOCR from https://github.com/GautamGupta/Simple-Android-OCR), I add: - - - - - - - - - - - - - - - - String *str_LineWordConfidenceText* = baseApi.getLineWordConfidenceText(); - - - - - - - - - - - - - - - - I run the app. LogCat shows the error information: - - - - - - - - - - - - - - - - 11-18 10:36:34.334: E/AndroidRuntime(12775): FATAL EXCEPTION: main 11-18 10:36:34.334: E/AndroidRuntime(12775): java.lang.UnsatisfiedLinkError: nativeGetLineWordConfidenceText 11-18 10:36:34.334: E/AndroidRuntime(12775): at com.googlecode.tesseract.android.TessBaseAPI.nativeGetLineWordConfidenceText(Native Method) 11-18 10:36:34.334: E/AndroidRuntime(12775): at com.googlecode.tesseract.android.TessBaseAPI.getLineWordConfidenceText(TessBaseAPI.java:378) 11-18 10:36:34.334: E/AndroidRuntime(12775): at com.datumdroid.android.ocr.simple.SimpleAndroidOCRActivity.onPhotoTaken(SimpleAndroidOCRActivity.java:360) 11-18 10:36:34.334: E/AndroidRuntime(12775): at com.datumdroid.android.ocr.simple.SimpleAndroidOCRActivity.onActivityResult(SimpleAndroidOCRActivity.java:168) 11-18 10:36:34.334: E/AndroidRuntime(12775): at android.app.Activity.dispatchActivityResult(Activity.java:3997) 11-18 10:36:34.334: E/AndroidRuntime(12775): at android.app.ActivityThread.deliverResults(ActivityThread.java:2905) 11-18 10:36:34.334: E/AndroidRuntime(12775): at android.app.ActivityThread.handleSendResult(ActivityThread.java:2961) 11-18 10:36:34.334: E/AndroidRuntime(12775): at android.app.ActivityThread.access$2000(ActivityThread.java:132) 11-18 10:36:34.334: E/AndroidRuntime(12775): at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1068) 11-18 10:36:34.334: E/AndroidRuntime(12775): at android.os.Handler.dispatchMessage(Handler.java:99) 11-18 10:36:34.334: E/AndroidRuntime(12775): at android.os.Looper.loop(Looper.java:150) 11-18 10:36:34.334: E/AndroidRuntime(12775): at android.app.ActivityThread.main(ActivityThread.java:4263) 11-18 10:36:34.334: E/AndroidRuntime(12775): at java.lang.reflect.Method.invokeNative(Native Method) 11-18 10:36:34.334: E/AndroidRuntime(12775): at java.lang.reflect.Method.invoke(Method.java:507) 11-18 10:36:34.334: E/AndroidRuntime(12775): at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:839) 11-18 10:36:34.334: E/AndroidRuntime(12775): at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:597) 11-18 10:36:34.334: E/AndroidRuntime(12775): at dalvik.system.NativeStart.main(Native Method) - - - - - - - - - - - - - - - - How to correct it? (Sorry I am new to Android, Java JNI) Thanks a lot in advance! -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

