I am using tesseract-ocr-android library.

See https://github.com/rmtheis/tess-two

Now I want to add a new api.

(1) Current output api getUTF8Text() 

The current output api getUTF8Text() outputs the text, where the words are 
separated by a space. However, I find that a word can be space too. So 
actually we can not tell word indexes from the getUTF8Text output. Without 
word indexes, we can not match wordBoundingBoxes, wordConfidences apis to 
each word.


 For example, for this line:

line 8: 

“ @- 7- - @ - - @ ”

How can you tell how many words in the line?

In the hOCR text, we can see:

<span class='ocr_line' id='line_8' title="bbox 0 771 2876 834">

<span class='ocrx_word' id='word_125' title="bbox 0 805 70 834"></span> 

<span class='ocrx_word' id='word_126' title="bbox 184 773 318 
831">@-</span> 

<span class='ocrx_word' id='word_127' title="bbox 387 820 389 822"> </span> 

<span class='ocrx_word' id='word_128' title="bbox 452 771 550 
816">7-</span> 

<span class='ocrx_word' id='word_129' title="bbox 583 823 585 825"> </span> 

<span class='ocrx_word' id='word_130' title="bbox 616 808 622 815">-</span> 

<span class='ocrx_word' id='word_131' title="bbox 758 819 812 832"></span> 

<span class='ocrx_word' id='word_132' title="bbox 839 821 841 823"> </span> 

<span class='ocrx_word' id='word_133' title="bbox 865 818 888 831"></span> 

<span class='ocrx_word' id='word_134' title="bbox 923 802 1016 830"></span> 

<span class='ocrx_word' id='word_135' title="bbox 1214 816 1216 
819">@</span> 

<span …..


 Some words are spaces, some are empty. 


 (2) I want to add a new api to output a string of the result, in the 
following format:

- - - - - - - - - - - - - - - - 

line, 1, left, top, right, bottom, word1, word2, word3, word4, word5, 
meanConfidenceOfThisLine \n

line, 2, left, top, right, bottom, word1, word2, meanConfidenceOfThisLine \n

- - - - - - - - - - - - - - - - 

Using the tess-two as the base.


 I define a new api in 
/jin/com_googlecode_tesseract_android/src/api/baseapi.cpp 

*char** TessBaseAPI::*GetLineWordConfidenceText*(*int* page_number) 

(details of the definition is long, omitted here)

Declare the method in baseapi.h


 Then in /jin/com_googlecode_tesseract_android/tessbaseapi.cpp, I add:

- - - - - - - - - - - - - - - - 

jstring *
Java_com_googlecode_tesseract_android_TessBaseAPI_nativeGetLineWordConfidenceText
*(JNIEnv *env,

jobject thiz,

jint page_number) {


 native_data_t *nat = get_native_data(env, thiz);


 *char* *text = nat->api.GetLineWordConfidenceText((*int*) page_number);


 jstring result = env->NewStringUTF(text);


 free(text);


 *return* result;

}

- - - - - - - - - - - - - - - - 


 Then in /src/com.googlecode.tesseract.android/TesssBaseAPI.java, I add

- - - - - - - - - - - - - - - - 

*public* String getLineWordConfidenceText() {

// Trim because the text will have extra line breaks at the end

String text = nativeGetLineWordConfidenceText(0);


 *return* text.trim();

} 

- - - - - - - - - - - - - - - - 


 Now in my android app (based on SimpleAndroidOCR from 
https://github.com/GautamGupta/Simple-Android-OCR), I add:

- - - - - - - - - - - - - - - - 

String *str_LineWordConfidenceText* = baseApi.getLineWordConfidenceText();

- - - - - - - - - - - - - - - - 


 I run the app.

LogCat shows the error information:

- - - - - - - - - - - - - - - - 

11-18 10:36:34.334: E/AndroidRuntime(12775): FATAL EXCEPTION: main

11-18 10:36:34.334: E/AndroidRuntime(12775): 
java.lang.UnsatisfiedLinkError: nativeGetLineWordConfidenceText

11-18 10:36:34.334: E/AndroidRuntime(12775): at 
com.googlecode.tesseract.android.TessBaseAPI.nativeGetLineWordConfidenceText(Native
 
Method)

11-18 10:36:34.334: E/AndroidRuntime(12775): at 
com.googlecode.tesseract.android.TessBaseAPI.getLineWordConfidenceText(TessBaseAPI.java:378)

11-18 10:36:34.334: E/AndroidRuntime(12775): at 
com.datumdroid.android.ocr.simple.SimpleAndroidOCRActivity.onPhotoTaken(SimpleAndroidOCRActivity.java:360)

11-18 10:36:34.334: E/AndroidRuntime(12775): at 
com.datumdroid.android.ocr.simple.SimpleAndroidOCRActivity.onActivityResult(SimpleAndroidOCRActivity.java:168)

11-18 10:36:34.334: E/AndroidRuntime(12775): at 
android.app.Activity.dispatchActivityResult(Activity.java:3997)

11-18 10:36:34.334: E/AndroidRuntime(12775): at 
android.app.ActivityThread.deliverResults(ActivityThread.java:2905)

11-18 10:36:34.334: E/AndroidRuntime(12775): at 
android.app.ActivityThread.handleSendResult(ActivityThread.java:2961)

11-18 10:36:34.334: E/AndroidRuntime(12775): at 
android.app.ActivityThread.access$2000(ActivityThread.java:132)

11-18 10:36:34.334: E/AndroidRuntime(12775): at 
android.app.ActivityThread$H.handleMessage(ActivityThread.java:1068)

11-18 10:36:34.334: E/AndroidRuntime(12775): at 
android.os.Handler.dispatchMessage(Handler.java:99)

11-18 10:36:34.334: E/AndroidRuntime(12775): at 
android.os.Looper.loop(Looper.java:150)

11-18 10:36:34.334: E/AndroidRuntime(12775): at 
android.app.ActivityThread.main(ActivityThread.java:4263)

11-18 10:36:34.334: E/AndroidRuntime(12775): at 
java.lang.reflect.Method.invokeNative(Native Method)

11-18 10:36:34.334: E/AndroidRuntime(12775): at 
java.lang.reflect.Method.invoke(Method.java:507)

11-18 10:36:34.334: E/AndroidRuntime(12775): at 
com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:839)

11-18 10:36:34.334: E/AndroidRuntime(12775): at 
com.android.internal.os.ZygoteInit.main(ZygoteInit.java:597)

11-18 10:36:34.334: E/AndroidRuntime(12775): at 
dalvik.system.NativeStart.main(Native Method)

- - - - - - - - - - - - - - - - 


 How to correct it? (Sorry I am new to Android, Java JNI)

Thanks a lot in advance!

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to