I'm trying to make an android application that do real time ocr and then 
put the result text on the corresponding text areas.
The libraries I used is:
    OpenCV4android 3.1.0.
    tess-two(https://github.com/rmtheis/tess-two)
The target language is: English.(I use eng.traineddata)


I've been testing it on my tablet(asus memo pad7). 
The text-detection part is good enough, but the ocr part is no better than 
gibberish and the frame rate drops.
I want to know if there is a method to improve

I did searched several articles to find a way to improve ocr result, but 
none of these worked.
I'm very confident that the real time ocr is possible, because 2 years ago 
I saw this video <https://www.youtube.com/watch?v=4IKoxb_pbLo>
I've been studying his code for over 6 months and finally made my way here.
But there is a striking difference between the OCR result of my app and his 
work.
I will keep trying to contact the author.
But he is now starting a new company at Japan and it seems that he will 
have any time to deal with my problem.
Therefore I post this question here, hoping anyone could give me some help.

The following is my code to detect text and do ocr:
public class TextDetector implements Filter{
Bitmap bmp=null;
TessBaseAPI baseApi = new TessBaseAPI();
public String ocr(Mat tmp){
        baseApi.init("/mnt/sdcard/tesseract/", "eng");
        baseApi.setPageSegMode(TessBaseAPI.PageSegMode.PSM_SINGLE_LINE);
        bmp=Bitmap.createBitmap(tmp.cols(), tmp.rows(), 
Bitmap.Config.ARGB_8888);
        Utils.matToBitmap(tmp, bmp);
        baseApi.setImage(bmp);
        return baseApi.getUTF8Text();
}
@Override
public void apply(Mat src, Mat dst) {
if (dst != src) {
src.copyTo(dst);
}
//this text detection method is translated from 
http://stackoverflow.com/questions/23506105/extracting-text-opencv
   Mat img_gray,img_sobel, img_threshold, element;
   //turn source image to gray scale
   img_gray=new Mat();
   Imgproc.cvtColor(src, img_gray, Imgproc.COLOR_RGB2GRAY);
   //sobel edge detection
   img_sobel=new Mat();
   Imgproc.Sobel(img_gray, img_sobel, CvType.CV_8U, 1, 0, 3, 1, 
0,Core.BORDER_DEFAULT);
   //thresholding
   img_threshold=new Mat();
   Imgproc.threshold(img_sobel, img_threshold, 0, 255, 
Imgproc.THRESH_OTSU+Imgproc.THRESH_BINARY);
   //removing noise
   element=new Mat();
   element = Imgproc.getStructuringElement(Imgproc.MORPH_RECT, new Size(17, 
3) );
   Imgproc.morphologyEx(img_threshold, img_threshold, Imgproc.MORPH_CLOSE, 
element);
   //extract contours
   List<MatOfPoint>  contours=new ArrayList<MatOfPoint>();
   Mat hierarchy = new Mat();
   Imgproc.findContours(img_threshold, contours, hierarchy, 0, 1);
   List<MatOfPoint> contours_poly=new 
ArrayList<MatOfPoint>(contours.size());
   //for opencv4android only. you have to turn the List<MatOfPoint> into 
MatOfPoint2f
   //to run the approxPolyDP function. then turn it back into 
List<MatOfPoint>
   contours_poly.addAll(contours);
   MatOfPoint2f mMOP2f1,mMOP2f2;
   mMOP2f1=new MatOfPoint2f();
   mMOP2f2=new MatOfPoint2f();
   //
   for( int i = 0; i < contours.size(); i++ )
       if (contours.get(i).toList().size()>100)
       { 
        contours.get(i).convertTo(mMOP2f1, CvType.CV_32FC2);
        Imgproc.approxPolyDP(mMOP2f1,mMOP2f2, 3, true );
        mMOP2f2.convertTo(contours_poly.get(i), CvType.CV_32S);
           Rect appRect=Imgproc.boundingRect(contours_poly.get(i));
           if (appRect.width>appRect.height) 
           {
            Imgproc.rectangle(dst, new Point(appRect.x,appRect.y) ,new 
Point(appRect.x+appRect.width,appRect.y+appRect.height), new 
Scalar(255,0,0));
               Log.i(null, "cut Roi");
            Mat roiMat=new Mat(src,appRect);             
            Log.i(null, "OCR and puttext");
            Imgproc.putText(dst,ocr(roiMat), new 
Point(appRect.x,appRect.y),Core.FONT_HERSHEY_SIMPLEX, 1, new 
Scalar(0,255,0));
            Log.i(null, "Complete!");
            
           }
       } 
   
}
}




-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/31425d08-7331-427a-9c69-142982dc6ff9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to