have you followed the suggestions given on https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality
ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Mar 9, 2015 at 10:26 AM, Daniel <danieluc...@gmail.com> wrote: > Hey Pierre, I'm trying to accomplish the same thing as you for my thesis. > Could you tell me if you managed to preprocess images enough to read the > documents? So far I've applied Unsharp Mask and Threshold as you did, I > even fixed the skew angle by following this link > http://felix.abecassis.me/2011/10/opencv-rotation-deskewing/ but haven't > gotten acceptable results. > > What others filter can I use to make the image more readable for Tesseract? > > Attached is the image I'm testing (cropped and rotated), the two outcomes > from the filters and the text resulting from Tesseract. > > > On Thursday, July 17, 2014 at 7:36:50 AM UTC-5, Pierre-Henri DAUVERGNE > wrote: >> >> Hello >> I am relatively new to android development and I am working on an OCR app >> that would take a picture of a document and get the text out of it (the >> cameras could be from relatively old phones). During my research, I've >> found that tesseract was the best API to use, so here I am :) >> >> I understand that the image needs this to be as good as possible : >> - Binarization (having a picture in black and white) >> - Without border (I'm using another library to crop the photo and process >> only the part I want) >> - Deskewing >> - Training >> >> Others parameters that would influence, I guess, would be Scaling and >> trying to recognize one character after another (I haven't looked that much >> into it) >> >> But I can't find any documentation or people having the same issue as I >> have : >> I added the "eng.traineddata" in my project, but I don't feel like it's >> being used or anything. I just added the file I found online but haven't >> done anything else and tesseract seems to be having troubles reading >> characters that appears to be fine (well, at least not that unaccurate). I >> can't find any guide or tutorial online on "how to train tesseract for >> android". Could anyone help ? I've understood that it would take time but >> I'm willing to do it on my own. >> >> The other thing is about deskewing. Same idea : no guide nor tutorials >> online and the Skew class doesn't seem to be working properly as it always >> returns 0.0. Could anyone help ? ^^ >> >> Thank you for your help, I hope I'm clear enough on my issues. >> >> I added a picture of the photos I'm taking and the cropped+binarized >> result as well as the returning string (sorry it's not english but you can >> see it's not really good :x) >> >> >> Do you know how I could improve my picture preprocessing ? As you can >> see, there's still a lot of noise around the characters. >> >> >> >> This is what I'm doing so far : >> >> photo = >> WriteFile.writeBitmap(AdaptiveMap.backgroundNormMorph(ReadFile.readBitmap(photo))); >>> // locally adaptive; preparation to binarize >>> photo = >>> WriteFile.writeBitmap(Binarize.otsuAdaptiveThreshold(ReadFile.readBitmap(photo))); >>> // locally adaptive; special binarization methods >>> photo = >>> WriteFile.writeBitmap(Enhance.unsharpMasking(ReadFile.readBitmap(photo), >>> 1, (float) 0.5)); //im not sure about those parameters >>> >>> ocr_engine.setVariable("textord_max_noise_size", "3"); >>> ocr_engine.setVariable("textord_heavy_nr ", "1"); >>> ocr_engine.setImage(photo); >>> ocr_engine.setPageSegMode(TessBaseAPI.OEM_TESSERACT_CUBE_COMBINED); >>> String recognizedText = ocr_engine.getUTF8Text(); >>> >> >> >> Thank you for any help >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/c4e390d2-cb38-4b08-b713-39650fb45c34%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/c4e390d2-cb38-4b08-b713-39650fb45c34%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXrexgZEAiReML%2Bv%2B1Fe%3DJ-sjK7otFao4Od%2B7%2BM%2BHf9-Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.