have you followed the suggestions given on

https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Mon, Mar 9, 2015 at 10:26 AM, Daniel <danieluc...@gmail.com> wrote:

> Hey Pierre, I'm trying to accomplish the same thing as you for my thesis.
> Could you tell me if you managed to preprocess images enough to read the
> documents? So far I've applied Unsharp Mask and Threshold as you did, I
> even fixed the skew angle by following this link
> http://felix.abecassis.me/2011/10/opencv-rotation-deskewing/ but haven't
> gotten acceptable results.
>
> What others filter can I use to make the image more readable for Tesseract?
>
> Attached is the image I'm testing (cropped and rotated), the two outcomes
> from the filters and the text resulting from Tesseract.
>
>
> On Thursday, July 17, 2014 at 7:36:50 AM UTC-5, Pierre-Henri DAUVERGNE
> wrote:
>>
>> Hello
>> I am relatively new to android development and I am working on an OCR app
>> that would take a picture of a document and get the text out of it (the
>> cameras could be from relatively old phones). During my research, I've
>> found that tesseract was the best API to use, so here I am :)
>>
>> I understand that the image needs this to be as good as possible :
>> - Binarization (having a picture in black and white)
>> - Without border (I'm using another library to crop the photo and process
>> only the part I want)
>> - Deskewing
>> - Training
>>
>> Others parameters that would influence, I guess, would be Scaling and
>> trying to recognize one character after another (I haven't looked that much
>> into it)
>>
>> But I can't find any documentation or people having the same issue as I
>> have :
>> I added the "eng.traineddata" in my project, but I don't feel like it's
>> being used or anything. I just added the file I found online but haven't
>> done anything else and tesseract seems to be having troubles reading
>> characters that appears to be fine (well, at least not that unaccurate). I
>> can't find any guide or tutorial online on "how to train tesseract for
>> android". Could anyone help ? I've understood that it would take time but
>> I'm willing to do it on my own.
>>
>> The other thing is about deskewing. Same idea : no guide nor tutorials
>> online and the Skew class doesn't seem to be working properly as it always
>> returns 0.0. Could anyone help ? ^^
>>
>> Thank you for your help, I hope I'm clear enough on my issues.
>>
>> I added a picture of the photos I'm taking and the cropped+binarized
>> result as well as the returning string (sorry it's not english but you can
>> see it's not really good :x)
>>
>>
>> Do you know how I could improve my picture preprocessing ? As you can
>> see, there's still a lot of noise around the characters.
>>
>>
>>
>> This is what I'm doing so far :
>>
>> photo = 
>> WriteFile.writeBitmap(AdaptiveMap.backgroundNormMorph(ReadFile.readBitmap(photo)));
>>> // locally adaptive; preparation to binarize
>>> photo = 
>>> WriteFile.writeBitmap(Binarize.otsuAdaptiveThreshold(ReadFile.readBitmap(photo)));
>>> // locally adaptive; special binarization methods
>>> photo = 
>>> WriteFile.writeBitmap(Enhance.unsharpMasking(ReadFile.readBitmap(photo),
>>> 1, (float) 0.5));  //im not sure about those parameters
>>>
>>> ocr_engine.setVariable("textord_max_noise_size", "3");
>>> ocr_engine.setVariable("textord_heavy_nr ", "1");
>>> ocr_engine.setImage(photo);
>>> ocr_engine.setPageSegMode(TessBaseAPI.OEM_TESSERACT_CUBE_COMBINED);
>>> String recognizedText = ocr_engine.getUTF8Text();
>>>
>>
>>
>> Thank you for any help
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/c4e390d2-cb38-4b08-b713-39650fb45c34%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/c4e390d2-cb38-4b08-b713-39650fb45c34%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXrexgZEAiReML%2Bv%2B1Fe%3DJ-sjK7otFao4Od%2B7%2BM%2BHf9-Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to