[tesseract-ocr] Re: Why I am getting different results through GUI and programmatically?

Quan Nguyen Mon, 07 Apr 2014 17:52:26 -0700

It depends on the image. The ImproveQuality page lists some types of 
preprocessing you can employ.


The following pages contain many useful filter routines in Java:

http://www.java2s.com/Code/Java/Advanced-Graphics/Image.htm
http://www.jhlabs.com/ip/filters/

On Monday, April 7, 2014 7:06:55 PM UTC-5, Ahmad Chan wrote:
>
> Thanks Quan for reply. Yes, I have noticed with recycling I am getting 
> comparatively better results, but  results are still not satisfactory as I 
> am getting through  Tesseract - OCR GUI. I would like to know, what sort 
> of preprocessing should I have to carry out before passing the image to 
> Tesseract-OCR. In wiki guide its mentioned that Tesseract do some basic 
> image processing at its own, but its not clear from guide, what sort of 
> preprocessing  it performs. I want to know whether Tesseract-OCR convert a 
> color image into black and white and do a little dancing or not. Moreover, 
> it would be help if some can share links of Java code for image 
> preprocessing. Thanks a lot
>
> On Monday, April 7, 2014 8:37:10 PM UTC-3, Quan Nguyen wrote:
>>
>> It's likely the GUI programs have added some preprocessing on the image. 
>> If you ran it directly with Tesseract executable, you would get results 
>> similar to that of Tess4J.
>>
>> Rescaling your image to 300DPI will produce better output.
>>
>> https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality
>>
>> On Monday, April 7, 2014 11:05:02 AM UTC-5, Ahmad Chan wrote:
>>>
>>>   Hi,
>>>
>>> I am doing some experiments with Tesseract-OCR (3.02) to extract OCR 
>>> (without training) from the pool of sequence images (sample is given 
>>> below). The issue which I am currently facing is, I am getting almost 
>>> correct results through GUI (http://sourceforge.net/projects/tesseract-gui/ 
>>> <http://sourceforge.net/projects/tesseract-gui/%20>on Ubuntu and 
>>> http://vietocr.sourceforge.net/ on windows) but with 50% accuracy when 
>>> I use tess4J to get the OCR programmatically. Does anyone know the reason 
>>> behind this? I have to get better results through the program. 
>>>
>>>
>>>
>>> <https://lh6.googleusercontent.com/-cNHxknh9iZc/U0LHgWnCz0I/AAAAAAAAARc/ujBaUVAHqPg/s1600/1.jpg>
>>>
>>>
>>> <https://lh6.googleusercontent.com/-cNHxknh9iZc/U0LHgWnCz0I/AAAAAAAAARc/ujBaUVAHqPg/s1600/1.jpg><https://lh6.googleusercontent.com/-cNHxknh9iZc/U0LHgWnCz0I/AAAAAAAAARc/ujBaUVAHqPg/s1600/1.jpg>
>>>
>>> *OCR results using GUI*
>>>
>>>  CLUSTAL 2.0.2 multiple sequence alxgnment
>>>
>>> 907307 wvmqsscwrsascmmwnznmwcqLmm:u.wmswr::Qn'1vQz-rrm>rm'wn:L1'ns 60
>>> PC7306 
>>> ———ELERSCYW'FSRSG!iNﬂ\DADNYCRLEDAELWVTSWEEQK!‘VQ1-D-IIGPVNTWMGLHDQ 216
>>>
>>>  
>>>
>>> PC7307 DESWIOJVDGTDYRHNYICNWAVTQPDVMHGHELGGSECVEVQPDGRWIDDFCLQVYEWVC 120
>>> P07306 uspwxwvuarm51‘crmwwzqmnwrcacLsssmczuatrnnsnwlnnvcgmavnwvc 276
>>>
>>> PC7307 ex 122
>>> PC7306 :— 277
>>>
>>> *       OCR result using Tess4J API (Programmtic access)*
>>>
>>> CLUSTAL 2.0.2 multxple sequence alignment
>>>
>>> 207307 .4 nnQGSCYWFSESGR7lWI\EAEKYC WINSVIEEQKFIVQHTMPFNTWIGLTD5 so
>>>
>>> E07306 ———n.EnscYw1~'sI\ss1vm\D1\Dmc 
>>> wAm,wvTsIvE=Q!<rvQx-n-IIL:1>vuTm4GLI-11:0 216
>>>
>>> P0 7 3 0 7 .. AnNYIGWAVTQPDNWHGHELGGSIIDCVEVQPDGNIHIDDFC LQVY nwvc 12 0
>>>
>>> P07306 ALJILJ » . nwxﬁw 1. 276
>>>
>>> P0730‘! ax 122
>>> P137306 :— 277I
>>> *Second Question:* Do I  need any training to improve OCR result? The 
>>> images which i have all using courier font (display in attached 
>>> imageabove). 
>>> Morever, i just need to extract the alphabets, no digits and special 
>>> characters. Another important thing, alphabets string always comes without 
>>> space. I tried to disable dictionary because i donot require but it did 
>>> not help to imrpove my results. Any tip, technique will highly be 
>>> appreciated which can help me to improve my results programmatically. 
>>> Thanks 
>>>
>>>
>>>
>>>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: Why I am getting different results through GUI and programmatically?

Reply via email to