[tesseract-ocr] Re: Why I am getting different results through GUI and programmatically?

Ahmad Chan Mon, 07 Apr 2014 17:07:24 -0700

Thanks Quan for reply. Yes, I have noticed with recycling I am getting 
comparatively better results, but  results are still not satisfactory as I 
am getting through  Tesseract - OCR GUI. I would like to know, what sort of 
preprocessing should I have to carry out before passing the image to 
Tesseract-OCR. In wiki guide its mentioned that Tesseract do some basic 
image processing at its own, but its not clear from guide, what sort of 
preprocessing  it performs. I want to know whether Tesseract-OCR convert a 
color image into black and white and do a little dancing or not. Moreover, 
it would be help if some can share links of Java code for image 
preprocessing. Thanks a lot


On Monday, April 7, 2014 8:37:10 PM UTC-3, Quan Nguyen wrote:
>
> It's likely the GUI programs have added some preprocessing on the image. 
> If you ran it directly with Tesseract executable, you would get results 
> similar to that of Tess4J.
>
> Rescaling your image to 300DPI will produce better output.
>
> https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality
>
> On Monday, April 7, 2014 11:05:02 AM UTC-5, Ahmad Chan wrote:
>>
>>   Hi,
>>
>> I am doing some experiments with Tesseract-OCR (3.02) to extract OCR 
>> (without training) from the pool of sequence images (sample is given 
>> below). The issue which I am currently facing is, I am getting almost 
>> correct results through GUI (http://sourceforge.net/projects/tesseract-gui/ 
>> <http://sourceforge.net/projects/tesseract-gui/%20>on Ubuntu and 
>> http://vietocr.sourceforge.net/ on windows) but with 50% accuracy when I 
>> use tess4J to get the OCR programmatically. Does anyone know the reason 
>> behind this? I have to get better results through the program. 
>>
>>
>>
>> <https://lh6.googleusercontent.com/-cNHxknh9iZc/U0LHgWnCz0I/AAAAAAAAARc/ujBaUVAHqPg/s1600/1.jpg>
>>
>>
>> <https://lh6.googleusercontent.com/-cNHxknh9iZc/U0LHgWnCz0I/AAAAAAAAARc/ujBaUVAHqPg/s1600/1.jpg><https://lh6.googleusercontent.com/-cNHxknh9iZc/U0LHgWnCz0I/AAAAAAAAARc/ujBaUVAHqPg/s1600/1.jpg>
>>
>> *OCR results using GUI*
>>
>>  CLUSTAL 2.0.2 multiple sequence alxgnment
>>
>> 907307 wvmqsscwrsascmmwnznmwcqLmm:u.wmswr::Qn'1vQz-rrm>rm'wn:L1'ns 60
>> PC7306 ———ELERSCYW'FSRSG!iNﬂ\DADNYCRLEDAELWVTSWEEQK!‘VQ1-D-IIGPVNTWMGLHDQ 
>> 216
>>
>>  
>>
>> PC7307 DESWIOJVDGTDYRHNYICNWAVTQPDVMHGHELGGSECVEVQPDGRWIDDFCLQVYEWVC 120
>> P07306 uspwxwvuarm51‘crmwwzqmnwrcacLsssmczuatrnnsnwlnnvcgmavnwvc 276
>>
>> PC7307 ex 122
>> PC7306 :— 277
>>
>> *       OCR result using Tess4J API (Programmtic access)*
>>
>> CLUSTAL 2.0.2 multxple sequence alignment
>>
>> 207307 .4 nnQGSCYWFSESGR7lWI\EAEKYC WINSVIEEQKFIVQHTMPFNTWIGLTD5 so
>>
>> E07306 ———n.EnscYw1~'sI\ss1vm\D1\Dmc 
>> wAm,wvTsIvE=Q!<rvQx-n-IIL:1>vuTm4GLI-11:0 216
>>
>> P0 7 3 0 7 .. AnNYIGWAVTQPDNWHGHELGGSIIDCVEVQPDGNIHIDDFC LQVY nwvc 12 0
>>
>> P07306 ALJILJ » . nwxﬁw 1. 276
>>
>> P0730‘! ax 122
>> P137306 :— 277I
>> *Second Question:* Do I  need any training to improve OCR result? The 
>> images which i have all using courier font (display in attached imageabove). 
>> Morever, i just need to extract the alphabets, no digits and special 
>> characters. Another important thing, alphabets string always comes without 
>> space. I tried to disable dictionary because i donot require but it did 
>> not help to imrpove my results. Any tip, technique will highly be 
>> appreciated which can help me to improve my results programmatically. 
>> Thanks 
>>
>>
>>
>>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: Why I am getting different results through GUI and programmatically?

Reply via email to