It's likely the GUI programs have added some preprocessing on the image. If 
you ran it directly with Tesseract executable, you would get results 
similar to that of Tess4J.

Rescaling your image to 300DPI will produce better output.

https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality

On Monday, April 7, 2014 11:05:02 AM UTC-5, Ahmad Chan wrote:
>
>   Hi,
>
> I am doing some experiments with Tesseract-OCR (3.02) to extract OCR 
> (without training) from the pool of sequence images (sample is given 
> below). The issue which I am currently facing is, I am getting almost 
> correct results through GUI (http://sourceforge.net/projects/tesseract-gui/ 
> <http://sourceforge.net/projects/tesseract-gui/%20>on Ubuntu and 
> http://vietocr.sourceforge.net/ on windows) but with 50% accuracy when I 
> use tess4J to get the OCR programmatically. Does anyone know the reason 
> behind this? I have to get better results through the program. 
>
>
>
> <https://lh6.googleusercontent.com/-cNHxknh9iZc/U0LHgWnCz0I/AAAAAAAAARc/ujBaUVAHqPg/s1600/1.jpg>
>
>
> <https://lh6.googleusercontent.com/-cNHxknh9iZc/U0LHgWnCz0I/AAAAAAAAARc/ujBaUVAHqPg/s1600/1.jpg><https://lh6.googleusercontent.com/-cNHxknh9iZc/U0LHgWnCz0I/AAAAAAAAARc/ujBaUVAHqPg/s1600/1.jpg>
>
> *OCR results using GUI*
>
>  CLUSTAL 2.0.2 multiple sequence alxgnment
>
> 907307 wvmqsscwrsascmmwnznmwcqLmm:u.wmswr::Qn'1vQz-rrm>rm'wn:L1'ns 60
> PC7306 ———ELERSCYW'FSRSG!iNfl\DADNYCRLEDAELWVTSWEEQK!‘VQ1-D-IIGPVNTWMGLHDQ 
> 216
>
>  
>
> PC7307 DESWIOJVDGTDYRHNYICNWAVTQPDVMHGHELGGSECVEVQPDGRWIDDFCLQVYEWVC 120
> P07306 uspwxwvuarm51‘crmwwzqmnwrcacLsssmczuatrnnsnwlnnvcgmavnwvc 276
>
> PC7307 ex 122
> PC7306 :— 277
>
> *       OCR result using Tess4J API (Programmtic access)*
>
> CLUSTAL 2.0.2 multxple sequence alignment
>
> 207307 .4 nnQGSCYWFSESGR7lWI\EAEKYC WINSVIEEQKFIVQHTMPFNTWIGLTD5 so
>
> E07306 ———n.EnscYw1~'sI\ss1vm\D1\Dmc 
> wAm,wvTsIvE=Q!<rvQx-n-IIL:1>vuTm4GLI-11:0 216
>
> P0 7 3 0 7 .. AnNYIGWAVTQPDNWHGHELGGSIIDCVEVQPDGNIHIDDFC LQVY nwvc 12 0
>
> P07306 ALJILJ » . nwxfiw 1. 276
>
> P0730‘! ax 122
> P137306 :— 277I
> *Second Question:* Do I  need any training to improve OCR result? The 
> images which i have all using courier font (display in attached imageabove). 
> Morever, i just need to extract the alphabets, no digits and special 
> characters. Another important thing, alphabets string always comes without 
> space. I tried to disable dictionary because i donot require but it did 
> not help to imrpove my results. Any tip, technique will highly be 
> appreciated which can help me to improve my results programmatically. 
> Thanks 
>
>
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to