Hi,

I am doing some experiments with Tesseract-OCR (3.02) to extract OCR 
(without training) from the pool of sequence images (sample is given 
below). The issue which I am currently facing is, I am getting almost 
correct results through GUI (http://sourceforge.net/projects/tesseract-gui/ 
<http://sourceforge.net/projects/tesseract-gui/%20>on Ubuntu and 
http://vietocr.sourceforge.net/ on windows) but with 50% accuracy when I 
use tess4J to get the OCR programmatically. Does anyone know the reason 
behind this? I have to get better results through the program. 


<https://lh6.googleusercontent.com/-cNHxknh9iZc/U0LHgWnCz0I/AAAAAAAAARc/ujBaUVAHqPg/s1600/1.jpg>

<https://lh6.googleusercontent.com/-cNHxknh9iZc/U0LHgWnCz0I/AAAAAAAAARc/ujBaUVAHqPg/s1600/1.jpg><https://lh6.googleusercontent.com/-cNHxknh9iZc/U0LHgWnCz0I/AAAAAAAAARc/ujBaUVAHqPg/s1600/1.jpg>

*OCR results using GUI*

 CLUSTAL 2.0.2 multiple sequence alxgnment

907307 wvmqsscwrsascmmwnznmwcqLmm:u.wmswr::Qn'1vQz-rrm>rm'wn:L1'ns 60
PC7306 ———ELERSCYW'FSRSG!iNfl\DADNYCRLEDAELWVTSWEEQK!‘VQ1-D-IIGPVNTWMGLHDQ 
216

 

PC7307 DESWIOJVDGTDYRHNYICNWAVTQPDVMHGHELGGSECVEVQPDGRWIDDFCLQVYEWVC 120
P07306 uspwxwvuarm51‘crmwwzqmnwrcacLsssmczuatrnnsnwlnnvcgmavnwvc 276

PC7307 ex 122
PC7306 :— 277

*       OCR result using Tess4J API (Programmtic access)*

CLUSTAL 2.0.2 multxple sequence alignment

207307 .4 nnQGSCYWFSESGR7lWI\EAEKYC WINSVIEEQKFIVQHTMPFNTWIGLTD5 so

E07306 ———n.EnscYw1~'sI\ss1vm\D1\Dmc 
wAm,wvTsIvE=Q!<rvQx-n-IIL:1>vuTm4GLI-11:0 216

P0 7 3 0 7 .. AnNYIGWAVTQPDNWHGHELGGSIIDCVEVQPDGNIHIDDFC LQVY nwvc 12 0

P07306 ALJILJ » . nwxfiw 1. 276

P0730‘! ax 122
P137306 :— 277I
*Second Question:* Do I  need any training to improve OCR result? The 
images which i have all using courier font (display in attached imageabove). 
Morever, i just need to extract the alphabets, no digits and special 
characters. Another important thing, alphabets string always comes without 
space. I tried to disable dictionary because i donot require but it did not 
help to imrpove my results. Any tip, technique will highly be appreciated 
which can help me to improve my results programmatically. Thanks 



-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to