Thanks Quan for reply. Yes, I have noticed with recycling I am getting comparatively better results, but results are still not satisfactory as I am getting through Tesseract - OCR GUI. I would like to know, what sort of preprocessing should I have to carry out before passing the image to Tesseract-OCR. In wiki guide its mentioned that Tesseract do some basic image processing at its own, but its not clear from guide, what sort of preprocessing it performs. I want to know whether Tesseract-OCR convert a color image into black and white and do a little dancing or not. Moreover, it would be help if some can share links of Java code for image preprocessing. Thanks a lot
On Monday, April 7, 2014 8:37:10 PM UTC-3, Quan Nguyen wrote: > > It's likely the GUI programs have added some preprocessing on the image. > If you ran it directly with Tesseract executable, you would get results > similar to that of Tess4J. > > Rescaling your image to 300DPI will produce better output. > > https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality > > On Monday, April 7, 2014 11:05:02 AM UTC-5, Ahmad Chan wrote: >> >> Hi, >> >> I am doing some experiments with Tesseract-OCR (3.02) to extract OCR >> (without training) from the pool of sequence images (sample is given >> below). The issue which I am currently facing is, I am getting almost >> correct results through GUI (http://sourceforge.net/projects/tesseract-gui/ >> <http://sourceforge.net/projects/tesseract-gui/%20>on Ubuntu and >> http://vietocr.sourceforge.net/ on windows) but with 50% accuracy when I >> use tess4J to get the OCR programmatically. Does anyone know the reason >> behind this? I have to get better results through the program. >> >> >> >> <https://lh6.googleusercontent.com/-cNHxknh9iZc/U0LHgWnCz0I/AAAAAAAAARc/ujBaUVAHqPg/s1600/1.jpg> >> >> >> <https://lh6.googleusercontent.com/-cNHxknh9iZc/U0LHgWnCz0I/AAAAAAAAARc/ujBaUVAHqPg/s1600/1.jpg><https://lh6.googleusercontent.com/-cNHxknh9iZc/U0LHgWnCz0I/AAAAAAAAARc/ujBaUVAHqPg/s1600/1.jpg> >> >> *OCR results using GUI* >> >> CLUSTAL 2.0.2 multiple sequence alxgnment >> >> 907307 wvmqsscwrsascmmwnznmwcqLmm:u.wmswr::Qn'1vQz-rrm>rm'wn:L1'ns 60 >> PC7306 ———ELERSCYW'FSRSG!iNfl\DADNYCRLEDAELWVTSWEEQK!‘VQ1-D-IIGPVNTWMGLHDQ >> 216 >> >> >> >> PC7307 DESWIOJVDGTDYRHNYICNWAVTQPDVMHGHELGGSECVEVQPDGRWIDDFCLQVYEWVC 120 >> P07306 uspwxwvuarm51‘crmwwzqmnwrcacLsssmczuatrnnsnwlnnvcgmavnwvc 276 >> >> PC7307 ex 122 >> PC7306 :— 277 >> >> * OCR result using Tess4J API (Programmtic access)* >> >> CLUSTAL 2.0.2 multxple sequence alignment >> >> 207307 .4 nnQGSCYWFSESGR7lWI\EAEKYC WINSVIEEQKFIVQHTMPFNTWIGLTD5 so >> >> E07306 ———n.EnscYw1~'sI\ss1vm\D1\Dmc >> wAm,wvTsIvE=Q!<rvQx-n-IIL:1>vuTm4GLI-11:0 216 >> >> P0 7 3 0 7 .. AnNYIGWAVTQPDNWHGHELGGSIIDCVEVQPDGNIHIDDFC LQVY nwvc 12 0 >> >> P07306 ALJILJ » . nwxfiw 1. 276 >> >> P0730‘! ax 122 >> P137306 :— 277I >> *Second Question:* Do I need any training to improve OCR result? The >> images which i have all using courier font (display in attached imageabove). >> Morever, i just need to extract the alphabets, no digits and special >> characters. Another important thing, alphabets string always comes without >> space. I tried to disable dictionary because i donot require but it did >> not help to imrpove my results. Any tip, technique will highly be >> appreciated which can help me to improve my results programmatically. >> Thanks >> >> >> >> -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.

