*Denoising not dancing.**I apologize for my mistake*. On Monday, April 7, 2014 9:06:55 PM UTC-3, Ahmad Chan wrote: > > Thanks Quan for reply. Yes, I have noticed with recycling I am getting > comparatively better results, but results are still not satisfactory as I > am getting through Tesseract - OCR GUI. I would like to know, what sort > of preprocessing should I have to carry out before passing the image to > Tesseract-OCR. In wiki guide its mentioned that Tesseract do some basic > image processing at its own, but its not clear from guide, what sort of > preprocessing it performs. I want to know whether Tesseract-OCR convert a > color image into black and white and do a little dancing or not. Moreover, > it would be help if some can share links of Java code for image > preprocessing. Thanks a lot > > On Monday, April 7, 2014 8:37:10 PM UTC-3, Quan Nguyen wrote: >> >> It's likely the GUI programs have added some preprocessing on the image. >> If you ran it directly with Tesseract executable, you would get results >> similar to that of Tess4J. >> >> Rescaling your image to 300DPI will produce better output. >> >> https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality >> >> On Monday, April 7, 2014 11:05:02 AM UTC-5, Ahmad Chan wrote: >>> >>> Hi, >>> >>> I am doing some experiments with Tesseract-OCR (3.02) to extract OCR >>> (without training) from the pool of sequence images (sample is given >>> below). The issue which I am currently facing is, I am getting almost >>> correct results through GUI (http://sourceforge.net/projects/tesseract-gui/ >>> <http://sourceforge.net/projects/tesseract-gui/%20>on Ubuntu and >>> http://vietocr.sourceforge.net/ on windows) but with 50% accuracy when >>> I use tess4J to get the OCR programmatically. Does anyone know the reason >>> behind this? I have to get better results through the program. >>> >>> >>> >>> <https://lh6.googleusercontent.com/-cNHxknh9iZc/U0LHgWnCz0I/AAAAAAAAARc/ujBaUVAHqPg/s1600/1.jpg> >>> >>> >>> <https://lh6.googleusercontent.com/-cNHxknh9iZc/U0LHgWnCz0I/AAAAAAAAARc/ujBaUVAHqPg/s1600/1.jpg><https://lh6.googleusercontent.com/-cNHxknh9iZc/U0LHgWnCz0I/AAAAAAAAARc/ujBaUVAHqPg/s1600/1.jpg> >>> >>> *OCR results using GUI* >>> >>> CLUSTAL 2.0.2 multiple sequence alxgnment >>> >>> 907307 wvmqsscwrsascmmwnznmwcqLmm:u.wmswr::Qn'1vQz-rrm>rm'wn:L1'ns 60 >>> PC7306 >>> ———ELERSCYW'FSRSG!iNfl\DADNYCRLEDAELWVTSWEEQK!‘VQ1-D-IIGPVNTWMGLHDQ 216 >>> >>> >>> >>> PC7307 DESWIOJVDGTDYRHNYICNWAVTQPDVMHGHELGGSECVEVQPDGRWIDDFCLQVYEWVC 120 >>> P07306 uspwxwvuarm51‘crmwwzqmnwrcacLsssmczuatrnnsnwlnnvcgmavnwvc 276 >>> >>> PC7307 ex 122 >>> PC7306 :— 277 >>> >>> * OCR result using Tess4J API (Programmtic access)* >>> >>> CLUSTAL 2.0.2 multxple sequence alignment >>> >>> 207307 .4 nnQGSCYWFSESGR7lWI\EAEKYC WINSVIEEQKFIVQHTMPFNTWIGLTD5 so >>> >>> E07306 ———n.EnscYw1~'sI\ss1vm\D1\Dmc >>> wAm,wvTsIvE=Q!<rvQx-n-IIL:1>vuTm4GLI-11:0 216 >>> >>> P0 7 3 0 7 .. AnNYIGWAVTQPDNWHGHELGGSIIDCVEVQPDGNIHIDDFC LQVY nwvc 12 0 >>> >>> P07306 ALJILJ » . nwxfiw 1. 276 >>> >>> P0730‘! ax 122 >>> P137306 :— 277I >>> *Second Question:* Do I need any training to improve OCR result? The >>> images which i have all using courier font (display in attached >>> imageabove). >>> Morever, i just need to extract the alphabets, no digits and special >>> characters. Another important thing, alphabets string always comes without >>> space. I tried to disable dictionary because i donot require but it did >>> not help to imrpove my results. Any tip, technique will highly be >>> appreciated which can help me to improve my results programmatically. >>> Thanks >>> >>> >>> >>>
-- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.

