I am doing the process of cleaning up and image using leptonica and then 
passing it to tesseract for OCR.However it is not able to recognize the 
characters 
even though the image is of high quality.The image specifications are as 
follows.

1 bpp, uncompressed, 1280 * 960 , 300dpi horizontal and vertical resolution


Following are the image processing operations I carry out in sequence using 
leptonica

pixConvertTo8pixBackgroundNormSimplepixOtsuAdaptiveThresholdpixContrastTRC 
{Regarding this - I am passing high values like 1.0 or even 5.0 but image 
doesnt really change}pixFindSkewpixRotate { rotate by angle found by 
pixFindSkew}
pixRotate90 {do this 4 times to read image in all 4 
orientations}pixClipRectangle {crop image}
Finally tesseract command.

I get garbage characters in the output.A sample Input Image is as follows

<https://lh4.googleusercontent.com/-kG9mHG4xOVQ/U9DQ7tIxibI/AAAAAAAABME/G88fZRRRCgU/s1600/90_cropped.tif>

The output that i get is as follows

Final K-1
II]
s h d | K-1 ,.,
(F°o.~?n‘i&1) 5/>.©12 mm E2‘;
Deparlrnenl of tho Treasury , ,
I 1 I l I
‘mama, Ravenuo SGMW For cnlundm your 201), ‘ " °F°$ "'100fTIO
or lax yum boqmnnnq 7 _ 20\Q_
‘ 7660
and ondmg _  W vv I go
Beneï¬ciary's Share of Income, Deductions,cl'editS, etc. F 800 buck 01 loam nnd 
lnstruoflons»
___lnformatI0n About mo Estate or Trust
‘ Ordmary d|v|dmi 12113
 _
‘; Quahfmd dlVIdG
\ 8132
3 1
Net shun-term
A Estate's at trust's omgiuym ldonnlmnluon numbol
56-0987654
B Estate's u trust‘: namo
ESTATE OF MARTHA SMITH
0 Fiduc§ary's name, address, clly, smlu‘ and /IP codo
N01 long~lerm c
\ 24043 u 
‘ 28% vale gann
Ti
Unreptumd 5
Omar porfloho 4nonbuslness lfll
/\..4........ L. ._.._ ,.

What Should i do to improve the accuracy.

Part 2:

I tried to follow this link 
<http://tesseract-ocr.googlecode.com/svn-history/trunk/doc/tesseract.1.html#_config_files_and_augmenting_with_user_data>.And
 created a eng.user-words.traineddata file and bazaar.train file and tried to 
run with "bazaar" as additional parameter.but i get "read_params_file: can't 
open bazaar error". Any suggestions?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/787a79de-4a34-4882-bbe4-a601cdb6cd32%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to