On Tue, Mar 20, 2012 at 2:32 PM, Gytis <[email protected]> wrote: > Hi all, > > first thank you for super software and FAQ on how to install it - went > smoothly. > > The question is though, maybe anyone has any tips on how to prepare > image for tesseract? I've read the FAQ on borders, resolution, etc, > but i have one particular scenario were image prepared on my Mac with > Preview works almost flawlessly (considering how bad the original jpg > is) and if i try to do the same on linux machine with imagemagick > convert function, the results are disappointing . > > Currently i have an image - > http://dl.dropbox.com/u/12535857/tesseract/sample.jpg > > If i convert it on my Mac and apply auto-levels i get image like this > http://dl.dropbox.com/u/12535857/tesseract/mac.tiff > > which is finely transformed into text like this: > > ‘or the short pastry: > a|1.purp0se flour, > 1% 613:5 more as needed > stick) unsalted butter > Z/6 cup sugar > 5 egg yolks > Salt > l4 lb. (1 > For the filling: > 6 oz. blanched almonds > 6 large eggs, separated > 1% cup sugar > 1 pinch ground cinnamon > Grated zest of 1 lemon > ‘A cup pearjelly, > warmed to liquld > For the glaze: > 1% cups sugar > 107- (1 square) unsweetened > l chocolate > > > And if i run transform with imagemagick > > convert -compress none -auto-level -auto-gamma sample.jpg linux.tiff > > I get an image like this > http://dl.dropbox.com/u/12535857/tesseract/linux.tiff > and the output is truly worse: > > > .17 flied > 3-‘butter > nsugar > yolks > Salt > filling: > lalmonds > Eseparated > Q {cup sugar > cinnamon > 10:1 lemon > 1 pearjelly, > ; " _ ed to liquid > ‘:5 i > the glazg: > 3%‘; . > cups sugar > imsweetened > 311‘_’ - chocolate > > Maybe i should use some other tools ? or fine tune conversion? Tried > playing with contrast, but that doesn't seem to help. Thanks! > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en >
Try this: convert -colorspace gray -density 300 -sigmoidal-contrast 3,0% -depth 8 sample.jpg im-sample.png But I would pay more attention to get better image: uniform illumination and straight text lines. I just tried to fix warped lines in scantailor[1] (on windows but I compiled it on linux without problem, and I expect Mac should not be problem too) and it did great job - see sample.tif and tesseract output (sample.txt). Anyway it looks like you will need to training this font (in some extent). BTW: Scantailor also has command line version. Zdenko [1] http://scantailor.sourceforge.net/ -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en
gr’ For the short pastry: 1% cups all-purpose flour, plus more as needed % lb. (l stick) unsalted butter 2/6 cup sugar 5 egg yolks Salt For the filling: 6 oz. blanched almonds 5 large eggs, separated % cup sugar 1 pinch ground cinnamon Grated zest of 1 lemon ‘/4 cup pearjelly, warmed to liquid For the glaze: 1% cups sugar l oz. (1 square) unsweetened chocolate
<<attachment: sample.tif>>

