On Tue, Mar 20, 2012 at 2:32 PM, Gytis <[email protected]> wrote:

> Hi all,
>
> first thank you for super software and FAQ on how to install it - went
> smoothly.
>
> The question is though, maybe anyone has any tips on how to prepare
> image for tesseract? I've read the FAQ on borders, resolution, etc,
> but i have one particular scenario were image prepared on my Mac with
> Preview works almost flawlessly (considering how bad the original jpg
> is) and if i try to do the same on linux machine with imagemagick
> convert function, the results are disappointing .
>
> Currently i have an image -
> http://dl.dropbox.com/u/12535857/tesseract/sample.jpg
>
> If i convert it on my Mac and apply auto-levels i get image like this
> http://dl.dropbox.com/u/12535857/tesseract/mac.tiff
>
> which is finely transformed into text like this:
>
> ‘or the short pastry:
> a|1.purp0se flour,
> 1% 613:5 more as needed
> stick) unsalted butter
> Z/6 cup sugar
> 5 egg yolks
> Salt
> l4 lb. (1
> For the filling:
> 6 oz. blanched almonds
> 6 large eggs, separated
> 1% cup sugar
> 1 pinch ground cinnamon
> Grated zest of 1 lemon
> ‘A cup pearjelly,
> warmed to liquld
> For the glaze:
> 1% cups sugar
> 107- (1 square) unsweetened
> l chocolate
>
>
> And if i run transform with imagemagick
>
> convert -compress none -auto-level -auto-gamma sample.jpg linux.tiff
>
> I get an image  like this
> http://dl.dropbox.com/u/12535857/tesseract/linux.tiff
> and the output is truly worse:
>
>
> .17 flied
> 3-‘butter
> nsugar
>  yolks
>  Salt
>  filling:
>  lalmonds
> Eseparated
> Q {cup sugar
>  cinnamon
> 10:1 lemon
>  1 pearjelly,
>  ; " _ ed to liquid
> ‘:5 i
>  the glazg:
> 3%‘; .
>  cups sugar
> imsweetened
> 311‘_’ - chocolate
>
> Maybe i should use some other tools ? or fine tune conversion? Tried
> playing with contrast, but that doesn't seem to help. Thanks!
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

Try this:
convert -colorspace gray -density 300 -sigmoidal-contrast 3,0%  -depth 8
sample.jpg im-sample.png

But I would pay more attention to get better image: uniform illumination
and straight text lines.

I just tried to fix warped lines in scantailor[1] (on windows but I
compiled it on linux without problem, and I expect Mac should not be
problem too) and it did great job - see sample.tif and tesseract output
(sample.txt). Anyway it looks like you will need to training this font (in
some extent).
BTW: Scantailor also has command line version.

Zdenko

[1] http://scantailor.sourceforge.net/

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
gr’
For the short pastry:
1% cups all-purpose flour,
plus more as needed
% lb. (l stick) unsalted butter
2/6 cup sugar
5 egg yolks
Salt
For the filling:
6 oz. blanched almonds
5 large eggs, separated
% cup sugar
1 pinch ground cinnamon
Grated zest of 1 lemon
‘/4 cup pearjelly,
warmed to liquid
For the glaze:
1% cups sugar
l oz. (1 square) unsweetened
chocolate

<<attachment: sample.tif>>

Reply via email to