I've kept messing around with it and tried inverting the image by
commenting out the "if (high_value == 0)" test towards the end of
read_tiff_image(). The value of TIFFTAG_PHOTOMETRIC is 1, so
high_value is getting set to 1 and not zero (uinT8 high_value =
photometric == 1;). I am near certain in my image that the lettering
is zero-valued and the background is 255-valued. I do not know what
this does to tesseract but it seems to agree with the comment in
read_tiff_image() regarding the value of photometric (expcept my
background is 255 and not 1). Not sure why this was causing the
problem. Is the lettering supposed to be high-valued and the
background zero-valued?

Whatever it means, it's working great for my testcases now with the
inversion hardwired in. Maybe in a future release an input switch (--
invert-image) could be added so people could test this themselves if
they find they are having trouble.

Thanks, mods, for letting me post here and best of luck to you.


On Jan 18, 11:58 pm, rutiger <[email protected]> wrote:
> Can anyone help me out with why I am getting output like this:
>
> >> head -5 try0.txt
>
> % ~ E§@@m®%@ ®m Lm&@@ ii �...@iw§@i...@m
> §wm@@i...@m im%§@@& :     @w@@jy§
> % @&@§ @   @@;@@%m;@ i...@w@ f@@w
> gw;
> % ;m%§@;@A ® g...@im;@@ @@ @%m@@@& &m...@g@
>
> When it should be:
>
> % - Execute on Image formation
> function im_data = sar_proc(im_data, subdir)
> %
> % data        - processing structure input from
> gui
> % im_data   - pointer to general image
> parameters
> % subdiir     - output directory for saving
>
> I am running Debian (Lenny) my system is
>
> Linux yournamehere 2.6.26-2-686 #1 SMP Wed Nov 4 20:45:37 UTC 2009
> i686 GNU/Linux
>
> The input is a high-res tif (400x400) with 16-point Courier New font.
> To install the tesseract-ocr package I used wajig. To run it I used
>
> tesseract inputTif.tif outputBaseName
-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.


Reply via email to