I've kept messing around with it and tried inverting the image by commenting out the "if (high_value == 0)" test towards the end of read_tiff_image(). The value of TIFFTAG_PHOTOMETRIC is 1, so high_value is getting set to 1 and not zero (uinT8 high_value = photometric == 1;). I am near certain in my image that the lettering is zero-valued and the background is 255-valued. I do not know what this does to tesseract but it seems to agree with the comment in read_tiff_image() regarding the value of photometric (expcept my background is 255 and not 1). Not sure why this was causing the problem. Is the lettering supposed to be high-valued and the background zero-valued?
Whatever it means, it's working great for my testcases now with the inversion hardwired in. Maybe in a future release an input switch (-- invert-image) could be added so people could test this themselves if they find they are having trouble. Thanks, mods, for letting me post here and best of luck to you. On Jan 18, 11:58 pm, rutiger <[email protected]> wrote: > Can anyone help me out with why I am getting output like this: > > >> head -5 try0.txt > > % ~ E§@@m®%@ ®m Lm&@@ ii �...@iw§@i...@m > §wm@@i...@m im%§@@& : @w@@jy§ > % @&@§ @ @@;@@%m;@ i...@w@ f@@w > gw; > % ;m%§@;@A ® g...@im;@@ @@ @%m@@@& &m...@g@ > > When it should be: > > % - Execute on Image formation > function im_data = sar_proc(im_data, subdir) > % > % data - processing structure input from > gui > % im_data - pointer to general image > parameters > % subdiir - output directory for saving > > I am running Debian (Lenny) my system is > > Linux yournamehere 2.6.26-2-686 #1 SMP Wed Nov 4 20:45:37 UTC 2009 > i686 GNU/Linux > > The input is a high-res tif (400x400) with 16-point Courier New font. > To install the tesseract-ocr package I used wajig. To run it I used > > tesseract inputTif.tif outputBaseName
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

