Uwe,

Here are some further manipulations, perhaps they are useful. I'm also
under Debian (sid):

% xloadimage -identify fax000000095.tif
fax000000095.tif is a 3456x4677 single-plane black-on-white G4FAX TIFF
imageTitled "0389263661"

% xloadimage -identify fax95_cut.tif
fax95_cut.tif is a 1800x300 32-bit single-plane RGB standard TIFF
imageTitled "/tmp/fax95_cut.tif"

% convert fax95_cut.tif fax95_cut.pbm
% convert fax95_cut.pbm fax95_cut_bw.tif

% tesseract fax95_cut_bw.tif output && cat output.txt
Tesseract Open Source OCR Engine
AZLAN AT UNITEN DOT EDU DOT MY

% tesseract fax000000095.tif output -l eng && cat output.txt
Tesseract Open Source OCR Engine
AZLAN AT UNITEN DOT EDU DOT MY A

% tesseract fax000000095.tif output -l fra && cat output.txt
Tesseract Open Source OCR Engine
AZLAN AT UNITEN DOT EDU DOT MY *

Best regards,
Laird.


On Mar 14, 3:10 am, udippel <[email protected]> wrote:
> Following up on my earlier observations, I have now uploaded two
> files:
> fax000000095.tif
> which results in Debian (Lenny) as well as Ubuntu (8.10) with an extra
> 'A' at the end of the line using tesseract, One that definitively is
> not there at all ("... MY A", is what I get in Debian and Ubuntu; in
> case your tesseract works okay)
> fax95_cut.tif
> is the same image, with the area cut out in Gimp. Both on Debian and
> Ubuntu these result in an empty text file.
>
> Could someone please confirm, that these problems are limited to
> Debian/Ubuntu?
>
> Especially the first one seems to point to an imaging problem, rather
> than an OCR problem.
>
> Uwe
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to