On Tue, 25 Apr 2006, [email protected] wrote: > Please advise me as to availability of SANE with capability of scanning > documents to produce XML,
Hello Richard, Producing a text file is a function of the software which comes bundled with the scanner rather than the scanner itself. Sane does not itself provide OCR, but calls gocr to produce a text file. At level 0.3.5, gocr supported output "formats" ISO8859_1 TeX HTML UTF8. It would probably be better to call these "character encodings" rather than formats. http://jocr.sourceforge.net (Note the j.) My experience with gocr is that the text file requires human review and correction to be usable. Commercial OCR does better but will never be 100% accurate. When you say "produce XML", do you mean "produce a valid marked-up document according to a given DTD"? Roger
