On Tue, 25 Apr 2006, [email protected] wrote:

> Please advise me as to availability of SANE with capability of scanning 
> documents to produce XML,

Hello Richard, Producing a text file is a function of the software which 
comes bundled with the scanner rather than the scanner itself.  Sane does 
not itself provide OCR, but calls gocr to produce a text file.  At level 
0.3.5, gocr supported output "formats" ISO8859_1 TeX HTML UTF8.  It would 
probably be better to call these "character encodings" rather than 
formats.  http://jocr.sourceforge.net (Note the j.)

My experience with gocr is that the text file requires human review and 
correction to be usable.  Commercial OCR does better but will never be 
100% accurate.

When you say "produce XML", do you mean "produce a valid marked-up 
document according to a given DTD"?

Roger

Reply via email to