|
Hi friends,
OCR code's now been tweaked and tested to work in both WinXP and
Win9x.
This should work in unix as well.
Here is a summary:
1. Put ocrad 0.16 in the path
2. Change the following in ImageStripper.py
ocr = os.popen("ocrad -s %s -c %s -x %s < %s 2>ocrerr.txt"
%
(scale, charset, orf, pnmfile)) into this
ocr_cmd = ur'ocrad -s %s -c %s "%s"' % (scale, charset, pnmfile)
# os.popen3() returns [stdin, stdout, stderr]
ocr = os.popen3( ocr_cmd )[1]
3. Change this
if os.path.exists(program) and
is_executable(program):
into this
if os.path.exists(program +
".exe") or ( os.path.exists(program) and is_executable(program) ):
Because of the way the instruction is interpreted it does not produce
fatal errors even if the file is not found.
4. Change this
for line in
open(orf):
if line.startswith("lines"): nlines = int(line.split()[1]) if nlines: ctokens.add("image-text-lines:%d" % int(log2(nlines))) into this
nlines = ctext.count('\n') if nlines: ctokens.add("image-text-lines:%d" % nlines ) 5. Finally I sugest you change the default scale from 1 to 2 like
in this line
scale = options["Tokenizer",
"ocrad_scale"] or 2
Compile and enjoy.
Happy coding :)
Vibe
|
_______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
