On Fri, Feb 22, 2013 at 03:20:49PM +0000, Nick White wrote:
> On Sun, Jun 03, 2012 at 10:27:23PM +0100, zdenko podobny wrote:
> > it looks like it is ASCII only oriented (at least in report non-ASCII are
> > malformed...), ftk has only binary distribution, so no possible fix can
> > expected...
> > 
> > BTW: tools are at new place: 
> > http://code.google.com/p/isri-ocr-evaluation-tools
> > ; report can be found at stephenvrice.com/images/AT-1995.pdf
> 
> I finally got around to working with these tools a bit. It seems
> that they do process unicode correctly (though I haven't tested
> combined characters, and suspect that may not work). You're correct
> the reports don't seem to output unicode properly, but that's
> probably easily fixed.

Right, I created a workaround to enable at least the 'accuracy' tool
(which is the really important one) to work fine with UTF-8. It's a
script called utf8toolwrap.sh; if you're interested, check it out;
it's attached to this issue:
https://code.google.com/p/isri-ocr-evaluation-tools/issues/detail?id=2

It makes the 'accuracy' tool actually very useful; it shows how
common various misrecognitions are - very useful for potential
unicharambigs rules :)

Nick

P.S. It requires a Linux-ish environment, and the tools asc2uni and
uni2asc from the isri toolkit to be available on the PATH.

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to