On Fri, Feb 22, 2013 at 03:20:49PM +0000, Nick White wrote: > On Sun, Jun 03, 2012 at 10:27:23PM +0100, zdenko podobny wrote: > > it looks like it is ASCII only oriented (at least in report non-ASCII are > > malformed...), ftk has only binary distribution, so no possible fix can > > expected... > > > > BTW: tools are at new place: > > http://code.google.com/p/isri-ocr-evaluation-tools > > ; report can be found at stephenvrice.com/images/AT-1995.pdf > > I finally got around to working with these tools a bit. It seems > that they do process unicode correctly (though I haven't tested > combined characters, and suspect that may not work). You're correct > the reports don't seem to output unicode properly, but that's > probably easily fixed.
Right, I created a workaround to enable at least the 'accuracy' tool (which is the really important one) to work fine with UTF-8. It's a script called utf8toolwrap.sh; if you're interested, check it out; it's attached to this issue: https://code.google.com/p/isri-ocr-evaluation-tools/issues/detail?id=2 It makes the 'accuracy' tool actually very useful; it shows how common various misrecognitions are - very useful for potential unicharambigs rules :) Nick P.S. It requires a Linux-ish environment, and the tools asc2uni and uni2asc from the isri toolkit to be available on the PATH. -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

