See http://www.isri.unlv.edu/ISRI/OCRtk The ^ before a character indicates that it is "suspicious" in some sense to tesseract, and ~ indicates a reject. The output is in latin 1 instead of utf-8, and may not work at all for non-latin text. Ray.
On Mon, Aug 17, 2009 at 2:52 PM, jia <[email protected]> wrote: > > Hi, group, > > Here's an example of returned string (excluding double quotes) when > calling TessBaseAPI::TesseractRectUNLV(): > > "^L^0^v^e ^c^o^m^e^s ^a^n^d ^g^o^e^s > ^A^nd o^f^te^n it h^as paused, > Then ^c^o^m^e back t^0 ^see > The damage it ^ha^s caused." > > > I am not sure how I should interpret this result. I searched "UNLV" in > this group, and nothing shows up. I also google'd around a bit, and > there wasn't an obvious answer. Can someone explain what exactly UNLV- > style output is. > > Thanks. > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

