<https://lh3.googleusercontent.com/-AqN-fJd_3ZI/VXChL3kKEHI/AAAAAAAAAEk/R6FctBPztnI/s1600/test.bmp>
Hello, I faced a problem trying to recognize numbers with a leading minus sign, such as '-100', '-200' etc. My tests with various samples showed me that it's almost always recognized as '400', '100' and similar to it, but almost never includes leading minus sign itself. I need help to find out how can I improve my recognition quality. Here are some facts and things I've tried to get the correct output: 1. I'm using Tesseract 3.02.02 through C API functions (TessBaseAPICreate, TessBaseAPIInit3, TessBaseAPISetVariable, TessBaseAPIProcessPages, TessBaseAPIClear, TessBaseAPIEnd, TessBaseAPIDelete) with Delphi 7 on Windows 7 64-bit. 2. Image format is .bmp (I've not tried any other formats as I believe that bmp has no quality loss). 3. Before recognizing, I process my image with some decolorization and contrast enhancement for improving the quality of recognition. An example of a postprocessed image is attached. 4. I set character whitelist as '-0123456789' (I've also tried '--0123456789', '-0123456789 ' and even '-0' and '-' for a test - but minus wasn't recognized not even once). 5. I've tried to increase the size of image, but the result is still the same. 7. I've tried to stretch the image height (still '400') or width (terrible results like '111 0' or '11 -21'). 8. I've tried to "extend" the minus sign by drawing it in MS Paint by myself, but it also was not recognized correctly (only once I've drawn a "perfect" minus sign that was recognized correctly, but unfortunately I couldn't repeat it later). I've also tried to recognize text added through MS Paint 'Draw Text' tool (Calibri, Arial Black, Times New Roman, 20-72pt, black text on white background), and still no result. Which makes me think that the image itself could be OK, and it's something wrong with my usage of Tesseract itself. 9. Nonetheless, unsigned numbers and words (using other whitelist, of course) with same code recognizes just nicely. If there are some hints or actions that I could miss to get proper result, please let me know. Regards, Pavel Shcherbakov. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3fe0cc18-b9e9-4eca-88a0-4ea4449b6a1d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

