<https://lh3.googleusercontent.com/-AqN-fJd_3ZI/VXChL3kKEHI/AAAAAAAAAEk/R6FctBPztnI/s1600/test.bmp>

Hello,

I faced a problem trying to recognize numbers with a leading minus sign, 
such as '-100', '-200' etc. My tests with various samples showed me that 
it's almost always recognized as '400', '100' and similar to it, but almost 
never includes leading minus sign itself. I need help to find out how can I 
improve my recognition quality.

Here are some facts and things I've tried to get the correct output:

1. I'm using Tesseract 3.02.02 through C API functions (TessBaseAPICreate, 
TessBaseAPIInit3, TessBaseAPISetVariable, TessBaseAPIProcessPages, 
TessBaseAPIClear, TessBaseAPIEnd, TessBaseAPIDelete) with Delphi 7 on 
Windows 7 64-bit.
2. Image format is .bmp (I've not tried any other formats as I believe that 
bmp has no quality loss).
3. Before recognizing, I process my image with some decolorization and contrast 
enhancement for improving the quality of recognition. An example of a 
postprocessed image is attached.
4. I set character whitelist as '-0123456789' (I've also tried 
'--0123456789', '-0123456789 ' and even '-0' and '-' for a test - but minus 
wasn't recognized not even once).
5. I've tried to increase the size of image, but the result is still the 
same.
7. I've tried to stretch the image height (still '400') or width (terrible 
results like '111 0' or '11 -21').
8. I've tried to "extend" the minus sign by drawing it in MS Paint by 
myself, but it also was not recognized correctly (only once I've drawn a 
"perfect" minus sign that was recognized correctly, but unfortunately I 
couldn't repeat it later). I've also tried to recognize text added through 
MS Paint 'Draw Text' tool (Calibri, Arial Black, Times New Roman, 20-72pt, 
black text on white background), and still no result. Which makes me think 
that the image itself could be OK, and it's something wrong with my usage 
of Tesseract itself.
9. Nonetheless, unsigned numbers and words (using other whitelist, of 
course) with same code recognizes just nicely.

If there are some hints or actions that I could miss to get proper result, 
please let me know.

Regards, Pavel Shcherbakov.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/3fe0cc18-b9e9-4eca-88a0-4ea4449b6a1d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to