I didn't go deep into a problem of re-learning Tesseract-OCR because it's a 
problem to me at the moment to get the sufficient number of images in an 
initial quality that could cover all possible variations. So, eventually, I 
solved my problem in a more tricky way.

As things stand, positive and negative numbers in my data input are colored 
in different colors:

<https://lh3.googleusercontent.com/-neamZkYHLdo/VXQN3h28E_I/AAAAAAAAAIM/vGfS1SMQzpo/s1600/1clean-.bmp>

<https://lh3.googleusercontent.com/-EkF08ybcWc0/VXQN5XjsbaI/AAAAAAAAAIU/i7BVxmQP0yc/s1600/1clean%252B.bmp>


So, after converting them into grayscale in can be seen that negative 
numbers are more darker than positive numbers.


<https://lh3.googleusercontent.com/-5t0gQ9a_5f0/VXQN6r6iP8I/AAAAAAAAAIc/ep5N09rXBG4/s1600/2grey-.bmp>

<https://lh3.googleusercontent.com/-nUYaFYR3ol0/VXQN79FLFVI/AAAAAAAAAIk/ZqlRkLhpRug/s1600/2grey%252B.bmp>


After that I can detect the brightest pixel in the image and, if it's 
darker than some given value, I can determine if the number is negative. If 
I determine that, I just fill the left side of the image with the 
background color, so it can be recognized by Tesseract-OCR correctly.


<https://lh3.googleusercontent.com/-kdUtezlhdKQ/VXQN9coCAGI/AAAAAAAAAIs/mMBcRNDJ0fc/s1600/3cut-.bmp>

<https://lh3.googleusercontent.com/-YJaioswXVVE/VXQN_G9_KXI/AAAAAAAAAI0/yZZ4pajc988/s1600/3cut%252B.bmp>


Before recognition I also do some more image processing to additionally 
increase 
recognition quality.

<https://lh3.googleusercontent.com/-wBvUyWDKMGQ/VXQOASD6neI/AAAAAAAAAI8/ruSDN9jdtsE/s1600/4final-.bmp>

<https://lh3.googleusercontent.com/-HHXqiEDxokc/VXQOBhO7vcI/AAAAAAAAAJE/V7iiKqUz_So/s1600/4final%252B.bmp>


And finally, after recognition, I just concatenate the minus sign to the 
result value, if the number was detected as negative.


'-100'
>
'100' 
>

I believe it's not a very elegant and troubleproof solution, but at the 
moment it works fine for me and recognises 100% of my test dataset. Still, 
I'll be glad to receive any thoughts about detecting leading minus sign, if 
you have such. Thank you for reading.


Regards, Pavel Shcherbakov.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/786a02c6-63dd-41c6-85db-aa23fefc4eb2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to