I've made some more tests and found out that the problem with recognizing occurs only when minus is located at the beginning or at the end of the word - placing minus between digits recognizes just fine. Here are some test results. For making test images I used MS Paint 'Draw Text' tool, Arial font, 26pt.
*Test image 1* <https://lh3.googleusercontent.com/-dqS5C4-FJQI/VXIDnRD9qyI/AAAAAAAAAFA/16PF2xe-iTs/s1600/test-100-.bmp> Box file: > 4 35 1 59 25 0 > 0 65 1 82 25 0 > 0 84 1 101 25 0 > Visualization of the box file: <https://lh3.googleusercontent.com/-gxYb8r01K_Q/VXIDwt5augI/AAAAAAAAAFI/oLZ3nS4aHSc/s1600/test-100-box.bmp> *Test image 2* <https://lh3.googleusercontent.com/-PPTN-wW7e0s/VXID2rbRqLI/AAAAAAAAAFQ/7QoGsGqnen8/s1600/test-200-.bmp> Box file: > ~ 40 8 50 12 0 > 2 51 1 68 26 0 > 0 70 1 87 26 0 > 0 89 1 106 26 0 > Visualization: <https://lh3.googleusercontent.com/-LRkK2ZgZ-Dk/VXID5qE30hI/AAAAAAAAAFY/6b5MJnqz9aY/s1600/test-200-box.bmp> *Test image 3* <https://lh3.googleusercontent.com/-Ks9qfy5GDA0/VXIEGUKW_II/AAAAAAAAAFg/Ckbe27Rf4KE/s1600/test-300-.bmp> Box file: > ~ 35 8 45 12 0 > 3 46 1 63 26 0 > 0 65 1 82 26 0 > 0 84 1 101 26 0 > Visualization: <https://lh3.googleusercontent.com/-GFnXj9-OQcI/VXIEIbIJv0I/AAAAAAAAAFo/ZbLKFxv0CIs/s1600/test-300-box.bmp> *Test image 4* <https://lh3.googleusercontent.com/-VMg-uN6gXvo/VXIEKJEbwMI/AAAAAAAAAFw/sdk0Wt1S9zA/s1600/test-400%252B.bmp> Box file: > - 42 6 53 12 0 > 4 50 0 70 25 0 > 0 72 0 89 25 0 > 0 91 0 108 25 0 > Visualization: <https://lh3.googleusercontent.com/-GsgvahSaDOI/VXIEMVsKCvI/AAAAAAAAAF4/tShOUvUTU6Y/s1600/test-400%252Bbox.bmp> *Test image 5* <https://lh3.googleusercontent.com/-aUUHNFUcL_U/VXIEPDmcIRI/AAAAAAAAAGA/OjVT34eEd2c/s1600/test-500-.bmp> Box file: > ~ 40 7 50 11 0 > 5 51 0 68 25 0 > 0 70 0 87 25 0 > 0 89 0 106 25 0 > Visualization: <https://lh3.googleusercontent.com/-yOEscdTufR8/VXIEQVs8SNI/AAAAAAAAAGI/O65h5OaZY9Q/s1600/test-500-box.bmp> *Test image 6* <https://lh3.googleusercontent.com/-EgL6ujp69Hw/VXIERj6zPfI/AAAAAAAAAGQ/f0Ia0Y_mhMg/s1600/test-600-.bmp> Box file: > 4 39 1 60 26 0 > 5 55 1 67 26 0 > 0 69 1 86 26 0 > 0 88 1 105 26 0 > Visualization: <https://lh3.googleusercontent.com/-YEv0hQYDbCc/VXIES0500wI/AAAAAAAAAGY/kbLh1U_28tA/s1600/test-600-box.bmp> *Test image 7* <https://lh3.googleusercontent.com/-eHarD2VM8lg/VXIEUNEx2uI/AAAAAAAAAGg/5L6giRa1meg/s1600/test-700-.bmp> Box file: > ~ 43 7 53 11 0 > 7 54 0 71 25 0 > 0 73 0 90 25 0 > 0 92 0 109 25 0 > Visualization: <https://lh3.googleusercontent.com/--q6Hq0VPCCc/VXIEVlM_9KI/AAAAAAAAAGo/HiEeB1u8lNg/s1600/test-700-box.bmp> *Test image 8* <https://lh3.googleusercontent.com/-ZyIG91bgTs8/VXIEW_JMYdI/AAAAAAAAAGw/VDBAaYKzxQo/s1600/test-800-.bmp> Box file: > ~ 38 7 48 11 0 > 3 49 0 66 25 0 > 0 68 0 85 25 0 > 0 87 0 104 25 0 > Visualization: <https://lh3.googleusercontent.com/-ZUZB4pwVc1Q/VXIEX0ajscI/AAAAAAAAAG4/07nehm3bkuo/s1600/test-800-box.bmp> *Test image 9* <https://lh3.googleusercontent.com/--lo5G7x642k/VXIEZcpNAoI/AAAAAAAAAHA/9ZoqX6WhzPc/s1600/test-900-.bmp> Box file: > ~ 39 7 49 11 0 > 9 50 0 67 25 0 > 0 69 0 86 25 0 > 0 88 0 105 25 0 > Visualization: <https://lh3.googleusercontent.com/-q_A9Yl23yyA/VXIEag6XHiI/AAAAAAAAAHI/2VVd2N9peVY/s1600/test-900-box.bmp> *Test image 10* <https://lh3.googleusercontent.com/-4fYTUHzLNGY/VXIEcIxDY_I/AAAAAAAAAHQ/2h1fDFlnZBg/s1600/test-000-.bmp> Box file: > 4 40 0 61 25 0 > 3 56 0 68 25 0 > 0 70 0 87 25 0 > 0 89 0 106 25 0 > Visualization: <https://lh3.googleusercontent.com/-0NdixkH2v9w/VXIEdcuttPI/AAAAAAAAAHY/_Tfw20u6Q34/s1600/test-000-box.bmp> Note that only 4-th test image ('-400') recognized correctly. That intrigued me and after some experimenting I found out that this specific case is extremly fragile - moving the minus sign even on one pixel from it's position causing incorrect recognition results. So it can be called a fortuity that it recognized correctly. As an example, here are recognition results for a modified picture with a moved minus sign on one pixel to left. *Test image 11* <https://lh3.googleusercontent.com/-8GeSqJuvt_U/VXIEhKFn8aI/AAAAAAAAAHw/v9OX9EMjaj0/s1600/test-400divided-.bmp> Box file: > ~ 41 7 51 11 0 > 4 52 0 70 25 0 > 0 72 0 89 25 0 > 0 91 0 108 25 0 > Visualization: <https://lh3.googleusercontent.com/-TaXBzR2te3c/VXIEik0IPBI/AAAAAAAAAH4/IPtdaNSZqI0/s1600/test-400divided-box.bmp> And here's an example of a correct recognition as a relief. As I said before, a minus sign between digits recognizes perfectly all the time. *Test image 12* <https://lh3.googleusercontent.com/-XZ-901tr5wg/VXIEeTY5r1I/AAAAAAAAAHg/Hu-B0kxHykU/s1600/test02-02-92%252B.bmp> Box file: > 0 10 0 26 25 0 > 2 29 0 45 25 0 > - 49 7 59 11 0 > 0 60 0 76 25 0 > 2 79 0 95 25 0 > - 99 7 109 11 0 > 9 110 0 126 25 0 > 2 129 0 145 25 0 > Visualization: <https://lh3.googleusercontent.com/-k7rl0w_UTAc/VXIEfiYzt1I/AAAAAAAAAHo/x7EsqXGWLnw/s1600/test02-02-92%252Bbox.bmp> I've also tried to recognize different formats (jpg, png, tif, even gif), but none of them gave me correct results. I suppose that re-learning is the only reliable option to fix that inaccuracy, but I would appreciate any opinion. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ffd03b90-708b-439d-aefa-2d0179c5b6d6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

