I've made some more tests and found out that the problem with recognizing 
occurs only when minus is located at the beginning or at the end of the 
word - placing minus between digits recognizes just fine.
Here are some test results. For making test images I used MS Paint 'Draw 
Text' tool, Arial font, 26pt.
 


*Test image 1*

<https://lh3.googleusercontent.com/-dqS5C4-FJQI/VXIDnRD9qyI/AAAAAAAAAFA/16PF2xe-iTs/s1600/test-100-.bmp>


Box file:

> 4 35 1 59 25 0
> 0 65 1 82 25 0
> 0 84 1 101 25 0
>

Visualization of the box file:

<https://lh3.googleusercontent.com/-gxYb8r01K_Q/VXIDwt5augI/AAAAAAAAAFI/oLZ3nS4aHSc/s1600/test-100-box.bmp>


*Test image 2*


<https://lh3.googleusercontent.com/-PPTN-wW7e0s/VXID2rbRqLI/AAAAAAAAAFQ/7QoGsGqnen8/s1600/test-200-.bmp>


Box file:

> ~ 40 8 50 12 0
> 2 51 1 68 26 0
> 0 70 1 87 26 0
> 0 89 1 106 26 0
>

Visualization:

<https://lh3.googleusercontent.com/-LRkK2ZgZ-Dk/VXID5qE30hI/AAAAAAAAAFY/6b5MJnqz9aY/s1600/test-200-box.bmp>


*Test image 3*


 
<https://lh3.googleusercontent.com/-Ks9qfy5GDA0/VXIEGUKW_II/AAAAAAAAAFg/Ckbe27Rf4KE/s1600/test-300-.bmp>


Box file:

> ~ 35 8 45 12 0
> 3 46 1 63 26 0
> 0 65 1 82 26 0
> 0 84 1 101 26 0
>

Visualization:

<https://lh3.googleusercontent.com/-GFnXj9-OQcI/VXIEIbIJv0I/AAAAAAAAAFo/ZbLKFxv0CIs/s1600/test-300-box.bmp>


*Test image 4*


<https://lh3.googleusercontent.com/-VMg-uN6gXvo/VXIEKJEbwMI/AAAAAAAAAFw/sdk0Wt1S9zA/s1600/test-400%252B.bmp>


Box file:

> - 42 6 53 12 0
> 4 50 0 70 25 0
> 0 72 0 89 25 0
> 0 91 0 108 25 0
>

Visualization:

<https://lh3.googleusercontent.com/-GsgvahSaDOI/VXIEMVsKCvI/AAAAAAAAAF4/tShOUvUTU6Y/s1600/test-400%252Bbox.bmp>


*Test image 5*


 
<https://lh3.googleusercontent.com/-aUUHNFUcL_U/VXIEPDmcIRI/AAAAAAAAAGA/OjVT34eEd2c/s1600/test-500-.bmp>


Box file:

> ~ 40 7 50 11 0
> 5 51 0 68 25 0
> 0 70 0 87 25 0
> 0 89 0 106 25 0
>

Visualization:

<https://lh3.googleusercontent.com/-yOEscdTufR8/VXIEQVs8SNI/AAAAAAAAAGI/O65h5OaZY9Q/s1600/test-500-box.bmp>


*Test image 6*


 
<https://lh3.googleusercontent.com/-EgL6ujp69Hw/VXIERj6zPfI/AAAAAAAAAGQ/f0Ia0Y_mhMg/s1600/test-600-.bmp>


Box file:

> 4 39 1 60 26 0
> 5 55 1 67 26 0
> 0 69 1 86 26 0
> 0 88 1 105 26 0
>

Visualization:

<https://lh3.googleusercontent.com/-YEv0hQYDbCc/VXIES0500wI/AAAAAAAAAGY/kbLh1U_28tA/s1600/test-600-box.bmp>


*Test image 7*


 
<https://lh3.googleusercontent.com/-eHarD2VM8lg/VXIEUNEx2uI/AAAAAAAAAGg/5L6giRa1meg/s1600/test-700-.bmp>


Box file:

> ~ 43 7 53 11 0
> 7 54 0 71 25 0
> 0 73 0 90 25 0
> 0 92 0 109 25 0
>

Visualization:

<https://lh3.googleusercontent.com/--q6Hq0VPCCc/VXIEVlM_9KI/AAAAAAAAAGo/HiEeB1u8lNg/s1600/test-700-box.bmp>


*Test image 8*


 
<https://lh3.googleusercontent.com/-ZyIG91bgTs8/VXIEW_JMYdI/AAAAAAAAAGw/VDBAaYKzxQo/s1600/test-800-.bmp>


Box file:

> ~ 38 7 48 11 0
> 3 49 0 66 25 0
> 0 68 0 85 25 0
> 0 87 0 104 25 0
>

Visualization:

<https://lh3.googleusercontent.com/-ZUZB4pwVc1Q/VXIEX0ajscI/AAAAAAAAAG4/07nehm3bkuo/s1600/test-800-box.bmp>


*Test image 9*


<https://lh3.googleusercontent.com/--lo5G7x642k/VXIEZcpNAoI/AAAAAAAAAHA/9ZoqX6WhzPc/s1600/test-900-.bmp>


Box file:

> ~ 39 7 49 11 0
> 9 50 0 67 25 0
> 0 69 0 86 25 0
> 0 88 0 105 25 0
>

Visualization:

<https://lh3.googleusercontent.com/-q_A9Yl23yyA/VXIEag6XHiI/AAAAAAAAAHI/2VVd2N9peVY/s1600/test-900-box.bmp>


*Test image 10*


<https://lh3.googleusercontent.com/-4fYTUHzLNGY/VXIEcIxDY_I/AAAAAAAAAHQ/2h1fDFlnZBg/s1600/test-000-.bmp>


Box file:

> 4 40 0 61 25 0
> 3 56 0 68 25 0
> 0 70 0 87 25 0
> 0 89 0 106 25 0
>

Visualization:

<https://lh3.googleusercontent.com/-0NdixkH2v9w/VXIEdcuttPI/AAAAAAAAAHY/_Tfw20u6Q34/s1600/test-000-box.bmp>


Note that only 4-th test image ('-400') recognized correctly. That 
intrigued me and after some experimenting I found out that this specific 
case is extremly fragile - moving the minus sign even on one pixel from 
it's position causing incorrect recognition results. So it can be called a 
fortuity that it recognized correctly. As an example, here are recognition 
results for a modified picture with a moved minus sign on one pixel to left.


*Test image 11*


<https://lh3.googleusercontent.com/-8GeSqJuvt_U/VXIEhKFn8aI/AAAAAAAAAHw/v9OX9EMjaj0/s1600/test-400divided-.bmp>


Box file:

> ~ 41 7 51 11 0
> 4 52 0 70 25 0
> 0 72 0 89 25 0
> 0 91 0 108 25 0
>

Visualization:

<https://lh3.googleusercontent.com/-TaXBzR2te3c/VXIEik0IPBI/AAAAAAAAAH4/IPtdaNSZqI0/s1600/test-400divided-box.bmp>


And here's an example of a correct recognition as a relief. As I said 
before, a minus sign between digits recognizes perfectly all the time.


*Test image 12*


<https://lh3.googleusercontent.com/-XZ-901tr5wg/VXIEeTY5r1I/AAAAAAAAAHg/Hu-B0kxHykU/s1600/test02-02-92%252B.bmp>


Box file:

> 0 10 0 26 25 0
> 2 29 0 45 25 0
> - 49 7 59 11 0
> 0 60 0 76 25 0
> 2 79 0 95 25 0
> - 99 7 109 11 0
> 9 110 0 126 25 0
> 2 129 0 145 25 0
>

Visualization:

<https://lh3.googleusercontent.com/-k7rl0w_UTAc/VXIEfiYzt1I/AAAAAAAAAHo/x7EsqXGWLnw/s1600/test02-02-92%252Bbox.bmp>


I've also tried to recognize different formats (jpg, png, tif, even gif), 
but none of them gave me correct results.
I suppose that re-learning is the only reliable option to fix that 
inaccuracy, but I would appreciate any opinion.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ffd03b90-708b-439d-aefa-2d0179c5b6d6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to