Maybe look at the API [1]. The output of the attached program shows there’s a 
lot of detail that can be gleaned at this level, including the confidence of 
the selected character and that of the other candidates. Compiling against 
tesseract on Ubuntu, at least, is fairly straightforward. I don’t know about 
windows or os/x.

art
---
1. https://github.com/tesseract-ocr/tesseract/wiki/APIExample

From: [email protected] [mailto:[email protected]] On 
Behalf Of ???
Sent: Friday, October 20, 2017 4:18 AM
To: tesseract-ocr <[email protected]>
Subject: Re: [tesseract-ocr] How to get digital and the confidence?

I think that is another question. I want to filter recognized characters by 
confidence in this question but don't know how to get the confidence with 
Tesseract.

在 2017年10月20日星期五 UTC+8下午4:13:35,shree写道:
Your image is 96 dpi. Increase the dpi to 300 and try.

Preprocess the image to remove the boxes around letters, if possible.

See https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Fri, Oct 20, 2017 at 1:24 PM, 朱裕清 <[email protected]<javascript:>> wrote:
This is my target image:

![target image](https://i.stack.imgur.com/UYMZJ.png)

Actually my question is similar to [this 
post](https://stackoverflow.com/questions/4944830/how-to-make-tesseract-to-recognize-only-numbers-when-they-are-mixed-with-letter).
 But I don't know why the following answer will lead to another direction. I 
mean, I just hope to get those digits with high degree of confidence. Such as I 
can do this with another language

![](https://i.stack.imgur.com/rF1gP.png)

Then I can just keep those degree of confidence with a threshold `0.9`. But now 
I hope to use *Tesseract* to do this.

First, I train a *number.traineddata* just for recognizing number. You can get 
it [here](https://1drv.ms/u/s!Aumb0ijJibxOi1KVXFjwDzOVRQrm).


tesseract.exe target.jpg stdout -l number --oem 0 -psm 6


![](https://i.stack.imgur.com/OzgBS.png)

Note I will get all digits which include high confidence and low confidence. 
Can we recognize the number and get the degree of its confidence? I cannot find 
any information to implement it. If *Tesseract* cannot do it. Any other method 
based on **C++** can implement my target? Could anyone can give me some 
information for it?
--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected]<javascript:>.
To post to this group, send email to [email protected]<javascript:>.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/273d9f86-39ce-42fe-8934-781f2103e4fa%40googlegroups.com<https://groups.google.com/d/msgid/tesseract-ocr/273d9f86-39ce-42fe-8934-781f2103e4fa%40googlegroups.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
[email protected]<mailto:[email protected]>.
To post to this group, send email to 
[email protected]<mailto:[email protected]>.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/790f9169-e724-49b2-b24a-320a10fea6f4%40googlegroups.com<https://groups.google.com/d/msgid/tesseract-ocr/790f9169-e724-49b2-b24a-320a10fea6f4%40googlegroups.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/YTOPR0101MB209109241A8136650C6DC3ADDC400%40YTOPR0101MB2091.CANPRD01.PROD.OUTLOOK.COM.
For more options, visit https://groups.google.com/d/optout.
->Start block
-->Start para
--->Start textline
---->Start word
     Font details:
      name: Century_Schoolbook_L_Bold
      bold: true
      italic: false
      underlined: false
      monospace: false
      serif: true
      smallcaps: false
      pointsize: 21
      font_id: 60
----->Symbol(s): x1: 11, y1: 14, x2: 21 y2: 31
      1 conf: 89.931801
       l conf: 83.995880
       i conf: 76.661751
----->Symbol(s): x1: 37, y1: 15, x2: 74 y2: 36
      E conf: 59.237701
       @ conf: 53.401047
       $ conf: 50.958675
       fl conf: 48.651180
       § conf: 47.086262
       fi conf: 46.461655
----->Symbol(s): x1: 83, y1: 15, x2: 121 y2: 36
      E conf: 58.209717
       K conf: 52.477173
       $ conf: 52.456875
       i conf: 52.380096
       @ conf: 49.374390
       § conf: 48.515774
       £ conf: 47.806328
       fi conf: 47.651409
       fl conf: 45.653095
       & conf: 43.965816
----->Symbol(s): x1: 131, y1: 15, x2: 168 y2: 36
      - conf: 88.712219
       _ conf: 86.145760
       — conf: 84.812286
----->Symbol(s): x1: 177, y1: 15, x2: 215 y2: 36
      @ conf: 59.478523
       E conf: 58.393898
       $ conf: 49.511791
---->Start word
     Font details:
      name: Verdana_Bold
      bold: true
      italic: false
      underlined: false
      monospace: false
      serif: false
      smallcaps: false
      pointsize: 21
      font_id: 317
----->Symbol(s): x1: 296, y1: 14, x2: 307 y2: 31
      3 conf: 79.707809
       $ conf: 68.919724
       S conf: 68.013824
----->Symbol(s): x1: 319, y1: 15, x2: 405 y2: 36
      " conf: 79.930550
       “ conf: 77.794098
       n conf: 76.540306
       u conf: 75.111694
       ” conf: 70.770447
----->Symbol(s): x1: 414, y1: 15, x2: 416 y2: 36
      | conf: 99.690750
       I conf: 97.735306
       l conf: 96.670387
----->Symbol(s): x1: 414, y1: 15, x2: 452 y2: 36
      E conf: 62.716747
       £ conf: 54.273201
       $ conf: 54.205250
       fl conf: 54.006363
       @ conf: 53.053898
       § conf: 52.804672
       fi conf: 50.822838
       3 conf: 50.051003
----->Symbol(s): x1: 461, y1: 15, x2: 463 y2: 36
      | conf: 88.676117
       l conf: 81.228409
----->Symbol(s): x1: 461, y1: 15, x2: 496 y2: 36
      I conf: 63.768780
       1 conf: 63.171150
       l conf: 62.177311
       i conf: 58.743431
       E conf: 55.793533
       £ conf: 55.474258
       j conf: 49.850060
       § conf: 49.404373
       K conf: 49.255898
----->Symbol(s): x1: 494, y1: 15, x2: 499 y2: 36
      | conf: 99.678108
       l conf: 98.545494
       I conf: 98.339149
---->Start word
     Font details:
      name: Verdana_Bold
      bold: true
      italic: false
      underlined: false
      monospace: false
      serif: false
      smallcaps: false
      pointsize: 21
      font_id: 317
----->Symbol(s): x1: 593, y1: 15, x2: 602 y2: 32
      1 conf: 95.122108
       l conf: 87.608086
----->Symbol(s): x1: 605, y1: 15, x2: 615 y2: 32
      5 conf: 87.609360
       § conf: 74.390923
       S conf: 73.789215
----->Symbol(s): x1: 625, y1: 14, x2: 664 y2: 36
      - conf: 88.515839
       _ conf: 84.997162
       — conf: 84.629189
----->Symbol(s): x1: 718, y1: 14, x2: 758 y2: 36
      - conf: 85.411400
       _ conf: 82.293083
       — conf: 81.396370
----->Symbol(s): x1: 672, y1: 14, x2: 709 y2: 36
      I conf: 66.698105
       1 conf: 63.526737
       l conf: 61.748486
       E conf: 60.197342
       i conf: 58.322693
       £ conf: 55.986095
       $ conf: 52.553886
----->Symbol(s): x1: 707, y1: 14, x2: 711 y2: 36
      | conf: 99.683777
       I conf: 97.731819
       l conf: 96.670387
----->Symbol(s): x1: 767, y1: 15, x2: 770 y2: 36
      | conf: 88.676117
       l conf: 81.228409
----->Symbol(s): x1: 767, y1: 15, x2: 805 y2: 36
      I conf: 64.318085
       1 conf: 61.494984
       l conf: 60.348309
       Z conf: 56.929646
       i conf: 56.537701
       E conf: 54.513718
       £ conf: 54.003593
       K conf: 50.321625
----->Symbol(s): x1: 801, y1: 15, x2: 805 y2: 36
      J conf: 82.018463
       l conf: 78.605148
       I conf: 76.047478
       1 conf: 75.599884
       ] conf: 74.380348
       ! conf: 74.294319
       } conf: 74.253632
--->Start textline
---->Start word
     Font details:
      name: Verdana
      bold: false
      italic: false
      underlined: false
      monospace: false
      serif: false
      smallcaps: false
      pointsize: 21
      font_id: 316
----->Symbol(s): x1: 11, y1: 48, x2: 21 y2: 64
      2 conf: 79.138992
       Z conf: 77.948776
       z conf: 70.518303
       l conf: 68.821663
----->Symbol(s): x1: 37, y1: 48, x2: 74 y2: 69
      E conf: 60.971474
       $ conf: 53.594707
       £ conf: 52.193092
       i conf: 50.993881
       § conf: 50.710670
       @ conf: 49.891197
       fi conf: 46.740971
       & conf: 46.092949
----->Symbol(s): x1: 70, y1: 48, x2: 89 y2: 69
      H conf: 79.952812
       N conf: 74.207687
       M conf: 74.169350
----->Symbol(s): x1: 83, y1: 48, x2: 121 y2: 69
      I conf: 64.840027
       E conf: 60.899055
       1 conf: 58.392452
       £ conf: 57.880394
       l conf: 56.296944
       K conf: 55.399208
       i conf: 55.059166
       X conf: 52.708771
       § conf: 52.596466
----->Symbol(s): x1: 131, y1: 48, x2: 168 y2: 69
      - conf: 88.712219
       _ conf: 86.145760
       — conf: 84.812286
----->Symbol(s): x1: 177, y1: 48, x2: 215 y2: 69
      @ conf: 59.407616
       E conf: 54.503513
---->Start word
     Font details:
      name: Trebuchet_MS
      bold: false
      italic: false
      underlined: false
      monospace: false
      serif: false
      smallcaps: false
      pointsize: 21
      font_id: 295
----->Symbol(s): x1: 296, y1: 48, x2: 307 y2: 65
      9 conf: 86.562469
----->Symbol(s): x1: 319, y1: 48, x2: 357 y2: 69
      - conf: 88.942566
       _ conf: 86.370323
       — conf: 85.031242
----->Symbol(s): x1: 413, y1: 48, x2: 452 y2: 69
      - conf: 88.942566
       _ conf: 86.370323
       — conf: 85.031242
----->Symbol(s): x1: 366, y1: 48, x2: 403 y2: 69
      E conf: 61.234783
       § conf: 53.852997
       £ conf: 51.375343
       i conf: 51.331131
       K conf: 51.299850
       $ conf: 49.915943
       fi conf: 49.619556
       @ conf: 49.598892
       fl conf: 49.552441
       ® conf: 46.762474
----->Symbol(s): x1: 400, y1: 48, x2: 406 y2: 69
      I conf: 80.192734
       l conf: 80.140518
       1 conf: 75.111221
       ] conf: 75.103256
       i conf: 72.018944
       } conf: 71.933495
       ! conf: 68.319191
----->Symbol(s): x1: 461, y1: 48, x2: 496 y2: 69
      E conf: 56.619549
       $ conf: 51.349327
       § conf: 50.727558
       J conf: 49.731922
       fi conf: 49.565971
       @ conf: 49.054981
       i conf: 48.830215
       fl conf: 48.670063
       £ conf: 48.016785
       Q conf: 47.968769
----->Symbol(s): x1: 493, y1: 48, x2: 499 y2: 69
      ] conf: 82.970512
       l conf: 82.810471
       I conf: 80.548431
       1 conf: 77.296616
       i conf: 75.613907
       } conf: 75.108841
---->Start word
     Font details:
      name: Verdana_Bold
      bold: true
      italic: false
      underlined: false
      monospace: false
      serif: false
      smallcaps: false
      pointsize: 21
      font_id: 317
----->Symbol(s): x1: 593, y1: 48, x2: 602 y2: 65
      1 conf: 96.127319
       l conf: 88.378372
----->Symbol(s): x1: 604, y1: 48, x2: 615 y2: 65
      6 conf: 80.859253
       G conf: 73.840683
       fi conf: 71.738495
       € conf: 71.620415
       E conf: 70.836113
       § conf: 69.450500
       S conf: 68.940498
       fl conf: 66.893127
       $ conf: 66.766830
----->Symbol(s): x1: 625, y1: 48, x2: 758 y2: 69
      " conf: 79.498199
       ” conf: 78.719879
       “ conf: 75.898613
       u conf: 75.464127
       n conf: 73.290276
       H conf: 70.139511
       U conf: 69.778442
----->Symbol(s): x1: 672, y1: 48, x2: 709 y2: 69
      I conf: 65.886452
       1 conf: 62.073864
       E conf: 57.697769
       i conf: 56.664619
       $ conf: 51.611328
----->Symbol(s): x1: 707, y1: 48, x2: 711 y2: 69
      | conf: 98.356125
       I conf: 96.818039
       l conf: 96.705261
----->Symbol(s): x1: 767, y1: 48, x2: 770 y2: 69
      | conf: 89.581299
       l conf: 82.386742
----->Symbol(s): x1: 767, y1: 48, x2: 805 y2: 69
      I conf: 63.940907
       1 conf: 59.138775
       l conf: 58.213902
       Z conf: 56.786251
       i conf: 56.111240
       £ conf: 53.649902
       E conf: 53.434944
       § conf: 51.297283
----->Symbol(s): x1: 801, y1: 48, x2: 805 y2: 69
      ] conf: 82.581589
       1 conf: 76.473320
       } conf: 71.298218
       l conf: 68.645561
--->Start textline
---->Start word
     Font details:
      name: Verdana_Bold
      bold: true
      italic: false
      underlined: false
      monospace: false
      serif: false
      smallcaps: false
      pointsize: 21
      font_id: 317
----->Symbol(s): x1: 11, y1: 81, x2: 21 y2: 98
      3 conf: 85.559586
----->Symbol(s): x1: 37, y1: 86, x2: 50 y2: 103
      L conf: 63.747787
----->Symbol(s): x1: 37, y1: 82, x2: 74 y2: 103
      T conf: 70.555412
----->Symbol(s): x1: 61, y1: 82, x2: 74 y2: 103
      J conf: 60.052521
----->Symbol(s): x1: 83, y1: 82, x2: 86 y2: 102
      l conf: 92.044502
       I conf: 91.600403
       | conf: 90.471542
       [ conf: 84.678436
       i conf: 78.262184
----->Symbol(s): x1: 87, y1: 82, x2: 108 y2: 102
      3 conf: 62.102150
       E conf: 56.818970
       $ conf: 53.791138
       § conf: 52.500908
       i conf: 52.108189
       £ conf: 51.891251
       } conf: 51.663471
       fi conf: 50.492054
----->Symbol(s): x1: 110, y1: 82, x2: 134 y2: 102
      U conf: 47.428085
       W conf: 46.122742
       H conf: 45.945786
       T conf: 45.747604
       1 conf: 45.260666
       I conf: 43.324814
       N conf: 41.100128
       K conf: 37.174133
       fi conf: 36.480530
       X conf: 36.448853
----->Symbol(s): x1: 135, y1: 82, x2: 168 y2: 102
      I conf: 59.096680
       f conf: 56.660698
       Z conf: 53.634380
       E conf: 53.483654
       § conf: 53.450756
       3 conf: 51.726677
       i conf: 49.731819
       K conf: 48.559818
       fi conf: 47.356853
       ? conf: 46.857929
----->Symbol(s): x1: 165, y1: 82, x2: 180 y2: 102
      H conf: 79.664330
       N conf: 69.253906
----->Symbol(s): x1: 181, y1: 82, x2: 211 y2: 102
      j conf: 56.324745
       I conf: 55.692390
       E conf: 52.452335
       § conf: 51.819668
       f conf: 51.451538
       Z conf: 51.244164
       fi conf: 50.728230
       fl conf: 49.817715
       i conf: 49.536133
       1 conf: 49.320961
----->Symbol(s): x1: 205, y1: 82, x2: 215 y2: 102
      J conf: 75.697052
---->Start word
     Font details:
      name: Verdana_Bold
      bold: true
      italic: false
      underlined: false
      monospace: false
      serif: false
      smallcaps: false
      pointsize: 21
      font_id: 317
----->Symbol(s): x1: 592, y1: 82, x2: 615 y2: 98
      H conf: 72.904251
       N conf: 71.727982
       fl conf: 71.510674
       n conf: 70.780136
       fi conf: 69.174377
       v conf: 64.948578
       y conf: 58.213177
----->Symbol(s): x1: 625, y1: 81, x2: 663 y2: 102
      @ conf: 56.882828
       w conf: 53.960770
       ® conf: 53.951904
       E conf: 50.006145
       Q conf: 45.344917
       & conf: 44.476360
       $ conf: 43.746780
----->Symbol(s): x1: 672, y1: 81, x2: 710 y2: 102
      £ conf: 59.107914
       i conf: 55.733929
       E conf: 47.672760
       $ conf: 45.034451
----->Symbol(s): x1: 707, y1: 81, x2: 722 y2: 102
      H conf: 87.944679
       M conf: 78.044083
       N conf: 75.669800
----->Symbol(s): x1: 719, y1: 81, x2: 758 y2: 102
      L conf: 57.087746
       Q conf: 54.929131
       £ conf: 53.656570
       fl conf: 52.296772
       E conf: 50.422203
       @ conf: 50.020218
       § conf: 48.993279
       fi conf: 46.869476
       g conf: 43.846817
       $ conf: 43.130569
----->Symbol(s): x1: 767, y1: 81, x2: 770 y2: 102
      I conf: 72.456284
       [ conf: 70.036972
       i conf: 67.350258
       { conf: 67.206100
       K conf: 60.327148
----->Symbol(s): x1: 767, y1: 81, x2: 802 y2: 102
      l conf: 60.077377
       i conf: 55.453255
       1 conf: 55.393631
       j conf: 54.418472
       L conf: 54.331619
       £ conf: 54.100616
       J conf: 49.822998
       Q conf: 46.445877
       § conf: 46.120979
       @ conf: 45.312500
----->Symbol(s): x1: 800, y1: 81, x2: 805 y2: 102
      | conf: 91.200478
       l conf: 90.304131
       I conf: 88.383553
       i conf: 76.797340
--->Start textline
---->Start word
     Font details:
      name: Trebuchet_MS
      bold: false
      italic: false
      underlined: false
      monospace: false
      serif: false
      smallcaps: false
      pointsize: 21
      font_id: 295
----->Symbol(s): x1: 11, y1: 114, x2: 23 y2: 131
      4 conf: 81.138107
----->Symbol(s): x1: 37, y1: 115, x2: 74 y2: 136
      E conf: 57.821243
       $ conf: 52.355614
       fl conf: 51.528450
       @ conf: 50.219479
       fi conf: 48.979576
       £ conf: 48.071854
       § conf: 46.317833
----->Symbol(s): x1: 83, y1: 115, x2: 87 y2: 136
      | conf: 91.059731
       l conf: 90.372162
       I conf: 87.832039
       ! conf: 81.340881
----->Symbol(s): x1: 83, y1: 115, x2: 121 y2: 136
      E conf: 61.190586
       £ conf: 57.133331
       1 conf: 55.225716
       i conf: 54.875031
       § conf: 54.231987
       $ conf: 51.957684
       fi conf: 47.398830
----->Symbol(s): x1: 131, y1: 115, x2: 168 y2: 136
      - conf: 88.712219
       _ conf: 86.145760
       — conf: 84.812286
----->Symbol(s): x1: 177, y1: 115, x2: 215 y2: 136
      @ conf: 60.305641
       $ conf: 50.887508
--->Start textline
---->Start word
     Font details:
      name: Verdana_Bold
      bold: true
      italic: false
      underlined: false
      monospace: false
      serif: false
      smallcaps: false
      pointsize: 21
      font_id: 317
----->Symbol(s): x1: 11, y1: 149, x2: 21 y2: 166
      5 conf: 90.268242
       $ conf: 77.653046
       S conf: 77.592651
       § conf: 75.649452
---->Start word
     Font details:
      name: Verdana_Bold
      bold: true
      italic: false
      underlined: false
      monospace: false
      serif: false
      smallcaps: false
      pointsize: 21
      font_id: 317
----->Symbol(s): x1: 37, y1: 149, x2: 75 y2: 170
      w conf: 56.742764
       W conf: 53.940559
       @ conf: 43.838638
----->Symbol(s): x1: 84, y1: 149, x2: 122 y2: 170
      @ conf: 50.607704
       m conf: 50.549244
       ® conf: 47.672546
       fl conf: 47.557049
       w conf: 47.033524
       E conf: 44.665962
       & conf: 42.775955
       $ conf: 39.692707
       fi conf: 38.713459
----->Symbol(s): x1: 131, y1: 149, x2: 135 y2: 170
      | conf: 92.728096
       l conf: 92.188660
       I conf: 87.789482
----->Symbol(s): x1: 131, y1: 149, x2: 169 y2: 170
      £ conf: 56.105175
       i conf: 50.756767
       L conf: 50.313000
       1 conf: 47.445774
       l conf: 46.876053
       Q conf: 42.748451
       § conf: 41.579365
       $ conf: 41.516594
----->Symbol(s): x1: 165, y1: 149, x2: 181 y2: 170
      H conf: 86.541916
       M conf: 77.263245
       N conf: 74.392952
       U conf: 71.702118
       fl conf: 71.566193
----->Symbol(s): x1: 177, y1: 149, x2: 216 y2: 170
      L conf: 57.775211
       i conf: 47.642044
       l conf: 46.858337
       £ conf: 44.453754
/*
    info.cpp - extract detailed information using Tesseract API

        ./info -i [image] -l [lang] -p [psm]
        note that psm is based on Tesseract numbers, see tesseract --help-psm
 
    - art rhyno <https://github.com/artunit/>
    (c) Copyright GNU General Public License (GPL)
*/

#include <string>
using std::string;
#include <iostream>

#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>

tesseract::PageSegMode sortOutPsms(int psm)
{
    if (psm == 0) return tesseract::PSM_AUTO;
    if (psm == 1) return tesseract::PSM_AUTO_ONLY;
    if (psm == 2) return tesseract::PSM_AUTO_OSD;
    if (psm == 3) return tesseract::PSM_CIRCLE_WORD;
    if (psm == 4) return tesseract::PSM_AUTO_OSD;
    if (psm == 5) return tesseract::PSM_RAW_LINE;
    if (psm == 6) return tesseract::PSM_SINGLE_BLOCK;
    if (psm == 7) return tesseract::PSM_SINGLE_BLOCK_VERT_TEXT;
    if (psm == 8) return tesseract::PSM_SINGLE_CHAR;
    if (psm == 9) return tesseract::PSM_SINGLE_COLUMN;
    if (psm == 10) return tesseract::PSM_SINGLE_LINE;
    if (psm == 11) return tesseract::PSM_SINGLE_WORD;
    if (psm == 12) return tesseract::PSM_SPARSE_TEXT;
    if (psm == 13) return tesseract::PSM_SPARSE_TEXT_OSD;
    return tesseract::PSM_AUTO_OSD;
}//sortOutPsms

void showFontInfo(const char* font_name, bool is_bold, bool is_italic, 
    bool is_underlined, bool is_monospace, bool is_serif, bool is_smallcaps,
    int pointsize, int font_id) 
{
    printf("     Font details:\n");
    printf("      name: %s\n", font_name);
    printf("      bold: %s\n",(is_bold ? "true" : "false"));
    printf("      italic: %s\n",(is_italic ? "true" : "false"));
    printf("      underlined: %s\n",(is_underlined ? "true" : "false"));
    printf("      monospace: %s\n",(is_monospace ? "true" : "false"));
    printf("      serif: %s\n",(is_serif ? "true" : "false"));
    printf("      smallcaps: %s\n",(is_smallcaps ? "true" : "false"));
    printf("      pointsize: %d\n", pointsize);
    printf("      font_id: %d\n", font_id);
}//showFontInfo

int main(int argc, char* argv[])
{
    const char* default_lang = "eng";
    const char* default_img = "default.jpg";
    int default_psm = 4;
 
    bool bold, italic, underlined, monospace, serif, smallcaps; 
    int pointsize, font_id;

    std::string lang = default_lang;
                
    std::string img = default_img;
    int psm = default_psm;

    //sort out parameters
    for (int i = 1; i < argc; ++i) {
        if (std::string(argv[i]) == "-i") {
            if (i + 1 < argc) { 
                img = argv[++i]; 
            } else { 
                std::cerr << "-i option requires image argument." << std::endl;
                return 0;
            }//if  
        }//if 
        if (std::string(argv[i]) == "-l") {
            if (i + 1 < argc) { 
                lang = argv[++i]; 
                printf("LANG: %s\n",lang.c_str());
            } else { 
                std::cerr << "-l option requires lang argument." << std::endl;
                return 0;
            }//if  
        }//if 
        if (std::string(argv[i]) == "-p") {
            if (i + 1 < argc) { 
                psm = atoi(argv[++i]);
            } else { 
                std::cerr << "-p option requires psm argument." << std::endl;
                return 0;
            }//if  
        }//if 
    }//for

    //set up API
    tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
    api->SetPageSegMode(sortOutPsms(psm));

    if (api->Init(NULL, lang.c_str())) {
        fprintf(stderr, "Could not initialize tesseract.\n");
        exit(1);
    }//if

    Pix *image = pixRead(img.c_str());
    api->SetImage(image);
    //Important
    api->Recognize(NULL);

    // Get OCR result
    tesseract::ResultIterator* ri = api->GetIterator();
    tesseract::PageIteratorLevel level = tesseract::RIL_SYMBOL;
                
    if(ri != 0) {
        do {
            const char* symbol = ri->GetUTF8Text(level);
            if (ri->IsAtBeginningOf(tesseract::RIL_BLOCK)) { printf("->Start 
block\n");}
            if (ri->IsAtBeginningOf(tesseract::RIL_PARA)) { printf("-->Start 
para\n");}
            if (ri->IsAtBeginningOf(tesseract::RIL_TEXTLINE)) { 
printf("--->Start textline\n");}
            if (ri->IsAtBeginningOf(tesseract::RIL_WORD)) {
                printf("---->Start word\n");
                const char *font_name = ri->WordFontAttributes(&bold,   
                   &italic, &underlined,&monospace, &serif, 
&smallcaps,&pointsize, &font_id);
                
showFontInfo(font_name,bold,italic,underlined,monospace,serif,smallcaps,
                    pointsize,font_id);
            }
            if (symbol != 0) {
                int x1, y1, x2, y2;
                ri->BoundingBox(level, &x1, &y1, &x2, &y2);
                printf("----->Symbol(s): x1: %d, y1: %d, x2: %d y2: 
%d\n",x1,y1,x2,y2);
                tesseract::ChoiceIterator ci(*ri);
                bool start = true;
                do {
                    const char* choice = ci.GetUTF8Text();
                    if (!start) printf(" ");
                    printf("      %s conf: %f\n", choice, ci.Confidence());
                    start = false;
                } while(ci.Next());
            }//if
            delete[] symbol;
        } while((ri->Next(level)));
    }//if
    // Destroy used object and release memory
    api->End();
    pixDestroy(&image);

    return 0;
}

Reply via email to