I am reading the mrz of id cards/passports - most of the time the OCR is perfect but sometimes I would like to iterate over the choices in order to fix errors. However for some images there are choices missing, as far as I've seen always one full row. Why? Am I doing it wrong? Or is it a bug? in the example below the first row of the image does not return any choices at all, as seen in the beginning of the output, however being read as seen in the bottom of the output. [image: 1] <https://cloud.githubusercontent.com/assets/1453778/10577389/2f78984c-766b-11e5-9791-61a79165e3b0.jpg> Output IELVAEA99907431101080<88884<<< 8010100M1702091EST<<<<<<<<<<<2 SPECIMEN<<ANDREW<<<<<<<<<<<<<< So far all good, choiceIterator output ( ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( ), ( "(81.30%) '8'" ), ( "(82.51%) '0'", "(75.10%) 'B'", "(71.87%) 'O'", "(71.62%) 'Q'", "(71.30%) 'C'", "(68.84%) 'G'" ), ( "(89.18%) '1'" ), ( "(85.36%) '0'", "(77.56%) 'O'" ), ( "(86.12%) '1'" ), ( "(81.99%) '0'", "(74.86%) 'O'", "(70.67%) 'Q'", "(68.59%) 'B'", "(68.47%) 'C'" ), ( "(85.11%) '0'", "(76.91%) 'O'", "(71.51%) 'Q'" ), ( "(94.15%) 'M'" ), ( "(88.53%) '1'" ), ( "(85.22%) '7'" ), ( "(80.44%) '0'", "(76.15%) 'O'", "(69.74%) 'Q'", "(69.29%) 'C'", "(67.53%) 'B'" ), ( "(88.68%) '2'" ), ( "(85.94%) '0'", "(75.14%) 'B'", "(71.71%) 'O'" ), ( "(76.29%) '9'" ), ( "(89.28%) '1'" ), ( "(94.65%) 'E'" ), ( "(86.10%) 'S'", "(77.95%) '5'" ), ( "(92.35%) 'T'" ), ( "(81.21%) '<'" ), ( "(76.13%) '<'" ), ( "(83.40%) '<'" ), ( "(85.28%) '<'" ), ( "(85.74%) '<'" ), ( "(83.62%) '<'" ), ( "(83.62%) '<'" ), ( "(81.84%) '<'" ), ( "(80.28%) '<'" ), ( "(82.61%) '<'" ), ( "(85.72%) '<'" ), ( "(91.66%) '2'" ), ( "(82.86%) 'S'", "(79.72%) '5'" ), ( "(87.99%) 'P'" ), ( "(90.25%) 'E'", "(75.38%) 'B'" ), ( "(73.48%) 'C'", "(63.71%) 'E'" ), ( "(85.36%) 'I'" ), ( "(92.14%) 'M'" ), ( "(92.45%) 'E'" ), ( "(93.64%) 'N'", "(79.42%) 'M'" ), ( "(73.11%) '<'" ), ( "(72.99%) '<'" ), ( "(90.35%) 'A'" ), ( "(86.72%) 'N'" ), ( "(92.94%) 'D'" ), ( "(85.07%) 'R'" ), ( "(94.44%) 'E'" ), ( "(88.69%) 'W'" ), ( "(83.70%) '<'" ), ( "(80.63%) '<'" ), ( "(75.83%) '<'" ), ( "(81.21%) '<'" ), ( "(84.20%) '<'" ), ( "(84.55%) '<'" ), ( "(83.27%) '<'" ), ( "(83.06%) '<'" ), ( "(81.36%) '<'" ), ( "(81.34%) '<'" ), ( "(78.78%) '<'" ), ( "(80.69%) '<'" ), ( "(85.49%) '<'" ), ( "(82.61%) '<'" ) ) The first row is all NULL. The problem seems to be the double "1"s on the first row. Using tessedit_dump_choices I can see that two words are present on the first row, and only on one the others. As the character "1" is narrow, two in a row becomes a large gap. Quite natural to be deemed as a space. However when using a two words with a proper space between them the choiceIterator functions as expected again. It seems as if the gap is too large, but also too narrow..? Any ideas how to solve it? Can i force tesseract to treat each row as a single word perhaps? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/252d9a27-79ef-4ec6-a722-ef6883bab6ea%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

