Hi Nikola,

I suggest you don't try training it. Training is mostly for adding
new languages, or at least significantly different fonts. As your
input is English, and a common font, I doubt it would help much over
the standard english training file.

The results I got from running Tesseract 3 on your sample were
pretty good, though. I'll attach them here. Using -psm 6 made a big
improvement as it meant the table cells were on the correct row. So
I ran:

  tesseract ocr1.png outtest2 -psm 6

The problems remaining in the output is 7 being consistently recognised
as ?, and m is regularly misrecognised as r'n or r‘n. I have suggestions
for this. 

If your input data will never have ? in, create an ambig rule which
always changes a ? to a 7 (and similar for the r'n issues). The best
way to do this would be:

1) unpack the english training data:

  combine_tessdata -u eng.traineddata eng.

2) add the following lines to the end of eng.unicharambigs:

1       ?       1       7       1
3       r ' n   1       m       1
3       r ‘ n   1       m       1

3) recombine the training data:

  combine_tessdata eng.

And the eng.traineddata file will contain the extra ambig rules.

Hope this helps, and let us know how you get on.

Nick

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
04-Jan-2012 00:22
Ward:
Physician:
Operator:
Total r'nAs 1088? Total DLP 1206 r'nGycr‘n
Scan l<\-f r'nAs I ref. CTDlvol DLP Tl :SL
r'nGy r'nGycr‘n s mm
PatientPosition F-SP
Topograrn 1 120 36 mA 5.3 0.6
Thorax 2 120 50 3.3? 140 0.5 0.6
Topograrn 3 120 36 mA 5.3 0.6
F|_CaSc 4D 120 66 I S0 1.00 24 0.20 0.6
Premonitoring 5 100 42 1.2? 1 0.20 10.0
Premonitoring 6 100 42 1.2? 1 0.20 10.0
Premonitoring T 100 42 1.2? 1 0.20 10.0
Contrast
Monitoring S 100 42 12.?3 13 0.20 10.0
DS_CorCTA 10D 100 320 50.64 1010 0.20 0.6
Medium Type Iodine Conc. Volume Flow CM Ratio
mgfml ml mlfs
Contrast Ultravist 3?0 S0 5.5 100%
Saline 40 5.5

Reply via email to