Ok. Now I have tried the "best" traindata file (no difference) and removing the alpha layer (no difference). I even created a new, simpler bitmap using Courier New font (attached), which still fails.
Tesseract just can't distinguish between the number 5 and an S. On Sunday, April 28, 2019 at 12:41:35 AM UTC-5, RangerRick wrote: > > Hi, > > I'm new to Tesseract, using latest version 4 executable on Windows 7. > > I'm converting Morse code CW from JPG into text using Tesseract. It works > almost right, just missing on the number 5, which is usually misinterpreted > as an "S". Here's an example of the issue. > > > [image: output.jpg] > > > Here's how it's being interpreted: > > 3AMWA DE > FASMX QFSMXQ CQ CQ DE FSMXQ FSMXQ CQ DE FSMXQ ENSMAA I III FSMXQ FSMXQ > NHE K » > > > I have tried adjusting the various command line parameters but no joy. I > believe the font is Fontcraft Courier DemiBold, but shouldn't matter. In > this case, the image is 96 DPI and 24 pixels tall (total, including border). > > I started to try and retrain to optimize for this font, but that looks > like a pretty daunting task. > > Any guidance would be greatly appreciated. > > Rick > > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ab572776-22f8-4259-a7b4-ec6615d11bb4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

