I think the problem is also that the network does not expect a mix of letters and numbers. The text is processed as a continuous stream and not as individual characters. This is good for text but not for codes.
So if you want to fine tune you need to provide similar mixed sequences. Also, if possible, try to use a bigger text, here it is 13px, something between 30/50px should work better. Also preprocessing (or generating) the image to have a high contrast black/white image might help (not a binary threshold, just a little more contrast). If you can choose which font to use try a few different ones. Of course, if the structure of the codes is regular you can simply replace S with 5. Lorenzo Il giorno dom 28 apr 2019 alle ore 16:49 RangerRick <[email protected]> ha scritto: > Ok. Now I have tried the "best" traindata file (no difference) and > removing the alpha layer (no difference). I even created a new, simpler > bitmap using Courier New font (attached), which still fails. > > Tesseract just can't distinguish between the number 5 and an S. > > > On Sunday, April 28, 2019 at 12:41:35 AM UTC-5, RangerRick wrote: >> >> Hi, >> >> I'm new to Tesseract, using latest version 4 executable on Windows 7. >> >> I'm converting Morse code CW from JPG into text using Tesseract. It works >> almost right, just missing on the number 5, which is usually misinterpreted >> as an "S". Here's an example of the issue. >> >> >> [image: output.jpg] >> >> >> Here's how it's being interpreted: >> >> 3AMWA >> DE FASMX QFSMXQ CQ CQ DE FSMXQ FSMXQ CQ DE FSMXQ ENSMAA I III FSMXQ FSMXQ >> NHE K » >> >> >> I have tried adjusting the various command line parameters but no joy. I >> believe the font is Fontcraft Courier DemiBold, but shouldn't matter. In >> this case, the image is 96 DPI and 24 pixels tall (total, including border). >> >> I started to try and retrain to optimize for this font, but that looks >> like a pretty daunting task. >> >> Any guidance would be greatly appreciated. >> >> Rick >> >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/ab572776-22f8-4259-a7b4-ec6615d11bb4%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/ab572776-22f8-4259-a7b4-ec6615d11bb4%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLyUW5a6EoH%2BqNeKhTwvTw%3DgYswVDd6GswLvim_n5046%3DQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

