Have you tried to train using images made by TRDG (https://textrecognitiondatagenerator.readthedocs.io/en/latest/overview.html)? I recently mentioned this software in this forum because it seems to produce more realistic images than the default tool in tesseract (text2image). You also have more power to fine tune the image outputs with TRDG than with text2image. As such, your synthetic data could be tuned to look exactly like the bank checks.
As to detecting and removing the signatures, OpenCV is probably the best tool you can have. But, I have no clue on how it works; cannot help. OpenCV has a steep learning curve. I tried to learn it once: but, well, I was not fit to it. If the signatures appear in the same coordinate (place) across your images, other tools can also be programmed to crop them out. On Tuesday, November 14, 2023 at 5:46:22 PM UTC+3 Keith Smith wrote: > The short answer is "no", but a fuller answer is that my use case is a bit > different from others and is as follows ... > > I trained tesseract to read the MICR line at the bottom of bank checks > using only 20K checks (i.e. real data, not synthetic). I was able to get > 85% accuracy where the reason for about 13% of the failures was that the > person's signature overlapped the MICR line. If I could figure out a way > to detect and remove the overlapping signature contours, then I think I > would be able to reach 98% accuracy. Any suggestions? I don't know if > tesseract would ever be able to do this alone. > > I also tried training tesseract from scratch using synthetic data but have > not yet achieved the same accuracy. I think the problem is that the > synthetic data doesn't simulate real data closely enough. > > On Tue, Nov 14, 2023 at 12:55 AM Des Bw <[email protected]> wrote: > >> It looks like every one is having issues with tesseract. I am not able to >> find any one who has a great success with this software. >> It would be really encouraging to hear any success story from >> any language. >> >> Has anybody a successful training of tesseract? >> (like, a model that can detect with higher accuracy: 98% or more ?) >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/6509904e-c308-49a6-99a6-a8fd4e4d67bfn%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/6509904e-c308-49a6-99a6-a8fd4e4d67bfn%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/933ac8ff-1e2a-4f9f-8b51-f67afdf683c3n%40googlegroups.com.

