Ah! Thanks for the heads up, that probably saved me alot of time. I'll definitely have a look at OpenCV text detection and Cloud Vision. I really appreciate the tips.
man. 13. nov. 2023 kl. 17:14 skrev Tom Morris <[email protected]>: > > > On Monday, November 13, 2023 at 5:35:20 AM UTC-5 [email protected] wrote: > > > Yeah it seems page segmentation is the crucial issue. If the bounding > boxes are good, the recognition is usually very good. > > I think I've sort of reached the limit on what I can do with base > Tesseract. I think the next step would be custom training / fine-tuning. > > > Tesseract's page layout analysis / segmentation isn't training based, so I > don't think this is going to help you. If you wanted to recognize the C/L > glyph, you could do fine tuning training for it, but it's not going to help > you with the problem of finding rotated text and accurately determining > bounding boxes for text of interest. > > It's been ages since I've done serious image processing, but I'd recommend > looking at something like OpenCV's text detection: > https://docs.opencv.org/4.8.0/d4/d43/tutorial_dnn_text_spotting.html > > Aspirationally, you can get some idea of what's possible by playing with > Google's Cloud Vision API demo > https://cloud.google.com/vision/docs/drag-and-drop > > It lets you just drag & drop an image and then inspect the results both > visually and via the JSON that the API produces. > > Good luck! > > Tom > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/3a6e0271-db4b-4624-bada-51167dd6d744n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/3a6e0271-db4b-4624-bada-51167dd6d744n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CADVG04pctqNSKkSSmybN%2BRE8J96w-X-eGkGu9HQUgQkUE0Ea-A%40mail.gmail.com.

