Ah! Thanks for the heads up, that probably saved me alot of time. I'll
definitely have a look at OpenCV text detection and Cloud Vision. I really
appreciate the tips.

man. 13. nov. 2023 kl. 17:14 skrev Tom Morris <[email protected]>:

>
>
> On Monday, November 13, 2023 at 5:35:20 AM UTC-5 [email protected] wrote:
>
>
> Yeah it seems page segmentation is the crucial issue. If the bounding
> boxes are good, the recognition is usually very good.
>
> I think I've sort of reached the limit on what I can do with base
> Tesseract. I think the next step would be custom training / fine-tuning.
>
>
> Tesseract's page layout analysis / segmentation isn't training based, so I
> don't think this is going to help you. If you wanted to recognize the C/L
> glyph, you could do fine tuning training for it, but it's not going to help
> you with the problem of finding rotated text and accurately determining
> bounding boxes for text of interest.
>
> It's been ages since I've done serious image processing, but I'd recommend
> looking at something like OpenCV's text detection:
> https://docs.opencv.org/4.8.0/d4/d43/tutorial_dnn_text_spotting.html
>
> Aspirationally, you can get some idea of what's possible by playing with
> Google's Cloud Vision API demo
> https://cloud.google.com/vision/docs/drag-and-drop
>
> It lets you just drag & drop an image and then inspect the results both
> visually and via the JSON that the API produces.
>
> Good luck!
>
> Tom
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/3a6e0271-db4b-4624-bada-51167dd6d744n%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/3a6e0271-db4b-4624-bada-51167dd6d744n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CADVG04pctqNSKkSSmybN%2BRE8J96w-X-eGkGu9HQUgQkUE0Ea-A%40mail.gmail.com.

Reply via email to