Yeah, that could be true.  But still trying to figure out *where* in the 
code to put any new segmentation and glyph identification.

BTW, to generate additional training data, I wrote a program on the Mac to 
scrap text from the subtitle images.  The resulting OCR output from Apple's 
Vision framework is leagues ahead of tesseract.  Too bad it is not an open 
source solution and/or runs on Linux.

On Thursday, August 8, 2024 at 1:27:59 AM UTC+8 [email protected] wrote:

> On Monday, August 5, 2024 at 8:15:27 PM UTC-4 Danny wrote:
>
>
> So, I'm thinking the issue is with the preprocessing, segmentation, and 
> glyph identification more than the model itself.  
>
>
> I agree with that and  I suspect you can do a better job of line 
> segmentation than Tesseract can since you have more information available 
> to you about font size, context, etc.
>
> Tom
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/55adb88d-8a3e-4dd5-b5ac-3b64a4975a31n%40googlegroups.com.

Reply via email to