date:20210130

[tesseract-ocr] Re: New release for tessdata_{fast,best}?

2021-01-30 Thread Tom Morris

On Wednesday, January 27, 2021 at 5:28:27 AM UTC-5 Merlijn Wajer wrote: > > The Internet Archive has switched to using Tesseract for all our OCR, That's great to hear! It's certainly been a long time coming. Nick White & I tried to get this to happen 7 years ago and even volunteered to help,

[tesseract-ocr] Re: Digits reading optimalisation.

2021-01-30 Thread Владимир Калачихин

Digits included in language model with letters. And model most trained to phrase recognition, not separate digits. Mistakes on digits unavoidable. суббота, 30 января 2021 г. в 19:12:39 UTC+3, Benek: > I still need to read the dot in the correct place which makes it a bit > harder. So you

[tesseract-ocr] Re: Digits reading optimalisation.

2021-01-30 Thread Владимир Калачихин

Heh. It's an old issue. For 100% accuracy, you must use a digit-only language model. But there is no such thing. Besides trivial perceptron shows good results on digits recognition. суббота, 30 января 2021 г. в 18:41:13 UTC+3, Benek: > Hello! I'm trying to read some digits and I thought it was