Re: [tesseract-ocr] Help for training tesseract to recognize a new (dead) language

2018-05-29 Thread ShreeDevi Kumar
please see https://github.com/OCR-D/ocrd-train you can use it with image files and matching ground truth text - in utf-8. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, May 29, 2018 at 9:57 PM, wrote: > Hi, >

[tesseract-ocr] Help for training tesseract to recognize a new (dead) language

2018-05-29 Thread ramast . com
Hi, I belong to a group who study an old Egyptian writing system called "Coptic". It's based mostly on Greek (with some variation). Big majority of books written in Coptic where during the last century and were mostly the same [typewriter] font. Here is a sample picture:

[tesseract-ocr] Re: Training error "Couldn't find a matching blob"

2018-05-29 Thread Paul Kitchen
I'm actually training with several other TIFF images which contain the "Circle M" symbol (uppercase M inside a circle). In all cases, tesseract reports the error message "Couldn't find a matching blob". So I think the issue is something fundamental with the algorithm rather than just an

Re: [tesseract-ocr] Help for training tesseract to recognize a new (dead) language

2018-05-29 Thread ShreeDevi Kumar
See http://www.moheb.de/ocr.html It provides a traineddata file for Coptic for use with tesseract version 3. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, May 29, 2018 at 9:57 PM, wrote: > Hi, > I belong to a

[tesseract-ocr] pre-trained number plate database

2018-05-29 Thread Reiner Richter
I'm looking at possibly using tesseract 4 for license plate recognition. Currently I've use downloaded the trained English database - from https://github.com/tesseract-ocr/tessdata/ - and limited returns to numbers and capital letters only. It still isn't very good, but that's probably because

Re: [tesseract-ocr] Some spaces are not recognized

2018-05-29 Thread ShreeDevi Kumar
set the config variable - "preserve_interword_spaces" to 1 And as 0 For diff runs and see if that makes any difference On Tue 29 May, 2018, 4:30 PM ShreeDevi Kumar, wrote: > >The traineddata from tesseract does not have a spacing problem, > > Then the problem is related to training. > > > > >

Re: [tesseract-ocr] Some spaces are not recognized

2018-05-29 Thread Sumedhe Dissanayake
On Friday, May 18, 2018 at 6:32:44 PM UTC+5:30, shree wrote: > > image is not visible. > > ShreeDevi > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Fri, May 18, 2018 at 5:39 PM, Sumedhe Dissanayake < >

Re: [tesseract-ocr] Some spaces are not recognized

2018-05-29 Thread ShreeDevi Kumar
>The traineddata from tesseract does not have a spacing problem, Then the problem is related to training. On Tue 29 May, 2018, 4:16 PM Sumedhe Dissanayake, < sumedhedissanay...@gmail.com> wrote: > > > On Friday, May 18, 2018 at 6:32:44 PM UTC+5:30, shree wrote: >> >> image is not visible. >>